[postgis-users] Light WeightLight Weight Geometry (LWGEOM)Proposal
David Blasby
dblasby at refractions.net
Mon Feb 23 10:56:59 PST 2004
There's been some discussion; I thought I'd put in a few words.
The bounding box inside the geometry is just there for a speed
improvement. I would only put it in "large" geometries - and then its
likely not much of a speed improvement (look at the time it takes to
pull a large geometry from disk relative to the speed it take to compute
its bounding box). It might take a few tenths-of-milli-seconds to
compute the bounding box of a 100,000 point polygon, but several hundred
milliseconds to pull it from disk/TOAST.
It should be calculated in double-precision since storing it in floating
precision is only going to save you 16 bytes on a 8kb+ geometry (0.2%).
For sequential scans, you'll either have a large table (in which case
your disk cache is unlikely to prevent you from reading the whole table
each time - meaning bounding box computation is MUCH smaller than
pulling from the disk) or a small table (in which case the overhead of
computing the bounding boxes is small since there's so few points in
your geometries).
Currently, I'm leaning towards the "never pre-calculate the bounding box
in the geometry". Of course, the index will have the BOX2DFLOAT4
pre-calculated.
I think the pg_dump format of the geometry should be WKB (much like it
is now for the WKB type). Since we might be putting SRID's in, we could
have their cannonical form look like:
'SRID=123;01010000000000000000D9BE40000000A0DF687641'
I like the WKB form for dumps (and for pretty much everything but a
person looking at) because you dont have to worry about any numeric drift.
We'll still have convertion to and from WKT for easy-readability.
Other people might not like this, so I'd be willing to have the "normal"
form be WKT. But I think the pg_dump format should be WKB.
On Datatypes, I'm reluctant to use anything but doubles. I realized
that going to int32s or float32s could save us just under 50% storage,
so I'm sympathetic. There is enough space left in the byte type (esp if
we remove the bounding box flag) to put in something like:
00 - float64
01 - int64
10 - float32
11 - int32
The serialized form (and actual analysis) could still be doubles (THIS
HAS IMPLICATIONS), and you could change the format like:
UPDATE <table> SET mygeom = lggeom_tofloat32(mygeom);
NOTE: the WKB representation (and the pg_dump format) will be
converting all these floats to doubles.
I think we're going to cause more confusion and precision issues than
we'll gain in space savings. I dont think this is something we'll do
right-away, but I can leave space available in the type and design to
account for it.
Yes, there could possibily be problems in the LWGEOM->WKB->POSTGIS and
back transliterations. There shouldnt be any (if any of those
conversion isnt 100% bug-free we're in real trouble anyways), but it is
making things complex. Ideally, we'll be transfering all the PostGIS
functionality to the new type over time.
Most of the PostGIS code is 'simple' - the only core portion that's
difficult is the WKT parser/outputer (I'd love to have a brand new one
for the LWGEOM). The other complexities are in functinality like GEOS,
shpherical calculations, projection, and distance() - reworking them for
LWGEOM shouldnt be difficult.
dave
More information about the postgis-users
mailing list