[postgis-users] Light WeightLight Weight Geometry (LWGEOM)Proposal

David Blasby dblasby at refractions.net
Mon Feb 23 10:56:59 PST 2004


There's been some discussion; I thought I'd put in a few words.

The bounding box inside the geometry is just there for a speed 
improvement.  I would only put it in "large" geometries - and then its 
likely not much of a speed improvement (look at the time it takes to 
pull a large geometry from disk relative to the speed it take to compute 
its bounding box).  It might take a few tenths-of-milli-seconds to 
compute the bounding box of a 100,000 point polygon, but several hundred 
milliseconds to pull it from disk/TOAST.

It should be calculated in double-precision since storing it in floating 
precision is only going to save you 16 bytes on a 8kb+ geometry (0.2%).

For sequential scans, you'll either have a large table (in which case 
your disk cache is unlikely to prevent you from reading the whole table 
each time - meaning bounding box computation is MUCH smaller than 
pulling from the disk) or a small table (in which case the overhead of 
computing the bounding boxes is small since there's so few points in 
your geometries).

Currently, I'm leaning towards the "never pre-calculate the bounding box 
in the geometry".  Of course, the index will have the BOX2DFLOAT4 
pre-calculated.

I think the pg_dump format of the geometry should be WKB (much like it 
is now for the WKB type).  Since we might be putting SRID's in, we could 
have their cannonical form look like:

'SRID=123;01010000000000000000D9BE40000000A0DF687641'

I like the WKB form for dumps (and for pretty much everything but a 
person looking at) because you dont have to worry about any numeric drift.
We'll still have convertion to and from WKT for easy-readability.

Other people might not like this, so I'd be willing to have the "normal" 
form be WKT. But I think the pg_dump format should be WKB.


On Datatypes, I'm reluctant to use anything but doubles.  I realized 
that going to int32s or float32s could save us just under 50% storage, 
so I'm sympathetic.  There is enough space left in the byte type (esp if 
we remove the bounding box flag) to put in something like:

00   - float64
01   - int64
10   - float32
11   - int32

The serialized form (and actual analysis) could still be doubles (THIS 
HAS IMPLICATIONS), and you could change the format like:

UPDATE <table> SET mygeom = lggeom_tofloat32(mygeom);

    NOTE: the WKB representation (and the pg_dump format) will be
          converting all these floats to doubles.

I think we're going to cause more confusion and precision issues than 
we'll gain in space savings.  I dont think this is something we'll do 
right-away, but I can leave space available in the type and design to 
account for it.


Yes, there could possibily be problems in the LWGEOM->WKB->POSTGIS and 
back transliterations.  There shouldnt be any (if any of those 
conversion isnt 100% bug-free we're in real trouble anyways), but it is 
making things complex.  Ideally, we'll be transfering all the PostGIS 
functionality to the new type over time.
Most of the PostGIS code is 'simple' - the only core portion that's 
difficult is the WKT parser/outputer (I'd love to have a brand new one 
for the LWGEOM).  The other complexities are in functinality like GEOS, 
shpherical calculations, projection, and distance() - reworking them for 
LWGEOM shouldnt be difficult.

dave








More information about the postgis-users mailing list