[postgis-users] Light WeightLight Weight Geometry (LWGEOM)Proposal

Ralph Mason ralph.mason at telogis.com
Mon Feb 23 12:10:47 PST 2004


David Blasby wrote:

> There's been some discussion; I thought I'd put in a few words.
>
> The bounding box inside the geometry is just there for a speed 
> improvement.  I would only put it in "large" geometries - and then its 
> likely not much of a speed improvement (look at the time it takes to 
> pull a large geometry from disk relative to the speed it take to 
> compute its bounding box).  It might take a few 
> tenths-of-milli-seconds to compute the bounding box of a 100,000 point 
> polygon, but several hundred milliseconds to pull it from disk/TOAST.
>
> It should be calculated in double-precision since storing it in 
> floating precision is only going to save you 16 bytes on a 8kb+ 
> geometry (0.2%).

How does the decision to store a box or not to store one for a given 
geometry get made?

> For sequential scans, you'll either have a large table (in which case 
> your disk cache is unlikely to prevent you from reading the whole 
> table each time - meaning bounding box computation is MUCH smaller 
> than pulling from the disk) or a small table (in which case the 
> overhead of computing the bounding boxes is small since there's so few 
> points in your geometries).
>
> Currently, I'm leaning towards the "never pre-calculate the bounding 
> box in the geometry".  Of course, the index will have the BOX2DFLOAT4 
> pre-calculated.
>
> I think the pg_dump format of the geometry should be WKB (much like it 
> is now for the WKB type).  Since we might be putting SRID's in, we 
> could have their cannonical form look like:
>
> 'SRID=123;01010000000000000000D9BE40000000A0DF687641'
>
> I like the WKB form for dumps (and for pretty much everything but a 
> person looking at) because you dont have to worry about any numeric 
> drift.
> We'll still have convertion to and from WKT for easy-readability.
>
> Other people might not like this, so I'd be willing to have the 
> "normal" form be WKT. But I think the pg_dump format should be WKB.
>
>
> On Datatypes, I'm reluctant to use anything but doubles.  I realized 
> that going to int32s or float32s could save us just under 50% storage, 
> so I'm sympathetic.  There is enough space left in the byte type (esp 
> if we remove the bounding box flag) to put in something like:
> 00   - float64

> 01   - int64
> 10   - float32
> 11   - int32
>
> The serialized form (and actual analysis) could still be doubles (THIS 
> HAS IMPLICATIONS), and you could change the format like:

After writing the previous (at 1 am in the morning NZ time) I thought of 
the uselessness of trying to bundle all those types into a single 
geometry field. It makes things complicated and the data bigger. So I 
propose this (and it doesn't stop the go ahead of the double version)

LW_GEOM - Uses double  Totally compliant
LW_GEOMF4 - Uses floats (same internal structure as LW_GEOM but with floats)
LW_GEOMI4 - Uses int32 ( .. .. .. )
etc ..
etc ..
etc ..

The internal structure of each type is exactly the same except the data 
types change.  You just create a table with a LW_GEOMI4 column or an 
LW_GEOM column. So it's a per column choice (which is the only way it 
makes sense really).

For implementation these could perhaps be templated to that a single 
source can generate them all.

The nice bit about this is you can make WKB from LW_GEOM and and 
extended WKB from the others.  They can all generate WKT which is a 
simple go between them.  So you can then just cast from one type to 
another when you pull it out or use ot between different table 
structures.  Seems nice and simple.

Also means  less checking and overloading of datatypes in the code ( a 
function that works with LW_GEOM only gets those) - thus less bugs.

>
> UPDATE <table> SET mygeom = lggeom_tofloat32(mygeom);
>
>    NOTE: the WKB representation (and the pg_dump format) will be
>          converting all these floats to doubles.
>
> I think we're going to cause more confusion and precision issues than 
> we'll gain in space savings.  I dont think this is something we'll do 
> right-away, but I can leave space available in the type and design to 
> account for it.
>
>
> Yes, there could possibily be problems in the LWGEOM->WKB->POSTGIS and 
> back transliterations.  There shouldnt be any (if any of those 
> conversion isnt 100% bug-free we're in real trouble anyways), but it 
> is making things complex.  Ideally, we'll be transfering all the 
> PostGIS functionality to the new type over time.
> Most of the PostGIS code is 'simple' - the only core portion that's 
> difficult is the WKT parser/outputer (I'd love to have a brand new one 
> for the LWGEOM).  The other complexities are in functinality like 
> GEOS, shpherical calculations, projection, and distance() - reworking 
> them for LWGEOM shouldnt be difficult.
>
> dave 

I really could see the current geometry being fully redundant.

Ralph



More information about the postgis-users mailing list