[postgis-devel] LWGEOM -- inital version ready for testing

Fri May 7 09:40:07 PDT 2004

Mark Cave-Ayland wrote:

> Ah I see, although this wasn't what I was thinking when I wrote the
> email! I was thinking about that since a point has its own bounding box
> then would it be a waste of space to specify the same information
> contained with the point from the bounding box.

True, but there is still significant overhead of looking at the 
geometry, checking to see if its a point, then constructing the BOX3D, 
then converting it to a BOX2DFLOAT4.  It not much overhead, but it adds 
up for the nested queries.

> My only concern would that LWGEOM would fail an OGC regression test that
> used data up to the full precision of a double. I also know that some of
> our lat/long datasets can have precision because they were geocoded
> against hi-resolution raster imagery. Although with LWGEOM, I thought
> that internally everything would still be a double except for the
> bounding boxes in the GiST index? Then again, thinking about it now, I
> suppose we would need the 'true' double precision bounding box stored in
> the actual geometry for the RECHECK operator.

None of the operators are in the OGC spec.  The one's I'm talking about 
are "&&", and the almost-never used ones like '<&'.  Personally, I'd 
like to see all but && and maybe contains/contained removed.

If you were to do a intersects(g1,g2), you'll be using the actual 
double-precision coordinates.

The idea of the operators is to do two-stage queries:

SELECT ... FROM <table> WHERE g && <geometry>
                          AND  <actual function>;

As long as the first stage (&&) give you correct results, the 2nd stage 
will always work.  The way I construct the box2d, the && operator will 
never give you a false negative.

>>You should find that it performs a wee bit slower than 
>>postgis, but it 
>>takes *significantly* less space.
> 
> 
> Interesting. So while it may be slightly slower to begin with, it should
> still scale better for the reason that there is less data to pull off
> disk, no?

Well - this depends on how well your system disk cache is working.  Its 
really hard to test true speed because it takes a LONG time to be 
reasonably sure the cache is empty (know any way to flush the cache 
under linux?).

I'm just in process of creating a table with 3 geometry columns in it - 
a 2 point LINESTRING and two points.  There's going to be 17,000,000 
entries in it.

Under PostGIS, the table would take 7.4 Gb. Under LWGEOM, it only takes 
a little over 1Gb.  You can imagine how much faster a sequential scan 
would go on the LWGEOM.  For index searches, it much more likely that 
the disk cache would actually help.

If the LWGEOM stuff was a little more tested, I'd use it - it takes 4 
days of processing to make the table, so I dont want to screw it up.

dave