[postgis-devel] LWGEOM -- inital version ready for testing

Tue May 4 10:34:39 PDT 2004

Mark Cave-Ayland wrote:

> Firstly a big well done for the work you've put into the LWGEOM - I had
> hoped to have taken a more active role in developing the code but
> unfortunately my time has been needed on other things :)

Taking a look at it will help me a lot.

> Anyway, haven't seen many responses from the other guys on the list so

Ya - I wasnt feeling The Love...

> 2. At this stage it think it would be useful to include some sort of
> debugging function to 
>    dump the LWGEOM format into a human-readable form, for example:
> 
> 	> SELECT raw_lwgeom(geom) FROM geomtable;
> 	
> 	| size | has_srid | has_bbox | dimensionality | wkb_type |
> wkt_geom | wkb_geom | 
> 	
> +------+----------+----------+----------------+----------+----------+---
> -------+
> 
> 
>    I can see this being useful for helping users in situations where
> they are not sure 
>    exactly how they have configured the particular geometry in the
> table. What do you
>    think?

I'll see what I can do.  Its a bit tricky to have a single command 
return multiple columns of data, but I can easily make 
lwgeom_rawinfo_size()-type functions and a summary function that would 
give a text version of all the above info.

> 3. Should we prevent users from adding bounding boxes to point columns?
> (i.e. is the 
>    single/double precision conversion fast enough to make this a waste
> of disk space?)

I think we should make a decision on if we're going to always have 
bbounding box automatically added or not.

For most people, not having bounding boxes is the "best" option - the 
geometries are small, and simple queries arent noticably slower.

For queries like:
SELECT * FROM <table> WHERE lwgeom && '<geom>';

You will not miss the bounding boxes inside the geometries because it 
will be looking at the pre-generated bounding boxes in the index. 
Unfortunately, because of the way GiST does its searching this is 
actually faster:

SELECT * FROM <table> WHERE lwgeom && AddBBox('<geom>');

Because GiST will ask for the bounding box of the search geometry many 
many times during the index scan (once for every level in the tree, then 
once for each tuple in the index leaf [about 140]).  You'll probably not 
notice a speed difference as it usually just a few milliseconds.  I 
tried to get GiST to pre-cache the bounding box of the search geometry, 
but I havent been able to do it - its a bit silly.

When you start cross-joining tables - a query that does a lot of 
sub-index scans, adding the bounding box significantly improves 
performance.  Crossing a 10,000 row table with itself takes about 2 
second when there's bounding boxes but about 20 seconds when there's not.

> 4. I think it would be useful to include BBOXes in LWGEOM by default. My
> thinking here 
>    would be that those users who would be knowlegable enough to be
> concerned about the 
>    space saving will more than likely to be knowledgable enough to
> remove it where as new 
>    users may get confused when certain queries that used to perform well
> in PostGIS 
>    perform poorly using LWGEOM.

This is a good point.

> 5. As far as I can see, assuming a non-index scan, the LWGEOM operators
> call the box2d_* 
>    functions directly which is defined using float4s. It looks like this
> is contrary to 
>    the OGC spec since all coordinates (and therefore I would guess
> operators) are defined 
>    as doubles? :(.
> 
>    I guess that we would need to maintain a box2d type which uses
> doubles as well as 
>    floats and use this for all the LWGEOM operators/functions (the box2d
> float4 would  
>    still be used for the indexes). Here it would be compulsory to add
> RECHECK to the 
>    operator classes since when expanding the box2d(double) to
> box2d(float) extra 
>    geometries may be returned by an overlap calculation. The RECHECK
> would ensure that 
>    these would be stripped out before the result set was returned.

None of the LWGEOM operators are defined by the OGC - they're there 
because the GiST index needs them. When you do a "<geom1> && <geom2>", 
you should actually be calling the GEOS "intersects(geom1,geom2)".

I must admit that the only operator I've actually ever used is the "&&". 
  The way BOX2Ds are formed, you'll always get an 'appropriate' answer.

Its a bit more complex for the other operators, but you'll usually get 
the correct answer.

If you want to do things in double-precision, you can create 
double-precision bounding boxes (BOX3D) from lwgeoms "box3d(lwgeom)".

I understand you point, but I think it's a lot of work (mostly 
computation) to do things in double when the single-precision results 
are "good enough".

If people feel strongly on this, it isnt difficult to make the change - 
but it will have to compute the double-precision bounding box every time 
since there's no way to pre-compute it.

> 6. It looks like you can add bboxes and srids to a geometry but not
> remove them? 

Ya - I need to add this ability.  I'll add it once we decide if we're 
going to have bounding boxes by default or not.

>Also if I 
>    follow the instructions in the README to add bounding boxes to a
> column:
> 
>    DROP INDEX <lwgeom index name>;
>    UPDATE <table> SET <lwgeom column> = AddBBOX(<lwgeom column>);
>    CREATE INDEX <lwgeom index name> ON <table> USING GIST 
>      (<lwgeom column> GIST_LWGEOM_OPS);
>    VACUUM ANALYSE <table>;
> 
>    That would seem to work great. My question is now if I add more
> geometries to the    
>    table, would these new geometries not have a bounding box attached?
> Would the only way   
>    to add bounding boxes to these geometries be by running the update
> query and reindexing 
>    again?

Yes - this is a problem.  You'll have a mix of geometries with and 
without bounding boxes.

> Anyway glad to see that this stuff is becoming a reality, and I hope
> that this will get some discussion started on the list about your work
> so far :) If I get a chance I will try and setup a test database with
> some sample data and see how it performs....

You should find that it performs a wee bit slower than postgis, but it 
takes *significantly* less space.

To find out how "big" things are:

1) vacuum analyse; --- very important!
2) SELECT relname, relpages FROM pg_class ORDER BY relpages DESC; 	
		-- gives size in disk pages (probably 8k)
You'll notice that there will often be a toast table associated with a 
geometry table if you have "large" geometries.

Thanks for the comments, looks like theres a few things to change.

dave