[postgis-devel] LWGEOM -- inital version ready for testing

Fri May 7 01:52:45 PDT 2004

Hi Dave,

> Ya - I wasnt feeling The Love...

Yeah, I'm quite surprised that no-one else has commented on this
too.....?

> I'll see what I can do.  Its a bit tricky to have a single command 
> return multiple columns of data, but I can easily make 
> lwgeom_rawinfo_size()-type functions and a summary function 
> that would 
> give a text version of all the above info.

Cool.

> > 3. Should we prevent users from adding bounding boxes to point 
> > columns? (i.e. is the
> >    single/double precision conversion fast enough to make 
> this a waste 
> > of disk space?)
> 
> I think we should make a decision on if we're going to always have 
> bbounding box automatically added or not.
> 
> For most people, not having bounding boxes is the "best" option - the 
> geometries are small, and simple queries arent noticably slower.
> 
> For queries like:
> SELECT * FROM <table> WHERE lwgeom && '<geom>';
> 
> You will not miss the bounding boxes inside the geometries because it 
> will be looking at the pre-generated bounding boxes in the index. 
> Unfortunately, because of the way GiST does its searching this is 
> actually faster:
> 
> SELECT * FROM <table> WHERE lwgeom && AddBBox('<geom>');
> 
> Because GiST will ask for the bounding box of the search 
> geometry many 
> many times during the index scan (once for every level in the 
> tree, then 
> once for each tuple in the index leaf [about 140]).  You'll 
> probably not 
> notice a speed difference as it usually just a few milliseconds.  I 
> tried to get GiST to pre-cache the bounding box of the search 
> geometry, 
> but I havent been able to do it - its a bit silly.
> 
> 
> When you start cross-joining tables - a query that does a lot of 
> sub-index scans, adding the bounding box significantly improves 
> performance.  Crossing a 10,000 row table with itself takes about 2 
> second when there's bounding boxes but about 20 seconds when 
> there's not.

Ah I see, although this wasn't what I was thinking when I wrote the
email! I was thinking about that since a point has its own bounding box
then would it be a waste of space to specify the same information
contained with the point from the bounding box.

(cut)

> > 5. As far as I can see, assuming a non-index scan, the LWGEOM 
> > operators call the box2d_*
> >    functions directly which is defined using float4s. It looks like 
> > this is contrary to
> >    the OGC spec since all coordinates (and therefore I would guess
> > operators) are defined 
> >    as doubles? :(.
> > 
> >    I guess that we would need to maintain a box2d type which uses 
> > doubles as well as
> >    floats and use this for all the LWGEOM operators/functions (the 
> > box2d float4 would
> >    still be used for the indexes). Here it would be 
> compulsory to add
> > RECHECK to the 
> >    operator classes since when expanding the box2d(double) to
> > box2d(float) extra 
> >    geometries may be returned by an overlap calculation. The RECHECK
> > would ensure that 
> >    these would be stripped out before the result set was returned.
> 
> None of the LWGEOM operators are defined by the OGC - they're there 
> because the GiST index needs them. When you do a "<geom1> && 
> <geom2>", 
> you should actually be calling the GEOS "intersects(geom1,geom2)".
> 
> I must admit that the only operator I've actually ever used 
> is the "&&". 
>   The way BOX2Ds are formed, you'll always get an 
> 'appropriate' answer.
> 
> Its a bit more complex for the other operators, but you'll 
> usually get 
> the correct answer.
> 
> If you want to do things in double-precision, you can create 
> double-precision bounding boxes (BOX3D) from lwgeoms "box3d(lwgeom)".
> 
> I understand you point, but I think it's a lot of work (mostly 
> computation) to do things in double when the single-precision results 
> are "good enough".
> 
> If people feel strongly on this, it isnt difficult to make 
> the change - 
> but it will have to compute the double-precision bounding box 
> every time 
> since there's no way to pre-compute it.

My only concern would that LWGEOM would fail an OGC regression test that
used data up to the full precision of a double. I also know that some of
our lat/long datasets can have precision because they were geocoded
against hi-resolution raster imagery. Although with LWGEOM, I thought
that internally everything would still be a double except for the
bounding boxes in the GiST index? Then again, thinking about it now, I
suppose we would need the 'true' double precision bounding box stored in
the actual geometry for the RECHECK operator.

> You should find that it performs a wee bit slower than 
> postgis, but it 
> takes *significantly* less space.

Interesting. So while it may be slightly slower to begin with, it should
still scale better for the reason that there is less data to pull off
disk, no?

Cheers,

Mark.

---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446

This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.