[postgis-devel] Caching Double-based Boxes

Nicklas Avén nicklas.aven at jordogskog.no
Thu Nov 24 23:11:06 PST 2011


I don't know if "my" issue has any practical impact. But just so I
understand. 

I have never thought about if before, but from this discussion I
understand that the boxes is stored in two places. Both the index and
together with the geometry. Am I right?

Just some very naive thoughts. 
We have no access to the index bbox to use in the functions? 
There is nothing that points to the index key from the geometry? Is it
so. It is the same reason we can't get the extent from a table from the
index?

It just seemed quite ugly and redundant when I thought about it.

/Nicklas

On Thu, 2011-11-24 at 19:33 -0800, Paul Ramsey wrote:
> On Thu, Nov 24, 2011 at 6:48 PM, Paragon Corporation <lr at pcorp.us> wrote:
> > Cons -
> > 1) If your index is still float while your cached box is double, you still
> > have the unpleasant issue
> > of your indexed search possibly giving a different answer from the
> > non-indexed one (and in fact more so).  Or am I mistaken?
> > This falls more along Nik's issue.  I ssupect completely satisfying Nik's
> > issue would require yet another dump / reload.
> > So if we are only going to partly solve it I'd rather not just partly so.
> 
> Not really. The && operators all go through functions which force the
> boxes into float space first, so both sequence scans and index
> assisted runs of the operators return the same results. Important
> stuff. I think there will be fewer instances of inconsistency with
> this approach.
> 
> > 2) This still requires a dump / reload for those using PostGIS 2.0 in a
> > pseudo production environment
> > or am I mistaken?
> 
> Yes, this is going to force another dump/reload, as the serialization
> has changed.
> 
> > 3) Disk space.  So how much exactly are we talking about here in a good
> > case?
> > 10% above what the cached box used to take up?
> 
> As noted, 28% worse in the very worst case (three-point line). In the
> 2D case, the 16byte box becomes a 32byte box. Note that two vertices
> take 32bytes. So in a ten-point feature, we're talking about a
> vanishing small difference.
> 
> (Why is a three-point line the worst case? Because I've been adding
> infrastructure to allow two-point line and one-entry multi-points to
> go box-less, since their boxes are implicit in their structure.)
> 
> > Keep in mind I'm not so much concerned about on disk space as I am about the
> > increase in time to do
> > inserts / updates which may be negligable relative to the size of the
> > geometries except for the case of points and small line strings.
> > Would require testing though to see the impact.
> 
> This is why I've posted, I'm hoping someone *cough* will do a little
> ground truthing. The code is there on the spike
> <http://svn.osgeo.org/postgis/spike/pramsey/doublebox>, it compiles
> and passes its regressions. (Had to change a few tests to do with
> object sizes, otherwise no regressions.)
> 
> There's no further work to do, if we want this change, we merge it in.
> 
> Paul
> 
> > Thanks,
> > Regina
> >
> >
> >
> >> -----Original Message-----
> >> From: postgis-devel-bounces at postgis.refractions.net
> >> [mailto:postgis-devel-bounces at postgis.refractions.net] On
> >> Behalf Of Paul Ramsey
> >> Sent: Thursday, November 24, 2011 5:13 PM
> >> To: PostGIS Development Discussion
> >> Subject: [postgis-devel] Caching Double-based Boxes
> >>
> >> http://svn.osgeo.org/postgis/spike/pramsey/doublebox
> >>
> >> We've been around the horn on this issue a number of times...
> >> - Mark has stated his preference for having double boxes
> >> cached with the geometries
> >> - Sandro wants to ensure that the bbox attribute of lwgeom is
> >> deterministic with respect to the underlying content
> >> - Nik wants the indexes to be double based so the distance
> >> queries on the index are deterministic with respect to the
> >> underlying content This change goes 2/3 of the way there.
> >> It alters the gserialized byte stream so that the cached box
> >> is a double box instead of a float box It continues to use
> >> float boxes as the key in the indexes It continues to use
> >> float boxes for operator tests (&&, &&&) It ensures
> >> (probably, I had to remove some recent hooks Sandro added)
> >> that the bbox on the lwgeom is the double-based minimum
> >> bounding rectangle, always, not sometimes Because the index
> >> keys remain floats the index distance searches (<#>,
> >> <->) will continue to be sloppy.
> >>
> >> As far as I can see, the main gain is that when you
> >> deserialize a geometry, the bbox hanging off the lwgeom is
> >> always going to be exact.
> >> The downside is that the on disk size of the geometries
> >> (though not the indexes) is going to be larger. The worse
> >> case scenario is the three-point-line which will be 28%
> >> larger under this regime.
> >>
> >> I'm interested to hear if we want to commit to this change,
> >> or just leave well-enough alone and stick with floats in the
> >> serialization.
> >>
> >> P.
> >> _______________________________________________
> >> postgis-devel mailing list
> >> postgis-devel at postgis.refractions.net
> >> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >>
> >
> >
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> 





More information about the postgis-devel mailing list