[postgis-devel] Caching Double-based Boxes

Paragon Corporation lr at pcorp.us
Thu Nov 24 19:16:46 PST 2011


Just another general comment.  Given all we have going in 2.0, I'd be pretty
impressed if our 2.1 release DOESN'T require a dump reload of people's data.

That said anything that requires a dump /reload or requires us pushing our
release date any further, or only half thought of, I'd rather punt to 2.1.

I would like to see all our cached boxes computed the same way though.
Which to me requires some cleaup under the hood without affecting users and
does get us closer
to the double precision dream.

Thanks,
Regina

 

> -----Original Message-----
> From: postgis-devel-bounces at postgis.refractions.net 
> [mailto:postgis-devel-bounces at postgis.refractions.net] On 
> Behalf Of Paragon Corporation
> Sent: Thursday, November 24, 2011 9:49 PM
> To: 'PostGIS Development Discussion'
> Subject: Re: [postgis-devel] Caching Double-based Boxes
> 
> Paul,
> I guess I'm still against the idea, but I'm not sure my 
> reasons are sound since I suspect my knowledge of the subject 
> is less than the aforementioned folks for it.
> 
> Here are my issues. 
> Pro -
> I do really like the benefit that functions won't need to 
> recompute the bbox if they need an accurate bbox or we have 
> the issue that some cached bboxes of same geometries have 
> different cached bboxes depending on which function is 
> putting it in.  
> 
> Cons -
> 1) If your index is still float while your cached box is 
> double, you still have the unpleasant issue of your indexed 
> search possibly giving a different answer from the 
> non-indexed one (and in fact more so).  Or am I mistaken?
> This falls more along Nik's issue.  I ssupect completely 
> satisfying Nik's issue would require yet another dump / reload.
> So if we are only going to partly solve it I'd rather not 
> just partly so.
> 
> 2) This still requires a dump / reload for those using 
> PostGIS 2.0 in a pseudo production environment or am I mistaken? 
> Granted we never promised people wouldn't have to do this so 
> its just a minor concern, but I suspect there are a few 
> people who jumped in because of the raster functionality and 
> some of the new geometry functions.  This is more of an issue 
> for raster folks since raster data tends to be much larger 
> than geometry and thus more of a pain to reload.
> 
> 3) Disk space.  So how much exactly are we talking about here 
> in a good case?
> 10% above what the cached box used to take up?
> 
> Keep in mind I'm not so much concerned about on disk space as 
> I am about the increase in time to do inserts / updates which 
> may be negligable relative to the size of the geometries 
> except for the case of points and small line strings.  
> Would require testing though to see the impact.
> 
> Thanks,
> Regina
> 
> 
> 
> > -----Original Message-----
> > From: postgis-devel-bounces at postgis.refractions.net
> > [mailto:postgis-devel-bounces at postgis.refractions.net] On Behalf Of 
> > Paul Ramsey
> > Sent: Thursday, November 24, 2011 5:13 PM
> > To: PostGIS Development Discussion
> > Subject: [postgis-devel] Caching Double-based Boxes
> > 
> > http://svn.osgeo.org/postgis/spike/pramsey/doublebox
> > 
> > We've been around the horn on this issue a number of times...
> > - Mark has stated his preference for having double boxes 
> cached with 
> > the geometries
> > - Sandro wants to ensure that the bbox attribute of lwgeom is 
> > deterministic with respect to the underlying content
> > - Nik wants the indexes to be double based so the distance 
> queries on 
> > the index are deterministic with respect to the underlying content 
> > This change goes 2/3 of the way there.
> > It alters the gserialized byte stream so that the cached box is a 
> > double box instead of a float box It continues to use float 
> boxes as 
> > the key in the indexes It continues to use float boxes for operator 
> > tests (&&, &&&) It ensures (probably, I had to remove some recent 
> > hooks Sandro added) that the bbox on the lwgeom is the double-based 
> > minimum bounding rectangle, always, not sometimes Because the index 
> > keys remain floats the index distance searches (<#>,
> > <->) will continue to be sloppy.
> > 
> > As far as I can see, the main gain is that when you deserialize a 
> > geometry, the bbox hanging off the lwgeom is always going 
> to be exact.
> > The downside is that the on disk size of the geometries (though not 
> > the indexes) is going to be larger. The worse case scenario is the 
> > three-point-line which will be 28% larger under this regime.
> > 
> > I'm interested to hear if we want to commit to this change, or just 
> > leave well-enough alone and stick with floats in the serialization.
> > 
> > P.
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-devel
> > 
> 
> 
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> 





More information about the postgis-devel mailing list