[postgis-devel] Caching Double-based Boxes

Paragon Corporation lr at pcorp.us
Thu Nov 24 18:48:40 PST 2011


Paul,
I guess I'm still against the idea, but I'm not sure my reasons are sound
since I suspect my knowledge of the subject is less
than the aforementioned folks for it.

Here are my issues. 
Pro -
I do really like the benefit that functions won't need to recompute the bbox
if they need an accurate bbox
or we have the issue that some cached bboxes of same geometries have
different cached bboxes depending on which function is putting it 
in.  

Cons -  
1) If your index is still float while your cached box is double, you still
have the unpleasant issue
of your indexed search possibly giving a different answer from the
non-indexed one (and in fact more so).  Or am I mistaken?
This falls more along Nik's issue.  I ssupect completely satisfying Nik's
issue would require yet another dump / reload.
So if we are only going to partly solve it I'd rather not just partly so.

2) This still requires a dump / reload for those using PostGIS 2.0 in a
pseudo production environment
or am I mistaken? 
Granted we never promised people wouldn't have to do this so its just a
minor concern, but I suspect there are a few people who jumped in because of
the
raster functionality and some of the new geometry functions.  This is more
of an issue for raster folks since
raster data tends to be much larger than geometry and thus more of a pain to
reload.

3) Disk space.  So how much exactly are we talking about here in a good
case?
10% above what the cached box used to take up?

Keep in mind I'm not so much concerned about on disk space as I am about the
increase in time to do
inserts / updates which may be negligable relative to the size of the
geometries except for the case of points and small line strings.  
Would require testing though to see the impact.

Thanks,
Regina



> -----Original Message-----
> From: postgis-devel-bounces at postgis.refractions.net 
> [mailto:postgis-devel-bounces at postgis.refractions.net] On 
> Behalf Of Paul Ramsey
> Sent: Thursday, November 24, 2011 5:13 PM
> To: PostGIS Development Discussion
> Subject: [postgis-devel] Caching Double-based Boxes
> 
> http://svn.osgeo.org/postgis/spike/pramsey/doublebox
> 
> We've been around the horn on this issue a number of times...
> - Mark has stated his preference for having double boxes 
> cached with the geometries
> - Sandro wants to ensure that the bbox attribute of lwgeom is 
> deterministic with respect to the underlying content
> - Nik wants the indexes to be double based so the distance 
> queries on the index are deterministic with respect to the 
> underlying content This change goes 2/3 of the way there.
> It alters the gserialized byte stream so that the cached box 
> is a double box instead of a float box It continues to use 
> float boxes as the key in the indexes It continues to use 
> float boxes for operator tests (&&, &&&) It ensures 
> (probably, I had to remove some recent hooks Sandro added) 
> that the bbox on the lwgeom is the double-based minimum 
> bounding rectangle, always, not sometimes Because the index 
> keys remain floats the index distance searches (<#>,
> <->) will continue to be sloppy.
> 
> As far as I can see, the main gain is that when you 
> deserialize a geometry, the bbox hanging off the lwgeom is 
> always going to be exact.
> The downside is that the on disk size of the geometries 
> (though not the indexes) is going to be larger. The worse 
> case scenario is the three-point-line which will be 28% 
> larger under this regime.
> 
> I'm interested to hear if we want to commit to this change, 
> or just leave well-enough alone and stick with floats in the 
> serialization.
> 
> P.
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> 





More information about the postgis-devel mailing list