[postgis-devel] Caching Double-based Boxes

Paul Ramsey pramsey at opengeo.org
Thu Nov 24 19:33:14 PST 2011


On Thu, Nov 24, 2011 at 6:48 PM, Paragon Corporation <lr at pcorp.us> wrote:
> Cons -
> 1) If your index is still float while your cached box is double, you still
> have the unpleasant issue
> of your indexed search possibly giving a different answer from the
> non-indexed one (and in fact more so).  Or am I mistaken?
> This falls more along Nik's issue.  I ssupect completely satisfying Nik's
> issue would require yet another dump / reload.
> So if we are only going to partly solve it I'd rather not just partly so.

Not really. The && operators all go through functions which force the
boxes into float space first, so both sequence scans and index
assisted runs of the operators return the same results. Important
stuff. I think there will be fewer instances of inconsistency with
this approach.

> 2) This still requires a dump / reload for those using PostGIS 2.0 in a
> pseudo production environment
> or am I mistaken?

Yes, this is going to force another dump/reload, as the serialization
has changed.

> 3) Disk space.  So how much exactly are we talking about here in a good
> case?
> 10% above what the cached box used to take up?

As noted, 28% worse in the very worst case (three-point line). In the
2D case, the 16byte box becomes a 32byte box. Note that two vertices
take 32bytes. So in a ten-point feature, we're talking about a
vanishing small difference.

(Why is a three-point line the worst case? Because I've been adding
infrastructure to allow two-point line and one-entry multi-points to
go box-less, since their boxes are implicit in their structure.)

> Keep in mind I'm not so much concerned about on disk space as I am about the
> increase in time to do
> inserts / updates which may be negligable relative to the size of the
> geometries except for the case of points and small line strings.
> Would require testing though to see the impact.

This is why I've posted, I'm hoping someone *cough* will do a little
ground truthing. The code is there on the spike
<http://svn.osgeo.org/postgis/spike/pramsey/doublebox>, it compiles
and passes its regressions. (Had to change a few tests to do with
object sizes, otherwise no regressions.)

There's no further work to do, if we want this change, we merge it in.

Paul

> Thanks,
> Regina
>
>
>
>> -----Original Message-----
>> From: postgis-devel-bounces at postgis.refractions.net
>> [mailto:postgis-devel-bounces at postgis.refractions.net] On
>> Behalf Of Paul Ramsey
>> Sent: Thursday, November 24, 2011 5:13 PM
>> To: PostGIS Development Discussion
>> Subject: [postgis-devel] Caching Double-based Boxes
>>
>> http://svn.osgeo.org/postgis/spike/pramsey/doublebox
>>
>> We've been around the horn on this issue a number of times...
>> - Mark has stated his preference for having double boxes
>> cached with the geometries
>> - Sandro wants to ensure that the bbox attribute of lwgeom is
>> deterministic with respect to the underlying content
>> - Nik wants the indexes to be double based so the distance
>> queries on the index are deterministic with respect to the
>> underlying content This change goes 2/3 of the way there.
>> It alters the gserialized byte stream so that the cached box
>> is a double box instead of a float box It continues to use
>> float boxes as the key in the indexes It continues to use
>> float boxes for operator tests (&&, &&&) It ensures
>> (probably, I had to remove some recent hooks Sandro added)
>> that the bbox on the lwgeom is the double-based minimum
>> bounding rectangle, always, not sometimes Because the index
>> keys remain floats the index distance searches (<#>,
>> <->) will continue to be sloppy.
>>
>> As far as I can see, the main gain is that when you
>> deserialize a geometry, the bbox hanging off the lwgeom is
>> always going to be exact.
>> The downside is that the on disk size of the geometries
>> (though not the indexes) is going to be larger. The worse
>> case scenario is the three-point-line which will be 28%
>> larger under this regime.
>>
>> I'm interested to hear if we want to commit to this change,
>> or just leave well-enough alone and stick with floats in the
>> serialization.
>>
>> P.
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>
>
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>



More information about the postgis-devel mailing list