[postgis-users] Improving Performance

Ralph Mason ralph.mason at telogis.com
Thu Feb 19 14:58:58 PST 2004


David Blasby wrote:

> Ralph Mason wrote:
>
>> I think it wouldn't be to difficult to benchmark the bounding box 
>> being floats / doubles / gone altogether.  The optimum setting is 
>> likely to be based on the actual machine it's running on.  I wouldn't 
>> be surprised if no bounding box was actually the fastest on a modern 
>> fast processor with a good sized cache.
>
>
> You could be right - for simple geometries with just a few points, 
> constructing the bounding box 'when required' could be more efficient 
> than storing the geometries.
>
> This is certainly true for index searches (since the bounding box is 
> pre-computed and stored in the index).  This is the normal use for 
> bounding boxes operations.  The only other real uses for them are (1) 
> the "funny" bbox operations (contains/within) which arent used very 
> often and (2) envelope() function calls.  The impact shouldnt be too bad.
>
> Unfortunately, for sequencial scans, we would have to compute the 
> bounding box for all the geometries in a table for each query.  This 
> is a fairly high-cost operation - its unlikely that the CPU time spent 
> computing these will be lower than the disk access time for loading 
> the extra 16 bytes/geometry.

As the aim here is performance one would hope that you are never doing 
sequential scans - so the bounding boxes in the index will suffice.  A 
configure option to do away with that bounding box would be great, as 
all users are not the same. 16 bytes is 16 bytes, when you multiply that 
my millions it adds up very quickly. The 'default' could be to have 
float bounding boxes.

>> I don't think it needs to lead to a proliferation of types.  Just 
>> another type geometry_2d or something like that.  I am also in favor 
>> or removing the projection, so that functions working with 2d 
>> geometrys don't need to consider it.
>> It would be interesting to know for sure, but I suspect that most 
>> users of postgis are using 2d geometries and all their data is in one 
>> projection.  Meaning that a faster smaller 2d type would probably 
>> make up the bulk of the use, with the full geometry being used to 
>> massage data into the smaller type.
>
>
> This is why I'm promoting the WKB version.  The WKB version supports 
> both 2d and 3d points.  The OGC SF SQL has the full definition of WKB, 
> but here's a 2d point and 3d point:
>
> 2d point (25 bytes):
> <int32> // postgresql variable-length datatype overhead
> <byte>  //endian flag
> <int32> // type ("2d point")
> <double>// "X"
> <double>// "Y"
>
> 3d point (33 bytes):
> <int32> // postgresql variable-length datatype overhead
> <byte> //endian flag
> <int32> //type ("3d point")
> <double>// "X"
> <double>// "Y"
> <double>// "Z"
>
> The definition of things like linestring, polygon, multipoint, 
> multilinestring, multipolygon, and geometrycollection are pretty much 
> straight forward.

I like this idea, nice and simple - however I would be inclined to dump 
the edian flag so that it can line up on 4 byte boundaries, this is 
likely to have a good positive effect on codesize and performance (esp 
on machines that can't easily do unaligned memory accesses).  The WBK to 
and from functions can add as whatever edieness the machine it's running 
on is.


> If we do full support for WKB, then you'll be able to store 2d and 3d 
> geometries natively and effienctly!
>
Very nice.

> NOTE: I havent put an SRID (int32) in these structures.
>
It seems that a little flexibility is lost here, although in the very 
few cases where it's used on a per row basis  the user they could store 
the SRID in the table themselves.

SELECT    asBinary(Transform(setSRID(GEOM_COL,SRID_COL), 222)) ;

> Logically, you should still be able to do something like this:
>
> SELECT    asBinary(Transform(setSRID(<wkb>,111), 222)) ;
>
> This will convert the WKB to GEOMETRY, give it an SRID of 111.  The 
> geometry would then be transformed to SRID 222.  Then its converted 
> back to a WKB.
>
> You'll also be able to do things like:
>
> SELECT asBinary(  intersection(<WKB 1>, <WKB 2>)) ;
>
On the downside it seems like a big change thought, that affects most of 
the postgis code.

Ralph




More information about the postgis-users mailing list