[postgis-users] question on gist performance

Paul Ramsey pramsey at refractions.net
Fri Sep 14 09:10:17 PDT 2007


This isn't a "bug" per se, because it's doing exactly what we "want", 
but I think it's very possible we don't "want" to do this.

We have so many other checks and balances with regard to SRID coherence 
(in index building, in the table constraints build by the default 
loader, in the geometry comparison functions) that we could drop this 
check for the sake of the extra performance benefit.

And if people use && on geometries of mixed SRID, tough on them.

P

Mark Cave-Ayland wrote:
> On Thu, 2007-09-13 at 15:26 +0200, Stefan Zweig wrote:
>> Mark,
>>
>> thank you very much for your response. As far as I understand, pre-computed bounding boxes are stored (by default) to each Geometry object (wherever that is) to "make bounding box based queries faster", see
>>
>> http://main.merlin.com.ua/doc/postgis/docs/ch06.html#hasbbox
>> http://main.merlin.com.ua/doc/postgis/docs/ch06.html#addbbox
>>
>> A full scan wouldn't be necessary if hasbbox(the_geom) returns true, which would make *much* sense here. Obviously this cached information is not evaluated when determining the bounding box while RECHECKing? Is this a "bug"?
> 
> Sure. But the bounding box is stored as part of the heap geometry, so
> for every index match PostgreSQL must load the entire geometry into
> memory first and decompress it before it can get to the bounding box
> info. If you have a geometry with several thousand points, the idea of
> the BBOX cache is to ensure that SELECT extent(my_big_geom) returns in a
> fixed amount of time, rather than having to linearly scan the entire set
> of points within the geometry.
> 
> I'm not saying there definitely isn't a bug - did you try removing the
> RECHECK clause as per my previous email? If that doesn't solve the
> problem then send a test case and I will look into it.
> 
>> Mark, I have another question. I want to find out the *fastest* way to retrieve the binary content of Geometry objects to perform conversions later on. I found that SELECT the_geom returns a Geometry object in the hexewkb format and does not require any conversion operations by the database. Now, some questions occurred:
>>
>> a) Each selected Geometry object seems always to be copied into memory before one can access it, even static object data. Is there any "clean" way to access the raw data stored in the associated table files?
> 
> If you want to delve into PostgreSQL innards, you could open the files
> directly and read the contents - but this really isn't recommended for
> beginners. Perhaps using a cursor would be useful to you?
> 
>> b) hexewkb data must always be converted to bytes first in order to get integers or doubles. Is there any way to store geometric data directly as ewkb or wkb (btw, this would save 50% disc space)?
> 
> All geometries are stored internally as binary - they are just converted
> to/from HEXEWKB when accepting input from the user or displaying it back
> to the user.
> 
>> c) double precision is not necessary to store coordinates if you know the SRID and the required accuracy. Is there any way to store geometric data in a format other than double, for example, float or int?
> 
> No is the quick answer to this.
> 
>> d) How can I find out what disc files belong to a table, and how much disc space a particular table needs?
> 
> You'll probably find this reference handy:
> http://www.postgresql.org/docs/8.2/interactive/functions-admin.html.
> 
> 
> ATB,
> 
> Mark.
> 


-- 

   Paul Ramsey
   Refractions Research
   http://www.refractions.net
   pramsey at refractions.net
   Phone: 250-383-3022
   Cell: 250-885-0632



More information about the postgis-users mailing list