[postgis-devel] Coordinate compression for native PostGIS geometry storage?

Martin Davis mbdavis at refractions.net
Tue Mar 10 08:45:59 PDT 2009


Thanks, Mark.  Interesting - I hadn't realized that PG already 
compresses.  I can imagine that access is expensive if decompression 
needs to be done every time!

I've done some further experimentation, and am no longer quite so 
enthusiastic that binary compression for floating-point coordinates will 
be enough of a savings to be worth.  Most geometry datasets I've looked 
at offer at most about 25% redundancy.  This probably isn't enough to 
bother with the extra code and performance hit.

Mark Cave-Ayland wrote:
> Martin Davis wrote:
>
>> Has anyone thought about or tested the idea of compressing the 
>> coordinate sequences in PostGIS' internal geometry data structure?
>> Typical sequences of ordinates for real-world geometries tend to have 
>> a significant number of identical bits at the high end of each number 
>> (sometimes as much as 40%).  With a suitable encoding method a lot of 
>> this redundancy could be squeezed out.
>> This is obviously a time/space trade-off - but CPU is so much faster 
>> than disk I/O it seems like it could be a win to compress.
>>
>> Perhaps something to discuss with the big brains at the code sprint...
>
> Martin,
>
> Just for reference PostgreSQL automatically compresses geometries 
> larger than 2K for storage on disk. However, we have seen that there 
> is considerable overhead involved with access because at the moment 
> the geometry has to be decompressed on *every* access to the geometry 
> which is expensive. There is talk of setting up a per-query cache to 
> help with this, but currently there is no complete implementation 
> available.
>
>
> HTH,
>
> Mark.
>

-- 
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022




More information about the postgis-devel mailing list