[postgis-devel] Bit setting to compress PostGIS geometries

Björn Harrtell bjorn.harrtell at gmail.com
Sat Mar 17 07:42:42 PDT 2018


Ah yes - getting the same results as in your blog post:

create table circles as
select st_buffer(st_makepoint(-180 + 360*random(), -90 + 180*random()),
random(), 128) as geom
from generate_series(1, 1.06e4);
select pg_size_pretty(pg_total_relation_size('"circles"'));
-- 100 MB

create table circles_9dig as select st_quantizegeometry(geom, array[9,9])
from circles;
select pg_size_pretty(pg_total_relation_size('"circles_9dig"'));
-- 77 MB

What made me confused was that when I tried it on some real world data it
had been snapped to grid and apparently that also removes noise:

create table circles_snap_test as select st_snaptogrid(geom, 0.0001) from
circles;
select pg_size_pretty(pg_total_relation_size('"circles_snap_test"'));
-- 82 MB

create table circles_snap_test as select st_snaptogrid(geom, 0.001) from
circles;
select pg_size_pretty(pg_total_relation_size('"circles_snap_test"'));
-- 41 MB

PS. My updated SQL implementation with fixed handling of polygon rings can
be found at
https://gist.github.com/bjornharrtell/30d804c5df44feb9003bb33a8cd1e508.

/Björn

2018-03-17 0:18 GMT+01:00 Daniel Baston <dbaston at gmail.com>:

> Are you now getting reasonable compression? The reason for negative
> precision is to represent values that are significant to the nearest 100,
> for example. (This is in a "digits of precision" world, not a "significant
> digits" world.)
>
> On Fri, Mar 16, 2018 at 5:48 PM, Björn Harrtell <bjorn.harrtell at gmail.com>
> wrote:
>
>> Had to slightly modify to make it work and I'm not sure why you want to
>> go into negative digit precision.
>>
>> SQL:
>>
>> WITH test AS (SELECT 'POINT (123.456789123456
>> 123.456789123456)'::geometry AS geom)
>> SELECT
>>   digits,
>>   encode(ST_QuantizeGeometry(geom, ARRAY[digits, digits, digits]),
>> 'hex'),
>>   ST_AsText(ST_QuantizeGeometry(geom, ARRAY[digits, digits, digits]))
>> FROM test, generate_series(15, -15, -1) AS digits;
>>
>> Result:
>>
>> 15 01010000005c9a72083cdd5e405c9a72083cdd5e40 POINT(123.456789123456
>> 123.456789123456)
>> 14 0101000000409a72083cdd5e40409a72083cdd5e40 POINT(123.456789123456
>> 123.456789123456)
>> 13 0101000000009a72083cdd5e40009a72083cdd5e40 POINT(123.456789123455
>> 123.456789123455)
>> 12 0101000000009072083cdd5e40009072083cdd5e40 POINT(123.456789123418
>> 123.456789123418)
>> 11 0101000000008072083cdd5e40008072083cdd5e40 POINT(123.45678912336
>> 123.45678912336)
>> 10 0101000000000070083cdd5e40000070083cdd5e40 POINT(123.456789121032
>> 123.456789121032)
>> 9 0101000000000040083cdd5e40000040083cdd5e40 POINT(123.456789076328
>> 123.456789076328)
>> 8 0101000000000000083cdd5e40000000083cdd5e40 POINT(123.456789016724
>> 123.456789016724)
>> 7 0101000000000000003cdd5e40000000003cdd5e40 POINT(123.456787109375
>> 123.456787109375)
>> 6 0101000000000000003cdd5e40000000003cdd5e40 POINT(123.456787109375
>> 123.456787109375)
>> 5 01010000000000000038dd5e400000000038dd5e40 POINT(123.45654296875
>> 123.45654296875)
>> 4 01010000000000000000dd5e400000000000dd5e40 POINT(123.453125 123.453125)
>> 3 01010000000000000000dc5e400000000000dc5e40 POINT(123.4375 123.4375)
>> 2 01010000000000000000c05e400000000000c05e40 POINT(123 123)
>> 1 01010000000000000000005e400000000000005e40 POINT(120 120)
>> 0 010100000000000000000050400000000000005040 POINT(64 64)
>> -1 010100000000000000000000400000000000000040 POINT(2 2)
>> -2 010100000000000000000000400000000000000040 POINT(2 2)
>> -3 010100000000000000000000400000000000000040 POINT(2 2)
>> -4 010100000000000000000000000000000000000000 POINT(0 0)
>> -5 010100000000000000000000000000000000000000 POINT(0 0)
>> -6 010100000000000000000000000000000000000000 POINT(0 0)
>> -7 010100000000000000000000000000000000000000 POINT(0 0)
>> -8 010100000000000000000000000000000000000000 POINT(0 0)
>> -9 010100000000000000000000000000000000000000 POINT(0 0)
>> -10 010100000000000000000000000000000000000000 POINT(0 0)
>> -11 010100000000000000000000000000000000000000 POINT(0 0)
>> -12 010100000000000000000000000000000000000000 POINT(0 0)
>> -13 010100000000000000000000000000000000000000 POINT(0 0)
>> -14 010100000000000000000000000000000000000000 POINT(0 0)
>> -15 010100000000000000000000000000000000000000 POINT(0 0)
>>
>> /Björn
>>
>> 2018-03-15 20:37 GMT+01:00 Daniel Baston <dbaston at gmail.com>:
>>
>>> Hi Björn,
>>>
>>> I haven't fully gone through the implementation (seems impressive,
>>> though!). If you run a query like the one in the docs (copied below) do you
>>> see a similar amount of trimming?
>>>
>>> WITH test AS (SELECT 'POINT (123.456789123456
>>> 123.456789123456)'::geometry AS geom)
>>> SELECT
>>>   digits,
>>>   encode(ST_QuantizeCoordinates(geom, digits), 'hex'),
>>>   ST_AsText(ST_QuantizeCoordinates(geom, digits))
>>> FROM test, generate_series(15, -15, -1) AS digits;
>>>
>>> Dan
>>>
>>> On Thu, Mar 15, 2018 at 3:22 PM, Björn Harrtell <
>>> bjorn.harrtell at gmail.com> wrote:
>>>
>>>> Apart from a bug with polygons that have more than one ring I think my
>>>> implementation does what it should but I'm seeing *very* limited space
>>>> reductions when going from 15 digits to 6. Any ideas why?
>>>>
>>>>
>>>> 2018-03-14 9:05 GMT+01:00 Björn Harrtell <bjorn.harrtell at gmail.com>:
>>>>
>>>>> Ok so I said why not implement using SQL (i.e PL/pgSQL) and I could
>>>>> not let it go even though it's clearly not a sane way of doing it.
>>>>>
>>>>> Note that second parameter is an array to be able to set the precision
>>>>> per ordinal. Here's the implementation for your pleasure:
>>>>>
>>>>> --example use
>>>>> select ST_AsText(ST_QuantizeGeometry(st_geomfromtext('POINT(123.12345
>>>>> 123.12345)'), ARRAY[5,5]));
>>>>> select ST_AsText(ST_QuantizeGeometry(st_geomfromtext('POINTZ(123.12345
>>>>> 123.12345 9.9999)'), ARRAY[3,3,1]));
>>>>> select ST_AsText(ST_QuantizeGeometry(st_geomfromtext('LINESTRING(123.12345
>>>>> 223.12345, 334.12345 434.12345, 999.999 999.999)'), ARRAY[1,1]));
>>>>> select ST_AsText(ST_QuantizeGeometry(st_geomfromtext('POLYGON((1 1,
>>>>> 2.555 2.555, 3 3, 1 1))'), ARRAY[1,3]));
>>>>>
>>>>> --drop type _quantize_geometry_context cascade;
>>>>> create type _quantize_geometry_context as (
>>>>> buffer bytea,
>>>>> endianness bool,
>>>>> off integer,
>>>>> masks bit(64)[],
>>>>> num_ordinals integer
>>>>> );
>>>>>
>>>>> --drop function _quantize_geometry_coord(c _quantize_geometry_context);
>>>>> create or replace function _quantize_geometry_coord(c
>>>>> _quantize_geometry_context) returns _quantize_geometry_context as
>>>>> $$
>>>>> declare
>>>>> masked_byte int;
>>>>> begin
>>>>> for o in 1 .. c.num_ordinals loop
>>>>> for i in 0 .. 7 loop
>>>>> masked_byte = get_byte(c.buffer, c.off) & substring(c.masks[o] from (i
>>>>> * 8) + 1 for 8)::int;
>>>>> c.buffer = set_byte(c.buffer, c.off, masked_byte);
>>>>> c.off = c.off + 1;
>>>>> end loop;
>>>>> end loop;
>>>>> return c;
>>>>> end;
>>>>> $$
>>>>> language plpgsql;
>>>>>
>>>>> --drop function _quantize_geometry_coords(c
>>>>> _quantize_geometry_context);
>>>>> create or replace function _quantize_geometry_coords(c
>>>>> _quantize_geometry_context) returns _quantize_geometry_context as
>>>>> $$
>>>>> declare
>>>>> num_coords int;
>>>>> begin
>>>>> num_coords = get_byte(c.buffer, c.off + 3);
>>>>> c.off = c.off + 4;
>>>>> for i in 0 .. num_coords - 1 loop
>>>>> c = _quantize_geometry_coord(c);
>>>>> end loop;
>>>>> return c;
>>>>> end;
>>>>> $$
>>>>> language plpgsql;
>>>>>
>>>>> --drop function ST_QuantizeGeometry(geom geometry, digits_precision
>>>>> integer[]);
>>>>> create or replace function ST_QuantizeGeometry(geom geometry,
>>>>> digits_precision integer[]) returns geometry as
>>>>> $$
>>>>> declare
>>>>> c _quantize_geometry_context;
>>>>> geomtype integer;
>>>>> bits_to_keep integer;
>>>>> ordinalstype integer;
>>>>> num_rings integer;
>>>>> num_objects integer;
>>>>> begin
>>>>> c.buffer = st_asbinary(geom, 'XDR');
>>>>> -- TODO: parse geometry type from wkb
>>>>> geomtype = st_dimension(geom)+1;
>>>>> -- TODO: parse number of ordinals from wkb
>>>>> c.num_ordinals = st_ndims(geom);
>>>>> for i in 1 .. c.num_ordinals loop
>>>>> bits_to_keep = ceil(digits_precision[i] / log(10, 2));
>>>>> c.masks[i] = B'1111111111111111111111111111
>>>>> 111111111111111111111111111111111111' << (52 - bits_to_keep);
>>>>> end loop;
>>>>> c.off = 5;
>>>>> if geomtype = 1 then
>>>>> c = _quantize_geometry_coord(c);
>>>>> elsif geomtype = 2 then
>>>>> c = _quantize_geometry_coords(c);
>>>>> elsif geomtype = 3 then
>>>>> num_rings = get_byte(c.buffer, c.off + 3);
>>>>> c.off = c.off + 4;
>>>>> for i in 0..num_rings-1 loop
>>>>> c = _quantize_geometry_coords(c);
>>>>> end loop;
>>>>>     else
>>>>>         raise 'Unsupported geometry type';
>>>>> end if;
>>>>> return st_geomfromwkb(c.buffer);
>>>>> end;
>>>>> $$
>>>>> language plpgsql;
>>>>>
>>>>> /Björn
>>>>>
>>>>> 2018-03-08 20:52 GMT+01:00 Regina Obe <lr at pcorp.us>:
>>>>>
>>>>>> I think the extra word is superfluous and hurts my fingers to type.
>>>>>>
>>>>>> Bur Dan likes it and he's the author so oh well.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: postgis-devel [mailto:postgis-devel-bounces at lists.osgeo.org]
>>>>>> On Behalf Of Paul Ramsey
>>>>>> Sent: Thursday, March 08, 2018 9:25 AM
>>>>>> To: PostGIS Development Discussion <postgis-devel at lists.osgeo.org>
>>>>>> Subject: Re: [postgis-devel] Bit setting to compress PostGIS
>>>>>> geometries
>>>>>>
>>>>>> ST_QuantizeCoordinates()?  or is the extra word superfluous?
>>>>>>
>>>>>> On Thu, Mar 8, 2018 at 6:21 AM, Daniel Baston <dbaston at gmail.com>
>>>>>> wrote:
>>>>>> > My understanding is that it should work on big-endian architectures,
>>>>>> > but I haven't tested it. Maybe Sandro will set up a new bot using
>>>>>> QEMU
>>>>>> > :)
>>>>>> >
>>>>>> > ST_Quantize, then? Or postgis_quantize_geometry?
>>>>>> >
>>>>>> > Dan
>>>>>> > _______________________________________________
>>>>>> > postgis-devel mailing list
>>>>>> > postgis-devel at lists.osgeo.org
>>>>>> > https://lists.osgeo.org/mailman/listinfo/postgis-devel
>>>>>> _______________________________________________
>>>>>> postgis-devel mailing list
>>>>>> postgis-devel at lists.osgeo.org
>>>>>> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> postgis-devel mailing list
>>>>>> postgis-devel at lists.osgeo.org
>>>>>> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> postgis-devel mailing list
>>>> postgis-devel at lists.osgeo.org
>>>> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20180317/4d0e1e09/attachment-0001.html>


More information about the postgis-devel mailing list