[postgis-devel] I'd like to remove this Toast discussion

Javier Santana jsantana at carto.com
Fri Jun 30 08:55:43 PDT 2017


Yes, we'd need to change all the way to pglz_compress.

So maybe the second option would be better (implement postgis own
compression/decompression)

On Fri, Jun 30, 2017 at 5:51 PM Paul Ramsey <pramsey at cleverelephant.ca>
wrote:

> I feel like if we did it PgSQL side we'd have to be further up the chain,
> working around the area of detoast_datum_slice. We might also need a change
> to the decompressor though, to support partial decompression. Annoying, I
> know I've looked at this problem several times and put it aside each time,
> but I cannot remember why :)
>
>
> P
>
> On Fri, Jun 30, 2017 at 8:44 AM, Javier Santana <jsantana at carto.com>
> wrote:
>
>>
>> When I implemented the st_getgeometrytype version [1] (that Paul finally
>> merged into postgis master) it was faster but Paul is right. My proposal
>> here would be to change the pglz_decompress [2] function, called
>> from PG_DETOAST_DATUM_SLICE (which is the macro we use to get a slice of
>> the data).
>>
>> The algorithm seems easy enough to change and it might get into postgres
>> 10.
>>
>> In any case, there are two things we could do to improve this:
>> 1) use the same pglz_compress postgres has but calling it from postgis
>> source and have a HEADER + COMPRESSED_GEOMETRY
>> 2) if we do that, we could encode the geometry by component so it
>>  compress better
>>
>> Cheers!
>>
>> [1]
>> https://github.com/javisantana/postgis/commit/fbc45b0486be558f77b8c89de0f3f30274be2b90
>> [2]
>> https://github.com/postgres/postgres/blob/master/src/common/pg_lzcompress.c#L682
>>
>> On Fri, Jun 30, 2017 at 5:18 PM Paul Ramsey <pramsey at cleverelephant.ca>
>> wrote:
>>
>>> On Thu, Jun 29, 2017 at 10:31 PM, Regina Obe <lr at pcorp.us> wrote:
>>>
>>>>
>>>> https://postgis.net/docs/performance_tips.html#small_tables_large_objects
>>>>
>>>> This is the last bit of business before I can release PostGIS 2.3.3.
>>>>
>>>> I'd like to get rid of this whole discussion of TOAST as I don't think
>>>> it's
>>>> relevant anymore.
>>>>
>>>> Am I correct in assuming that?
>>>>
>>>> My assumption of why I think it's not relevant anymore is that we store
>>>> cached boxes for most geometries except for Points and 2 point lines.
>>>> So isn't it so that even if there is no index on a geometry, we do not
>>>> need
>>>> to fully detoast (or possibly not to detoast at all) the geometry to
>>>> read
>>>> the bounding box.
>>>> That it is part of the head of the geometry?
>>>>
>>>
>>> No, that is not so. If the object is large enough that it requires
>>> toasting, the whole thing is stuffed off to the side table and replaced
>>> with a reference in the main tuple. In order to get at that bbox on the
>>> header, we have to .... go to the toast tables and re-assemble the thing.
>>>
>>> "Worse", since we use compression on large objects, the object is
>>> compressed in the side tables and is pulled back in its entirety, then
>>> decompressed, then just the header is read.
>>>
>>> Changing that would involve storing objects uncompressed and
>>> implementing any compression we wanted internal to the postgis library. So
>>> we'd have some logic to compress anything larger than N vertices, but would
>>> always leave the header uncompressed so we could peak at it for free.
>>>
>>> That would *still* leave the toasting problem, which I'd have to
>>> re-confirm whether a pg_datum_slice cleanly pulls out just one piece of the
>>> toasted object or actually pulls the whole thing back first.
>>>
>>> P.
>>>
>>>
>>>>
>>>> I did a quick test of my states table - 52 records (granted they aren't
>>>> huge, but toast size is about 14 MB)
>>>>
>>>> When I do something like below without forcing enable_seqscan = false;
>>>> it
>>>> doesn't use an index (well sometimes it does even without urging)
>>>>
>>>> SELECT stusps
>>>> from tiger_data.state_all
>>>> WHERE the_geom &&
>>>>
>>>> ST_Buffer('0101000020AD100000242F4BF9462754C0BD0F7DA33F524340'::geometry,
>>>> 5);
>>>>
>>>> But whether it uses an index or not, the timing is still about the same
>>>> -
>>>> under 10ms.
>>>>
>>>> Though interestingly switching to
>>>>
>>>> SELECT stusps
>>>> FROM tiger_data.state_all
>>>> WHERE ST_Intersects(the_geom,
>>>>
>>>> ST_Buffer('0101000020AD100000242F4BF9462754C0BD0F7DA33F524340'::geometry,
>>>> 5));
>>>>
>>>> Does always force an index scan.
>>>>
>>>>
>>>> Thanks,
>>>> Regina
>>>>
>>>>
>>>> _______________________________________________
>>>> postgis-devel mailing list
>>>> postgis-devel at lists.osgeo.org
>>>> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>>>
>>> _______________________________________________
>>> postgis-devel mailing list
>>> postgis-devel at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>>
>> --
>> -- CARTO CTO
>>
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>>
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel

-- 
-- CARTO CTO
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20170630/2757c042/attachment.html>


More information about the postgis-devel mailing list