[postgis-devel] gzip support for ST_AsMVT
Paul Ramsey
pramsey at cleverelephant.ca
Mon Nov 4 09:30:22 PST 2019
> On Nov 4, 2019, at 9:21 AM, Yuri Astrakhan <yuriastrakhan at gmail.com> wrote:
>
> looks good, thanks again for the blazing fast reply & fix! I wonder how soon the demand will bring it into core PG :)
Well, keep up with the agitating :) if only getting it built out as a package by pgdg.
I think core would resist it, on the theory that “compression is already done”, in the default TOAST storage. (I know, not *quite* the use case you’re looking at.)
I think you can also already hack a gzip compression with the right calls in pgcrypto, though I didn’t quite figure that out, building an extension felt like more fun anyways.
P.
>
> On Mon, Nov 4, 2019 at 11:19 AM Paul Ramsey <pramsey at cleverelephant.ca <mailto:pramsey at cleverelephant.ca>> wrote:
> Text as a column can be quite unhappy if you feed it with binary data (things like the printing code gets unhappy with the zero-byte parts, so you can end up with unexpected stops in printouts).
> I have no idea why text would have any advantage over bytea, since if you check the source, they are both just typedefs over top of varlena.
> It would be possible to do a gzip(text) returns bytea, but you’d still have a bytea in the end, and it wouldn’t be nice and bidirectional, since text::bytea works, but bytea::text doesn’t return the original ascii encoding.
>
> There shouldn’t be any conversion cost in a cast, since the memory structure of text and bytea is identical, it’s just re-typing a pointer.
>
> I can add a gzip(text) => bytea very easily though (heck, you can, just create a new gzip function that looks exactly like the bytea one, but takes text as the argument, it’ll work fine.
>
> ATB
>
> P
>
>> On Nov 4, 2019, at 1:49 AM, Yuri Astrakhan <yuriastrakhan at gmail.com <mailto:yuriastrakhan at gmail.com>> wrote:
>>
>> Paul, one (possibly newbie) question - I noticed you used `gzip(bytea)`, and I read in [1] that text performs about 17% better (for read queries, not sure about in-memory operations). Does it make sense to add gzip(text) variant? I concatenate ST_AsMVT() data with STRING_AGG, thus end up with a TEXT, and wonder if there is a conversion cost. Thanks!!
>>
>> [1] http://engineering.pivotal.io/post/bytea_versus_text_in_postgresql/ <http://engineering.pivotal.io/post/bytea_versus_text_in_postgresql/>
>> On Mon, Nov 4, 2019 at 4:24 AM Yuri Astrakhan <yuriastrakhan at gmail.com <mailto:yuriastrakhan at gmail.com>> wrote:
>> Paul rulez!!!! Thank you!!! :)
>>
>> Experimentation ensues...
>>
>> On Sun, Nov 3, 2019 at 9:29 PM Paul Ramsey <pramsey at cleverelephant.ca <mailto:pramsey at cleverelephant.ca>> wrote:
>> Excuse me,
>>
>> https://github.com/pramsey/pgsql-gzip <https://github.com/pramsey/pgsql-gzip>
>>
>> P
>>
>> On Sun, Nov 3, 2019 at 6:24 PM Paul Ramsey <pramsey at cleverelephant.ca <mailto:pramsey at cleverelephant.ca>> wrote:
>> >
>> > I could not find a gzip extension on the web, even though I swear one
>> > already exists.
>> > So, I wrote up one, it's quite a small piece of work, but hopefully it serves.
>> >
>> > https://github.com/pramsey/psql-gzip <https://github.com/pramsey/psql-gzip>
>> >
>> > ATB,
>> >
>> > P
>> >
>> > On Sun, Nov 3, 2019 at 11:57 AM Yuri Astrakhan <yuriastrakhan at gmail.com <mailto:yuriastrakhan at gmail.com>> wrote:
>> > >
>> > > OSM -> PostgreSQL is done by Imposm 3, which updates database tables on the daily/hourly/minute basis. Every time it runs, it generates a list of changed tiles. I do not know what process they use for it -- from my perspective, I simply get the list of updated tiles on zoom 14 as a file, and I could use it to regenerate cached tiles or purge them from Varnish. Perhaps Imposm maintainers could find a good use for that functionality?
>> > >
>> > > [1] https://imposm.org/docs/imposm3/latest/tutorial.html#expire-tiles <https://imposm.org/docs/imposm3/latest/tutorial.html#expire-tiles>
>> > >
>> > > On Sun, Nov 3, 2019 at 2:41 PM Martin Davis <mtnclimb at gmail.com <mailto:mtnclimb at gmail.com>> wrote:
>> > >>
>> > >> Good background to know. Quite a bit to grok there!
>> > >>
>> > >> One question: does any part of that toolchain regenerate tile cache subsets depending on detection of feature change? And if so, would it be useful to have a DB function which can determine the set of tile ids that need to be refreshed (i.e. by mapping a (set of or single) geometry (or envelopes) into a set of tile ids?
>> > >>
>> > >> On Sun, Nov 3, 2019 at 11:00 AM Yuri Astrakhan <yuriastrakhan at gmail.com <mailto:yuriastrakhan at gmail.com>> wrote:
>> > >>>
>> > >>> Martin, I am working on improving OpenMapTiles tooling [1] - the ultimate goal is to have tiles generated in real time from the up-to-date OSM data, and serve them directly to user's browser via some caching layer (i.e. Varnish). The tools already contain postserve - a simple python server that queries for MVT tiles (no compression yet, but can be easily added)
>> > >>>
>> > >>> The other task is tile pre-generation using tilelive-copy (nodejs) - I wrote a tilelive-pgquery plugin [2] that queries PG for the tile, compresses it, and passes it on to tilelive-copy for storage.
>> > >>>
>> > >>> [1] OpenMapTiles tools - https://github.com/openmaptiles/openmaptiles-tools <https://github.com/openmaptiles/openmaptiles-tools>
>> > >>> [2] tilelive-pgquery - https://www.npmjs.com/package/tilelive-pgquery <https://www.npmjs.com/package/tilelive-pgquery>
>> > >>>
>> > >>> On Sun, Nov 3, 2019 at 12:55 PM Martin Davis <mtnclimb at gmail.com <mailto:mtnclimb at gmail.com>> wrote:
>> > >>>>
>> > >>>> Great to hear that ST_AsMVT is useful.
>> > >>>>
>> > >>>> The other PostGIS capability that is useful for web spatial applications is the (recently enhanced) ST_AsGeoJSON. This should also be gzipped over the wire. So this suggests a modular gzip capability would be more useful.
>> > >>>>
>> > >>>> If this isn't provided in Postgres in some way (now or in near term) perhaps we should just add a ST_Gzip function to PostGIS.
>> > >>>>
>> > >>>> Out of curiosity, what platform do you use for your external gzipping layer?
>> > >>>>
>> > >>>> On Sun, Nov 3, 2019 at 8:29 AM nyurik <yuriastrakhan at gmail.com <mailto:yuriastrakhan at gmail.com>> wrote:
>> > >>>>>
>> > >>>>> The amazing ST_AsMVT() has two common usage patterns: copy resulting MVTs to
>> > >>>>> a tile cache (e.g. .mbtiles file or a materialized view), or serve MVT to
>> > >>>>> the users (direct SQL->browser approach). Both patterns still require one
>> > >>>>> additional data processing step -- gziping.
>> > >>>>>
>> > >>>>> Thus, rather than having a horizontally scalable db plus a simple IO-bound
>> > >>>>> SQL->Web or a SQL->store process, one has to add a relatively CPU-intensive
>> > >>>>> gzipping layer. This is especially relevant if I try to create a PG table
>> > >>>>> with the pre-generated tiles - I must use an external data compression
>> > >>>>> process to retrieve a tile, gzip it, and store it back, instead of running a
>> > >>>>> single query for copying all tiles. My cursory look at the tile sizes
>> > >>>>> indicate gzipping shrinks MVTs 50% to 300%.
>> > >>>>>
>> > >>>>> Note that a similar CPU-intensive step - creating MD5 tile hashes for a more
>> > >>>>> efficient storage - can be easily done with PG's `md5()` function, whereas
>> > >>>>> `gzip()` doesn't appear to exist.
>> > >>>>>
>> > >>>>> I would like to propose two possible solutions:
>> > >>>>> * Implement ST_AsMVT(..., compress) parameter - NULL=no compression,
>> > >>>>> 0-9=compression level.
>> > >>>>> PROs: adds just the required functionality to where it is needed (YAGNI
>> > >>>>> principle), does not require ungzip yet (ST_AsMVT is a one way function
>> > >>>>> without the corresponding MVT->Table method)
>> > >>>>> CONs: less generic (unusable for non-MVT usage)
>> > >>>>> * Implement gzip() or ST_gzip()
>> > >>>>> PROs: a more generic approach not tied to MVTs
>> > >>>>> CONs: logically implies the need of ungzip(), requires PG community to
>> > >>>>> agree this functionality is needed
>> > >>>>>
>> > >>>>> Thanks!
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>>> --
>> > >>>>> Sent from: http://postgis.17.x6.nabble.com/PostGIS-Dev-f3570762.html <http://postgis.17.x6.nabble.com/PostGIS-Dev-f3570762.html>
>> > >>>>> _______________________________________________
>> > >>>>> postgis-devel mailing list
>> > >>>>> postgis-devel at lists.osgeo.org <mailto:postgis-devel at lists.osgeo.org>
>> > >>>>> https://lists.osgeo.org/mailman/listinfo/postgis-devel <https://lists.osgeo.org/mailman/listinfo/postgis-devel>
>> > >>>>
>> > >>>> _______________________________________________
>> > >>>> postgis-devel mailing list
>> > >>>> postgis-devel at lists.osgeo.org <mailto:postgis-devel at lists.osgeo.org>
>> > >>>> https://lists.osgeo.org/mailman/listinfo/postgis-devel <https://lists.osgeo.org/mailman/listinfo/postgis-devel>
>> > >>>
>> > >>> _______________________________________________
>> > >>> postgis-devel mailing list
>> > >>> postgis-devel at lists.osgeo.org <mailto:postgis-devel at lists.osgeo.org>
>> > >>> https://lists.osgeo.org/mailman/listinfo/postgis-devel <https://lists.osgeo.org/mailman/listinfo/postgis-devel>
>> > >>
>> > >> _______________________________________________
>> > >> postgis-devel mailing list
>> > >> postgis-devel at lists.osgeo.org <mailto:postgis-devel at lists.osgeo.org>
>> > >> https://lists.osgeo.org/mailman/listinfo/postgis-devel <https://lists.osgeo.org/mailman/listinfo/postgis-devel>
>> > >
>> > > _______________________________________________
>> > > postgis-devel mailing list
>> > > postgis-devel at lists.osgeo.org <mailto:postgis-devel at lists.osgeo.org>
>> > > https://lists.osgeo.org/mailman/listinfo/postgis-devel <https://lists.osgeo.org/mailman/listinfo/postgis-devel>
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at lists.osgeo.org <mailto:postgis-devel at lists.osgeo.org>
>> https://lists.osgeo.org/mailman/listinfo/postgis-devel <https://lists.osgeo.org/mailman/listinfo/postgis-devel>_______________________________________________
>> postgis-devel mailing list
>> postgis-devel at lists.osgeo.org <mailto:postgis-devel at lists.osgeo.org>
>> https://lists.osgeo.org/mailman/listinfo/postgis-devel <https://lists.osgeo.org/mailman/listinfo/postgis-devel>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org <mailto:postgis-devel at lists.osgeo.org>
> https://lists.osgeo.org/mailman/listinfo/postgis-devel <https://lists.osgeo.org/mailman/listinfo/postgis-devel>_______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20191104/6732425e/attachment-0001.html>
More information about the postgis-devel
mailing list