[postgis-devel] lwgeom_*_in_place()

Tue Oct 3 05:07:38 PDT 2017

On Mon, Oct 2, 2017 at 10:24 PM, Björn Harrtell <bjorn.harrtell at gmail.com>
wrote:

> I like it, but with some incoherent comments.
>
> To start, it has caused quite a bit of confusion on my part that it's not
> clear which methods are modifying in place and which are not. Before
> https://github.com/postgis/postgis/commit/f45231f3e4ea7217fa270e9c8b207c
> 619e7c430e the code would sometimes corrupt the actual LWGEOM on disk
> (how that works is still a mystery to me).
>

Your issue would have been the affine, in cases where you weren't
intersecting. In that case you'd have been writing on an uncopied piece of
pgsql memory (from PG_GETARG_GSERIALIZED_P()) which it might not like. I
too am a little unclear on the lifecycle which would allow it to get
corrupted before you escape by serializing your own return value to a fresh
piece of memory. But we've seen it happen in lots of other functions, so.

> To throw in an observation on signature - lwgeom_affine returns void so
> it's clear that it must be mutating? I would also guess that anything that
> uses GEOS will not mutate because the geometry has to be converted. Is not
> returning void enough to indicate a function (that does something) mutates?
> In either case, I'm not certain adding a *_in_place() variants to a lot of
> places would result in a nicer API.
>

The in_place() variants will all return an integer (status code), and
they'll all have a non-const LWGEOM *argument. The copying variants will
all return an LWGEOM *value and will have a const LWGEOM *argument. So
there will be a number of signals that some mutation is going to occur.

>From a PostGIS usage point of view the main requirement to use an
in_place() method is that you start off your PostGIS function with a
PG_GETARG_GSERIALIZED_P_COPY(), so that you own your memory before you
start scribbling on it. This should be cheaper than doing a
PG_GETARG_GSERIALIZED_P() lwgeom_from_gserialized() and then an
lwgeom_clone_deep(), as it involves one big allocation rather than
multliple small ones.

Hopefully between the variation of arguments and return values and
signature it'll be obvious which functions do what.

(Remember that, in all these discussions, when I say "cheaper" I mean very
barely, since allocation overhead is quite small, only noticeable for bulk
handling of small objects)

P.

>
> /Björn
>
> 2017-10-02 23:53 GMT+02:00 Paul Ramsey <pramsey at cleverelephant.ca>:
>
>> Hey devs,
>> So, in pursuit of spare CPU cycles for the good folks at Carto, I've been
>> putting various workloads through a profiler. Some of the results you've
>> seen in recent tweaks committed.
>>
>> One thing I found as a general rule was that as the number of simple
>> objects goes up, the more noticeable the overhead of things like memory
>> allocation / deallocation becomes. So the overhead of just constructing an
>> LWPOINT on a point, when you're slamming through 1M of them, is not nothing.
>>
>> For some workloads, like generating MVT (there it is) we call numerous
>> geometry processing functions in a row. For some functions, that expect
>> const LWGEOM inputs, that can result in a lot of cloning of objects, and
>> hence, memory management. For other functions, that expect to modify things
>> in place, the coordinates are just altered right there.
>>
>> This has always been a little ugliness in our API, and for a while we
>> settled on defaulting to making copies, so the API was at least getting
>> cleaner. But also getting slower.
>>
>> I built a prototype for generating MVT that does all the coordinate
>> processing in place, and for bulk simple objects it ran about 30% faster
>> than the existing code. There's a benefit to be had to in working in place.
>>
>> I propose that we... cut the baby in half.
>>
>> * All liblwgeom functions that do in place coordinate modification should
>> be re-named to *_in_place() and a bare version that takes a const LWGEOM
>> and returns a copy as necessary be added.
>> * All liblwgeom functions that already take const LWGEOM be supplemented
>> with *_in_place() variants so that it's possible to build processing chains
>> that are capable of doing processing with a minimal allocation/deallocation
>> footprint.
>>
>> Functions on my list immediately:
>>
>>   * remove_repeated_points
>>   * simplify
>>   * affine
>>   * grid
>>   * transform
>>
>> I will, however, do a full review and attempt to ensure all non-GEOS
>> functions have in_place variants available in liblwgeom.
>>
>> Thoughts?
>>
>> P.
>>
>>
>>
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>>
>
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20171003/22ff25f2/attachment.html>