[postgis-devel] lwgeom_*_in_place()

Tue Oct 3 09:53:09 PDT 2017

2017-10-03 14:07 GMT+02:00 Paul Ramsey <pramsey at cleverelephant.ca>:

>
>
> On Mon, Oct 2, 2017 at 10:24 PM, Björn Harrtell <bjorn.harrtell at gmail.com>
> wrote:
>
>> I like it, but with some incoherent comments.
>>
>> To start, it has caused quite a bit of confusion on my part that it's not
>> clear which methods are modifying in place and which are not. Before
>> https://github.com/postgis/postgis/commit/f45231f3e4e
>> a7217fa270e9c8b207c619e7c430e the code would sometimes corrupt the
>> actual LWGEOM on disk (how that works is still a mystery to me).
>>
>
> Your issue would have been the affine, in cases where you weren't
> intersecting. In that case you'd have been writing on an uncopied piece of
> pgsql memory (from PG_GETARG_GSERIALIZED_P()) which it might not like. I
> too am a little unclear on the lifecycle which would allow it to get
> corrupted before you escape by serializing your own return value to a fresh
> piece of memory. But we've seen it happen in lots of other functions, so.
>
>
>> To throw in an observation on signature - lwgeom_affine returns void so
>> it's clear that it must be mutating? I would also guess that anything that
>> uses GEOS will not mutate because the geometry has to be converted. Is not
>> returning void enough to indicate a function (that does something) mutates?
>> In either case, I'm not certain adding a *_in_place() variants to a lot of
>> places would result in a nicer API.
>>
>
> The in_place() variants will all return an integer (status code), and
> they'll all have a non-const LWGEOM *argument. The copying variants will
> all return an LWGEOM *value and will have a const LWGEOM *argument. So
> there will be a number of signals that some mutation is going to occur.
>
> From a PostGIS usage point of view the main requirement to use an
> in_place() method is that you start off your PostGIS function with a
> PG_GETARG_GSERIALIZED_P_COPY(), so that you own your memory before you
> start scribbling on it. This should be cheaper than doing a
> PG_GETARG_GSERIALIZED_P() lwgeom_from_gserialized() and then an
> lwgeom_clone_deep(), as it involves one big allocation rather than
> multliple small ones.
>
> Hopefully between the variation of arguments and return values and
> signature it'll be obvious which functions do what.
>
> (Remember that, in all these discussions, when I say "cheaper" I mean very
> barely, since allocation overhead is quite small, only noticeable for bulk
> handling of small objects)
>
>
Good rationale and details of your suggested changes to the API. So, now I
like it without incoherent comments.

While the real world benefit might be small depending on use case, I would
assume it can affect more than "lots of small objects" i.e when processing
or preseeding any large volume of geometries/tiles. Will use
PG_GETARG_GSERIALIZED_P_COPY in my stuff if you do not beat me to it.

/Björn

> P.
>
>
>
>>
>> /Björn
>>
>> 2017-10-02 23:53 GMT+02:00 Paul Ramsey <pramsey at cleverelephant.ca>:
>>
>>> Hey devs,
>>> So, in pursuit of spare CPU cycles for the good folks at Carto, I've
>>> been putting various workloads through a profiler. Some of the results
>>> you've seen in recent tweaks committed.
>>>
>>> One thing I found as a general rule was that as the number of simple
>>> objects goes up, the more noticeable the overhead of things like memory
>>> allocation / deallocation becomes. So the overhead of just constructing an
>>> LWPOINT on a point, when you're slamming through 1M of them, is not nothing.
>>>
>>> For some workloads, like generating MVT (there it is) we call numerous
>>> geometry processing functions in a row. For some functions, that expect
>>> const LWGEOM inputs, that can result in a lot of cloning of objects, and
>>> hence, memory management. For other functions, that expect to modify things
>>> in place, the coordinates are just altered right there.
>>>
>>> This has always been a little ugliness in our API, and for a while we
>>> settled on defaulting to making copies, so the API was at least getting
>>> cleaner. But also getting slower.
>>>
>>> I built a prototype for generating MVT that does all the coordinate
>>> processing in place, and for bulk simple objects it ran about 30% faster
>>> than the existing code. There's a benefit to be had to in working in place.
>>>
>>> I propose that we... cut the baby in half.
>>>
>>> * All liblwgeom functions that do in place coordinate modification
>>> should be re-named to *_in_place() and a bare version that takes a const
>>> LWGEOM and returns a copy as necessary be added.
>>> * All liblwgeom functions that already take const LWGEOM be supplemented
>>> with *_in_place() variants so that it's possible to build processing chains
>>> that are capable of doing processing with a minimal allocation/deallocation
>>> footprint.
>>>
>>> Functions on my list immediately:
>>>
>>>   * remove_repeated_points
>>>   * simplify
>>>   * affine
>>>   * grid
>>>   * transform
>>>
>>> I will, however, do a full review and attempt to ensure all non-GEOS
>>> functions have in_place variants available in liblwgeom.
>>>
>>> Thoughts?
>>>
>>> P.
>>>
>>>
>>>
>>> _______________________________________________
>>> postgis-devel mailing list
>>> postgis-devel at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>>>
>>
>>
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20171003/54c80f4f/attachment.html>