[postgis-devel] brainstorming about topology polygonizer

Sandro Santilli strk at kbt.io
Thu Sep 15 23:00:27 PDT 2016


On Thu, Sep 15, 2016 at 05:21:43PM +0200, Sandro Santilli wrote:

> Right now I'm using the "Arezzo UCS" dataset, which is composed by
> 16746 shells (CCW rings) and 1817 holes (CW rings) composed by
> a total of 47708 edges.

[...]

> https://git.osgeo.org/gogs/strk/postgis/src/batch-topo )
> goes as follows (pseudo-code):
> 
>  For each yet-to-visit edge-side:
>    Compute edge-side ring (walking)
>    If edge-side ring is a shell (ccw):
>      - Create a face, register it in each of the ring edge sides
>        (marking the edge side as visited)
>      - Save the shell in a "shells container"
>    Otherwise (is an hole, clockwise): 
>      - Register each of the ring edge sides as being an "hole"
>        (marking the edge side as being an hole, and thus visited)
>      - Save the ring in a "holes container"
> 
>  For each of the elements in the "holes container":
>    - Find face-shell containing an arbitrary vertex of the hole ring
>      (from the "shells container")
>    - Register it in each of the ring edge sides

Analisys of the backend/database interaction.
Being there a total of 18563 rings we have: 

 - 18563 queries to select next yet-to-be-visited edge
   (WHERE left_face=NULL or right_face=NULL)
   each returns only one edge_id

 - 18563 queries to find an edge side ring
   (recursive CTE walking on the edge side)
   each returns an array of edge_id

 - 18563 queries to extract the geometries of ring edges
   (edge_id IN ARRAY[...])
   each returns an array of edge_id,deserialized_geom

 - 18563 queries to update left_face of edges
   (where edge_id = updated_data.edge_id)

 - 18563 queries to update right_face of edges
   (where edge_id = updated_data.edge_id)

It makes a total of 92815 SQL queries to be performed (rings x5).
And it's still fast.

Edge geometries are extracted twice (once per side ring) so that makes
a total of 95416 detoasts and deserializations.
what needs some love to release that memory.

> This is proving effective, but memory hungry (stopped the process
> while taking more than 20 GB of RAM).
>
> Theoretically, holding "holes" and "shells" in memory should not
> take much more than the size of all the face geometries, which
> I've computed for this case to be ~228 MB.

Reading this with a fresh mind I realize I mixed things up.
The "Arezzo UCS" test actually completes under 5 minutes
(proved effective) and uses less than 1GB of ram.

The killed process and 20+GB of ram was for a different dataset, namely
"rt09_wgs84_topo", having 2773950 edges and 1340262 faces.
The ~228 MB was the memory size (st_memsize) of the collection
of all face geometries in "rt09_wgs84_topo".

In the "Arezzo UCS" case, the size of collected faces is 13MB (for
under 1GB of resident memory used).

> Even considering the multiple representations of each face geometry
> component (edges, polygon, geos, prepared) I could understand a x10
> increase in size, but this is a x100 increase (20000 MB from 228 MB).

Or 1000 MB from 13 MB (the Arezzo case).

> I'll try a different approach, along these lines:
> 
>    geom = (GSERIALIZED *)PG_DETOAST_DATUM_COPY(dat);
>    lwg = lwgeom_from_gserialized(geom);
>    edge->geom = lwgeom_clone_deep(lwg);
>    lwgeom_free(lwg);
>    pfree(geom);
> 
> I'm afraid that doing so would still keep the Datum memory around
> unless context memory is switched, which I suspect is not the case
> as we call SPI_connect only once for the whole lifetime of the
> function.

This test reduced the Maximum resident set size (kbytes) of the
"Arezzo UCS" case from 
772400 to
722460

Not a huge benefit !

--strk;



More information about the postgis-devel mailing list