[postgis-devel] brainstorming about topology polygonizer
Sandro Santilli
strk at kbt.io
Thu Sep 15 23:00:27 PDT 2016
On Thu, Sep 15, 2016 at 05:21:43PM +0200, Sandro Santilli wrote:
> Right now I'm using the "Arezzo UCS" dataset, which is composed by
> 16746 shells (CCW rings) and 1817 holes (CW rings) composed by
> a total of 47708 edges.
[...]
> https://git.osgeo.org/gogs/strk/postgis/src/batch-topo )
> goes as follows (pseudo-code):
>
> For each yet-to-visit edge-side:
> Compute edge-side ring (walking)
> If edge-side ring is a shell (ccw):
> - Create a face, register it in each of the ring edge sides
> (marking the edge side as visited)
> - Save the shell in a "shells container"
> Otherwise (is an hole, clockwise):
> - Register each of the ring edge sides as being an "hole"
> (marking the edge side as being an hole, and thus visited)
> - Save the ring in a "holes container"
>
> For each of the elements in the "holes container":
> - Find face-shell containing an arbitrary vertex of the hole ring
> (from the "shells container")
> - Register it in each of the ring edge sides
Analisys of the backend/database interaction.
Being there a total of 18563 rings we have:
- 18563 queries to select next yet-to-be-visited edge
(WHERE left_face=NULL or right_face=NULL)
each returns only one edge_id
- 18563 queries to find an edge side ring
(recursive CTE walking on the edge side)
each returns an array of edge_id
- 18563 queries to extract the geometries of ring edges
(edge_id IN ARRAY[...])
each returns an array of edge_id,deserialized_geom
- 18563 queries to update left_face of edges
(where edge_id = updated_data.edge_id)
- 18563 queries to update right_face of edges
(where edge_id = updated_data.edge_id)
It makes a total of 92815 SQL queries to be performed (rings x5).
And it's still fast.
Edge geometries are extracted twice (once per side ring) so that makes
a total of 95416 detoasts and deserializations.
what needs some love to release that memory.
> This is proving effective, but memory hungry (stopped the process
> while taking more than 20 GB of RAM).
>
> Theoretically, holding "holes" and "shells" in memory should not
> take much more than the size of all the face geometries, which
> I've computed for this case to be ~228 MB.
Reading this with a fresh mind I realize I mixed things up.
The "Arezzo UCS" test actually completes under 5 minutes
(proved effective) and uses less than 1GB of ram.
The killed process and 20+GB of ram was for a different dataset, namely
"rt09_wgs84_topo", having 2773950 edges and 1340262 faces.
The ~228 MB was the memory size (st_memsize) of the collection
of all face geometries in "rt09_wgs84_topo".
In the "Arezzo UCS" case, the size of collected faces is 13MB (for
under 1GB of resident memory used).
> Even considering the multiple representations of each face geometry
> component (edges, polygon, geos, prepared) I could understand a x10
> increase in size, but this is a x100 increase (20000 MB from 228 MB).
Or 1000 MB from 13 MB (the Arezzo case).
> I'll try a different approach, along these lines:
>
> geom = (GSERIALIZED *)PG_DETOAST_DATUM_COPY(dat);
> lwg = lwgeom_from_gserialized(geom);
> edge->geom = lwgeom_clone_deep(lwg);
> lwgeom_free(lwg);
> pfree(geom);
>
> I'm afraid that doing so would still keep the Datum memory around
> unless context memory is switched, which I suspect is not the case
> as we call SPI_connect only once for the whole lifetime of the
> function.
This test reduced the Maximum resident set size (kbytes) of the
"Arezzo UCS" case from
772400 to
722460
Not a huge benefit !
--strk;
More information about the postgis-devel
mailing list