[postgis-devel] Postgis topology creation - O(n-squared)? - creates problems with large datasets.
Sandro Santilli
strk at keybit.net
Mon Jan 13 13:11:48 PST 2014
On Fri, Jan 10, 2014 at 05:27:22PM +0000, Graeme B. Bell wrote:
> >> What about producing a syntetic dataset to serve as base for analisys
> >> and see how it performs on differently configured systems ?
> >
> > I had the same thought, and I'm making one right now.
>
> http://graemebell.net/topologish/million_poly_topo1.zip
> http://graemebell.net/topologish/million_poly_topo2.zip
>
> If anyone has time to help, I would be super-interested to see other people's timings of building topologies with these datasets of topologish* data. > 1 million polys each.
>
> e.g. pick polys, order by gid limit 5000, 10000, 20000, 40000, 80000, 160000, 320000, 640000. SRID is 25833.
I did up to 160000, after which I stopped because it took ~10 hours,
and I've been running twice: once with ST_CreateTopoGeo and once
with TopoGeo_addPolygon.
= Environment =
PostgreSQL 9.3.1 on x86_64-unknown-linux-gnu (default config)
POSTGIS="2.2.0dev r12137" GEOS="3.4.3dev-CAPI-1.8.3 r3958"
= Data =
million_poly_topo1: 1161235 polygons, 5/7/116 vertices (min/avg/max)
= Queries =
SELECT ST_CreateTopoGeo('million_poly_topo1', ST_Collect(geom))
FROM ( SELECT geom from million_poly_topo1 order by gid limit XXXX) as f;
SELECT TopoGeo_addPolygon('million_poly_topo1', ST_GeometryN(geom,1))
FROM ( SELECT geom from million_poly_topo1 order by gid limit XXXX) as f;
= Timing (ms) =
Rows | ST_CreateTopoGeo | TopoGeo_addPolygon |
------+-------------------+--------------------|
5000 | 129427.274 | 116104.691 |
10000 | 327703.598 | 301611.556 |
20000 | 936224.702 | 884240.862 |
40000 | 2976178.984 | 2891650.767 |
80000 | 10343526.099 | 10254117.319 | 80071 nodes, 80071 edges, 80071 faces
160000 | 38352283.535 | 38375126.950 | 160167 nodes, 160167 edges, 160167 faces
Attached is a plot of those numbers
= Stats =
These are the function that are run for insertion of a _single_ polygon when
the topology already contains 165000 faces.
-- Ordered by "self_time":
strk=# select * from pg_stat_user_functions order by self_time desc;
funcid | schemaname | funcname | calls | total_time | self_time
--------+------------+-------------------------+--------+------------+-----------
62973 | topology | _st_addfacesplit | 2 | 424.356 | 192.07
52356 | public | st_lineinterpolatepoint | 165013 | 136.911 | 136.911
52097 | public | geometry_overlaps | 165015 | 92.684 | 92.684
54503 | topology | st_addedgemodface | 1 | 438.418 | 13.539
54508 | topology | st_addisonode | 1 | 7.39 | 7.359
54491 | topology | getringedges | 2 | 2.552 | 2.552
54533 | topology | topogeo_addlinestring | 1 | 450.942 | 2.385
54534 | topology | topogeo_addpoint | 2 | 9.287 | 1.837
Interesting enough, "geometry_overlaps" and "st_lineinterpolatepoint" are called
once for each and every already-loaded geometry, which suggests there's indeed a
problem with index usage. The "geometry_overlaps" function is the one that
implements operator&&, supposedly an index user!
--strk;
-------------- next part --------------
A non-text attachment was scrubbed...
Name: topotime.png
Type: image/png
Size: 27030 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20140113/5adb48d4/attachment.png>
More information about the postgis-devel
mailing list