<div dir="ltr"><div><div><div>OK,<br></div>is it the timing with<br></div>CREATE INDEX ON node (containing_face);<br>?<br></div>OR is it without index?<br><br>Maybe a little vacuum analyze would help?<br><br>Cheers,<br>Rémi-C<br>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">2014-07-30 12:24 GMT+02:00 Sandro Santilli <span dir="ltr"><<a href="mailto:strk@keybit.net" target="_blank">strk@keybit.net</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="">On Wed, Jul 30, 2014 at 12:09:05PM +0200, Rémi Cura wrote:<br>
> The ticket :<br>
> <a href="http://trac.osgeo.org/postgis/ticket/2860" target="_blank">http://trac.osgeo.org/postgis/ticket/2860</a><br>
><br>
> If fitting in memory, I would expect the 160k geom of the benchmark to be<br>
> imported in less than 30 sec in grass, cgal and geos.<br>
<br>
</div>Sum of memory size used to hold those 160k geometries in PostGIS memory<br>
is 26,592,360 (~26MB).<br>
<div class=""><br>
> 300 sec for Postgis Topology batch mode would be a good goal (reachable and<br>
> nice achievement).<br>
<br>
</div>Current best time (in that table) was 4984 secs (~2h20m)<br>
Still, we don't know exactly how much time went in triggers checking,<br>
how much in disk writing, how much in toasting/detoasting values etc.<br>
<br>
Speaking of which, another postgresql-level optimization that could<br>
be taken at population time could be setting the storage type, and<br>
setting the fill factor for tables should also make a difference.<br>
<br>
A lot to play with, having the time...<br>
<br>
Unfortunately running the population query for 5000 geometries right<br>
now gives me a longer time than I was getting back when I took the<br>
times for that test, and I'm not sure why. Maybe it's because I enabled<br>
gathering statistics about the called user functions, or because I changed<br>
some parameters about available memory work memory and such. I suspect<br>
those configurations change a lot the experience.<br>
<br>
Anyway, having stats on, will dump here the ones for loading 5000 geometries<br>
from that dataset (ordered by gid):<br>
<br>
strk=# select * from pg_stat_user_functions order by total_time desc limit 10;<br>
funcid | schemaname | funcname | calls | total_time | self_time<br>
--------+------------+-----------------------+-------+------------+-----------<br>
54535 | topology | topogeo_addpolygon | 5000 | 55761.337 | 1369.537<br>
54533 | topology | topogeo_addlinestring | 5004 | 48302.105 | 4038.88<br>
54503 | topology | st_addedgemodface | 5004 | 33837.97 | 10328.094<br>
62973 | topology | _st_addfacesplit | 10008 | 22461.719 | 17372.12<br>
54534 | topology | topogeo_addpoint | 10008 | 9712.522 | 3570.827<br>
54508 | topology | st_addisonode | 5004 | 6032.572 | 4861.636<br>
54491 | topology | getringedges | 10008 | 4876.065 | 4876.065<br>
54514 | topology | st_getfacegeometry | 11921 | 4674.631 | 3712.201<br>
52455 | public | st_pointonsurface | 9591 | 1267.424 | 1267.424<br>
52275 | public | st_buildarea | 11921 | 962.43 | 962.43<br>
(10 rows)<br>
<br>
You'll note that _ST_AddFaceSplit is called 10008 times, and at every<br>
call the number of points should be at least 1 more than the previous,<br>
with last call finding ~5000 nodes in the table.<br>
<br>
For the record, my current timing for loading those 5000 rows is 55822.181 ms<br>
whereas my last best was 36134.786 ms (just to get an idea about how much<br>
database configuration could change times, as I think the machine was<br>
really the same).<br>
<br>
--strk;<br>
<div><div class="h5"><br>
> 2014-07-30 11:56 GMT+02:00 Rémi Cura <<a href="mailto:remi.cura@gmail.com">remi.cura@gmail.com</a>>:<br>
><br>
> > Almost a 10x factor, really nice !<br>
> > Cheers,<br>
> > Rémi-C<br>
> ><br>
> ><br>
> > 2014-07-30 11:38 GMT+02:00 Sandro Santilli <<a href="mailto:strk@keybit.net">strk@keybit.net</a>>:<br>
> ><br>
> > On Wed, Jul 30, 2014 at 11:29:00AM +0200, Sandro Santilli wrote:<br>
> >><br>
> >> > There's a reference dataset produced some time ago by a user who<br>
> >> resulted<br>
> >> > in a fix making topology creation much faster:<br>
> >> > <a href="http://lists.osgeo.org/pipermail/postgis-devel/2014-January/024078.html" target="_blank">http://lists.osgeo.org/pipermail/postgis-devel/2014-January/024078.html</a><br>
> >> ><br>
> >> > I still have a couple of diagrams showing population time before/after<br>
> >> > that are waiting for a blog post that I never find the time to write...<br>
> >><br>
> >> Just a sneak preview of the blog post that might never be written<br>
> >><br>
> >> Queries:<br>
> >><br>
> >> SELECT ST_CreateTopoGeo('million_poly_topo1', ST_Collect(geom))<br>
> >> FROM ( SELECT geom from million_poly_topo1 order by gid limit 160000) as<br>
> >> f;<br>
> >><br>
> >> SELECT TopoGeo_addPolygon('million_poly_topo1', ST_GeometryN(geom,1))<br>
> >> FROM ( SELECT geom from million_poly_topo1 order by gid limit 160000) as<br>
> >> f;<br>
> >><br>
> >> Output topology has:<br>
> >><br>
> >> 160167 nodes, 160167 edges, 160167 faces<br>
> >><br>
> >> Times:<br>
> >><br>
> >> a: before starting<br>
> >> b: Ensure face splitting algorithm uses the edge index (#2610)<br>
> >> c: Drop all calls to geometry::text during topology population (#2616)<br>
> >><br>
> >> ST_CreateTopoGeo TopoGeo_addPolygon<br>
> >> +-------------------+--------------------+<br>
> >> a | 38352283.535 | 38375126.950 |<br>
> >> b | 12088554.776 | 9644736.277 |<br>
> >> c | 11963757.923 | 4984225.402 |<br>
> >><br>
> >> I've a table with more numbers, with limits from 5000 to 160000.<br>
> >> The index change would make another configuration (d, I guess).<br>
> >><br>
> >> Please please please file the ticket, maybe referencing this thread<br>
> >> and #2610 and #2616 as I'd really love to grow the optimizations further<br>
> >><br>
> >> --strk;<br>
> >> _______________________________________________<br>
> >> postgis-devel mailing list<br>
> >> <a href="mailto:postgis-devel@lists.osgeo.org">postgis-devel@lists.osgeo.org</a><br>
> >> <a href="http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel" target="_blank">http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel</a><br>
> >><br>
> ><br>
> ><br>
<br>
> _______________________________________________<br>
> postgis-devel mailing list<br>
> <a href="mailto:postgis-devel@lists.osgeo.org">postgis-devel@lists.osgeo.org</a><br>
> <a href="http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel" target="_blank">http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel</a><br>
<br>
<br>
</div></div>--<br>
<div class="im HOEnZb"><br>
() ASCII ribbon campaign -- Keep it simple !<br>
/\ <a href="http://strk.keybit.net/rants/ascii_mails.txt" target="_blank">http://strk.keybit.net/rants/ascii_mails.txt</a><br>
</div><div class="HOEnZb"><div class="h5">_______________________________________________<br>
postgis-devel mailing list<br>
<a href="mailto:postgis-devel@lists.osgeo.org">postgis-devel@lists.osgeo.org</a><br>
<a href="http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel" target="_blank">http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel</a><br>
</div></div></blockquote></div><br></div>