[postgis-devel] topology : new index on node table

Sandro Santilli strk at keybit.net
Wed Jul 30 03:24:18 PDT 2014


On Wed, Jul 30, 2014 at 12:09:05PM +0200, Rémi Cura wrote:
> The ticket :
> http://trac.osgeo.org/postgis/ticket/2860
> 
> If fitting in memory, I would expect the 160k geom of the benchmark to be
> imported in less than 30 sec in grass, cgal and geos.

Sum of memory size used to hold those 160k geometries in PostGIS memory
is 26,592,360 (~26MB).

> 300 sec for Postgis Topology batch mode would be a good goal (reachable and
> nice achievement).

Current best time (in that table) was 4984 secs (~2h20m)
Still, we don't know exactly how much time went in triggers checking,
how much in disk writing, how much in toasting/detoasting values etc.

Speaking of which, another postgresql-level optimization that could
be taken at population time could be setting the storage type, and
setting the fill factor for tables should also make a difference.

A lot to play with, having the time...

Unfortunately running the population query for 5000 geometries right
now gives me a longer time than I was getting back when I took the
times for that test, and I'm not sure why. Maybe it's because I enabled
gathering statistics about the called user functions, or because I changed
some parameters about available memory work memory and such. I suspect
those configurations change a lot the experience.

Anyway, having stats on, will dump here the ones for loading 5000 geometries
from that dataset (ordered by gid):

strk=# select * from pg_stat_user_functions  order by total_time desc limit 10;
 funcid | schemaname |       funcname        | calls | total_time | self_time
--------+------------+-----------------------+-------+------------+-----------
  54535 | topology   | topogeo_addpolygon    |  5000 |  55761.337 |  1369.537
  54533 | topology   | topogeo_addlinestring |  5004 |  48302.105 |   4038.88
  54503 | topology   | st_addedgemodface     |  5004 |   33837.97 | 10328.094
  62973 | topology   | _st_addfacesplit      | 10008 |  22461.719 |  17372.12
  54534 | topology   | topogeo_addpoint      | 10008 |   9712.522 |  3570.827
  54508 | topology   | st_addisonode         |  5004 |   6032.572 |  4861.636
  54491 | topology   | getringedges          | 10008 |   4876.065 |  4876.065
  54514 | topology   | st_getfacegeometry    | 11921 |   4674.631 |  3712.201
  52455 | public     | st_pointonsurface     |  9591 |   1267.424 |  1267.424
  52275 | public     | st_buildarea          | 11921 |     962.43 |    962.43
(10 rows)

You'll note that _ST_AddFaceSplit is called 10008 times, and at every
call the number of points should be at least 1 more than the previous,
with last call finding ~5000 nodes in the table.

For the record, my current timing for loading those 5000 rows is 55822.181 ms
whereas my last best was 36134.786 ms (just to get an idea about how much
database configuration could change times, as I think the machine was
really the same).

--strk;

> 2014-07-30 11:56 GMT+02:00 Rémi Cura <remi.cura at gmail.com>:
> 
> > Almost a 10x factor, really nice !
> > Cheers,
> > Rémi-C
> >
> >
> > 2014-07-30 11:38 GMT+02:00 Sandro Santilli <strk at keybit.net>:
> >
> > On Wed, Jul 30, 2014 at 11:29:00AM +0200, Sandro Santilli wrote:
> >>
> >> > There's a reference dataset produced some time ago by a user who
> >> resulted
> >> > in a fix making topology creation much faster:
> >> > http://lists.osgeo.org/pipermail/postgis-devel/2014-January/024078.html
> >> >
> >> > I still have a couple of diagrams showing population time before/after
> >> > that are waiting for a blog post that I never find the time to write...
> >>
> >> Just a sneak preview of the blog post that might never be written
> >>
> >> Queries:
> >>
> >>  SELECT ST_CreateTopoGeo('million_poly_topo1', ST_Collect(geom))
> >>  FROM ( SELECT geom from million_poly_topo1 order by gid limit 160000) as
> >> f;
> >>
> >>  SELECT TopoGeo_addPolygon('million_poly_topo1', ST_GeometryN(geom,1))
> >>  FROM ( SELECT geom from million_poly_topo1 order by gid limit 160000) as
> >> f;
> >>
> >> Output topology has:
> >>
> >>  160167 nodes, 160167 edges, 160167 faces
> >>
> >> Times:
> >>
> >>  a: before starting
> >>  b: Ensure face splitting algorithm uses the edge index (#2610)
> >>  c: Drop all calls to geometry::text during topology population (#2616)
> >>
> >>       ST_CreateTopoGeo     TopoGeo_addPolygon
> >>     +-------------------+--------------------+
> >>  a  |    38352283.535   |    38375126.950    |
> >>  b  |    12088554.776   |     9644736.277    |
> >>  c  |    11963757.923   |     4984225.402    |
> >>
> >> I've a table with more numbers, with limits from 5000 to 160000.
> >> The index change would make another configuration (d, I guess).
> >>
> >> Please please please file the ticket, maybe referencing this thread
> >> and #2610 and #2616 as I'd really love to grow the optimizations further
> >>
> >> --strk;
> >> _______________________________________________
> >> postgis-devel mailing list
> >> postgis-devel at lists.osgeo.org
> >> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
> >>
> >
> >

> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel


-- 

 ()  ASCII ribbon campaign  --  Keep it simple !
 /\  http://strk.keybit.net/rants/ascii_mails.txt  



More information about the postgis-devel mailing list