[postgis-devel] creating topology

Sandro Santilli strk at keybit.net
Wed Jul 17 13:46:34 PDT 2013


Hi Paul,
sorry for the late answer, I hope you still have your use case around.
Replies inline.

On Thu, Jun 27, 2013 at 03:24:08PM +0200, Paul Goldstein wrote:

> I'm creating a topology of about 70k of geometry-segments. The
> smallest of them is just a few meters, the longest is about 10 km
> long.

[...]

> # part 1 #
> With help of strk in postgis-irc (thanks again!) it took about 3
> hours. It was made by divide all geometries in chunks by their id. One
> chunk were 1000 pieces.

I've seen that chunks of 500 pieces were faster, did you try 100 ?

[...]

> # part 2 #
> I tried to switch off autovacuum (autovacuum_enabled = false) for all
> tables in topology-scheme and for my segment-table. This was a massive
> speed-improvement cause it took "only" 2 hours.
> But strangely there was a knee after 42k of updates where speed
> raises. - I don't know why.

Maybe indices started being used, for some reason ? Hard to tell.

Topology building leaves a lot of dead tuples around, so possibly
a vacuum every now and then helps. Could you try running a
vacuum analyze after 40k updates ?

It's also curious how speed decreases before 40k and before 60k


> # part 4 #
> strk asked me to measure another idea: creating chunks not on their id
> instead use their geographic coordinates. so one part is e.g. (0 0, 1
> 1) -> (1 0, 2 1) -> (2 0, 3 1) ---> (0 1, 1 2) -> (1 1, 2 2) -> (2 1,
> 3 2) ---> ... - i hope its clear what i meant.
> thats my appropriate statement:
> 
> update public.segment
>   set topogeom = topology.totopogeom(st_force_2d(geom), 'segment_topo', 1, 0)
>   where (st_intersects(geom, st_envelope(st_geomfromewkt(?))) = true
>     or st_contains(geom, st_envelope(st_geomfromewkt(?))) = true)
>   and topogeom is null;
> 
> where ? is this: "srid=4326;linestring(%f %f,%f %f)"
> 
> This took very too long. I aborted it after 1,5 hours, because it was too slow.

Pity, your fastest run lasted 2 hours...
For sure the curve seems more "regular" :)

> Now my questions:
> Does anybody know how to improve the speed to create a topology?

You already found a couple of good ways, those would be worth noting
somewhere (postgis wiki would be a good place). If you still have that
dataset I'd be curious to see the curve with smaller chunks, then with
manual VACUUM ANALYZE added and maybe finally with geometries ordered
by number of vertices (fewer vertices first). 

> Or is it possible to not creating my topology every night if something
> changes in geometry. Instead alter topology in an
> after-statement-trigger when something happens with geometries.

You can surely call toTopoGeom again with the modified geometry.
With PostGIS-2.1 the function also accepts the old TopoGeometry to
effectively replace it (rather than leaving an orphaned TopoGeometry
around). You'll still need to "manually" remove primitives (edges,
nodes) that are not used anymore. They won't hurt, but with time they
tend to slow things down.

> strk said when totopogeom() is called TopoGeo_addLinestring() is
> entered many times. Is it possible to improve that function? (Maybe
> writing it in C - instead of plpgsql :)

Yes, writing that function in C may considerably speed things up, but
there's currently nobody working on it. Contributions are welcome.

> Here you can see some graphs for the different parts:
> http://privatepaste.com/download/ef41296b0d
> blue is part 1: chunks of ids
> red is part 2: chunks of ids but without autovacuum
> yellow is part 3: as red but 96MB of shared_buffer
> green is part 4: chunks of squared geography elements
> 
> Thanks for any help or suggestions,

Thank you for the excellent report !

--strk;

 http://www.cartodb.com - Map, analyze and build applications with your data

                                       ~~ http://strk.keybit.net 




More information about the postgis-devel mailing list