[postgis-devel] topology : new index on node table

Wed Jul 30 02:29:00 PDT 2014

On Wed, Jul 30, 2014 at 11:02:36AM +0200, Rémi Cura wrote:
> OK,
> I ended looking for every instance of "containing_face" in the topology.sql
> file in postgres extension
> 
> Note that i found
> "  -- TODO: consider also adding an index on node.containing_face "
> 
> The index would be used in
> _the "intersects" function (case of point/polygon) "'WHERE  ... AND
> r2.element_id = n.containing_face" '"
> _the addface function ''WHERE containing_face IS NOT NULL"
> _ the ST_RemEdgeNewFace function "    -- Update containing_face for all
> nodes still referencing old faces"
> _the ST_RemEdgeModFace function "    -- Update containing_face for all
> nodes still referencing old faces"
> _ the _ST_AddFaceSplit function "  -- Update isolated nodes in new new face
> "
> 
> I think there is significant evidence that such an index will be used.

Yep, agreed. Especially for _ST_AddFaceSplit it could make a big difference.

> My usage of topology is very limited (I only use edge topo) and I don't
> have the benchmark to prove that such index would speed the operations.

There's a reference dataset produced some time ago by a user who resulted
in a fix making topology creation much faster:
http://lists.osgeo.org/pipermail/postgis-devel/2014-January/024078.html

I still have a couple of diagrams showing population time before/after
that are waiting for a blog post that I never find the time to write...

> If somebody uses face/node, it should be simple to benchmark : drop
> topology, create topology, create index, import data in topology .
> Then a simple look on index stats should tell if it was used ans how much.
> 
> However :
> _the above function will obviously be speeded.
> _index are mandatory on primary key (postgres default), this should be the
> same for "simple" foreign key
> _ the basic btree index is very low cost (time & memory)

Agreed, but I'd still like to double check with numbers.
If not else it'd make good material for advertising the change :)
Do you feel like filing a ticket to track progress on this ?

> About putting/removing indexes and constraint,
> I think your right about trigger/constraint in one case : a batch topology
> import.
> I fear the changes to do so efficiently would be important (more or less
> rewrite every function the batch way, and of course disable all
> triggers/check) .
> 
> I feel disabling index, filling table, then computing indexes is the kind
> of optimization that we are far far from needing. Maybe when data import
> would be in the speed range of grass data import, thats a 100 factor to
> gain before such optimization.

Agreed index batch disabling/enabling is not important, and splitting
functions between "checking" and "non-checking" should help.

--strk;