[postgis-users] Is the abs(r.element_id) function needed in ST_ModEdgeSplit postgis topology ? It's is causing big performance decrease.

Lars Aksel Opsahl lop at skogoglandskap.no
Sat Mar 28 12:16:10 PDT 2015


Hi 

Thanks a lot for the quick response. Sorry I forget to say that I have already have added indexes to get things to turn around with these number rows. I also refer to this id's from my own tables.

Here are the indexes that I have added on topo_ar5_sysdata.relation 

    "opo_ar5_sysdata_relation_element_id_idx" btree (element_id)
    "opo_ar5_sysdata_relation_layer_id_idx" btree (layer_id)
    "opo_ar5_sysdata_relation_topogeo_id_idx" btree (topogeo_id)


Here is the explain with out the abs function

explain analyze SELECT r.* FROM topo_ar5_sysdata.relation r, topology.layer l  WHERE  l.topology_id = 21 AND l.level = 0  AND l.layer_id = r.layer_id  AND r.element_id = 30145 AND r.element_type = 2
;
                                                                        QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.44..4.00 rows=1 width=16) (actual time=0.029..0.035 rows=1 loops=1)
   Join Filter: (r.layer_id = l.layer_id)
   Rows Removed by Join Filter: 1
   ->  Index Scan using opo_ar5_sysdata_relation_element_id_idx on relation r  (cost=0.44..3.45 rows=1 width=16) (actual time=0.018..0.021 rows=1 loops=1)
         Index Cond: (element_id = 30145)
         Filter: (element_type = 2)
         Rows Removed by Filter: 1
   ->  Seq Scan on layer l  (cost=0.00..0.53 rows=2 width=4) (actual time=0.006..0.007 rows=2 loops=1)
         Filter: ((topology_id = 21) AND (level = 0))
 Total runtime: 0.081 ms
(10 rows)


Here is with the abs function.

explain analyze SELECT r.* FROM topo_ar5_sysdata.relation r, topology.layer l  WHERE  l.topology_id = 21 AND l.level = 0  AND l.layer_id = r.layer_id  AND abs(r.element_id) = 30145 AND r.element_type = 2
;
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..570856.25 rows=100890 width=16) (actual time=12.486..3134.417 rows=1 loops=1)
   Join Filter: (r.layer_id = l.layer_id)
   Rows Removed by Join Filter: 1
   ->  Seq Scan on relation r  (cost=0.00..567829.02 rows=100890 width=16) (actual time=12.471..3134.398 rows=1 loops=1)
         Filter: ((element_type = 2) AND (abs(element_id) = 30145))
         Rows Removed by Filter: 27794697
   ->  Materialize  (cost=0.00..0.54 rows=2 width=4) (actual time=0.011..0.014 rows=2 loops=1)
         ->  Seq Scan on layer l  (cost=0.00..0.53 rows=2 width=4) (actual time=0.008..0.010 rows=2 loops=1)
               Filter: ((topology_id = 21) AND (level = 0))
 Total runtime: 3134.461 ms
(10 rows)

Time: 3241.620 ms


I will add a comment to http://trac.osgeo.org/postgis/ticket/2083

Thanks Lars
________________________________________
Fra: Sandro Santilli [sandro.santilli at gmail.com] på vegne av Sandro Santilli [strk at keybit.net]
Sendt: 28. mars 2015 19:09
Til: PostGIS Users Discussion; Lars Aksel Opsahl
Emne: Re: [postgis-users] Is the abs(r.element_id) function needed in ST_ModEdgeSplit postgis topology ? It's is causing big performance decrease.

On Sat, Mar 28, 2015 at 06:46:14PM +0100, Sandro Santilli wrote:

> For lineal TopoGeometry the sign in the relation table indicates
> the direction of the line. It's not really well-tested in reality
> but that's the theory.
>
> The only index on the relation table is a btree on
>
>  (layer_id, topogeo_id, element_id, element_type)
>
> So there should be no way for that query to use the index unless the
> "topogeo_id" is also known, which is what we're looking for instead.
>
> Since you instead get a 200 times speedup, I'd be curious to find
> out why. Maybe that's Could you run that query trough EXPLAIN to check ?
> Is it really that abs(int) for 27,794,698 times takes ~3 seconds ?
> How would changing the query to use EXISTS help, if at all ?

For the record, these are the numbers I get on a 35,128,772 rows table:

strk=# with c as ( select id from tweets ) select count(*) from c;
 35128772
Time: 10305.404 ms
strk=# with c as ( select abs(id) from tweets ) select count(*) from c;
 35128772
Time: 11495.359 ms

> I suggest you file an enhancement ticket on trac for this.
> I'd be in favor of adding an index to speed up checking existance
> of TopoGeometries defined by given primitives.

I found a ticket which was filed starting from another function
( ST_RemEdgeModFace ) but is due to the same issue:
http://trac.osgeo.org/postgis/ticket/2083

You may add your observations there.

--strk;

  ()   Free GIS & Flash consultant/developer
  /\   http://strk.keybit.net/services.html


More information about the postgis-users mailing list