[gdal-dev] ogr2ogr intersect very slow at command line with large sqlite input

Stephen Woodbridge woodbri at swoodbridge.com
Mon Apr 10 19:11:19 PDT 2017


I'm not suggesting that you change your workflow yet, only that you try 
some experiments. I've worked with both Spatialite and postgis, but I 
have much more experience with postgis.

While they both have many of same functions, the interactions with the 
indexes is much more automatic and intuitive with postgis. I was 
recently surprised to learn that st_intersects() is faster than 
st_dwithin() in postgis. I can't say if the same is true for spatialite 
because of the indexing differences that you have already run into.

Regarding your roads data, OSM stores everything as two noded straight 
line segments, but many import tools chain them together into multiple 
point line segments, which makes the bbox for the linestring larger, 
which causes it to intersect with more other linestring bboxes so more 
computational test.

If you are only looking for intersections at line ends and only 
intersections at line ends where the roads have different names then you 
could try a different approach

create a table of line end points, you could snap the floating point xy 
values to a grid.

create table nodes (nid serial, geom geometry);
insert into nodes (geom)
select distinct geom
select st_snaptogrid(st_startpoint(geom), 0.000001) as geom from table
union all
select st_snaptogrid(st_endpoint(geom), 0.000001) as geom from table;

Then create a table assigning nid values to the lines

create table node2line (gid integer, nid integer, wend integer);

-- insert the start nodes into table
insert into node2line
select a.gid, b.nid, 0
   from table a, nodes b
  where st_snaptogrid(st_startpoint(a.geom), 0.000001)=b.geom;

-- insert the end nodes into table
insert into node2line
select a.gid, b.nid, 1
   from table a, nodes b
  where st_snaptogrid(st_endpoint(a.geom), 0.000001)=b.geom;

-- get the intersections from the node ids
select a.nid, a.gid, a.wend, b.gid, b.wend
   from node2line a, node2line b
  where a.nid=b.nid;

You will probably want to add some more filtering to eliminate dups, and 
maybe another join to match sort out street names. But this can be done 
mostly with simple btree indexes.

Also because you have snaptogrid, you can make the geom column text and 
and populate it with st_astext()

I do something similar in postgis in my geocoder to identify intersections.


On 4/10/2017 6:38 PM, CTL101 wrote:
> Hi again Steve, The input is osm data, filtered so that only highways of some
> significance remain. I am trying to get road intersections as points based
> on certain criteria. I have an ogr2ogr sqlite based workflow that works on
> test data. But scaling up is proving to be a problem for this particular
> intersect operation, which occurs early on. Based on my source and sample
> data i would say that there are possibly several million intersections
> between the tables to compute. The intersection is between lines (roads),
> but I am only interested in the point output where lines meet. If there is a
> more efficient algorithm that i could use in ogr2ogr then i would be
> grateful of advice.
> I had not considered postgresql for this part of the process but would
> prefer to streamline the workflow rather than create a new one if avoidable.
> Might conversion of the sqlite tables to another format in ogr2ogr yield
> significant performance dividends? Or do I need to resign myself to trying a
> fresh approach?
> Regards, Chris
> --
> View this message in context: http://osgeo-org.1560.x6.nabble.com/ogr2ogr-intersect-very-slow-at-command-line-with-large-sqlite-input-tp5316545p5316749.html
> Sent from the GDAL - Dev mailing list archive at Nabble.com.
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev

This email has been checked for viruses by Avast antivirus software.

More information about the gdal-dev mailing list