[gdal-dev] ogr2ogr intersect very slow at command line with large sqlite input
Stephen Woodbridge
woodbri at swoodbridge.com
Mon Apr 10 19:11:19 PDT 2017
Chris,
I'm not suggesting that you change your workflow yet, only that you try
some experiments. I've worked with both Spatialite and postgis, but I
have much more experience with postgis.
While they both have many of same functions, the interactions with the
indexes is much more automatic and intuitive with postgis. I was
recently surprised to learn that st_intersects() is faster than
st_dwithin() in postgis. I can't say if the same is true for spatialite
because of the indexing differences that you have already run into.
Regarding your roads data, OSM stores everything as two noded straight
line segments, but many import tools chain them together into multiple
point line segments, which makes the bbox for the linestring larger,
which causes it to intersect with more other linestring bboxes so more
computational test.
If you are only looking for intersections at line ends and only
intersections at line ends where the roads have different names then you
could try a different approach
create a table of line end points, you could snap the floating point xy
values to a grid.
create table nodes (nid serial, geom geometry);
insert into nodes (geom)
select distinct geom
select st_snaptogrid(st_startpoint(geom), 0.000001) as geom from table
union all
select st_snaptogrid(st_endpoint(geom), 0.000001) as geom from table;
Then create a table assigning nid values to the lines
create table node2line (gid integer, nid integer, wend integer);
-- insert the start nodes into table
insert into node2line
select a.gid, b.nid, 0
from table a, nodes b
where st_snaptogrid(st_startpoint(a.geom), 0.000001)=b.geom;
-- insert the end nodes into table
insert into node2line
select a.gid, b.nid, 1
from table a, nodes b
where st_snaptogrid(st_endpoint(a.geom), 0.000001)=b.geom;
-- get the intersections from the node ids
select a.nid, a.gid, a.wend, b.gid, b.wend
from node2line a, node2line b
where a.nid=b.nid;
You will probably want to add some more filtering to eliminate dups, and
maybe another join to match sort out street names. But this can be done
mostly with simple btree indexes.
Also because you have snaptogrid, you can make the geom column text and
and populate it with st_astext()
I do something similar in postgis in my geocoder to identify intersections.
-Steve
On 4/10/2017 6:38 PM, CTL101 wrote:
> Hi again Steve, The input is osm data, filtered so that only highways of some
> significance remain. I am trying to get road intersections as points based
> on certain criteria. I have an ogr2ogr sqlite based workflow that works on
> test data. But scaling up is proving to be a problem for this particular
> intersect operation, which occurs early on. Based on my source and sample
> data i would say that there are possibly several million intersections
> between the tables to compute. The intersection is between lines (roads),
> but I am only interested in the point output where lines meet. If there is a
> more efficient algorithm that i could use in ogr2ogr then i would be
> grateful of advice.
>
> I had not considered postgresql for this part of the process but would
> prefer to streamline the workflow rather than create a new one if avoidable.
> Might conversion of the sqlite tables to another format in ogr2ogr yield
> significant performance dividends? Or do I need to resign myself to trying a
> fresh approach?
> Regards, Chris
>
>
>
> --
> View this message in context: http://osgeo-org.1560.x6.nabble.com/ogr2ogr-intersect-very-slow-at-command-line-with-large-sqlite-input-tp5316545p5316749.html
> Sent from the GDAL - Dev mailing list archive at Nabble.com.
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
More information about the gdal-dev
mailing list