[postgis-users] How to match lines between two similar graphs?

Thu Jan 26 07:41:48 PST 2012

> Hi All,
>
> I have an interesting problem I am trying to solve and would love some
> feedback on how to best go about it.
>
> I have road data from two different vendor data sets. But this could also be
> one road network and a GPS track, so I think this is a pretty common use
> case.
>
> Assumptions:
>
> o the networks are similar, ie: they have similar roadway coverage
> o the two sets might be slightly misaligned, ie: shifted by some amount
> o the segments in the two data sets do not have to be broken into equivalent
> segments, ie: one segment in A might be represented my multiple segments in
> B
> o segments are not aligned end point wise, ie: a segment in A might go from
> mid-point one segment in B to the midpoint of a connected segment in B
> o in many cases I will be working with a set of lines in one set that I need
> to match to the other to select a matching set of lines.
>
> So strategies for matching these:
>
> 1. take a segment from A and buffer it, then intersect the data in B and
> select the longest intersected object. I can probably throw out any pieces
> smaller than the buffer distance.
>
> 2. Do the same but buffer and union the set of lines into a single
> multipolygon, and intersect that with the other set.
>
> 3. ??? Other ideas?
>
> Thoughts on performance?
>
> Typically I will have a small set (1-20) of segments to compare against a
> larger (100K-2M) set. Obvious a spatial index will will be used. But I'm
> wondering what is the fast way to do this matching computationally. I think
> I will want to be able to compare 1-200 sets like this every 5 mins as data
> comes in from a feed, while supporting other queries.
>
> Thoughts would be appreciated.
>

Hi,

I worked on a similar project some month ago: The purpose was to
"transfer" routes made by users based on Navteq 2005 (NVQ05) onto the
Navteq 2009 (NVQ2009) dataset: the 2 networks are not perfectly
aligned, and some road are replaced by new ones (typically, straight
roads can be replaced by roundabouts).

The dataset to process was huge (hundreds of millions of segments to process).

Working with buffers seem to give a good result, but the performance
was very slow, even on the small sample I tested against.
Finally, I came out with the following solution (giving between 95% to
100% of network length rebuilt on the new network, for about 650 urban
areas):

The idea was to use the fact that segments are topologically clean:
they connected to each other.

• For each route in the input NVQ05 dataset, find and build the set of
possible candidates in NVQ09: those closer than a certain distance
(the maximum shift between the 2 networks). This builds a "sub
network" for each input route. Its characteristic is to have a lot of
non connected segments, and only a few fully connected segments. These
represent the possible new routes.

• Determine the new start and end points of routes in the new network,
by choosing the closest, most aligned segment in NVQ09.

• Use an iterative SQL (WITH RECURSIVE) query into a PLPGSQL loop to
rebuild each road.
>From the new start point, find the next connected vertex that belongs
this start segment and to another connected segment in the subnetwork.
Proceed with the next vertex.
The iteration "rebuilds" the route from the start point, to the last point.

The result was not very good at this stage, and I decided to use
pgRouting to complete missing parts in the rebuilt network:
Once the first operation is processed, if the new route is still a
multilinestring, I use pgRouting to route between the last point of a
linestring to the first point of the next linestring. I fed pgRouting
with a small set of candidates (typically, the subnetwork built
earlier) and let it find the route.

The result here was pretty good, as most of the time, the shortest
path is the right solution to match the 2 linestrings.

The trick to be fast was to build a table of "good" vertices, from the
subnetwork. This table stores, for each subnetwork vertex, the number
of segments this vertex connects to.
During the iteration, I use this table to find next good vertex.

I realize, at the end of this email, that a good schema could have
been a good idea...

Nicolas
Nicolas