[postgis-devel] Discussion about routing in PostGIS
Chris Hodgson
chodgson at refractions.net
Thu Apr 7 13:28:03 PDT 2005
I think that in order to do routing _well_ in the database, there first
needs to be some concept of graph structure storage in the database. It
is clearly _possible_ to build a graph using standard geometries with
node ID's and/or edge ID's, and using standard indexes on the ID's will
speed up access to them. But traversing this network will be pretty
slow if done by querying one ID at a time (ie. get all edges adjacent to
this node, repeated many times over). The only alternative (if using
standard database tables and indexes) is to read the entire table into
memory and build an in-memory graph data structure - which will
obviously be much faster, but we will be limited to dealing with graph
structures that are small enough to fit into memory - a limitation which
I don't think is acceptable.
So, I think to do this right, we need a way to store a graph structure
in the database which is somehow more advanced than just a table of
geometries with their adjacencies identified. What we want is a way to
store on disk, the same graph structure that we would build in memory,
in a randomly-accessible fashion. That is to say, connected sub-graphs
of the complete graph should be stored in each table row, with some kind
of index to allow speedy access to the nieghboring sub-graphs.
My idea to build this "graph table" is something like this:
1) break the graph into connected sub-graphs, with every edge belonging
to exactly one subgraph, and every node belonging to one or more sub-graph
2) index the graphs by the IDs of the of nodes on the edge of the subgraph
Now we can change the "get all edges adjacent to this node" query to a
"get all subgraphs adjacent to this node" query, which will have to be
done significantly fewer times within any given algorithm (eg.
shortest-path/routing).
I'm not sure what the ideal characteristics of the subgraphs are. They
definitely need to fit into a reasonable amount of memory - probably
fitting them into the page size of the database would be the way to go.
The geometric and topological characteristics need to be considered as
well... the more nodes inside each subgraph, and the fewer nodes on the
edges of each subgraph, the fewer queries will be required to traverse
them (I think?). Should they be geometrically clustered, or
topologically clustered?
One way I might see this working is that the graph structure table is
initially constructed from the geometry table, and then a trigger causes
any updates to the geometry table to cause similar updates in the graph
table - if this is the case, we also need to consider how our
subgraphing algorithm will be affected by updates to the geometries -
ideally the subgraphs will be able to shrink, and to grow and eventually
split, in an organic fashion.
Also, we would need to decide whether the attributes and/or geometries
of the features associated with each edge of the graph should be stored
inside the graph structure, or not. If they are not, then the edges of
the graph would simply store pointers to the associated features, and
our graph stucture becomes more of a new type of index than anything
else. I don't know if this sort of "graph index" could be implemented
using GiST, but I think it may be too different of an idea for GiST to
handle.
For the specific application of routing, it makes sense to store the
cost of each edge inside the graph structure, as this is the only
attribute (including the geometry) of interest for each edge. This could
be an optional additional piece of information stored in the index, or
we could decide that the graph should actually be stored its own table,
completely separate from the original feature table once it has been built.
Sorry if I'm rambling here, but since we're brainstormning, I think it
is relevant.
After considering the above, I think the idea that I like best is to
implement the graph structure inside a new kind of index on the geometry
table. This index would either need some additional parameters to
specify exactly how the graph is built, or the geometry table would be
required to have specific field names for the pieces of information
required to build the graph (node IDs). The index could also have the
option of storing the weight value of each edge inside the index, in
order to speed up routing-oriented queries.
Having not done much of the work on PostGIS, and never working
specifically within the Postgresql code, I'm guessing this would require
significant work, probably as much or more than PostGIS itself, and it
may need to be tightly integrated into postgresql (ie. need to have
changes made within postgresql in order to support it). Paul Ramsey and
I have discussed this sort of functionality before... I think it would
be very exciting to implement and could potentially be very "killer app"
to add to PostGIS - as if PostGIS wasn't already a killer app itself ;)
I think I'll leave it at that for now...
Chris
More information about the postgis-devel
mailing list