[postgis-devel] Discussion about routing in PostGIS

Thu Apr 7 13:28:03 PDT 2005

I think that in order to do routing _well_ in the database, there first 
needs to be some concept of graph structure storage in the database. It 
is clearly _possible_ to build a graph using standard geometries with 
node ID's and/or edge ID's, and using standard indexes on the ID's will 
speed up access to them. But  traversing this network will be pretty 
slow if done by querying one ID at a time (ie. get all edges adjacent to 
this node, repeated many times over). The only alternative (if using 
standard database tables and indexes) is to read the entire table into 
memory and build an in-memory graph data structure - which will 
obviously be much faster, but we will be limited to dealing with graph 
structures that are small enough to fit into memory - a limitation which 
I don't think is acceptable.

So, I think to do this right, we need a way to store a graph structure 
in the database which is somehow more advanced than just a table of 
geometries with their adjacencies identified. What we want is a way to 
store on disk, the same graph structure that we would build in memory, 
in a randomly-accessible fashion. That is to say, connected sub-graphs 
of the complete graph should be stored in each table row, with some kind 
of index to allow speedy access to the nieghboring sub-graphs.

My idea to build this "graph table" is something like this:

1) break the graph into connected sub-graphs, with every edge belonging 
to exactly one subgraph, and every node belonging to one or more sub-graph
2) index the graphs by the IDs of the of nodes on the edge of the subgraph

Now we can change the "get all edges adjacent to this node" query to a 
"get all subgraphs adjacent to this node" query, which will have to be 
done significantly fewer times within any given algorithm (eg. 
shortest-path/routing).

I'm not sure what the ideal characteristics of the subgraphs are. They 
definitely need to fit into a reasonable amount of memory - probably 
fitting them into the page size of the database would be the way to go. 
The geometric and topological characteristics need to be considered as 
well... the more nodes inside each subgraph, and the fewer nodes on the 
edges of each subgraph, the fewer queries will be required to traverse 
them (I think?). Should they be geometrically clustered, or 
topologically clustered?

One way I might see this working is that the graph structure table is 
initially constructed from the geometry table, and then a trigger causes 
any updates to the geometry table to cause similar updates in the graph 
table - if this is the case, we also need to consider how our 
subgraphing algorithm will be affected by updates to the geometries - 
ideally the subgraphs will be able to shrink, and to grow and eventually 
split, in an organic fashion.

Also, we would need to decide whether the attributes and/or geometries 
of the features associated with each edge of the graph should be stored 
inside the graph structure, or not. If they are not, then the edges of 
the graph would simply store pointers to the associated features, and 
our graph stucture becomes more of a new type of index than anything 
else. I don't know if this sort of "graph index" could be implemented 
using GiST, but I think it may be too different of an idea for GiST to 
handle.

For the specific application of routing, it makes sense to store the 
cost of each edge inside the graph structure, as this is the only 
attribute (including the geometry) of interest for each edge. This could 
be an optional additional piece of information stored in the index, or 
we could decide that the graph should actually be stored its own table, 
completely separate from the original feature table once it has been built.

Sorry if I'm rambling here, but since we're brainstormning, I think it 
is relevant.

After considering the above, I think the idea that I like best is to 
implement the graph structure inside a new kind of index on the geometry 
table. This index would either need some additional parameters to 
specify exactly how the graph is built, or the geometry table would be 
required to have specific field names for the pieces of information 
required to build the graph (node IDs). The index could also have the 
option of storing the weight value of each edge inside the index, in 
order to speed up routing-oriented queries.

Having not done much of the work on PostGIS, and never working 
specifically within the Postgresql code, I'm guessing this would require 
significant work, probably as much or more than PostGIS itself, and it 
may need to be tightly integrated into postgresql (ie. need to have 
changes made within postgresql in order to support it). Paul Ramsey and 
I have discussed this sort of functionality before... I think it would 
be very exciting to implement and could potentially be very "killer app" 
to add to PostGIS - as if PostGIS wasn't already a killer app itself ;)

I think I'll leave it at that for now...

Chris