[postgis-devel] Discussion about routing in PostGIS

Tue Apr 19 02:44:36 PDT 2005

Hello,

I understand the concept of subgraph is a quite powerful datastructure
for dealing with large graph, avoiding the intermediate structure in
memory, and finding quickly neighbour nodes and edges.

I would however propose another more simple approach, which is not tied
to geographical graphs. This can be some "proof of concept" which could
lead to another implementation using your subgraph proposal.

Basically, we have a node and an edges table. The node table contains an
id and optional fields like the geometry. The edge table contains edge
id, node start and end id's, and other optional fields. The cost could
either be in the node / edge table, or in another table mapping
node/edge id to their cost (useful for precomputed costs scenarios (no
main-roads, fastest / shortest path, ...).

For the implementation we have either the dglib approach which consist
in fetching all nodes/edges, building the internal dglib graph
structure, and computing the shortest path. This is only one sql query
for fetching all data, but may consume more memory.

The other approach is to implement the Dijkstra algorithm directly in
PostGIS, fetching nodes successors for each step of the algorithm. This
approach will generate a query for each node at least.

If this compromise seems acceptable, I will do some testing with these
different approaches, and give you some feedback about the results.

Cheers,

Sylvain

Le jeudi 07 avril 2005 à 13:28 -0700, Chris Hodgson a écrit : 
> I think that in order to do routing _well_ in the database, there first 
> needs to be some concept of graph structure storage in the database. It 
> is clearly _possible_ to build a graph using standard geometries with 
> node ID's and/or edge ID's, and using standard indexes on the ID's will 
> speed up access to them. But  traversing this network will be pretty 
> slow if done by querying one ID at a time (ie. get all edges adjacent to 
> this node, repeated many times over). The only alternative (if using 
> standard database tables and indexes) is to read the entire table into 
> memory and build an in-memory graph data structure - which will 
> obviously be much faster, but we will be limited to dealing with graph 
> structures that are small enough to fit into memory - a limitation which 
> I don't think is acceptable.
> So, I think to do this right, we need a way to store a graph structure 
> in the database which is somehow more advanced than just a table of 
> geometries with their adjacencies identified. What we want is a way to 
> store on disk, the same graph structure that we would build in memory, 
> in a randomly-accessible fashion. That is to say, connected sub-graphs 
> of the complete graph should be stored in each table row, with some kind 
> of index to allow speedy access to the nieghboring sub-graphs.
> 
> My idea to build this "graph table" is something like this:
> 
> 1) break the graph into connected sub-graphs, with every edge belonging 
> to exactly one subgraph, and every node belonging to one or more sub-graph
> 2) index the graphs by the IDs of the of nodes on the edge of the subgraph
> 
> Now we can change the "get all edges adjacent to this node" query to a 
> "get all subgraphs adjacent to this node" query, which will have to be 
> done significantly fewer times within any given algorithm (eg. 
> shortest-path/routing).
> 
> I'm not sure what the ideal characteristics of the subgraphs are. They 
> definitely need to fit into a reasonable amount of memory - probably 
> fitting them into the page size of the database would be the way to go. 
> The geometric and topological characteristics need to be considered as 
> well... the more nodes inside each subgraph, and the fewer nodes on the 
> edges of each subgraph, the fewer queries will be required to traverse 
> them (I think?). Should they be geometrically clustered, or 
> topologically clustered?
> 
> One way I might see this working is that the graph structure table is 
> initially constructed from the geometry table, and then a trigger causes 
> any updates to the geometry table to cause similar updates in the graph 
> table - if this is the case, we also need to consider how our 
> subgraphing algorithm will be affected by updates to the geometries - 
> ideally the subgraphs will be able to shrink, and to grow and eventually 
> split, in an organic fashion.
> 
> Also, we would need to decide whether the attributes and/or geometries 
> of the features associated with each edge of the graph should be stored 
> inside the graph structure, or not. If they are not, then the edges of 
> the graph would simply store pointers to the associated features, and 
> our graph stucture becomes more of a new type of index than anything 
> else. I don't know if this sort of "graph index" could be implemented 
> using GiST, but I think it may be too different of an idea for GiST to 
> handle.
> 
> For the specific application of routing, it makes sense to store the 
> cost of each edge inside the graph structure, as this is the only 
> attribute (including the geometry) of interest for each edge. This could 
> be an optional additional piece of information stored in the index, or 
> we could decide that the graph should actually be stored its own table, 
> completely separate from the original feature table once it has been built.
> 
> Sorry if I'm rambling here, but since we're brainstormning, I think it 
> is relevant.
> 
> After considering the above, I think the idea that I like best is to 
> implement the graph structure inside a new kind of index on the geometry 
> table. This index would either need some additional parameters to 
> specify exactly how the graph is built, or the geometry table would be 
> required to have specific field names for the pieces of information 
> required to build the graph (node IDs). The index could also have the 
> option of storing the weight value of each edge inside the index, in 
> order to speed up routing-oriented queries.
> 
> Having not done much of the work on PostGIS, and never working 
> specifically within the Postgresql code, I'm guessing this would require 
> significant work, probably as much or more than PostGIS itself, and it 
> may need to be tightly integrated into postgresql (ie. need to have 
> changes made within postgresql in order to support it). Paul Ramsey and 
> I have discussed this sort of functionality before... I think it would 
> be very exciting to implement and could potentially be very "killer app" 
> to add to PostGIS - as if PostGIS wasn't already a killer app itself ;)
> 
> I think I'll leave it at that for now...
> 
> Chris
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel