[pgrouting-dev] GSoC partitioning project
Stephen Woodbridge
woodbri at swoodbridge.com
Fri Jul 5 07:19:59 PDT 2013
Mukul asks:
> can you help me coming up with basic class skeletons for retrieving
> partitions using an index key and for keeping a track of loaded
> partitions as well .
The problem:
* we have an incomplete graph to solve
* we need to find a node in the graph but it is not in the graph
* we call a method graph::LoadPartition(partition_id) to load the
partition with that node
* if this fails then the route fails and we clean up, and throw an error
* otherwise we continue solving the graph until we need to load another
partition
This assumes it is cheap to find a node or to find that it is missing.
But the question is how do we define "missing".
A node has two pieces of information, the node_id and the partition_id.
We can can make some assumptions here that if both ends of an edge are
in the same partition then we will never find it in the graph unless the
partition has been loaded. So edges that cross partition boundaries are
the ones that concern us because the adjacent partition may or may not
have been loaded.
So we might define an edge like:
eid -- edge id
sid -- source node id
spid -- source partition id
tid -- target node id
tpid -- target partition id
cost
rcost
...
We could set both spid and tpid to zero if the edge is not a boundary
edge. So a simple check if either of these is zero then we know both are
loaded. If they are not zero then we need to check if the partition is
loaded and if not call graph::LoadPartiton(partition_id).
I would recommend maintaining a bit array where the partition_id >0 is
the index into the bit array.
How does graph::LoadPartiton(partition_id) work?
After we partition the data we have a tables like:
CREATE TABLE grid_vertices
(
id integer NOT NULL, -- node id
pid integer, -- partition id
the_geom geometry,
quadkey text,
CONSTRAINT grid_vertices_pkey PRIMARY KEY (id)
);
The following is the edge table and it can have other columns as need or
it can be a view with the source, target, sgid, and tgid columns joined
from another table.
CREATE TABLE grid
(
gid serial NOT NULL,
source integer, -- source node_id
target integer, -- target node_id
cost double precision,
sgid integer, -- source grid_id (partition_id)
tgid integer, -- target grid_id (partition_id)
CONSTRAINT grid_pkey PRIMARY KEY (gid)
);
So the LoadPartition() method will call a C function that will run a
query using the postgresql SPI functions, build a struct array and
return the array pointer to the method that will load the array into the
graph and update the bit array that tracks the loaded partitions. Then
free the struct array and return ok or fail.
The C code will execute a query to select all the edges that have at
least one end in that partition:
select * from grid where sgid=<pid>
union
select * from grid where tgid=<pid>
both sgid and tgid columns should be indexed on the table.
The struct will be something like:
typedef struct partition_t {
integer gid;
integer source;
integer spart;
integer target;
integer tpart
double precision cost;
double precision rcost;
...
}
You will need to return the edge count for the partition_t[] array or
add a dummy record with gid = -1 to signal the end of the array.
When the method loads the edges, if will zero the partition_ids if both
are the same. It will need to check if they are different, and check if
the other partition is already load if that edge as already been loaded
into the graph to avoid duplication of the edges for these boundary edges.
Does this cover everything?
Do you need some help with the C code? and processing the SQL query?
-Steve
More information about the pgrouting-dev
mailing list