[RouterGeocoder] Routing Data Schema

Mon Mar 30 02:18:29 EDT 2009

Hi all,

I suggested that a potential SoC project might be building an abstract 
schema for OpenRouter. I thought I might take this opportunity to better 
define what I'm proposing and open it for input, comments, and 
suggestions. Even if it is not appropriate for SoC, I think it has value 
  regardless and I look forward to a good discussion on it.

Goal:

Define an abstract schema that can be used as an intermediate data store 
for building router import tools. It is not expected that any router 
would use the data in this form. The idea is that it would be easy for 
people to write import tool into this schema, and that routing engines 
could then import data from this schema.

Why bother with a two step process?

Routing engines, and there may be more than one, only need to write a 
single import tool from this schema to benefit from many vendor specific 
imports the load data into this schema. This means that instead of 
needing M*N import tools, we only need M+N import tools.

The second reason is that this allows us to develop vendor specific 
tools prior to or in parallel to router engine development.

pgRouting  -+             +- OSM import
OpenRouter -+    Open     +- Navteq import
...        -+<-- Data  <--+- TeleAtlas import
...        -+    Schema   +- ...
...        -+             +- ...

Other benefits:

There is some preprocessing of the data that can be achieved within this 
  Data schema that can minimize the additional processing needed to 
import it into a specific router's internal storage which will likely be 
optimized for that routing engine.

Thought on Table Design:

My current thinking on this is a set of tables that could be represented 
in a database at least for the purpose of prototyping the concept. Most 
of the tables are normalized, except the ARC table. This could be 
changed, but my thought was that this might be more useful in a 
partially denormalized form.

ARC  - Table of road segments
---------------
  UID           - ARC_ID, Unique record ID for table
  DATASRC_ID    - reference to entry in DATASRC table
  SRC_ID        - Unique ID within the original source data or index num.
  X1            - X value of FROM end of segment
  Y1            - Y value of FROM end of segment
  Z1            - Zlevel of FROM end of segment
  X2            - X value of TO end of segment
  Y2            - Y value of TO end of segment
  Z2            - Zlevel of TO end of segment
  NODE1_ID      - NODE_ID for FROM end of segment
  NODE2_ID      - NODE_ID for TO end of segment
  COST          - COST to traverse segment FROM -> TO
  RCOST         - COST to traverse segment TO -> FROM
  LENGTH        - Length of segment in meters
  SPEED         - Average speed for segment
  ONEWAY        - Flag to indicate ONEWAY {B|F|T}
  HAS_XATTRIB   - Flag to indicate if this segment has XATTRIB records
  HAS_ARESTRICT - Flag to indicate if this segment has ARESTRICT records

DATASRC  - Table of data sources the ARCs were loaded from
---------------
  UID           - DATASRC_ID, Unique record ID for table
  SRC_TYPE      - Type of data source, this probably needs to be a well
                  defined set.
  SRC           - SRC identifier, like filename
  GEOM_COLUMN   - Column name for geometry column if a database
  OTHER         - SRC_TYPE specific information, like a database
                  connection string.

NODE  - List of unique nodes in the model
---------------
  UID           - NODE_ID, Unique record ID for table
  X             - X value for node
  Y             - Y value for node
  Z             - Zlevel for node

XATTRIB  - List of extended attributes associate with any given ARC
---------------
  UID           - UID, Unique record id for table
  ARC_ID        - ARC_ID that this record belongs to
  SEQNO         - Sequence number in case attributes need order
  KEY           - Type of attribute
  VALUE         - Value of attribute

ARESTRICT  - List of Restrictions associated with any given ARC
---------------
  UID           - UID, Unique record id for table
  ARC_ID        - ARC_ID that this record belongs to
  TIME_FROM     - Start time for restriction, DATE-TIME or TIME
  TIME_TO       - End time for restriction, DATE-TIME or TIME
  VEHICLE_CLASS - Vehicle class that this restriction belongs to
  KEY           - Type of restriction
  VALUE         - Value of restriction

NRESTRICT  - List of Restrictions associated with any given NODE
---------------
  UID           - UID, Unique record id for table
  NODE_ID       - NODE_ID that this record belongs to
  TIME_FROM     - Start time for restriction, DATE-TIME or TIME
  TIME_TO       - End time for restriction, DATE-TIME or TIME
  VEHICLE_CLASS - Vehicle class that this restriction belongs to
  KEY           - Type of restriction
  VALUE         - Value of restriction

Some Notes and thoughts on this:

In general, things like KEY values need to be a well defined set of 
values and should be validated on input.

1) For bicycling and heavy truck hauling we might want to have altitude 
loss or gain or grade. This can probably be held in the XATTRIB table.

2) We might want to have a METADATA table with key/value pairs to 
indicate what type of restriction and/or attributes the data set 
includes. This should also include a VERSION key with an appropriate 
value indicating the schema version number.

3) ARESTRICT could be used to indicate, NO TRUCKS, Weight restrictions, 
PEDESTRIANS ONLY, 4 Wheel drive only, etc. These can then be used to 
eliminate ARCs that might not be appropriate when importing into the 
router or to add additional information to the ARCs in the router 
depending on features and capabilities of the engine.

4) NRESTRICT could be used to store turn restrictions or any other 
restrictions that might need to be applied to a route passing through 
this NODE. KEY indicates the type of restriction, and VALUE for TURN 
restrictions, would be a list of CHILDREN_IDs, ":", list of 
ANCESTOR_IDs. And CHILDREN would be the potential CHILDREN that would be 
restricted if the path came from the listed ancestors

5) This does not include information that would be needed for route 
explication. This should probably be added. This also does not copy the 
ARC geometry to a local table.

6) I'm not sure we need to keep DATASRC table or not. If we keep a local 
copy of all the needed data then we might not. There might be some value 
if we wanted to extend this to support incremental updates, for example 
if a whole state or country was a single data source you could 
potentially purge all the ARCs associated with it, and any referential 
data records, then import that data again without having to reload the 
whole thing.

7) I have not really thought through signage yet. Navteq data has 
information about Signs along the route that can be useful during 
explication. These may fit in the XATTRIB, but I need to check this and 
how you determine which sign(s) are used for which MANEUVER.

8) I also think we can define pseudo-code process of how to load the 
data like:
   a) open a data source
   b) insert a record into the DATASRC table
   c) foreach record in the data source
   d)    insert a record into the ARC
   e)    add any required XATTRIB or ARESTRICT records to each ARC
   f) generate the NODE table from the ARC table
   g)    and update the ARCs with NODE_IDs
   h) Add Turn restrictions to the NRESTRICT table
   i) Run sanity checking for:
      1. zero or negative cost/length segments
      2. isolated single segments
      3. other TBD

Sorry it is late here and I can not think through this anymore tonight. 
This is a good start and I hope seeing the data laid out like this helps 
to show how a simple schema might be useful and that having an 
intermediate schema like this might be of value and not too overly 
complex to implement.

Questions, Concerns, Issues, Missing information?

Best regards,
   -Steve