[pgrouting-dev] integer vs bigint for edge and node ids

Sat Mar 14 16:46:43 PDT 2015

Developers,

This topic came up off list and I thought this should be part of a more 
public discussion, so here are my thoughts on it. I think both the ideas 
below are worth pursuing but are huge tasks that are currently not 
funded and I'm concerned that we should not tackle them on a piecemeal, 
ad-hoc basis.

Problem:

postgis loads data with a gid column that is typed bigint.
OSM data has node and edge ids that can not be held in an integer field.
Today, all pgrouting code assumes ids are integer and larger numbers 
overflow the field causing problems.

So the obvious answer is to update the code to be bigint compatible.

The issues related to doing this are as follows:

* pgrouting has the bad habit of using ids as indexes into arrays which 
is a problem. The code partially minimizes the indexing issue by located 
the minimum id and subtracting that from all other ids shifting the 
range to eliminate the leading empty ids, but if the (max_id - min_id) 
is large and the ids are sparsely allocated across the range we still 
need to allocate (max_id - min_id + 1) elements in our storage 
structures. For example if you have two ids '2' and '1,000,002' the code 
allocates 1,000,001 elements. This can obviously be fixed with some 
additional coding.

* the code would/might need to be changed to such that the boost 
structures can support bigint values and this will double the amount of 
storage needed for ids. There is also the possibility that if the point 
above does renumbering and we limit the number of ids in any given 
request to unsigned integer that the code might not need changes and we 
would only limit the size of the problem we can solve and max unsigned 
integer sized problem is still larger than anyone would want to wait for 
a solution.

* this would also imply that the output data types we use would get 
converted to bigint also.

* If we make this change, we need to do it to all our algorithms so we 
keep the commands consistent and compatible with one another. This would 
involve updating all of the follow algorithms in pgrouting:

apsp_johnson
apsp_warshall
astar
bd_astar
bd_dijkstra
common
dijkstra
driving_distance
kdijkstra
ksp
shooting_star ** deprecated, will be deleted **
trsp
tsp
vrp_basic
vrpdptw

While this does not have anything to do with this specific problem, we 
need to consider any major reworking of the code like this would involve 
with the fact for historical evolution of the code base we have copied 
source code from one algorithm to another so we have a lot of duplicated 
code. It would make a lot of sense at this point to refactor some of 
this code into common set of reusable C++ base classes. To put this into 
context all of the follow code is in someway based off of the original 
dijkstra code:
   astar
   bd_astar
   bd_dijkstra
   dijkstra
   driving_distance
   kdijkstra
   ksp
So every time we make a change to one of these algorithms, there is a 
good chance that the problem exists in one or more of the other ones. We 
have been struggling with this problem for some time.

I don't want to blowup this problem into something that is so big it 
will never get done, but I do think it is important that we think about 
these issues and that we put together a plan that takes these issues 
into account in the planning of these and other potential changes.

Thoughts,
   -Steve