[pgrouting-dev] integer vs bigint for edge and node ids
Stephen Woodbridge
woodbri at swoodbridge.com
Sat Mar 14 16:46:43 PDT 2015
Developers,
This topic came up off list and I thought this should be part of a more
public discussion, so here are my thoughts on it. I think both the ideas
below are worth pursuing but are huge tasks that are currently not
funded and I'm concerned that we should not tackle them on a piecemeal,
ad-hoc basis.
Problem:
postgis loads data with a gid column that is typed bigint.
OSM data has node and edge ids that can not be held in an integer field.
Today, all pgrouting code assumes ids are integer and larger numbers
overflow the field causing problems.
So the obvious answer is to update the code to be bigint compatible.
The issues related to doing this are as follows:
* pgrouting has the bad habit of using ids as indexes into arrays which
is a problem. The code partially minimizes the indexing issue by located
the minimum id and subtracting that from all other ids shifting the
range to eliminate the leading empty ids, but if the (max_id - min_id)
is large and the ids are sparsely allocated across the range we still
need to allocate (max_id - min_id + 1) elements in our storage
structures. For example if you have two ids '2' and '1,000,002' the code
allocates 1,000,001 elements. This can obviously be fixed with some
additional coding.
* the code would/might need to be changed to such that the boost
structures can support bigint values and this will double the amount of
storage needed for ids. There is also the possibility that if the point
above does renumbering and we limit the number of ids in any given
request to unsigned integer that the code might not need changes and we
would only limit the size of the problem we can solve and max unsigned
integer sized problem is still larger than anyone would want to wait for
a solution.
* this would also imply that the output data types we use would get
converted to bigint also.
* If we make this change, we need to do it to all our algorithms so we
keep the commands consistent and compatible with one another. This would
involve updating all of the follow algorithms in pgrouting:
apsp_johnson
apsp_warshall
astar
bd_astar
bd_dijkstra
common
dijkstra
driving_distance
kdijkstra
ksp
shooting_star ** deprecated, will be deleted **
trsp
tsp
vrp_basic
vrpdptw
While this does not have anything to do with this specific problem, we
need to consider any major reworking of the code like this would involve
with the fact for historical evolution of the code base we have copied
source code from one algorithm to another so we have a lot of duplicated
code. It would make a lot of sense at this point to refactor some of
this code into common set of reusable C++ base classes. To put this into
context all of the follow code is in someway based off of the original
dijkstra code:
astar
bd_astar
bd_dijkstra
dijkstra
driving_distance
kdijkstra
ksp
So every time we make a change to one of these algorithms, there is a
good chance that the problem exists in one or more of the other ones. We
have been struggling with this problem for some time.
I don't want to blowup this problem into something that is so big it
will never get done, but I do think it is important that we think about
these issues and that we put together a plan that takes these issues
into account in the planning of these and other potential changes.
Thoughts,
-Steve
More information about the pgrouting-dev
mailing list