[pgrouting-dev] integer vs bigint for edge and node ids
Stephen Woodbridge
woodbri at swoodbridge.com
Wed Mar 18 13:07:52 PDT 2015
On 3/18/2015 2:40 PM, Luís de Sousa wrote:
> Hi Steve,
>
> Has there been any GSOC campaign with pgRouting? That would be the
> place to start IMHO. After that would be a call for BSc/MSc thesis;
> even if there is no budget, students may become interested in the
> project.
After many years of mentoring GSoC, I decided that I need a break this
year. Daniel may still take a student and I'll probably co-mentor for
that. But I'm not sure what that project will be. One thing we do need
to get get some students/developers with strong C++ backgrounds because
as you mention below we need to do some refactoring of the code and this
is not a good project for a junior programmer.
> The main difficulty with the problems you describe is that they are
> are not easy to tackle concurrently. The first step would have to be
> the refactoring. It could start with a first task to refactor just two
> of modules and extend it to the others on a second step. The bigint
> issue would be the last to tackle.
Yes, I agree. This will also require some restructuring ot the source
tree to move common classes somewhere that they are easily accessed by
the various algorithm code that needs them.
If the refactored code is written to work with bigints, then it is just
a matter of changing the C code that interfaces with postgresql to
handle the bigints. I think one of the utility classes in the
refactoring needs to be renumbering class that supports old-new and
new-old number translations. old-new renumbers the postgres ids into a
new internal numbering, and new-old translates the internal numbers back
to the postgres numbers.
If anyone has some time and/or ideas on this, we could start some wiki
pages to discuss the refactoring. The are a list of dijkstra related
algorithms below that would be candidates for refactoring.
One of the ugly areas of the code is all the C code that implements the
postgresql interfaces. This code is hugely redundant. I have been trying
to think of ways to clean this up. Some random ideas are to make a code
generator that takes a structure defining the interface and them
automatically generate the C code. Or make the interface code data
driven using some structures and standard functions to refactor the
current copy and paste mess. I'm sure there are other ideas out there.
> Just my thoughts. Regards,
Good ideas, thanks!
-Steve
>
> Luís
>
> On 15 March 2015 at 00:46, Stephen Woodbridge <woodbri at swoodbridge.com> wrote:
>> Developers,
>>
>> This topic came up off list and I thought this should be part of a more
>> public discussion, so here are my thoughts on it. I think both the ideas
>> below are worth pursuing but are huge tasks that are currently not funded
>> and I'm concerned that we should not tackle them on a piecemeal, ad-hoc
>> basis.
>>
>> Problem:
>>
>> postgis loads data with a gid column that is typed bigint.
>> OSM data has node and edge ids that can not be held in an integer field.
>> Today, all pgrouting code assumes ids are integer and larger numbers
>> overflow the field causing problems.
>>
>> So the obvious answer is to update the code to be bigint compatible.
>>
>> The issues related to doing this are as follows:
>>
>> * pgrouting has the bad habit of using ids as indexes into arrays which is a
>> problem. The code partially minimizes the indexing issue by located the
>> minimum id and subtracting that from all other ids shifting the range to
>> eliminate the leading empty ids, but if the (max_id - min_id) is large and
>> the ids are sparsely allocated across the range we still need to allocate
>> (max_id - min_id + 1) elements in our storage structures. For example if you
>> have two ids '2' and '1,000,002' the code allocates 1,000,001 elements. This
>> can obviously be fixed with some additional coding.
>>
>> * the code would/might need to be changed to such that the boost structures
>> can support bigint values and this will double the amount of storage needed
>> for ids. There is also the possibility that if the point above does
>> renumbering and we limit the number of ids in any given request to unsigned
>> integer that the code might not need changes and we would only limit the
>> size of the problem we can solve and max unsigned integer sized problem is
>> still larger than anyone would want to wait for a solution.
>>
>> * this would also imply that the output data types we use would get
>> converted to bigint also.
>>
>> * If we make this change, we need to do it to all our algorithms so we keep
>> the commands consistent and compatible with one another. This would involve
>> updating all of the follow algorithms in pgrouting:
>>
>> apsp_johnson
>> apsp_warshall
>> astar
>> bd_astar
>> bd_dijkstra
>> common
>> dijkstra
>> driving_distance
>> kdijkstra
>> ksp
>> shooting_star ** deprecated, will be deleted **
>> trsp
>> tsp
>> vrp_basic
>> vrpdptw
>>
>>
>> While this does not have anything to do with this specific problem, we need
>> to consider any major reworking of the code like this would involve with the
>> fact for historical evolution of the code base we have copied source code
>> from one algorithm to another so we have a lot of duplicated code. It would
>> make a lot of sense at this point to refactor some of this code into common
>> set of reusable C++ base classes. To put this into context all of the follow
>> code is in someway based off of the original dijkstra code:
>> astar
>> bd_astar
>> bd_dijkstra
>> dijkstra
>> driving_distance
>> kdijkstra
>> ksp
>> So every time we make a change to one of these algorithms, there is a good
>> chance that the problem exists in one or more of the other ones. We have
>> been struggling with this problem for some time.
>>
>> I don't want to blowup this problem into something that is so big it will
>> never get done, but I do think it is important that we think about these
>> issues and that we put together a plan that takes these issues into account
>> in the planning of these and other potential changes.
>>
>> Thoughts,
>> -Steve
>> _______________________________________________
>> pgrouting-dev mailing list
>> pgrouting-dev at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/pgrouting-dev
> _______________________________________________
> pgrouting-dev mailing list
> pgrouting-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/pgrouting-dev
>
More information about the pgrouting-dev
mailing list