[pgrouting-users] dijkstra_sp_delta throwing a signal 6 (SIGABRT?)

Stephen Woodbridge woodbri at swoodbridge.com
Mon Feb 28 17:43:36 EST 2011


The only think(s) that I can think of are:

1. it could be caused by a call to abort() or assert() in the C code, but:

woodbri at mappy:~/work/pgrouting-git/pgrouting$ find * -type f -exec grep 
-l -i abort {} \;
woodbri at mappy:~/work/pgrouting-git/pgrouting$ find * -type f -exec grep 
-l -i assert {} \;
core/src/CMakeFiles/routing.dir/depend.make
core/src/CMakeFiles/routing.dir/astar_boost_wrapper.o
core/src/CMakeFiles/routing.dir/shooting_star_boost_wrapper.o
core/src/CMakeFiles/routing.dir/depend.internal
core/src/CMakeFiles/routing.dir/boost_wrapper.o
core/src/CMakeFiles/routing.dir/CXX.includecache
lib/librouting.so

So it does not look like we have one in our source code, but there 
appears to be references in the .o that might be referenced by compiler 
generated code or includes outside our source tree like boost or system 
libs.

2. I suppose it is possible that the server is sending a SIGABRT to a 
child process that is doing something bad like taking too much memory. 
Or maybe there is an OOM (Out Of Memory) watchdog process killing it 
with a SIGABRT.

Have you watched this with top? or some other process watcher?

Hopefully, you can extract the SQL and run it from the command line so 
we can get a better hand on what is happening and what the query is.

-Steve


On 2/28/2011 2:49 PM, Richard Marsden wrote:
> Well I've moved forward and now have code in production calculating
> mileages from OpenStreetMap data: I've calculated mileage charts for
> Oceania and Africa.
> The secret to get that far was to move operating systems from Windows to
> Ubuntu and then upgrade to pgRouting 1.05. (PostGres 8.4, Ubuntu 10)
> Th computations are being performed with dijkstra_sp_delta.
>
> However now I'm hitting another "server closed the connection
> unexpectedly" error.
>
> Looking in the server logs, I find the LOG message "server process (PID
> 19133) was terminated by signal 6: Aborted"
>  >From what I can tell, Signal 6 on Ubuntu is indeed a SIGABRT. There
> are no other log messages to indicate why Postgres/pgRouting threw a
> SIGABRT.
>
> This is then followed by warnings and log messages saying other active
> server processes are being terminated, transactions rolled back, etc.
>
> This error occurs at a reproducible point in a fairly sophisticated
> (multi-processor, Python, psycopg) script. Although I'm pretty certain
> of the SQL that is causing the problem, at the moment I don't have the
> exact parameters (ie. graph nodes). I'm about to run the script
> single-threaded with diagnostics so I should be able to get a single SQL
> statement to reproduce the problem on a psql command line. In the worst
> case, this could take a couple of days.
> No other programs are running that are calling Postgres.
>
> My graph consists of the global OSM street data loaded into PostGIS with
> osm2po. I have checked for links of zero length. In fact all links <1m
> long have been taken out of the graph. I've just double checked costs
> and reverse_costs:  all are positive (I've set these to the lengths)
>
> I've just checked for start & end nodes being the same (ie. resulting in
> dijkstra_sp_delta being called with the some node identifier for the
> start and end): Yes my data has a few of these, but I'm pretty certain
> the crash occurs before they appear. However, I'm going to add code to
> detect these - there's no point in executing an SQL statement for
> something that can be calculated in a trivial line of python.
>
> What else should I be looking for? Are there any known problems I should
> look for? Is there any way of finding out what is causing the Signal 6?
>
> Once I have the node identifiers that are causing the problem, I should
> be able to make en exportable-extract of the graph to give a
> reproducible dataset and matching SQL statement. Would anyone be able to
> investigate this?
>
> Is there any way of making pgRouting / PostGres handle these situations
> more cleanly? At the moment, the crash is taking the server down with
> it. The crash is perhaps the first to occur after roughly 1 million
> route calculations: I can live with that failure rate - but only if my
> scripts can cleanly detect and recover from it. I guess ideally the
> server should stay up and a status value (or exception - but that
> probably wouldn't work across so many code boundaries) be returned.
>
>
>
> Best regards,
>
>
> Richard Marsden
>
>
>
> _______________________________________________
> Pgrouting-users mailing list
> Pgrouting-users at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/pgrouting-users



More information about the Pgrouting-users mailing list