[pgrouting-users] dijkstra_sp_delta throwing a signal 6 (SIGABRT?)

Stephen Woodbridge woodbri at swoodbridge.com
Mon Feb 28 22:11:50 EST 2011


On 2/28/2011 8:54 PM, Richard Marsden wrote:
> Thanks for the reply and suggestions.
>
> Well I have now run the same script but with just one thread/process. I
> would have expected this to have worked if it was a "bulk" out of memory
> problem (only one pgRouting process running). It failed. Also with
> better diagnostics of my I own, I tried to recreate the SQL statements
> on the command line - unfortunately these worked!
>
> I have been using the System Monitor for a while. Previously this showed
> it hitting swap memory occasionally , so I've bumped the machine memory
> from 4GB to 8GB, and it hasn't done since.
> (Yes this is running 32 bit mainly because PostGIS is 32 bit, but I

I run Postgres on my amd64 boxes and I'm pretty sure they are all 64bit 
processes. I do not have that much memory though. I'm running Debian.

> understand modern Linux has a way to handle more memory (but limited per
> process) - and it was using the full 4GB). I do note that it does not
> appear to have gone beyond a full 4GB (+ephemera) memory usage.

Off hand I would stay it is not good that you are running close to 4GB 
because if I recall this is one of those magic boundaries where a 32bit 
points warps back to around or overflows.

Also, since you run threads all your threads are in the same process 
memory and their combined memeory can not exceed whatever the process 
limit is.

> I tried to adjust the shared memory parameter in PostGres but I think
> the default must be close to the maximum for standard Ubuntu (something
> about having to rebuild the kernel to change SHMEM). So the PostGres
> shared memory setting is back to its default (28MB). work_mem has been
> upped to 256MB. This change was after the first crash.

From:
http://www.postgresql.org/docs/8.4/static/kernel-resources.html

Linux

     The default maximum segment size is 32 MB, which is only adequate 
for small PostgreSQL installations. However, the remaining defaults are 
quite generously sized, and usually do not require changes. The maximum 
shared memory segment size can be changed via the sysctl interface. For 
example, to allow 128 MB, and explicitly set the maximum total shared 
memory size to 2097152 pages (the default):

     $ sysctl -w kernel.shmmax=134217728
     $ sysctl -w kernel.shmall=2097152

     In addition these settings can be saved between reboots in 
/etc/sysctl.conf.

     Older distributions might not have the sysctl program, but 
equivalent changes can be made by manipulating the /proc file system:

     $ echo 134217728 >/proc/sys/kernel/shmmax
     $ echo 2097152 >/proc/sys/kernel/shmall


And from:
http://www.postgresql.org/docs/7.4/static/kernel-resources.html
Linux

     The default shared memory limit (both SHMMAX and SHMALL) is 32 MB 
in 2.2 kernels, but it can be changed in the proc file system (without 
reboot). For example, to allow 128 MB:

     $ echo 134217728 >/proc/sys/kernel/shmall
     $ echo 134217728 >/proc/sys/kernel/shmmax

     You could put these commands into a script run at boot-time.

     Alternatively, you can use sysctl, if available, to control these 
parameters. Look for a file called /etc/sysctl.conf and add lines like 
the following to it:

     kernel.shmall = 134217728
     kernel.shmmax = 134217728

     This file is usually processed at boot time, but sysctl can also be 
called explicitly later.

     Other parameters are sufficiently sized for any application. If you 
want to see for yourself look in /usr/src/linux/include/asm-xxx/shmpara 
m.h and /usr/src/linux/include/linux/sem.h.

Hope this helps,
   -Steve

> Otherwise it is difficult to watch with top or the system monitor
> because so far it has had to run a while (hours) before the crash occurs.
>
>
> I guess as a kludgy workaround I could try trapping the client error,
> wait, and skip (or try again). This should work for a single thread, but
> might pose problems for my multi-threaded app. That's the problem when
> the server dies - all client threads have trouble until it restarts.
>
>
> Richard Marsden
>
>
>
>
> On Mon, Feb 28, 2011 at 4:43 PM, Stephen Woodbridge
> <woodbri at swoodbridge.com <mailto:woodbri at swoodbridge.com>> wrote:
>
>     The only think(s) that I can think of are:
>
>     1. it could be caused by a call to abort() or assert() in the C
>     code, but:
>
>     woodbri at mappy:~/work/pgrouting-git/pgrouting$ find * -type f -exec
>     grep -l -i abort {} \;
>     woodbri at mappy:~/work/pgrouting-git/pgrouting$ find * -type f -exec
>     grep -l -i assert {} \;
>     core/src/CMakeFiles/routing.dir/depend.make
>     core/src/CMakeFiles/routing.dir/astar_boost_wrapper.o
>     core/src/CMakeFiles/routing.dir/shooting_star_boost_wrapper.o
>     core/src/CMakeFiles/routing.dir/depend.internal
>     core/src/CMakeFiles/routing.dir/boost_wrapper.o
>     core/src/CMakeFiles/routing.dir/CXX.includecache
>     lib/librouting.so
>
>     So it does not look like we have one in our source code, but there
>     appears to be references in the .o that might be referenced by
>     compiler generated code or includes outside our source tree like
>     boost or system libs.
>
>     2. I suppose it is possible that the server is sending a SIGABRT to
>     a child process that is doing something bad like taking too much
>     memory. Or maybe there is an OOM (Out Of Memory) watchdog process
>     killing it with a SIGABRT.
>
>     Have you watched this with top? or some other process watcher?
>
>     Hopefully, you can extract the SQL and run it from the command line
>     so we can get a better hand on what is happening and what the query is.
>
>     -Steve
>
>
>
>     On 2/28/2011 2:49 PM, Richard Marsden wrote:
>
>         Well I've moved forward and now have code in production calculating
>         mileages from OpenStreetMap data: I've calculated mileage charts for
>         Oceania and Africa.
>         The secret to get that far was to move operating systems from
>         Windows to
>         Ubuntu and then upgrade to pgRouting 1.05. (PostGres 8.4, Ubuntu 10)
>         Th computations are being performed with dijkstra_sp_delta.
>
>         However now I'm hitting another "server closed the connection
>         unexpectedly" error.
>
>         Looking in the server logs, I find the LOG message "server
>         process (PID
>         19133) was terminated by signal 6: Aborted"
>          >From what I can tell, Signal 6 on Ubuntu is indeed a SIGABRT.
>         There
>         are no other log messages to indicate why Postgres/pgRouting threw a
>         SIGABRT.
>
>         This is then followed by warnings and log messages saying other
>         active
>         server processes are being terminated, transactions rolled back,
>         etc.
>
>         This error occurs at a reproducible point in a fairly sophisticated
>         (multi-processor, Python, psycopg) script. Although I'm pretty
>         certain
>         of the SQL that is causing the problem, at the moment I don't
>         have the
>         exact parameters (ie. graph nodes). I'm about to run the script
>         single-threaded with diagnostics so I should be able to get a
>         single SQL
>         statement to reproduce the problem on a psql command line. In
>         the worst
>         case, this could take a couple of days.
>         No other programs are running that are calling Postgres.
>
>         My graph consists of the global OSM street data loaded into
>         PostGIS with
>         osm2po. I have checked for links of zero length. In fact all
>         links <1m
>         long have been taken out of the graph. I've just double checked
>         costs
>         and reverse_costs:  all are positive (I've set these to the lengths)
>
>         I've just checked for start & end nodes being the same (ie.
>         resulting in
>         dijkstra_sp_delta being called with the some node identifier for the
>         start and end): Yes my data has a few of these, but I'm pretty
>         certain
>         the crash occurs before they appear. However, I'm going to add
>         code to
>         detect these - there's no point in executing an SQL statement for
>         something that can be calculated in a trivial line of python.
>
>         What else should I be looking for? Are there any known problems
>         I should
>         look for? Is there any way of finding out what is causing the
>         Signal 6?
>
>         Once I have the node identifiers that are causing the problem, I
>         should
>         be able to make en exportable-extract of the graph to give a
>         reproducible dataset and matching SQL statement. Would anyone be
>         able to
>         investigate this?
>
>         Is there any way of making pgRouting / PostGres handle these
>         situations
>         more cleanly? At the moment, the crash is taking the server down
>         with
>         it. The crash is perhaps the first to occur after roughly 1 million
>         route calculations: I can live with that failure rate - but only
>         if my
>         scripts can cleanly detect and recover from it. I guess ideally the
>         server should stay up and a status value (or exception - but that
>         probably wouldn't work across so many code boundaries) be returned.
>
>
>
>         Best regards,
>
>
>         Richard Marsden
>
>
>
>
> _______________________________________________
> Pgrouting-users mailing list
> Pgrouting-users at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/pgrouting-users



More information about the Pgrouting-users mailing list