[postgis-devel] [pgrouting] need help with std::bad_alloc issue

Stephen Woodbridge woodbri at swoodbridge.com
Wed Jun 12 11:40:01 PDT 2013


On 6/12/2013 5:20 AM, Sandro Santilli wrote:
> On Tue, Jun 11, 2013 at 11:32:47AM -0400, Stephen Woodbridge wrote:
>> On 6/11/2013 10:58 AM, Bborie Park wrote:
>>> Steve,
>>>
>>> On what platform? Windows? Linux?
>>
>> Linux, pg 9.2.4
>>>
>>> By the looks of it (I'm not very good at C++), std::bad_alloc comes
>> >from a failed new allocation. Run gdb or valgrind yet?
>>
>> Yes, gdb is not vary useful because the postgresql is compiled with
>> -PIE and gdb does not support that well. valgrind is my friend I
>> have run it before but it does not report anything useful in this
>> case.
>>
>> I'm not great with C++ either, but I'm stuck on that fact that this
>> seems like C++ does not know how much memory is available so it
>> fails but after connecting to the database it seems to have a better
>> idea.
>
> Did you look at the system memory state while testing ?
> I suspect you're just not releasing memory associated with results
> of queries run in previous connection, so that re-connection releases
> them all for you, or something like that. Alternatively there might be
> a memory leak in some database-side functions so that re-connecting
> quits the old backend and releases all leaked memory with it.
> Valgrind doesn't help because the memory isn't really lost, but rather
> hold by the postgresql backend pool (released on reconnect).
>
> As you said, bad_alloc doesn't come from palloc, but who's keeping
> all the system memory busy is still not known at this stage, so it
> could be either sides. To be frank I think it's more likely for it
> to be in pgrouting code itself, what do you think ?

So I go back to the simplest case that I can run to reproduce this:

1. createdb
2. connect and create postgis and pgrouting extensions
3. create a simple small table
4. run the query, get the error
5. \c to reconnect in psql
6. run same query it works

So new database, new connection, minimal work in session. Nothing that 
is requesting a huge amount of memory. This happens consistently 
regardless of server load or if I restart postgresql.

A slight variation in the above sequence:

1. createdb
2. connect and create postgis and pgrouting extensions
3. \c to reconnect in psql
4. create a simple small table
5. run the query, get NO error

You ask: What do I think?

This is harder to say. The symptoms all point to something systemic 
happening. That is not to say that pgrouting is not triggering the 
problem in some way, it is just not obvious.

There is a big difference in the structure of pgrouting functions over 
the postgis functions in that almost all of our function are SRF, set 
returning function, and most all of our functions use the SPI facility 
to run queries that fetch data from the database. I know we have to be 
careful to make sure that we do not try to hold data palloc's during SPI 
across the multiple SRF calls. At one point I review our code to make 
sure we did not do that.

I have reviewed most (all?) of the code and made changes to a lot of the 
C++ code (scary because I'm not a C++ programmer) to do things like wrap 
all C++ functions called from C with try-catch block to report errors 
rather than crash the server, I have changed a bunch of the std::vector 
allocators to reserve needed memory if I know up front what they will 
need. This has helped resolve a bunch of std::bad_alloc errors, but all 
of these were reproducible regardless of the reconnection.

I found some code that I can call that will tell me the status of system 
memory usage, that I think I will add to the catch block and see if I 
can see what that is reporting. I have run this problem against valgrind 
without getting anything useful. And I have added debug output to trace 
down the specific line generating the error and that was not illuminating.

I hate problems like this.

Thank you for your thoughts and comments. I'll keep plugging away at this.

-Steve

> --strk;
>
>>
>> We use a lot of std::vector structures, these are arrays that get
>> dynamically extended and as a result have a lot of realloc
>> equovalents on them the causes memory fragmentation, can where
>> possible I have changed the code to reserve a minimum size which has
>> helped a lot, but this kind of issue is consistently reproduced
>> regardless of reconnecting.
>>
>> I'll take another run at it with valgrind, but I'm pretty sure this
>> will not show anything new.
>>
>> Thanks,
>>    -Steve
>>
>>> -bborie
>>>
>>> On Tue, Jun 11, 2013 at 7:33 AM, Stephen Woodbridge
>>> <woodbri at swoodbridge.com> wrote:
>>>> Hi devs,
>>>>
>>>> I have run into a strange problem with some pgrouting functions that you
>>>> guys might have already seen in postgis.
>>>>
>>>> We have stored procedures that are C and C++ and in general they work fine,
>>>> but if I create a database, connect to it, create extensions and run some
>>>> commands I get std::bad_alloc error. If I simply reconnect to the database,
>>>> the same command does not generate an error, and, in fact, if I reconnect
>>>> after installing the extension, I never get this error.
>>>>
>>>> I have traced this down to the particular statement that is throwing the
>>>> error, but there is nothing unique or particular about it. And we have seen
>>>> this behavior in multiple commands.
>>>>
>>>> So I have to conclude that:
>>>>
>>>> 1. we use the same pattern for most of our commands so that might be flawed
>>>> in some basic way regarding memory
>>>>
>>>> 2. that there  is something strange about create extension in that the
>>>> libraries are not getting initialized correctly (or completely?) until a
>>>> connection is made. We are trying to verify this is or is not unique to pg
>>>> 9.2.
>>>>
>>>> 3. pgrouting installs multiple shared libraries in our extension and maybe
>>>> postgresql assumes there is only going to be one shared library
>>>>
>>>> 4. or something else totally different that we are missing
>>>>
>>>> So has anyone seen anything like this with postgis code?
>>>> Any thoughts on what this might be? or how to run it down?
>>>>
>>>> I did post post a inquiry to the postgresql list and Tom responded with not
>>>> enough information and to compile the server with --enable-cassert which I
>>>> did (assuming my Debian recompile worked correctly), but since this is a C++
>>>> error and not a postgresql palloc issue we have not seen and cassert errors.
>>>>
>>>> Thoughts?
>>>>
>>>> -Steve
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
>




More information about the postgis-devel mailing list