[gdal-dev] Assert due to stack corruption in FlatGeoBuf export
Simon Eves
simon.eves at heavy.ai
Tue Feb 20 13:10:51 PST 2024
Here's the stack trace for the original assert. Something is stepping on
scratch_ to make it 0x1000000000 instead of null, which it starts out as
when the flatbuffer object is created, but by the time it gets to
allocating memory, it's broken.
On Tue, Feb 20, 2024 at 1:05 PM Simon Eves <simon.eves at heavy.ai> wrote:
> (starting a new thread to avoid derailing the static-build one any further)
>
> Totally agreed on the mismatch idea, but the code in question is all
> self-contained down in *ogr/ogrsf_frmts/flatgeobuf* and the *flatbuffers*
> sub-project (which is a snapshot of a Google OSS project) so I'm struggling
> to see how there could be a mismatch.
>
> Also, although we're building on CentOS 7, we're using relatively new
> compilers (GCC 11.4 and Clang 14.0.6), and we bundle the matching newer
> runtimes.
>
> We don't have a full static build stack on our normal dev platform (Ubuntu
> 22.04) so I haven't been able to repro the problem there.
>
> I should have mentioned the first time that we have tried using ASAN, and
> it definitely catches something wrong, but the behavior is different, and
> varies if you add more debug printfs. For example:
>
> DEBUG: vector_downward::push() num = 16
> DEBUG: about to reallocate, buf_ = 0, cur_ = 0, scratch = 0
> DEBUG: reallocated, buf_ = 0x61900062d380, cur_ = 0x61900062cf80, scratch
> = 0
> DEBUG: vector_downward::push() ptr = 0x61900062cf70, about to do memcpy
> =================================================================
> ==25459==ERROR: AddressSanitizer: heap-buffer-overflow on address
> 0x61900062cf70 at pc 0x7f8933eb87f6 bp 0x7fffa7aa0e70 sp 0x7fffa7aa0620
> WRITE of size 16 at 0x61900062cf70 thread T0
>
> ...but it's still not obvious what exactly is going wrong. The code and
> data flow makes perfect sense when you step through it in a dynamic build
> that doesn't fail.
>
> Like I said, the frustrating part is that a simple test program (attached)
> compiled against the same set of static libs works fine.
>
> S
>
> On Tue, Feb 20, 2024 at 12:33 PM Robert Coup <robert.coup at koordinates.com>
> wrote:
>
>> Hi Simon,
>>
>> On Tue, 20 Feb 2024 at 18:58, Simon Eves via gdal-dev <
>> gdal-dev at lists.osgeo.org> wrote:
>>
>>> We still have one VERY strange issue whereby FlatGeoBuf export fails in
>>> a very consistent and reproducible form down in the flatbuffer code, but
>>> only in the static build, and only in the full system. I have written a
>>> simple test harness that links the very same static libgdal and does a
>>> simple GDAL startup and FGB export of a single feature and that works fine.
>>> It's some kind of data/stack corruption when it first tries to write to the
>>> flatbuffer on the first feature, which results in a pointer member of the
>>> buffer class becoming 0x100000000000 (always) instead of null, and then it
>>> stops on an assert. There is also one private function in the
>>> vector_downward class which the debugger won't even step into in that
>>> build. I can even put printfs in that function and they don't come out.
>>> I've tried it on CentOS and on Ubuntu, with GCC and Clang, and it's always
>>> the same. Everything else in GDAL works just fine (we have LOTS of
>>> import/export unit tests). This makes zero sense as all the FGB code is
>>> internal to GDAL and compiled together. I've been poking at it for over a
>>> week and it's doing my head in.
>>>
>>
>> One cause of this sort of crash is a header/library mismatch somewhere
>> where a function is expecting different parameters/types than the caller is
>> actually providing. Otherwise, maybe a bug in glibc/libstdc++/gcc/something
>> that's been fixed in the intervening ten years since CentOS 7 was released?
>>
>>
>> If you run your *build* on a modern distro/libc/gcc/etc does it change
>> things? If it's the same, maybe hints more towards the former.
>>
>> ASAN (https://github.com/google/sanitizers/wiki/AddressSanitizer) might
>> help track down stack/heap corruption.
>>
>> Rob :)
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240220/9b1457f5/attachment-0001.htm>
-------------- next part --------------
Thread 1 "ImportExportTes" received signal SIGABRT, Aborted.
__GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) where
#0 __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff5ec6859 in __GI_abort () at abort.c:79
#2 0x00007ffff5ec6729 in __assert_fail_base (fmt=0x7ffff605c588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x13bf5a98 "cur_ >= scratch_ && scratch_ >= buf_",
file=0x13bf5a40 "/build/scripts/gdal-3.7.3/ogr/ogrsf_frmts/flatgeobuf/flatbuffers/vector_downward.h", line=136, function=<optimized out>) at assert.c:92
#3 0x00007ffff5ed7fd6 in __GI___assert_fail (assertion=0x13bf5a98 "cur_ >= scratch_ && scratch_ >= buf_",
file=0x13bf5a40 "/build/scripts/gdal-3.7.3/ogr/ogrsf_frmts/flatgeobuf/flatbuffers/vector_downward.h", line=136,
function=0x13bf5a00 "size_t flatbuffers::vector_downward::ensure_space(size_t)") at assert.c:101
#4 0x000000000a240bec in flatbuffers::vector_downward::ensure_space (this=0x7fffffff9070, len=16) at /build/scripts/gdal-3.7.3/ogr/ogrsf_frmts/flatgeobuf/flatbuffers/vector_downward.h:136
#5 0x000000000a240c8c in flatbuffers::vector_downward::make_space (this=0x7fffffff9070, len=16) at /build/scripts/gdal-3.7.3/ogr/ogrsf_frmts/flatgeobuf/flatbuffers/vector_downward.h:146
#6 0x000000000a240e38 in flatbuffers::vector_downward::push (this=0x7fffffff9070, bytes=0x1bfc2a70 "", num=16)
at /build/scripts/gdal-3.7.3/ogr/ogrsf_frmts/flatgeobuf/flatbuffers/vector_downward.h:182
#7 0x000000000a241153 in flatbuffers::FlatBufferBuilder::PushBytes (this=0x7fffffff9070, bytes=0x1bfc2a70 "", size=16)
at /build/scripts/gdal-3.7.3/ogr/ogrsf_frmts/flatgeobuf/flatbuffers/flatbuffer_builder.h:262
#8 0x000000000a245f50 in flatbuffers::FlatBufferBuilder::CreateVector<double> (this=0x7fffffff9070, v=0x1bfc2a70, len=2)
at /build/scripts/gdal-3.7.3/ogr/ogrsf_frmts/flatgeobuf/flatbuffers/flatbuffer_builder.h:634
#9 0x000000000a2445c5 in flatbuffers::FlatBufferBuilder::CreateVector<double, std::allocator<double> > (this=0x7fffffff9070, v=...)
at /build/scripts/gdal-3.7.3/ogr/ogrsf_frmts/flatgeobuf/flatbuffers/flatbuffer_builder.h:684
#10 0x000000000a259152 in FlatGeobuf::CreateGeometryDirect (_fbb=..., ends=0x0, xy=0x7fffffff8f88, z=0x0, m=0x0, t=0x0, tm=0x0, type=FlatGeobuf::GeometryType::Unknown, parts=0x0)
at /build/scripts/gdal-3.7.3/ogr/ogrsf_frmts/flatgeobuf/feature_generated.h:182
#11 0x000000000a258d44 in ogr_flatgeobuf::GeometryWriter::write (this=0x7fffffff8f70, depth=0) at /build/scripts/gdal-3.7.3/ogr/ogrsf_frmts/flatgeobuf/geometrywriter.cpp:243
#12 0x000000000a23b51d in OGRFlatGeobufLayer::ICreateFeature (this=0x1bd2a160, poNewFeature=0x1b9b94d0) at /build/scripts/gdal-3.7.3/ogr/ogrsf_frmts/flatgeobuf/ogrflatgeobuflayer.cpp:2159
#13 0x000000000a2d799b in OGRLayer::CreateFeature (this=0x1bd2a160, poFeature=0x1b9b94d0) at /build/scripts/gdal-3.7.3/ogr/ogrsf_frmts/generic/ogrlayer.cpp:642
#14 0x0000000006d9ae95 in import_export::QueryExporterGDAL::exportResults (this=0x1c9c8d70, query_results=...)
at /home/simon.eves/work/master/heavydb-internal/ImportExport/QueryExporterGDAL.cpp:716
#15 0x0000000006bde00c in Parser::ExportQueryStmt::execute (this=0x7fffffff9d30, session=..., read_only_mode=false) at /home/simon.eves/work/master/heavydb-internal/Parser/ParserNode.cpp:6601
More information about the gdal-dev
mailing list