[gdal-dev] Assert due to stack corruption in FlatGeoBuf export
Simon Eves
simon.eves at heavy.ai
Sat Feb 24 17:27:44 PST 2024
OK, so I tried a custom build of 3.7.3 with the latest *flatbuffers*
(23.5.26), which was a drop-in replacement for 2.0.6 other than the version
asserts.
This does not exhibit the original problem either.
However, while it produces files which the stock static build, the static
build with the older *flatbuffers* (2.0.0), and the Ubuntu dynamic build,
can all read just fine, it is unable to read ANY files back in again (in
the context of our server geo importer, anyway).
GDAL throws a *CE_Failure* of *Header failed consistency verification
(1), *which
is from *OGRFlatGeobufLayer::Open(),* and the dataset reports no layers (or
at least, no vector layers).
This also appears to be a side-effect of it being a static build, as
*ogrinfo* built from the same source (with *flatbuffers* 2.0.0), but in
regular shared libs mode, can read all three files just fine. I have been
unable to achieve a full-static tools build, so I can't try that right now.
This either means that the problem is still there in some form in the
latest *flatbuffers*, but has moved, or that the higher-level FGB file
schema verification can be affected by the *flatbuffers* version. Both are
equally concerning.
Anyway, the build with the older *flatbuffers* 2.0.0 extracted from the
v3.5.3 tree (with the *Table::VerifyField* mod) seems to work fine in all
ways, so we're probably gonna go with that, in the absence of anything else.
One other weirdness is that, of the three files, the two produced by the
static builds (*flatbuffers* 2.0.0 and *flatbuffers* 23.5.26) are 16 bytes
longer than the one from the Ubuntu dynamic build. All three read just fine
with *ogrinfo* and our server geo importer, and result in the same table.
Here is a link to a bundle with all three files plus the GeoJSON equivalent
(*MULTIPOLYGON* US states with some metadata).
https://drive.google.com/file/d/1ETRuV63gvUL4aNAT_4KvjrtK1uiCrFun/view?usp=sharing
As ever, happy to get into the weeds with more details of the original
problem, but pretty sure that 95% of the readers of this list don't want
this thread to get any longer! :)
Simon
On Fri, Feb 23, 2024 at 3:46 PM Simon Eves <simon.eves at heavy.ai> wrote:
> Our emails crossed. I am indeed testing with the latest flatbuffers now
> too.
>
> Agreed on the rest.
>
> On Fri, Feb 23, 2024 at 3:42 PM Even Rouault <even.rouault at spatialys.com>
> wrote:
>
>> Simon,
>>
>> did you try to update to the latest
>> https://github.com/google/flatbuffers/releases to see if that would
>> solve the issue ? If that worked, that would be the best way forward...
>>
>> Otherwise if the issue persists with the latest flatbuffers release, a
>> (admitedly rather tedious) option would be to do a git bisect on the
>> flatbuffers code to identify the culprit commit. With some luck, the root
>> cause might be obvious if a single culptrit commit can be exhibited
>> (perhaps some subtle C++ undefined behaviour triggered? also it is a bit
>> mysterious that it hits only for static builds), or otherwise raise to the
>> upstream flatbuffers project to ask for their expertise
>>
>> Even
>> Le 23/02/2024 à 23:54, Simon Eves via gdal-dev a écrit :
>>
>> I was able to create a fork of 3.7.3 with just the *flatbuffers*
>> replaced with the pre-3.6.x version (2.0.0).
>>
>> This seemed to only require changes to the version asserts and adding an
>> *align* parameter to *Table::VerifyField()* to match the newer API.
>>
>>
>> https://github.com/heavyai/gdal/tree/simon.eves/release/3.7/downgrade_to_flatbuffers_2.0.0
>>
>> Our system works correctly and passes all GDAL I/O tests with that
>> version. Obviously this isn't an ideal solution, but this is otherwise a
>> release blocker for us.
>>
>> I would still very much like to discuss the original problem more deeply,
>> and hopefully come up with a better solution.
>>
>> Yours hopefully,
>>
>> Simon
>>
>>
>>
>> On Thu, Feb 22, 2024 at 10:22 PM Simon Eves <simon.eves at heavy.ai> wrote:
>>
>>> Thank you, Robert, for the RR tip. I shall try it.
>>>
>>> I have new findings to report, however.
>>>
>>> First of all, I confirmed that a build against GDAL 3.4.1 (the version
>>> we were on before) still works. I also confirmed that builds against 3.7.3
>>> and 3.8.4 still failed even with no additional library dependencies (just
>>> sqlite3 and proj), in case it was a side-effect of us also adding more of
>>> those. I then tried 3.5.3, with the CMake build (same config as we use for
>>> 3.7.3) and that worked. I then tried 3.6.4 (again, same CMake config) and
>>> that failed. These were all from bundles.
>>>
>>> I then started delving through the GDAL repo itself. I found the common
>>> root commit of 3.5.3 and 3.6.4, and all the commits in the
>>> *ogr/ogrsf_frmts/flatgeobuf* sub-project between that one and the final
>>> of each. For 3.5.3, this was only two. I built and tested both, and they
>>> were fine. I then tried the very first one that was new in the 3.6.4 chain
>>> (not in the history of 3.5.3), which was actually a bulk update to the
>>> *flatbuffers* sub-library, committed by Bjorn Harrtell on May 8 2022
>>> (SHA f7d8876). That one had the issue. I then tried the
>>> immediately-preceding commit (an unrelated docs change) and that one was
>>> fine.
>>>
>>> My current hypothesis, therefore, is that the *flatbuffers* update
>>> introduced the issue, or at least, the susceptibility of the issue.
>>>
>>> I still cannot explain why it only occurs in an all-static build, and
>>> even less able to explain why it only occurs in our full system and not
>>> with the simple test program against the very same static lib build that
>>> does the very same sequence of GDAL API calls, but I repeated the build
>>> tests of the commits either side and a few other random ones a bit further
>>> away in each direction, and the results were consistent. Again, it happens
>>> with both GCC 11 and Clang 14 builds, Debug or Release.
>>>
>>> I will continue tomorrow to look at the actual changes to *flatbuffers* in
>>> that update, although they are quite significant. Certainly, the
>>> *vector_downward* class, which is directly involved, was a new file in
>>> that update (although on inspection of that file's history in the
>>> *google/flatbuffers* repo, it seems it was just split out of another
>>> header).
>>>
>>> Bjorn, I don't mean to call you out directly, but I am CC'ing you to
>>> ensure you see this, as you appear to be a significant contributor to the
>>> *flatbuffers* project itself. Any insight you may have would be very
>>> welcome. I am of course happy to describe my debugging findings in more
>>> detail, privately if you wish, rather than spamming the list.
>>>
>>> Simon
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Feb 20, 2024 at 1:49 PM Robert Coup <robert.coup at koordinates.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> On Tue, 20 Feb 2024 at 21:44, Robert Coup <robert.coup at koordinates.com>
>>>> wrote:
>>>>
>>>>> Hi Simon,
>>>>>
>>>>> On Tue, 20 Feb 2024 at 21:11, Simon Eves <simon.eves at heavy.ai> wrote:
>>>>>
>>>>>> Here's the stack trace for the original assert. Something is stepping
>>>>>> on scratch_ to make it 0x1000000000 instead of null, which it starts out as
>>>>>> when the flatbuffer object is created, but by the time it gets to
>>>>>> allocating memory, it's broken.
>>>>>>
>>>>>
>>>>> What happens if you set a watchpoint in gdb when the flatbuffer is
>>>>> created?
>>>>>
>>>>> watch -l myfb->scratch
>>>>> or watch *0x1234c0ffee
>>>>>
>>>>
>>>> Or I've also had success with Mozilla's rr: https://rr-project.org/ —
>>>> you can run to a point where scratch is wrong, set a watchpoint on it, and
>>>> then run the program backwards to find out what touched it.
>>>>
>>>> Rob :)
>>>>
>>>
>> _______________________________________________
>> gdal-dev mailing listgdal-dev at lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>> -- http://www.spatialys.com
>> My software is free, but my time generally not.
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240224/3a03d7c1/attachment.htm>
More information about the gdal-dev
mailing list