<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Simon,</p>
<p>did you try to update to the latest
<a class="moz-txt-link-freetext" href="https://github.com/google/flatbuffers/releases">https://github.com/google/flatbuffers/releases</a> to see if that
would solve the issue ? If that worked, that would be the best way
forward...</p>
<p>Otherwise if the issue persists with the latest flatbuffers
release, a (admitedly rather tedious) option would be to do a git
bisect on the flatbuffers code to identify the culprit commit.
With some luck, the root cause might be obvious if a single
culptrit commit can be exhibited (perhaps some subtle C++
undefined behaviour triggered? also it is a bit mysterious that it
hits only for static builds), or otherwise raise to the upstream
flatbuffers project to ask for their expertise</p>
<p>Even<br>
</p>
<div class="moz-cite-prefix">Le 23/02/2024 à 23:54, Simon Eves via
gdal-dev a écrit :<br>
</div>
<blockquote type="cite"
cite="mid:CAJf0KTSrA_Pod7EK3Cd4SYDZiHqdtXGUadWSGEEcx_G_NZn9eA@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">I was able to create a fork of 3.7.3 with just the <b>flatbuffers</b>
replaced with the pre-3.6.x version (2.0.0).
<div><br>
</div>
<div>This seemed to only require changes to the version asserts
and adding an <b>align</b> parameter to <b>Table::VerifyField()</b> to
match the newer API.
<div><br>
</div>
<div><a
href="https://github.com/heavyai/gdal/tree/simon.eves/release/3.7/downgrade_to_flatbuffers_2.0.0"
moz-do-not-send="true" class="moz-txt-link-freetext">https://github.com/heavyai/gdal/tree/simon.eves/release/3.7/downgrade_to_flatbuffers_2.0.0</a><br>
</div>
<div><br>
</div>
<div>Our system works correctly and passes all GDAL I/O tests
with that version. Obviously this isn't an ideal solution,
but this is otherwise a release blocker for us.</div>
<div><br>
</div>
<div>I would still very much like to discuss the original
problem more deeply, and hopefully come up with a better
solution.</div>
<div><br>
</div>
<div>Yours hopefully,</div>
<div><br>
</div>
<div>Simon</div>
<div><br>
</div>
<div><br>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Feb 22, 2024 at
10:22 PM Simon Eves <<a href="mailto:simon.eves@heavy.ai"
moz-do-not-send="true" class="moz-txt-link-freetext">simon.eves@heavy.ai</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">Thank you, Robert, for the RR tip. I shall try
it.
<div><br>
</div>
<div>I have new findings to report, however.</div>
<div><br>
</div>
<div>First of all, I confirmed that a build against GDAL
3.4.1 (the version we were on before) still works. I also
confirmed that builds against 3.7.3 and 3.8.4 still failed
even with no additional library dependencies (just sqlite3
and proj), in case it was a side-effect of us also adding
more of those. I then tried 3.5.3, with the CMake build
(same config as we use for 3.7.3) and that worked. I then
tried 3.6.4 (again, same CMake config) and that failed.
These were all from bundles.</div>
<div><br>
</div>
<div>I then started delving through the GDAL repo itself. I
found the common root commit of 3.5.3 and 3.6.4, and all
the commits in the <b>ogr/ogrsf_frmts/flatgeobuf</b> sub-project
between that one and the final of each. For 3.5.3, this
was only two. I built and tested both, and they were fine.
I then tried the very first one that was new in the 3.6.4
chain (not in the history of 3.5.3), which was actually a
bulk update to the <b>flatbuffers</b> sub-library,
committed by Bjorn Harrtell on May 8 2022 (SHA f7d8876).
That one had the issue. I then tried the
immediately-preceding commit (an unrelated docs change)
and that one was fine.</div>
<div><br>
</div>
<div>My current hypothesis, therefore, is that the <b>flatbuffers</b>
update introduced the issue, or at least, the
susceptibility of the issue.</div>
<div><br>
</div>
<div>I still cannot explain why it only occurs in an
all-static build, and even less able to explain why it
only occurs in our full system and not with the simple
test program against the very same static lib build that
does the very same sequence of GDAL API calls, but I
repeated the build tests of the commits either side and a
few other random ones a bit further away in each
direction, and the results were consistent. Again, it
happens with both GCC 11 and Clang 14 builds, Debug or
Release.<br>
</div>
<div><br>
</div>
<div>I will continue tomorrow to look at the actual changes
to <b>flatbuffers</b> in that update, although they are
quite significant. Certainly, the <b>vector_downward</b>
class, which is directly involved, was a new file in that
update (although on inspection of that file's history in
the <b>google/flatbuffers</b> repo, it seems it was just
split out of another header).</div>
<div><br>
</div>
<div>Bjorn, I don't mean to call you out directly, but I am
CC'ing you to ensure you see this, as you appear to be a
significant contributor to the <b>flatbuffers</b> project
itself. Any insight you may have would be very welcome. I
am of course happy to describe my debugging findings in
more detail, privately if you wish, rather than spamming
the list.</div>
<div><br>
</div>
<div>Simon</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, Feb 20, 2024 at
1:49 PM Robert Coup <<a
href="mailto:robert.coup@koordinates.com"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">robert.coup@koordinates.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div dir="ltr">Hi,</div>
<div dir="ltr"><br>
</div>
<div dir="ltr">On Tue, 20 Feb 2024 at 21:44, Robert Coup
<<a href="mailto:robert.coup@koordinates.com"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">robert.coup@koordinates.com</a>>
wrote:<br>
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>Hi Simon,</div>
<div><br>
</div>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, 20 Feb
2024 at 21:11, Simon Eves <<a
href="mailto:simon.eves@heavy.ai"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">simon.eves@heavy.ai</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">Here's the stack trace for the
original assert. Something is stepping on
scratch_ to make it 0x1000000000 instead of
null, which it starts out as when the
flatbuffer object is created, but by the
time it gets to allocating memory, it's
broken.</div>
</blockquote>
<div><br>
</div>
What happens if you set a watchpoint in gdb
when the flatbuffer is created?
<div><br>
</div>
<div><span style="color:rgb(0,0,0)"><font
face="monospace">watch -l myfb->scratch</font></span></div>
<div><span style="color:rgb(0,0,0)">or </span><span
style="color:rgb(0,0,0);font-family:monospace">watch *0x1234c0ffee</span></div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div dir="ltr">Or I've also had success with Mozilla's
rr: <a href="https://rr-project.org/"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">https://rr-project.org/</a>
— you can run to a point where scratch is wrong, set
a watchpoint on it, and then run the program
backwards to find out what touched it.</div>
<div dir="ltr"><br>
</div>
<div>Rob :)</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
<br>
<fieldset class="moz-mime-attachment-header"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
gdal-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a>
<a class="moz-txt-link-freetext" href="https://lists.osgeo.org/mailman/listinfo/gdal-dev">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a>
</pre>
</blockquote>
<pre class="moz-signature" cols="72">--
<a class="moz-txt-link-freetext" href="http://www.spatialys.com">http://www.spatialys.com</a>
My software is free, but my time generally not.</pre>
</body>
</html>