<div dir="ltr">I was able to create a fork of 3.7.3 with just the <b>flatbuffers</b> replaced with the pre-3.6.x version (2.0.0).<div><br></div><div>This seemed to only require changes to the version asserts and adding an <b>align</b> parameter to <b>Table::VerifyField()</b> to match the newer API.<div><br></div><div><a href="https://github.com/heavyai/gdal/tree/simon.eves/release/3.7/downgrade_to_flatbuffers_2.0.0">https://github.com/heavyai/gdal/tree/simon.eves/release/3.7/downgrade_to_flatbuffers_2.0.0</a><br></div><div><br></div><div>Our system works correctly and passes all GDAL I/O tests with that version. Obviously this isn't an ideal solution, but this is otherwise a release blocker for us.</div><div><br></div><div>I would still very much like to discuss the original problem more deeply, and hopefully come up with a better solution.</div><div><br></div><div>Yours hopefully,</div><div><br></div><div>Simon</div><div><br></div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Feb 22, 2024 at 10:22 PM Simon Eves <<a href="mailto:simon.eves@heavy.ai">simon.eves@heavy.ai</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Thank you, Robert, for the RR tip. I shall try it.<div><br></div><div>I have new findings to report, however.</div><div><br></div><div>First of all, I confirmed that a build against GDAL 3.4.1 (the version we were on before) still works. I also confirmed that builds against 3.7.3 and 3.8.4 still failed even with no additional library dependencies (just sqlite3 and proj), in case it was a side-effect of us also adding more of those. I then tried 3.5.3, with the CMake build (same config as we use for 3.7.3) and that worked. I then tried 3.6.4 (again, same CMake config) and that failed. These were all from bundles.</div><div><br></div><div>I then started delving through the GDAL repo itself. I found the common root commit of 3.5.3 and 3.6.4, and all the commits in the <b>ogr/ogrsf_frmts/flatgeobuf</b> sub-project between that one and the final of each. For 3.5.3, this was only two. I built and tested both, and they were fine. I then tried the very first one that was new in the 3.6.4 chain (not in the history of 3.5.3), which was actually a bulk update to the <b>flatbuffers</b> sub-library, committed by Bjorn Harrtell on May 8 2022 (SHA f7d8876). That one had the issue. I then tried the immediately-preceding commit (an unrelated docs change) and that one was fine.</div><div><br></div><div>My current hypothesis, therefore, is that the <b>flatbuffers</b> update introduced the issue, or at least, the susceptibility of the issue.</div><div><br></div><div>I still cannot explain why it only occurs in an all-static build, and even less able to explain why it only occurs in our full system and not with the simple test program against the very same static lib build that does the very same sequence of GDAL API calls, but I repeated the build tests of the commits either side and a few other random ones a bit further away in each direction, and the results were consistent. Again, it happens with both GCC 11 and Clang 14 builds, Debug or Release.<br></div><div><br></div><div>I will continue tomorrow to look at the actual changes to <b>flatbuffers</b> in that update, although they are quite significant. Certainly, the <b>vector_downward</b> class, which is directly involved, was a new file in that update (although on inspection of that file's history in the <b>google/flatbuffers</b> repo, it seems it was just split out of another header).</div><div><br></div><div>Bjorn, I don't mean to call you out directly, but I am CC'ing you to ensure you see this, as you appear to be a significant contributor to the <b>flatbuffers</b> project itself. Any insight you may have would be very welcome. I am of course happy to describe my debugging findings in more detail, privately if you wish, rather than spamming the list.</div><div><br></div><div>Simon</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Feb 20, 2024 at 1:49 PM Robert Coup <<a href="mailto:robert.coup@koordinates.com" target="_blank">robert.coup@koordinates.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">Hi,</div><div dir="ltr"><br></div><div dir="ltr">On Tue, 20 Feb 2024 at 21:44, Robert Coup <<a href="mailto:robert.coup@koordinates.com" target="_blank">robert.coup@koordinates.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi Simon,</div><div><br></div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 20 Feb 2024 at 21:11, Simon Eves <<a href="mailto:simon.eves@heavy.ai" target="_blank">simon.eves@heavy.ai</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Here's the stack trace for the original assert. Something is stepping on scratch_ to make it 0x1000000000 instead of null, which it starts out as when the flatbuffer object is created, but by the time it gets to allocating memory, it's broken.</div></blockquote><div><br></div>What happens if you set a watchpoint in gdb when the flatbuffer is created?<div><br></div><div><span style="color:rgb(0,0,0)"><font face="monospace">watch -l myfb->scratch</font></span></div><div><span style="color:rgb(0,0,0)">or </span><span style="color:rgb(0,0,0);font-family:monospace">watch *0x1234c0ffee</span></div></div></div></blockquote><div><br></div><div dir="ltr">Or I've also had success with Mozilla's rr: <a href="https://rr-project.org/" target="_blank">https://rr-project.org/</a> — you can run to a point where scratch is wrong, set a watchpoint on it, and then run the program backwards to find out what touched it.</div><div dir="ltr"><br></div><div>Rob :)</div></div></div>
</blockquote></div>
</blockquote></div>