[gdal-dev] core dump on dir info

Even Rouault even.rouault at spatialys.com
Sun Feb 4 03:55:21 PST 2024


ok, so I believe this is the AVX2 issue I was talking about, as I 
realize that enabling AVX2 is the default mode when TileDB is built from 
source (which the Docker image does), and must be explicitly disabled 
with "./bootstrap --disable-avx2" (I've just changed the build recipe to 
include that, will take effect next time the images are refreshed)

To confirm, can you send or just check the output of : cat 
/proc/cpuinfo|grep sse|head -n 1

If there is no "avx2" in it, this is at 99.9% the reason of the issue.

Even

Le 04/02/2024 à 06:20, Michael Sumner a écrit :
> skipping TileDB does fix:
>
> ogr2ogr /tmp/newdir 
> https://github.com/SymbolixAU/geojsonsf/raw/master/inst/examples/geo_melbourne.geojson -f 
> "ESRI Shapefile"
> export GDAL_SKIP=TileDB
> ogrinfo /tmp/newdir/
> INFO: Open of `/tmp/newdir/'
>       using driver `ESRI Shapefile' successful.
> 1: geo_melbourne (Polygon)
>
> unset GDAL_SKIP
> ogrinfo /tmp/newdir/
> Illegal instruction (core dumped)
>
> I failed to explain that I'm using gdal containers from the repo:
>
> docker run --rm -ti ghcr.io/osgeo/gdal:ubunt 
> <http://ghcr.io/osgeo/gdal:ubunt>
>
> u-full-latest
>
> apt update
> apt install -y gdb
>
> Here's the output of under gdb as you suggested, there was a lot so I 
> put it on a gist: 
> https://gist.github.com/mdsumner/839ae6e05ededf640b65bfee3a20a4c0
>
> gdb --args ogrinfo /tmp/newdir/
> > run
> > thread apply all bt
>
> Thanks!
>
>
>
>
>
> On Sat, Feb 3, 2024 at 7:49 PM Even Rouault 
> <even.rouault at spatialys.com> wrote:
>
>     - When it crashes under gdb, type "thread apply all bt" to get the
>     stack trace of all threads
>
>     - I suspect there is a connection with
>     https://github.com/OSGeo/gdal/pull/9170 , but that pull request
>     wouldn't help here as "/tmp/newdir" could be a valid connection to
>     TileDB
>
>     - how did you get TileDB installed? It looks to be packaged? Which
>     distribution do you use?
>
>     - SIGILL reminds me of issues with some TileDB builds using the
>     AVX2 instruction set by default, which could cause some crash on
>     host CPUs that don't have AVX2 (unlikely on recent hardware though)
>
>     - Setting GDAL_SKIP=TileDB should be a workaround
>
>
>     Le 03/02/2024 à 07:15, Michael Sumner a écrit :
>>     Thanks Even, so there's something about tiledb under gdb (or
>>     maybe I am mangling the context,  I will try variants of the host
>>     I'm using). Run with valgrind included below.
>>
>>     gdb --args ogrinfo /tmp/newdir/
>>     ...
>>     (gdb) run
>>     Starting program: /usr/local/bin/ogrinfo /tmp/newdir/
>>     [Thread debugging using libthread_db enabled]
>>     Using host libthread_db library
>>     "/lib/x86_64-linux-gnu/libthread_db.so.1".
>>     [New Thread 0x7fffe7757640 (LWP 988)]
>>     [New Thread 0x7fffe6f56640 (LWP 989)]
>>     [New Thread 0x7fffde755640 (LWP 990)]
>>     [New Thread 0x7fffd5f54640 (LWP 991)]
>>     [New Thread 0x7fffc5753640 (LWP 992)]
>>     [New Thread 0x7fffc4f52640 (LWP 993)]
>>     [New Thread 0x7fffb4751640 (LWP 994)]
>>     [New Thread 0x7fffabf50640 (LWP 995)]
>>     [New Thread 0x7fffab74f640 (LWP 996)]
>>     [New Thread 0x7fffa2f4e640 (LWP 997)]
>>     [New Thread 0x7fff9a74d640 (LWP 998)]
>>     [New Thread 0x7fff91f4c640 (LWP 999)]
>>     [New Thread 0x7fff8974b640 (LWP 1000)]
>>     [New Thread 0x7fff78f4a640 (LWP 1001)]
>>     [New Thread 0x7fff78749640 (LWP 1002)]
>>     [New Thread 0x7fff6f5ff640 (LWP 1003)]
>>
>>     Thread 1 "ogrinfo" received signal SIGILL, Illegal instruction.
>>     0x00007ffff3773c9e in
>>     tiledb::common::ThreadPool::ThreadPool(unsigned long) () from
>>     /lib/x86_64-linux-gnu/libtiledb.so.2.16
>>
>>
>>
>>
>>     valgrind -s ogrinfo /tmp/newdir
>>     ==704== Memcheck, a memory error detector
>>     ==704== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward
>>     et al.
>>     ==704== Using Valgrind-3.18.1 and LibVEX; rerun with -h for
>>     copyright info
>>     ==704== Command: ogrinfo /tmp/newdir
>>     ==704==
>>     INFO: Open of `/tmp/newdir'
>>           using driver `ESRI Shapefile' successful.
>>     1: geo_melbourne (Polygon)
>>     ==704==
>>     ==704== HEAP SUMMARY:
>>     ==704==     in use at exit: 25,486 bytes in 216 blocks
>>     ==704==   total heap usage: 15,761 allocs, 15,545 frees,
>>     2,390,169 bytes allocated
>>     ==704==
>>     ==704== LEAK SUMMARY:
>>     ==704==    definitely lost: 0 bytes in 0 blocks
>>     ==704==    indirectly lost: 0 bytes in 0 blocks
>>     ==704==      possibly lost: 544 bytes in 1 blocks
>>     ==704==    still reachable: 24,942 bytes in 215 blocks
>>     ==704==         suppressed: 0 bytes in 0 blocks
>>     ==704== Rerun with --leak-check=full to see details of leaked memory
>>     ==704==
>>     ==704== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0
>>     from 0)
>>
>>
>>     ogrinfo /tmp/newdir
>>     Illegal instruction (core dumped)
>>
>>     Cheers, Mike
>>
>>
>>
>>
>>     On Sat, Feb 3, 2024 at 12:46 PM Even Rouault
>>     <even.rouault at spatialys.com> wrote:
>>
>>         Michael,
>>
>>         I'm wondering if there not might be something wrong with your
>>         build or runtime environment. Or there's something subtle,
>>         because that works fine for me with my dev build or in the
>>         ghcr.io/osgeo/gdal:alpine-normal-3.8.3
>>         <http://ghcr.io/osgeo/gdal:alpine-normal-3.8.3> Docker image
>>
>>         Try running "valgrind ogrinfo /tmp/newdir/" or "gdb --args
>>         ogrinfo /tmp/newdir/" (type "run") to get more useful information
>>
>>         Even
>>
>>         Le 03/02/2024 à 02:35, Michael Sumner via gdal-dev a écrit :
>>>         I'm getting Illegal instruction / core dumped on ogrinfo of
>>>         a directory:
>>>
>>>         ogr2ogr /tmp/newdir
>>>         https://github.com/SymbolixAU/geojsonsf/raw/master/inst/examples/geo_melbourne.geojson
>>>         -f "ESRI Shapefile"
>>>
>>>         ogrinfo /tmp/newdir/
>>>         Illegal instruction (core dumped)
>>>
>>>         I've worked back through some docker images and it wasn't a
>>>         problem in 3.6.0, but I'm getting it since 3.7.0 - or I'm
>>>         doing something wrong entirely.
>>>
>>>         Cheers, Mike
>>>
>>>
>>>         -- 
>>>         Michael Sumner
>>>         Software and Database Engineer
>>>         Australian Antarctic Division
>>>         Hobart, Australia
>>>         e-mail: mdsumner at gmail.com
>>>
>>>         _______________________________________________
>>>         gdal-dev mailing list
>>>         gdal-dev at lists.osgeo.org
>>>         https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>>         -- 
>>         http://www.spatialys.com
>>         My software is free, but my time generally not.
>>
>>
>>
>>     -- 
>>     Michael Sumner
>>     Software and Database Engineer
>>     Australian Antarctic Division
>>     Hobart, Australia
>>     e-mail: mdsumner at gmail.com
>
>     -- 
>     http://www.spatialys.com
>     My software is free, but my time generally not.
>
>
>
> -- 
> Michael Sumner
> Software and Database Engineer
> Australian Antarctic Division
> Hobart, Australia
> e-mail: mdsumner at gmail.com

-- 
http://www.spatialys.com
My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240204/3d349f17/attachment-0001.htm>


More information about the gdal-dev mailing list