[gdal-dev] core dump on dir info

Michael Sumner mdsumner at gmail.com
Mon Feb 5 12:54:28 PST 2024


yes, jammy VM on openstack is the host (and is where I run pretty much
everything, though will increasingly use AWS).

Thanks for the note, I'll try on other systems too. We need a
security-allow set for vsicurl to work so if there are other little details
I'll be keen to flush them out.

Cheers, Mike

On Mon, 5 Feb 2024, 18:44 Javier Jimenez Shaw, <j1 at jimenezshaw.com> wrote:

> Hi Mike
>
> Out of curiosity, are run running it in a virtual machine?
> A few year ago I had problems running a program in a virtual machine
> (virtualbox, but I read it happens in others) due to a missing SSE
> instruction. The solution there was to "enable" the missing instructions in
> the virtual machine configuration (that I don't know why it was not the
> default).
>
> Cheers
>
>
> On Mon, 5 Feb 2024, 02:04 Even Rouault via gdal-dev, <
> gdal-dev at lists.osgeo.org> wrote:
>
>> ghcr.io/osgeo/gdal:ubuntu-full-latest has been regenerated with the
>> rebuild of TileDB without AVX2. I've also enabled the
>> drivers-with-external-depencies-built-as-plugin GDAL build mode, so it is
>> easy to just remove a given plugin by deleting the corresponding .so in
>> /usr/lib/x86_64-linux-gnu/gdalplugins
>>
>> Even
>> Le 04/02/2024 à 22:51, Michael Sumner a écrit :
>>
>> indeed there's no avx2:
>>
>> cat /proc/cpuinfo|grep sse|head -n 1
>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
>> mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt
>> pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni
>> pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes
>> xsave avx f16c hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a
>> misalignsse 3dnowprefetch osvw xop fma4 tbm perfctr_core ssbd ibpb vmmcall
>> tsc_adjust bmi1 virt_ssbd arat npt nrip_save arch_capabilities
>>
>> Cheers, Mike
>>
>>
>>
>> On Sun, Feb 4, 2024 at 10:55 PM Even Rouault <even.rouault at spatialys.com>
>> wrote:
>>
>>> ok, so I believe this is the AVX2 issue I was talking about, as I
>>> realize that enabling AVX2 is the default mode when TileDB is built from
>>> source (which the Docker image does), and must be explicitly disabled with
>>> "./bootstrap --disable-avx2" (I've just changed the build recipe to include
>>> that, will take effect next time the images are refreshed)
>>>
>>> To confirm, can you send or just check the output of : cat
>>> /proc/cpuinfo|grep sse|head -n 1
>>>
>>> If there is no "avx2" in it, this is at 99.9% the reason of the issue.
>>>
>>> Even
>>> Le 04/02/2024 à 06:20, Michael Sumner a écrit :
>>>
>>> skipping TileDB does fix:
>>>
>>> ogr2ogr /tmp/newdir
>>> https://github.com/SymbolixAU/geojsonsf/raw/master/inst/examples/geo_melbourne.geojson -f
>>> "ESRI Shapefile"
>>> export GDAL_SKIP=TileDB
>>> ogrinfo /tmp/newdir/
>>> INFO: Open of `/tmp/newdir/'
>>>       using driver `ESRI Shapefile' successful.
>>> 1: geo_melbourne (Polygon)
>>>
>>> unset GDAL_SKIP
>>> ogrinfo /tmp/newdir/
>>> Illegal instruction (core dumped)
>>>
>>> I failed to explain that I'm using gdal containers from the repo:
>>>
>>> docker run --rm -ti ghcr.io/osgeo/gdal:ubunt
>>>
>>> u-full-latest
>>>
>>> apt update
>>> apt install -y gdb
>>>
>>> Here's the output of under gdb as you suggested, there was a lot so I
>>> put it on a gist:
>>> https://gist.github.com/mdsumner/839ae6e05ededf640b65bfee3a20a4c0
>>>
>>> gdb --args ogrinfo /tmp/newdir/
>>> > run
>>> > thread apply all bt
>>>
>>> Thanks!
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Feb 3, 2024 at 7:49 PM Even Rouault <even.rouault at spatialys.com>
>>> wrote:
>>>
>>>> - When it crashes under gdb, type "thread apply all bt" to get the
>>>> stack trace of all threads
>>>>
>>>> - I suspect there is a connection with
>>>> https://github.com/OSGeo/gdal/pull/9170 , but that pull request
>>>> wouldn't help here as "/tmp/newdir" could be a valid connection to TileDB
>>>>
>>>> - how did you get TileDB installed? It looks to be packaged? Which
>>>> distribution do you use?
>>>>
>>>> - SIGILL reminds me of issues with some TileDB builds using the AVX2
>>>> instruction set by default, which could cause some crash on host CPUs that
>>>> don't have AVX2 (unlikely on recent hardware though)
>>>>
>>>> - Setting GDAL_SKIP=TileDB should be a workaround
>>>>
>>>>
>>>> Le 03/02/2024 à 07:15, Michael Sumner a écrit :
>>>>
>>>> Thanks Even, so there's something about tiledb under gdb (or maybe I am
>>>> mangling the context,  I will try variants of the host I'm using). Run with
>>>> valgrind included below.
>>>>
>>>> gdb --args ogrinfo /tmp/newdir/
>>>> ...
>>>> (gdb) run
>>>> Starting program: /usr/local/bin/ogrinfo /tmp/newdir/
>>>> [Thread debugging using libthread_db enabled]
>>>> Using host libthread_db library
>>>> "/lib/x86_64-linux-gnu/libthread_db.so.1".
>>>> [New Thread 0x7fffe7757640 (LWP 988)]
>>>> [New Thread 0x7fffe6f56640 (LWP 989)]
>>>> [New Thread 0x7fffde755640 (LWP 990)]
>>>> [New Thread 0x7fffd5f54640 (LWP 991)]
>>>> [New Thread 0x7fffc5753640 (LWP 992)]
>>>> [New Thread 0x7fffc4f52640 (LWP 993)]
>>>> [New Thread 0x7fffb4751640 (LWP 994)]
>>>> [New Thread 0x7fffabf50640 (LWP 995)]
>>>> [New Thread 0x7fffab74f640 (LWP 996)]
>>>> [New Thread 0x7fffa2f4e640 (LWP 997)]
>>>> [New Thread 0x7fff9a74d640 (LWP 998)]
>>>> [New Thread 0x7fff91f4c640 (LWP 999)]
>>>> [New Thread 0x7fff8974b640 (LWP 1000)]
>>>> [New Thread 0x7fff78f4a640 (LWP 1001)]
>>>> [New Thread 0x7fff78749640 (LWP 1002)]
>>>> [New Thread 0x7fff6f5ff640 (LWP 1003)]
>>>>
>>>> Thread 1 "ogrinfo" received signal SIGILL, Illegal instruction.
>>>> 0x00007ffff3773c9e in tiledb::common::ThreadPool::ThreadPool(unsigned
>>>> long) () from /lib/x86_64-linux-gnu/libtiledb.so.2.16
>>>>
>>>>
>>>>
>>>>
>>>> valgrind -s ogrinfo /tmp/newdir
>>>> ==704== Memcheck, a memory error detector
>>>> ==704== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
>>>> ==704== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright
>>>> info
>>>> ==704== Command: ogrinfo /tmp/newdir
>>>> ==704==
>>>> INFO: Open of `/tmp/newdir'
>>>>       using driver `ESRI Shapefile' successful.
>>>> 1: geo_melbourne (Polygon)
>>>> ==704==
>>>> ==704== HEAP SUMMARY:
>>>> ==704==     in use at exit: 25,486 bytes in 216 blocks
>>>> ==704==   total heap usage: 15,761 allocs, 15,545 frees, 2,390,169
>>>> bytes allocated
>>>> ==704==
>>>> ==704== LEAK SUMMARY:
>>>> ==704==    definitely lost: 0 bytes in 0 blocks
>>>> ==704==    indirectly lost: 0 bytes in 0 blocks
>>>> ==704==      possibly lost: 544 bytes in 1 blocks
>>>> ==704==    still reachable: 24,942 bytes in 215 blocks
>>>> ==704==         suppressed: 0 bytes in 0 blocks
>>>> ==704== Rerun with --leak-check=full to see details of leaked memory
>>>> ==704==
>>>> ==704== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
>>>>
>>>>
>>>> ogrinfo /tmp/newdir
>>>> Illegal instruction (core dumped)
>>>>
>>>> Cheers, Mike
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Feb 3, 2024 at 12:46 PM Even Rouault <
>>>> even.rouault at spatialys.com> wrote:
>>>>
>>>>> Michael,
>>>>>
>>>>> I'm wondering if there not might be something wrong with your build or
>>>>> runtime environment. Or there's something subtle, because that works fine
>>>>> for me with my dev build or in the
>>>>> ghcr.io/osgeo/gdal:alpine-normal-3.8.3 Docker image
>>>>>
>>>>> Try running "valgrind ogrinfo /tmp/newdir/" or "gdb --args ogrinfo
>>>>> /tmp/newdir/" (type "run") to get more useful information
>>>>>
>>>>> Even
>>>>> Le 03/02/2024 à 02:35, Michael Sumner via gdal-dev a écrit :
>>>>>
>>>>> I'm getting Illegal instruction / core dumped on ogrinfo of a
>>>>> directory:
>>>>>
>>>>> ogr2ogr /tmp/newdir
>>>>> https://github.com/SymbolixAU/geojsonsf/raw/master/inst/examples/geo_melbourne.geojson
>>>>> -f "ESRI Shapefile"
>>>>>
>>>>> ogrinfo /tmp/newdir/
>>>>> Illegal instruction (core dumped)
>>>>>
>>>>> I've worked back through some docker images and it wasn't a problem in
>>>>> 3.6.0, but I'm getting it since 3.7.0 - or I'm doing something wrong
>>>>> entirely.
>>>>>
>>>>> Cheers, Mike
>>>>>
>>>>>
>>>>> --
>>>>> Michael Sumner
>>>>> Software and Database Engineer
>>>>> Australian Antarctic Division
>>>>> Hobart, Australia
>>>>> e-mail: mdsumner at gmail.com
>>>>>
>>>>> _______________________________________________
>>>>> gdal-dev mailing listgdal-dev at lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>>
>>>>> -- http://www.spatialys.com
>>>>> My software is free, but my time generally not.
>>>>>
>>>>>
>>>>
>>>> --
>>>> Michael Sumner
>>>> Software and Database Engineer
>>>> Australian Antarctic Division
>>>> Hobart, Australia
>>>> e-mail: mdsumner at gmail.com
>>>>
>>>> -- http://www.spatialys.com
>>>> My software is free, but my time generally not.
>>>>
>>>>
>>>
>>> --
>>> Michael Sumner
>>> Software and Database Engineer
>>> Australian Antarctic Division
>>> Hobart, Australia
>>> e-mail: mdsumner at gmail.com
>>>
>>> -- http://www.spatialys.com
>>> My software is free, but my time generally not.
>>>
>>>
>>
>> --
>> Michael Sumner
>> Software and Database Engineer
>> Australian Antarctic Division
>> Hobart, Australia
>> e-mail: mdsumner at gmail.com
>>
>> -- http://www.spatialys.com
>> My software is free, but my time generally not.
>>
>> _______________________________________________
>> gdal-dev mailing list
>> gdal-dev at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240206/6dd4c5c3/attachment-0001.htm>


More information about the gdal-dev mailing list