[gdal-dev] gdalinfo on large vrt takes a long time
William Kyngesburye
woklist at kyngchaos.com
Wed Jul 19 06:47:58 PDT 2023
macos has lldb, once I figured that out, I got:
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00007ff81eefdeb2 libsystem_kernel.dylib`stat$INODE64 + 10
frame #1: 0x000000010140c669 GDAL`VSIStatExL + 96
frame #2: 0x0000000101699994 GDAL`VRTSimpleSource::GetFileList(char***, int*, int*, _CPLHashSet*) + 118
frame #3: 0x000000010169709b GDAL`VRTSourcedRasterBand::GetFileList(char***, int*, int*, _CPLHashSet*) + 71
frame #4: 0x00000001016a603c GDAL`VRTDataset::GetFileList() + 114
frame #5: 0x0000000101ee40df GDAL`GDALInfo + 543
frame #6: 0x0000000100003911 gdalinfo`main + 914
frame #7: 0x000000010001552e dyld`start + 462
It looks like just getting the file list touches the files to make sure they exist?
-----
William Kyngesburye
<kyngchaos*at*kyngchaos*dot*com>
<https://www.kyngchaos.com>
Don't Panic
> On Jul 18, 2023, at 5:24 PM, Even Rouault <even.rouault at spatialys.com> wrote:
>
> Can you run under gdb and break when it takes a long time and display the stack trace ?
>
> Something along (best with a debug build):
>
> gdb --args gdalinfo your.vrt
>
> run
>
> hit ctrl-c when it takes some time
>
> bt
>
>
>> Le 19/07/2023 à 00:17, William Kyngesburye a écrit :
>> (sorry, my email sorting rules missed your reply somehow, just found it)
>>
>> The env var didn't help, and the vrt does have statistics.
>>
>> Pathnames in the vrt are relative to the vrt, if that might be a problem in this situation.
>>
>> I turned on CPL_DEBUG. gdalinfo is doing whatever is taking time after GDALDefaultOverviews::OverviewScan(). The next output that appears when that's done is the list of files. The individual files do not have overviews, and the vrt has no overviews. When I add the vrt overviews (I disable them most of the time by renaming the over file because they're out of date), I get the overview scan message, a list of overviews with no delay, then another overview scan message that has the long processing time again, then the file list.
>>
>> -----
>> William Kyngesburye
>> <kyngchaos*at*kyngchaos*dot*com>
>> <https://www.kyngchaos.com>
>>
>> Don't Panic
>>
>>>> On Jun 8, 2023, at 6:18 PM, Even Rouault <even.rouault at spatialys.com> wrote:
>>> William,
>>>
>>> it might be perhaps related to the GetMinimum() call done by gdalinfo. Cf https://trac.osgeo.org/gdal/ticket/5444
>>>
>>> But normally it should only try to open the first source, and not all of them. At least that's what I could confirm on a quick testing. But I do see that the CanUseSourcesMinMaxImplementations() method will stat() sources whose filename looks like a local file (obviously if that's a mounted file system / vpn thing, it will not realize it is remote).
>>>
>>> Try setting the VRT_MIN_MAX_FROM_SOURCES=NO environment variable / configuration option to see if that makes a difference. If it does, the CanUseSourcesMinMaxImplementations() logic should be modified to avoid doing those stat's().
>>>
>>> If that's confirmed to be linked to GetMinmum(), you may also workaround the issue by doing a "gdalinfo -stats the.vrt" (from the server) to have statistics incorporated in the VRT, then GetMinimum() should be instant
>>>
>>> Even
>>>
>>> Le 09/06/2023 à 00:43, William Kyngesburye a écrit :
>>>> I'm writing a script that needs some info from a vrt raster, and one has thousands of files. When reading the vrt over the internet (vpn to our server) it takes a long time. I think it's looking at every file of the vrt. What is gdal reading from the files that's not in the vrt itself? I used all the -no options and I'm not adding any other info options like checksums or stats. I just need basic info from the vrt that's in the vrt file.
>>>>
>>>> -----
>>>> William Kyngesburye
>>>> <kyngchaos*at*kyngchaos*dot*com>
>>>> <https://www.kyngchaos.com>
>>>>
>>>> Don't Panic
>>>> _______________________________________________
>>>> gdal-dev mailing list
>>>> gdal-dev at lists.osgeo.org
>>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>> --
>>> http://www.spatialys.com
>>> My software is free, but my time generally not.
>
> --
> http://www.spatialys.com
> My software is free, but my time generally not.
>
More information about the gdal-dev
mailing list