[gdal-dev] gdalinfo on large vrt takes a long time

William Kyngesburye woklist at kyngchaos.com
Wed Jul 19 06:47:58 PDT 2023


macos has lldb, once I figured that out, I got:

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007ff81eefdeb2 libsystem_kernel.dylib`stat$INODE64 + 10
    frame #1: 0x000000010140c669 GDAL`VSIStatExL + 96
    frame #2: 0x0000000101699994 GDAL`VRTSimpleSource::GetFileList(char***, int*, int*, _CPLHashSet*) + 118
    frame #3: 0x000000010169709b GDAL`VRTSourcedRasterBand::GetFileList(char***, int*, int*, _CPLHashSet*) + 71
    frame #4: 0x00000001016a603c GDAL`VRTDataset::GetFileList() + 114
    frame #5: 0x0000000101ee40df GDAL`GDALInfo + 543
    frame #6: 0x0000000100003911 gdalinfo`main + 914
    frame #7: 0x000000010001552e dyld`start + 462

It looks like just getting the file list touches the files to make sure they exist?

-----
William Kyngesburye
<kyngchaos*at*kyngchaos*dot*com>
<https://www.kyngchaos.com>

Don't Panic

> On Jul 18, 2023, at 5:24 PM, Even Rouault <even.rouault at spatialys.com> wrote:
> 
> Can you run under gdb and break when it takes a long time and display the stack trace ?
> 
> Something along (best with a debug build):
> 
> gdb --args gdalinfo your.vrt
> 
> run
> 
> hit ctrl-c when it takes some time
> 
> bt
> 
> 
>> Le 19/07/2023 à 00:17, William Kyngesburye a écrit :
>> (sorry, my email sorting rules missed your reply somehow, just found it)
>> 
>> The env var didn't help, and the vrt does have statistics.
>> 
>> Pathnames in the vrt are relative to the vrt, if that might be a problem in this situation.
>> 
>> I turned on CPL_DEBUG.  gdalinfo is doing whatever is taking time after GDALDefaultOverviews::OverviewScan().  The next output that appears when that's done is the list of files.  The individual files do not have overviews, and the vrt has no overviews.  When I add the vrt overviews (I disable them most of the time by renaming the over file because they're out of date), I get the overview scan message, a list of overviews with no delay, then another overview scan message that has the long processing time again, then the file list.
>> 
>> -----
>> William Kyngesburye
>> <kyngchaos*at*kyngchaos*dot*com>
>> <https://www.kyngchaos.com>
>> 
>> Don't Panic
>> 
>>>> On Jun 8, 2023, at 6:18 PM, Even Rouault <even.rouault at spatialys.com> wrote:
>>> William,
>>> 
>>> it might be perhaps related to the GetMinimum() call done by gdalinfo. Cf https://trac.osgeo.org/gdal/ticket/5444
>>> 
>>> But normally it should only try to open the first source, and not all of them. At least that's what I could confirm on a quick testing. But I do see that the CanUseSourcesMinMaxImplementations() method will stat() sources whose filename looks like a local file (obviously if that's a mounted file system / vpn thing, it will not realize it is remote).
>>> 
>>> Try setting the VRT_MIN_MAX_FROM_SOURCES=NO environment variable / configuration option to see if that makes a difference. If it does, the CanUseSourcesMinMaxImplementations() logic should be modified to avoid doing those stat's().
>>> 
>>> If that's confirmed to be linked to GetMinmum(), you may also workaround the issue by doing a "gdalinfo -stats the.vrt" (from the server) to have statistics incorporated in the VRT, then GetMinimum() should be instant
>>> 
>>> Even
>>> 
>>> Le 09/06/2023 à 00:43, William Kyngesburye a écrit :
>>>> I'm writing a script that needs some info from a vrt raster, and one has thousands of files. When reading the vrt over the internet (vpn to our server) it takes a long time. I think it's looking at every file of the vrt.  What is gdal reading from the files that's not in the vrt itself? I used all the -no options and I'm not adding any other info options like checksums or stats. I just need basic info from the vrt that's in the vrt file.
>>>> 
>>>> -----
>>>> William Kyngesburye
>>>> <kyngchaos*at*kyngchaos*dot*com>
>>>> <https://www.kyngchaos.com>
>>>> 
>>>> Don't Panic
>>>> _______________________________________________
>>>> gdal-dev mailing list
>>>> gdal-dev at lists.osgeo.org
>>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>> -- 
>>> http://www.spatialys.com
>>> My software is free, but my time generally not.
> 
> -- 
> http://www.spatialys.com
> My software is free, but my time generally not.
> 


More information about the gdal-dev mailing list