Hello,<br><br><span id="BugEvents" style="border: 0px none ; margin: 0px;"> We have
identfied a serious performance problem with the reading of sibling
files performed in the GDALOpenInfo constructor.<br><br>When
we commented out lines 123-127 in gdalopeninfo.cpp (the VSIReadDir call), the runtime of our
application went down from 150 days to 15! The application is 100%
i/O-bound (uses no cpu time according to the task manager)<br><br>This is our setup:<br><br>7.5
million small (~20 KB) jpeg files with corresponding world files for a
total of 15 million files, distributed in 50000 directories
(approximately 300 files per directory).<br><br>The files reside on a
fast 15K SAS disk running in a Windows 2003 server with 8 cores and 4
GB RAM. The filesystem is NTFS (no compression / indexing).<br><br>Due
to the way the files are organized, neighboring jpeg files are located
in different directories. This means that we always have to read the
entire directory in order to open just one file.<br><br>Our app needs
to go read the entire dataset ordered geographically. Unfortunately,
changing the directory layout is not an option.<br><br>Reading one
complete directory means reading ~1.5 MB data from disk. The data is
read non-sequentially, since the NTFS directory structure is a B-Tree
and FindNextFile returns the contents sorted alphabetically. <br>The disk cache gets exhausted after reading 2700 directories. This means that we neve re-use the previously read directory data.<br><br>I realise that this might be a quite unusual case but it would be very nice if the sibling reading in </span><span id="BugEvents" style="border: 0px none ; margin: 0px;">GDALOpenInfo was optional.<br>
<br>
I don't think that the changes made in ticket </span><span id="BugEvents" style="border: 0px none ; margin: 0px;">#2158 (<a href="http://trac.osgeo.org/gdal/ticket/2158">http://trac.osgeo.org/gdal/ticket/2158</a>) would help in this case since there was almost no CPU utilization.<br>
<br>
Regards,<br>
Daniel Bäck</span>