[gdal-dev] Performance and sibling files

Even Rouault even.rouault at mines-paris.org
Tue Jan 29 14:30:22 EST 2008


Daniel,

I have added your message in ticket #2158 and added you in CC of the ticket, 
so you can follow how it evolves.

Best regards,
Even

Le Tuesday 29 January 2008 16:19:34 Daniel, vous avez écrit :
> Hello,
>
> We have identfied a serious performance problem with the reading of sibling
> files performed in the GDALOpenInfo constructor.
>
> When we commented out lines 123-127 in gdalopeninfo.cpp (the VSIReadDir
> call), the runtime of our application went down from 150 days to 15! The
> application is 100% i/O-bound (uses no cpu time according to the task
> manager)
>
> This is our setup:
>
> 7.5 million small (~20 KB) jpeg files with corresponding world files for a
> total of 15 million files, distributed in 50000 directories (approximately
> 300 files per directory).
>
> The files reside on a fast 15K SAS disk running in a Windows 2003 server
> with 8 cores and 4 GB RAM. The filesystem is NTFS (no compression /
> indexing).
>
> Due to the way the files are organized, neighboring jpeg files are located
> in different directories. This means that we always have to read the entire
> directory in order to open just one file.
>
> Our app needs to go read the entire dataset ordered geographically.
> Unfortunately, changing the directory layout is not an option.
>
> Reading one complete directory means reading ~1.5 MB data from disk. The
> data is read non-sequentially, since the NTFS directory structure is a
> B-Tree and FindNextFile returns the contents sorted alphabetically.
> The disk cache gets exhausted after reading 2700 directories. This means
> that we neve re-use the previously read directory data.
>
> I realise that this might be a quite unusual case but it would be very nice
> if the sibling reading in GDALOpenInfo was optional.
>
> I don't think that the changes made in ticket #2158 (
> http://trac.osgeo.org/gdal/ticket/2158) would help in this case since there
> was almost no CPU utilization.
>
> Regards,
>   Daniel Bäck




More information about the gdal-dev mailing list