[gdal-dev] Problem using gdalbuildvrt with a large number of source datasets

Even Rouault even.rouault at spatialys.com
Wed Dec 3 02:24:07 PST 2014


Homme,

> 
> I've come up against a problem with `gdalbuildvrt` taking a long time to
> create
> a VRT when it is passed a large number of source datasets. I am trying
> to create
> a VRT file for a zoom level in a TMS structure containing JPEG tiles.  The
> command I'm using is:
> 
> gdalbuildvrt output.vrt `find ./tiles/18 -iname *.jpg -printf "%p "`
> 
> where the number of tiles is:
> 
> $ find ./tiles/18 -iname *.jpg | wc -l
> 767104
> 
> The processing seemed to progress reasonably quickly with the progress bar
> outputing '0... etc ...100 - done'.  However `gdalbuildvrt` continued
> running
> until I killed it 8 hours later.  Looking at `output.vrt` just before I
> killed
> the program showed it remained empty (0 bytes).

I've looked up a bit at the code, and I spotted a potential performance 
problem when serialing the in-memory VRT into the XML with a big number of 
sources. I've just committed an improvement into trunk that will make the 
complexity of source serialization linear instead of quadratic.

> 
> Before digging any deeper is there something I'm missing? Am I expecting
> too much of `gdalbuildvrt`, or indeed the VRT format, in processing this
> many source
> datasets?
> 
> Conceptually in this instance it seems as if it would be useful for a
> VRT file
> (and `gdalbuildvrt`) to reference the output of `gdaltindex` or something
> similar.  I'm not sure how efficiently source datasets are indexed in
> VRTs and
> whether this might be contributing to the problem?

There's no indexing in VRT. So yes for that big number of sources, there might 
be performance problems since each RasterIO() request will have to go test if 
each source interstects the requested area of interest. Adding an in-memory 
spatial index after opening the VRT would likely be possible, provided that 
the non neglectable size of the VRT/XML doesn't make opening it too slow. That 
depends on the use cases.

Yes, perhaps referencing a shapefile tile index could be a possible 
enhancement.

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the gdal-dev mailing list