[gdal-dev] Problem using gdalbuildvrt with a large number of source datasets
Even Rouault
even.rouault at spatialys.com
Wed Dec 3 02:24:07 PST 2014
Homme,
>
> I've come up against a problem with `gdalbuildvrt` taking a long time to
> create
> a VRT when it is passed a large number of source datasets. I am trying
> to create
> a VRT file for a zoom level in a TMS structure containing JPEG tiles. The
> command I'm using is:
>
> gdalbuildvrt output.vrt `find ./tiles/18 -iname *.jpg -printf "%p "`
>
> where the number of tiles is:
>
> $ find ./tiles/18 -iname *.jpg | wc -l
> 767104
>
> The processing seemed to progress reasonably quickly with the progress bar
> outputing '0... etc ...100 - done'. However `gdalbuildvrt` continued
> running
> until I killed it 8 hours later. Looking at `output.vrt` just before I
> killed
> the program showed it remained empty (0 bytes).
I've looked up a bit at the code, and I spotted a potential performance
problem when serialing the in-memory VRT into the XML with a big number of
sources. I've just committed an improvement into trunk that will make the
complexity of source serialization linear instead of quadratic.
>
> Before digging any deeper is there something I'm missing? Am I expecting
> too much of `gdalbuildvrt`, or indeed the VRT format, in processing this
> many source
> datasets?
>
> Conceptually in this instance it seems as if it would be useful for a
> VRT file
> (and `gdalbuildvrt`) to reference the output of `gdaltindex` or something
> similar. I'm not sure how efficiently source datasets are indexed in
> VRTs and
> whether this might be contributing to the problem?
There's no indexing in VRT. So yes for that big number of sources, there might
be performance problems since each RasterIO() request will have to go test if
each source interstects the requested area of interest. Adding an in-memory
spatial index after opening the VRT would likely be possible, provided that
the non neglectable size of the VRT/XML doesn't make opening it too slow. That
depends on the use cases.
Yes, perhaps referencing a shapefile tile index could be a possible
enhancement.
Even
--
Spatialys - Geospatial professional services
http://www.spatialys.com
More information about the gdal-dev
mailing list