[gdal-dev] Problem using gdalbuildvrt with a large number of source datasets
Homme Zwaagstra
hrz at geodata.soton.ac.uk
Wed Dec 3 02:31:21 PST 2014
Even,
On 03/12/14 10:24, Even Rouault wrote:
> Homme,
>
>>
>> I've come up against a problem with `gdalbuildvrt` taking a long time to
>> create
>> a VRT when it is passed a large number of source datasets. I am trying
>> to create
>> a VRT file for a zoom level in a TMS structure containing JPEG
tiles. The
>> command I'm using is:
>>
>> gdalbuildvrt output.vrt `find ./tiles/18 -iname *.jpg -printf "%p "`
>>
>> where the number of tiles is:
>>
>> $ find ./tiles/18 -iname *.jpg | wc -l
>> 767104
>>
>> The processing seemed to progress reasonably quickly with the
progress bar
>> outputing '0... etc ...100 - done'. However `gdalbuildvrt` continued
>> running
>> until I killed it 8 hours later. Looking at `output.vrt` just before I
>> killed
>> the program showed it remained empty (0 bytes).
>
> I've looked up a bit at the code, and I spotted a potential performance
> problem when serialing the in-memory VRT into the XML with a big
number of
> sources. I've just committed an improvement into trunk that will make
the
> complexity of source serialization linear instead of quadratic.
Many thanks! I will give it a spin and report back...
>
>>
>> Before digging any deeper is there something I'm missing? Am I expecting
>> too much of `gdalbuildvrt`, or indeed the VRT format, in processing this
>> many source
>> datasets?
>>
>> Conceptually in this instance it seems as if it would be useful for a
>> VRT file
>> (and `gdalbuildvrt`) to reference the output of `gdaltindex` or
something
>> similar. I'm not sure how efficiently source datasets are indexed in
>> VRTs and
>> whether this might be contributing to the problem?
>
> There's no indexing in VRT. So yes for that big number of sources,
there might
> be performance problems since each RasterIO() request will have to go
test if
> each source interstects the requested area of interest. Adding an
in-memory
> spatial index after opening the VRT would likely be possible,
provided that
> the non neglectable size of the VRT/XML doesn't make opening it too
slow. That
> depends on the use cases.
>
> Yes, perhaps referencing a shapefile tile index could be a possible
> enhancement.
Ok, that's useful to know, thanks. Unless I hear back otherwise, I'll
submit an
enhancement request on the issue tracker to bookmark the issue.
Best regards,
Homme
>
>
> Even
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20141203/2bb831ff/attachment-0001.html>
More information about the gdal-dev
mailing list