[gdal-dev] Problem using gdalbuildvrt with a large number of source datasets

Homme Zwaagstra hrz at geodata.soton.ac.uk
Wed Dec 3 02:31:21 PST 2014


Even,

On 03/12/14 10:24, Even Rouault wrote:
> Homme,
 >
 >>
 >> I've come up against a problem with `gdalbuildvrt` taking a long time to
 >> create
 >> a VRT when it is passed a large number of source datasets. I am trying
 >> to create
 >> a VRT file for a zoom level in a TMS structure containing JPEG 
tiles.  The
 >> command I'm using is:
 >>
 >> gdalbuildvrt output.vrt `find ./tiles/18 -iname *.jpg -printf "%p "`
 >>
 >> where the number of tiles is:
 >>
 >> $ find ./tiles/18 -iname *.jpg | wc -l
 >> 767104
 >>
 >> The processing seemed to progress reasonably quickly with the 
progress bar
 >> outputing '0... etc ...100 - done'.  However `gdalbuildvrt` continued
 >> running
 >> until I killed it 8 hours later.  Looking at `output.vrt` just before I
 >> killed
 >> the program showed it remained empty (0 bytes).
 >
 > I've looked up a bit at the code, and I spotted a potential performance
 > problem when serialing the in-memory VRT into the XML with a big 
number of
 > sources. I've just committed an improvement into trunk that will make 
the
 > complexity of source serialization linear instead of quadratic.

Many thanks! I will give it a spin and report back...

>
 >>
 >> Before digging any deeper is there something I'm missing? Am I expecting
 >> too much of `gdalbuildvrt`, or indeed the VRT format, in processing this
 >> many source
 >> datasets?
 >>
 >> Conceptually in this instance it seems as if it would be useful for a
 >> VRT file
 >> (and `gdalbuildvrt`) to reference the output of `gdaltindex` or 
something
 >> similar.  I'm not sure how efficiently source datasets are indexed in
 >> VRTs and
 >> whether this might be contributing to the problem?
 >
 > There's no indexing in VRT. So yes for that big number of sources, 
there might
 > be performance problems since each RasterIO() request will have to go 
test if
 > each source interstects the requested area of interest. Adding an 
in-memory
 > spatial index after opening the VRT would likely be possible, 
provided that
 > the non neglectable size of the VRT/XML doesn't make opening it too 
slow. That
 > depends on the use cases.
 >
 > Yes, perhaps referencing a shapefile tile index could be a possible
 > enhancement.

Ok, that's useful to know, thanks.  Unless I hear back otherwise, I'll 
submit an
enhancement request on the issue tracker to bookmark the issue.

Best regards,

Homme

>
 >
 > Even
 >


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20141203/2bb831ff/attachment-0001.html>


More information about the gdal-dev mailing list