[gdal-dev] Problem using gdalbuildvrt with a large number of source datasets

Homme Zwaagstra hrz at geodata.soton.ac.uk
Thu Dec 4 07:15:03 PST 2014


Hello Even,

I've had a chance to test the fix in trunk and can report that it works 
very well: the `gdalbuildvrt` completed in just over an hour with the 
progress meter giving a much more accurate report on progress.

I have submitted an enhancement request regarding the VRT indexing at 
<http://trac.osgeo.org/gdal/ticket/5762>.

Many thanks and best regards,

Homme

On 03/12/14 10:31, Homme Zwaagstra wrote:
> Even,
>
> On 03/12/14 10:24, Even Rouault wrote:
> > Homme,
> >
> >>
> >> I've come up against a problem with `gdalbuildvrt` taking a long 
> time to
> >> create
> >> a VRT when it is passed a large number of source datasets. I am trying
> >> to create
> >> a VRT file for a zoom level in a TMS structure containing JPEG 
> tiles.  The
> >> command I'm using is:
> >>
> >> gdalbuildvrt output.vrt `find ./tiles/18 -iname *.jpg -printf "%p "`
> >>
> >> where the number of tiles is:
> >>
> >> $ find ./tiles/18 -iname *.jpg | wc -l
> >> 767104
> >>
> >> The processing seemed to progress reasonably quickly with the 
> progress bar
> >> outputing '0... etc ...100 - done'.  However `gdalbuildvrt` continued
> >> running
> >> until I killed it 8 hours later.  Looking at `output.vrt` just before I
> >> killed
> >> the program showed it remained empty (0 bytes).
> >
> > I've looked up a bit at the code, and I spotted a potential performance
> > problem when serialing the in-memory VRT into the XML with a big 
> number of
> > sources. I've just committed an improvement into trunk that will 
> make the
> > complexity of source serialization linear instead of quadratic.
>
> Many thanks! I will give it a spin and report back...
>
> >
> >>
> >> Before digging any deeper is there something I'm missing? Am I 
> expecting
> >> too much of `gdalbuildvrt`, or indeed the VRT format, in processing 
> this
> >> many source
> >> datasets?
> >>
> >> Conceptually in this instance it seems as if it would be useful for a
> >> VRT file
> >> (and `gdalbuildvrt`) to reference the output of `gdaltindex` or 
> something
> >> similar.  I'm not sure how efficiently source datasets are indexed in
> >> VRTs and
> >> whether this might be contributing to the problem?
> >
> > There's no indexing in VRT. So yes for that big number of sources, 
> there might
> > be performance problems since each RasterIO() request will have to 
> go test if
> > each source interstects the requested area of interest. Adding an 
> in-memory
> > spatial index after opening the VRT would likely be possible, 
> provided that
> > the non neglectable size of the VRT/XML doesn't make opening it too 
> slow. That
> > depends on the use cases.
> >
> > Yes, perhaps referencing a shapefile tile index could be a possible
> > enhancement.
>
> Ok, that's useful to know, thanks.  Unless I hear back otherwise, I'll 
> submit an
> enhancement request on the issue tracker to bookmark the issue.
>
> Best regards,
>
> Homme
>
> >
> >
> > Even
> >
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20141204/fd1d34aa/attachment.html>


More information about the gdal-dev mailing list