[gdal-dev] Problem using gdalbuildvrt with a large number of source datasets
Homme Zwaagstra
hrz at geodata.soton.ac.uk
Thu Dec 4 07:15:03 PST 2014
Hello Even,
I've had a chance to test the fix in trunk and can report that it works
very well: the `gdalbuildvrt` completed in just over an hour with the
progress meter giving a much more accurate report on progress.
I have submitted an enhancement request regarding the VRT indexing at
<http://trac.osgeo.org/gdal/ticket/5762>.
Many thanks and best regards,
Homme
On 03/12/14 10:31, Homme Zwaagstra wrote:
> Even,
>
> On 03/12/14 10:24, Even Rouault wrote:
> > Homme,
> >
> >>
> >> I've come up against a problem with `gdalbuildvrt` taking a long
> time to
> >> create
> >> a VRT when it is passed a large number of source datasets. I am trying
> >> to create
> >> a VRT file for a zoom level in a TMS structure containing JPEG
> tiles. The
> >> command I'm using is:
> >>
> >> gdalbuildvrt output.vrt `find ./tiles/18 -iname *.jpg -printf "%p "`
> >>
> >> where the number of tiles is:
> >>
> >> $ find ./tiles/18 -iname *.jpg | wc -l
> >> 767104
> >>
> >> The processing seemed to progress reasonably quickly with the
> progress bar
> >> outputing '0... etc ...100 - done'. However `gdalbuildvrt` continued
> >> running
> >> until I killed it 8 hours later. Looking at `output.vrt` just before I
> >> killed
> >> the program showed it remained empty (0 bytes).
> >
> > I've looked up a bit at the code, and I spotted a potential performance
> > problem when serialing the in-memory VRT into the XML with a big
> number of
> > sources. I've just committed an improvement into trunk that will
> make the
> > complexity of source serialization linear instead of quadratic.
>
> Many thanks! I will give it a spin and report back...
>
> >
> >>
> >> Before digging any deeper is there something I'm missing? Am I
> expecting
> >> too much of `gdalbuildvrt`, or indeed the VRT format, in processing
> this
> >> many source
> >> datasets?
> >>
> >> Conceptually in this instance it seems as if it would be useful for a
> >> VRT file
> >> (and `gdalbuildvrt`) to reference the output of `gdaltindex` or
> something
> >> similar. I'm not sure how efficiently source datasets are indexed in
> >> VRTs and
> >> whether this might be contributing to the problem?
> >
> > There's no indexing in VRT. So yes for that big number of sources,
> there might
> > be performance problems since each RasterIO() request will have to
> go test if
> > each source interstects the requested area of interest. Adding an
> in-memory
> > spatial index after opening the VRT would likely be possible,
> provided that
> > the non neglectable size of the VRT/XML doesn't make opening it too
> slow. That
> > depends on the use cases.
> >
> > Yes, perhaps referencing a shapefile tile index could be a possible
> > enhancement.
>
> Ok, that's useful to know, thanks. Unless I hear back otherwise, I'll
> submit an
> enhancement request on the issue tracker to bookmark the issue.
>
> Best regards,
>
> Homme
>
> >
> >
> > Even
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20141204/fd1d34aa/attachment.html>
More information about the gdal-dev
mailing list