[Qgis-developer] VRT functionality
Zoltan Szecsei
zoltans at geograph.co.za
Mon Oct 27 01:43:34 PDT 2014
On 2014/10/27 10:38, Even Rouault wrote:
> Le lundi 27 octobre 2014 08:04:21, Zoltan Szecsei a écrit :
>> On 2014/10/26 19:22, Even Rouault wrote:
>>> Le dimanche 26 octobre 2014 16:44:37, Zoltan Szecsei a écrit :
>>>> Hi,
>>>> I just want to clear up my mindset as to how a VRT is implemented in
>>>> QGIS.
>>> Zoltan,
>>>
>>> In fact those are more OGR questions than QGIS questions. QGIS makes no
>>> difference when reading a plain shapefile (through OGR) or a VRT.
>>>
>>>> I'd like to understand when QGIS opens a file, when it reads the
>>>> contents, and when it writes (if need be) and closes a file.
>>>> In this context, I am thinking about SHP files - especially the NGI
>>>> dataset which comes out "cut up" into degree squares.
>>>> Let's just deal with 1 feature type: Rivers lines. My VRT looks like:
>>>> <OGRVRTDataSource>
>>>>
>>>> <OGRVRTUnionLayer name="Rivers">
>>>> <OGRVRTLayer name="2730_RIVER_LINE_2006_06">
>>>>
>>>> <SrcDataSource
>>>>
>>>> relativeToVRT="1">2730/2730_RIVER_LINE_2006_06.shp</SrcDataSource>
>>>>
>>>> </OGRVRTLayer>
>>>> <OGRVRTLayer name="2731_RIVER_LINE_2006_04">
>>>>
>>>> <SrcDataSource
>>>>
>>>> relativeToVRT="1">2731/2731_RIVER_LINE_2006_04.shp</SrcDataSource>
>>>>
>>>> </OGRVRTLayer>
>>>> <OGRVRTLayer name="2732_RIVER_LINE_2006_04">
>>>>
>>>> <SrcDataSource
>>>>
>>>> relativeToVRT="1">2732/2732_RIVER_LINE_2006_04.shp</SrcDataSource>
>>>>
>>>> </OGRVRTLayer>
>>>> </OGRVRTUnionLayer>
>>>>
>>>> </OGRVRTDataSource>
>>>>
>>>> * When I open the VRT in QGIS, does QGIS open ALL the VRT files and
>>>>
>>>> look for the extent of each of the files?
>>> If QGIS issues a GetExtent() on the VRT, then with the above definition,
>>> it will query the 3 shapefiles to find the extent of each. But on
>>> shapefiles this is a fast operation.
>>> You could define <ExtentXMin>, etc... just besides OGRVRTUnionLayer if
>>> you really want fast GetExtent()
>>>
>>>> o If my VRT had the extents included for each of the files, would
>>>>
>>>> this stop QGIS from (at this stage) opening the files and
>>>> reading the extents?
>>> Yes, but QGIS probably asks GetFeatureCount(), so it will need to open
>>> each shapefile, unless you define <FeatureCount> as well.
>>> But QGIS will also asks the field definition, and will need to open each
>>> ...., unless you define <Field>
>>>
>>>> * Before rendering the VRT, does QGIS look at the extents of my
>>>>
>>>> viewport and only physically open my files and render it's
>>>> contents?
>>> QGIS will define SetSpatialFilter() on the layer with the extent, so as
>>> the layer can use a spatial index if it has one. Reviewing my code in
>>> VRT union layer, I can see that the spatial filter will be forwarded to
>>> each source layer. So it will need to open them, but the shapefile
>>> driver won't scan any feature if setting a spatial filter that does not
>>> intersect the extent of the shapefile, so that should be fast. A
>>> possible optimization could be done in the VRT union layer to take into
>>> account the extent of the source layer to avoid iterating on it if the
>>> spatial filter on th union layer doesn't intersect that extent.
>>>
>>> To be efficient, you likely need to compute .qix spatial index on each
>>> shapefile.
>>>
>>>> In other words, if I first zoom into a known area, then open my
>>>> VRT, will QGIS at this stage still open all the subfiles, instead
>>>> of waiting until a specific subfile needs opening)
>>>>
>>>> * As I pan around my map, does QGIS open and close the VRT subfiles
>>>>
>>>> that are out of my current viewing region?
>>> The VRT driver will maintain a pool of a maximum of 100 source layers by
>>> default (that number can be altered by setting the OGR_VRT_MAX_OPENED
>>> configuration option) and will close transparently the older ones
>>>
>>>> * Presumably if any of my VRT subfiles touch or overlap my current
>>>>
>>>> viewport, they would be "processed" depending on what I am doing?
>>>>
>>>> * Is there a way to structure a VRT file so that you can have access
>>>>
>>>> to the underlying files that make up the VRT? (Even edit access?)
>>> Not sure what you mean by "have access to". But a union VRT can be opened
>>> in update mode and the update mode will be forwareded to the source
>>> layers (provided they support it). You can delete or modify features.
>>> For creation of new features, you need to specify <SourceLayerFieldName>
>>> as documented in http://gdal.org/drv_vrt.html
>>>
>>>> Or, is the VRT just any easy way to bunch a whole lot of maps under one
>>>> name, and there is no processing benefit depending on the area you are
>>>> viewing or working in?
>>> Your above VRT should work reasonably fast. Unless you have several
>>> hunderds or thousands of source layers. In which case, you may need to
>>> define more optional elements in the VRT to avoid the scans, and there
>>> would be perhaps a need for some enhancements in the OGRUnionLayer
>>> class.
>>>
>>> Even
>> Hi Even,
>> Thanks for the detailed thought, and for the effort of reviewing your code.
>> I'm fiddling with setting up quite a big dataset - likely to have over
>> 1000 shapefiles in the VRT - maybe even up to 3000 - but I will
>> experiment and see what is both logical and practical.
>> My goal with the above questions is to try to avoid opening all the
>> shapefiles at the time the VRT is opened, so that there won't be a
>> "million and one" physical disk IOs.
>> If the user then loads my VRT with rendering off, it should load very
>> quickly (if I can supply all the details needed, in the VRT file).
>> Once the user has zoomed into his/her area of interest, and turns
>> rendering on for the VRT, then (hopefully) only the underlying
>> shapefiles in that AOI need to be physically accessed.
>>
>> So, how compatible is the current code when opening a VRT, to zeroing
>> the need to open any underlying VRT files before any rendering or other
>> operations are done by the user, and if the user is "zoomed in", to
>> limiting the underlying VRT file-actions only to those affected by the
>> current zoom level?
> Zoltan,
>
> You definitely need to define all fields (otherwise the VRT driver will open each
> file to compute the union of fields) or declare
> <FieldStrategy>FirstLayer</FieldStrategy, the geometry type, the global
> extent. And ultimately, you would also need to declare the extent per layer,
> once an extra optimization would be done in the union layer to avoid opening
> files whose declared extent doesn't intersect the area of interest declared by
> SetSpatialFilter().
> I can imagine an enhanced version of ogrbuildvrt, as a python script, that
> would retrieve all needed informations from source layers, could be usefull
> too.
> I can imagine also an interesting improvement: you could declare some option
> in the VRT saying that by default GetNextFeature() on the union layer should
> return nothing, except if the area of interest doesn't cover more than X
> source layers, so that when the VRT is zoomed out, one doesn't try to open
> thousands of files.
> If you're interested in some of those improvements, you can contact me.
> Perhaps some spatial indexing of the bounding boxes of the shapefiles would
> also help, instead of sequential iterations, but for a few thousands ones,
> that isn't probably necessary yet (spatial indices are generally interesting
> only starting with dozains of thousands of geometries).
>
>> ogrinfo -al -so gives a lot of info that could be added to the static
>> VRT file, but is it enough to stop QGIS's implementation of VRT from
>> physically querying the underlying files until absolutely necessary?
>>
>> Also, when a VRT opened, do you really need all the knowledge (like
>> featurecount) at this stage?
> I know QGIS asks that in some situations for example when displaying
> informations about layers when opening a multi-layer dataset. You coud likely
> put a dummy value, like -1, since I don't think QGIS uses that except for
> informative purposes.
>
>> One negative of me putting the featurecount into the VRT xml, is that
>> someone could change that shape file, and the actual feature count would
>> then differ from that in the xml file.
>>
>> So, probably to negate the direction I am hoping to go in (like putting
>> details into the VRT file so that opening the VRT would cause minimal
>> disk io), the correct way would be to optimise the QGIS code so that the
>> information about the underlying files is only read by QGIS when
>> absolutely necessary.
> Yes, there are perhaps optimizations possible on QGIS side as well. If you use
> GDAL trunk, compiled as a debug build, you can use the OGR C API spy
> mechanism. I've added recently to help debugging my improvements in the
> MapInfo driver when I spotted bugs in it when using QGIS. See
> http://www.gdal.org/ograpispy_8h.html
>
> Even
>
OK - some good thoughts there, thanks.
At the moment it is knowledge gaining and experimenting for me, but if I
ever need to deploy what I am trying to set up, I'll most certainly try
to influence any possible optimisation - by whatever method makes
everyone happy to get involved.
Regards & keep well,
Zoltan
--
===========================================
Zoltan Szecsei PrGISc [PGP0031]
Geograph (Pty) Ltd.
GIS and Photogrammetric Services
P.O. Box 7, Muizenberg 7950, South Africa.
Mobile: +27-83-6004028
Fax: +27-86-6115323 www.geograph.co.za
===========================================
More information about the Qgis-developer
mailing list