[Qgis-developer] VRT functionality

Zoltan Szecsei zoltans at geograph.co.za
Mon Oct 27 00:04:21 PDT 2014


On 2014/10/26 19:22, Even Rouault wrote:
> Le dimanche 26 octobre 2014 16:44:37, Zoltan Szecsei a écrit :
>> Hi,
>> I just want to clear up my mindset as to how a VRT is implemented in QGIS.
> Zoltan,
>
> In fact those are more OGR questions than QGIS questions. QGIS makes no
> difference when reading a plain shapefile (through OGR) or a VRT.
>
>> I'd like to understand when QGIS opens a file, when it reads the
>> contents, and when it writes (if need be) and closes a file.
>> In this context, I am thinking about SHP files - especially the NGI
>> dataset which comes out "cut up" into degree squares.
>> Let's just deal with 1 feature type: Rivers lines. My VRT looks like:
>> <OGRVRTDataSource>
>>     <OGRVRTUnionLayer name="Rivers">
>>     <OGRVRTLayer name="2730_RIVER_LINE_2006_06">
>>       <SrcDataSource
>> relativeToVRT="1">2730/2730_RIVER_LINE_2006_06.shp</SrcDataSource>
>>     </OGRVRTLayer>
>>     <OGRVRTLayer name="2731_RIVER_LINE_2006_04">
>>       <SrcDataSource
>> relativeToVRT="1">2731/2731_RIVER_LINE_2006_04.shp</SrcDataSource>
>>     </OGRVRTLayer>
>>     <OGRVRTLayer name="2732_RIVER_LINE_2006_04">
>>       <SrcDataSource
>> relativeToVRT="1">2732/2732_RIVER_LINE_2006_04.shp</SrcDataSource>
>>     </OGRVRTLayer>
>>     </OGRVRTUnionLayer>
>> </OGRVRTDataSource>
>>
>>    * When I open the VRT in QGIS, does QGIS open ALL the VRT files and
>>      look for the extent of each of the files?
> If QGIS issues a GetExtent() on the VRT, then with the above definition, it
> will query the 3 shapefiles to find the extent of each. But on shapefiles this is
> a fast operation.
> You could define <ExtentXMin>, etc... just besides OGRVRTUnionLayer if you
> really want fast GetExtent()
>
>>        o If my VRT had the extents included for each of the files, would
>>          this stop QGIS from (at this stage) opening the files and
>>          reading the extents?
> Yes, but QGIS probably asks GetFeatureCount(), so it will need to open each
> shapefile, unless you define <FeatureCount> as well.
> But QGIS will also asks the field definition, and will need to open each ....,
> unless you define <Field>
>
>>    * Before rendering the VRT, does QGIS look at the extents of my
>>      viewport and only physically open my files and render it's contents?
> QGIS will define SetSpatialFilter() on the layer with the extent, so as the
> layer can use a spatial index if it has one. Reviewing my code in VRT union
> layer, I can see that the spatial filter will be forwarded to each source
> layer. So it will need to open them, but the shapefile driver won't scan any
> feature if setting a spatial filter that does not intersect the extent of the
> shapefile, so that should be fast. A possible optimization could be done in the
> VRT union layer to take into account the extent of the source layer to avoid
> iterating on it if the spatial filter on th union layer doesn't intersect that
> extent.
>
> To be efficient, you likely need to compute .qix spatial index on each shapefile.
>
>
>>      In other words, if I first zoom into a known area, then open my VRT,
>>      will QGIS at this stage still open all the subfiles, instead of
>>      waiting until a specific subfile needs opening)
>>
>>    * As I pan around my map, does QGIS open and close the VRT subfiles
>>      that are out of my current viewing region?
> The VRT driver will maintain a pool of a maximum of 100 source layers by
> default (that number can be altered by setting the OGR_VRT_MAX_OPENED
> configuration option) and will close transparently the older ones
>
>>    * Presumably if any of my VRT subfiles touch or overlap my current
>>      viewport, they would be "processed" depending on what I am doing?
>>
>>    * Is there a way to structure a VRT file so that you can have access
>>      to the underlying files that make up the VRT? (Even edit access?)
> Not sure what you mean by "have access to". But a union VRT can be opened in
> update mode and the update mode will be forwareded to the source layers
> (provided they support it). You can delete or modify features. For creation of
> new features, you need to specify <SourceLayerFieldName> as documented in
> http://gdal.org/drv_vrt.html
>
>>
>> Or, is the VRT just any easy way to bunch a whole lot of maps under one
>> name, and there is no processing benefit depending on the area you are
>> viewing or working in?
> Your above VRT should work reasonably fast. Unless you have several hunderds
> or thousands of source layers. In which case, you may need to define more
> optional elements in the VRT to avoid the scans, and there would be perhaps a
> need for some enhancements in the OGRUnionLayer class.
>
> Even
>

Hi Even,
Thanks for the detailed thought, and for the effort of reviewing your code.
I'm fiddling with setting up quite a big dataset - likely to have over 
1000 shapefiles in the VRT - maybe even up to 3000 - but I will 
experiment and see what is both logical and practical.
My goal with the above questions is to try to avoid opening all the 
shapefiles at the time the VRT is opened, so that there won't be a 
"million and one" physical disk IOs.
If the user then loads my VRT with rendering off, it should load very 
quickly (if I can supply all the details needed, in the VRT file).
Once the user has zoomed into his/her area of interest, and turns 
rendering on for the VRT, then (hopefully) only the underlying 
shapefiles in that AOI need to be physically accessed.

So, how compatible is the current code when opening a VRT, to zeroing 
the need to open any underlying VRT files before any rendering or other 
operations are done by the user, and if the user is "zoomed in", to 
limiting the underlying VRT file-actions only to those affected by the 
current zoom level?

ogrinfo -al -so gives a lot of info that could be added to the static 
VRT file, but is it enough to stop QGIS's implementation of VRT from 
physically querying the underlying files until absolutely necessary?

Also, when a VRT opened, do you really need all the knowledge (like 
featurecount) at this stage?
One negative of me putting the featurecount into the VRT xml, is that 
someone could change that shape file, and the actual feature count would 
then differ from that in the xml file.

So, probably to negate the direction I am hoping to go in (like putting 
details into the VRT file so that opening the VRT would cause minimal 
disk io), the correct way would be to optimise the QGIS code so that the 
information about the underlying files is only read by QGIS when 
absolutely necessary.


Regards & thanks again for your interest.
Zoltan



-- 

===========================================
Zoltan Szecsei PrGISc [PGP0031]
Geograph (Pty) Ltd.
GIS and Photogrammetric Services

P.O. Box 7, Muizenberg 7950, South Africa.

Mobile: +27-83-6004028
Fax:    +27-86-6115323     www.geograph.co.za
===========================================



More information about the Qgis-developer mailing list