[Qgis-developer] VRT functionality

Even Rouault even.rouault at spatialys.com
Mon Oct 27 01:38:55 PDT 2014


Le lundi 27 octobre 2014 08:04:21, Zoltan Szecsei a écrit :
> On 2014/10/26 19:22, Even Rouault wrote:
> > Le dimanche 26 octobre 2014 16:44:37, Zoltan Szecsei a écrit :
> >> Hi,
> >> I just want to clear up my mindset as to how a VRT is implemented in
> >> QGIS.
> > 
> > Zoltan,
> > 
> > In fact those are more OGR questions than QGIS questions. QGIS makes no
> > difference when reading a plain shapefile (through OGR) or a VRT.
> > 
> >> I'd like to understand when QGIS opens a file, when it reads the
> >> contents, and when it writes (if need be) and closes a file.
> >> In this context, I am thinking about SHP files - especially the NGI
> >> dataset which comes out "cut up" into degree squares.
> >> Let's just deal with 1 feature type: Rivers lines. My VRT looks like:
> >> <OGRVRTDataSource>
> >> 
> >>     <OGRVRTUnionLayer name="Rivers">
> >>     <OGRVRTLayer name="2730_RIVER_LINE_2006_06">
> >>     
> >>       <SrcDataSource
> >> 
> >> relativeToVRT="1">2730/2730_RIVER_LINE_2006_06.shp</SrcDataSource>
> >> 
> >>     </OGRVRTLayer>
> >>     <OGRVRTLayer name="2731_RIVER_LINE_2006_04">
> >>     
> >>       <SrcDataSource
> >> 
> >> relativeToVRT="1">2731/2731_RIVER_LINE_2006_04.shp</SrcDataSource>
> >> 
> >>     </OGRVRTLayer>
> >>     <OGRVRTLayer name="2732_RIVER_LINE_2006_04">
> >>     
> >>       <SrcDataSource
> >> 
> >> relativeToVRT="1">2732/2732_RIVER_LINE_2006_04.shp</SrcDataSource>
> >> 
> >>     </OGRVRTLayer>
> >>     </OGRVRTUnionLayer>
> >> 
> >> </OGRVRTDataSource>
> >> 
> >>    * When I open the VRT in QGIS, does QGIS open ALL the VRT files and
> >>    
> >>      look for the extent of each of the files?
> > 
> > If QGIS issues a GetExtent() on the VRT, then with the above definition,
> > it will query the 3 shapefiles to find the extent of each. But on
> > shapefiles this is a fast operation.
> > You could define <ExtentXMin>, etc... just besides OGRVRTUnionLayer if
> > you really want fast GetExtent()
> > 
> >>        o If my VRT had the extents included for each of the files, would
> >>        
> >>          this stop QGIS from (at this stage) opening the files and
> >>          reading the extents?
> > 
> > Yes, but QGIS probably asks GetFeatureCount(), so it will need to open
> > each shapefile, unless you define <FeatureCount> as well.
> > But QGIS will also asks the field definition, and will need to open each
> > ...., unless you define <Field>
> > 
> >>    * Before rendering the VRT, does QGIS look at the extents of my
> >>    
> >>      viewport and only physically open my files and render it's
> >>      contents?
> > 
> > QGIS will define SetSpatialFilter() on the layer with the extent, so as
> > the layer can use a spatial index if it has one. Reviewing my code in
> > VRT union layer, I can see that the spatial filter will be forwarded to
> > each source layer. So it will need to open them, but the shapefile
> > driver won't scan any feature if setting a spatial filter that does not
> > intersect the extent of the shapefile, so that should be fast. A
> > possible optimization could be done in the VRT union layer to take into
> > account the extent of the source layer to avoid iterating on it if the
> > spatial filter on th union layer doesn't intersect that extent.
> > 
> > To be efficient, you likely need to compute .qix spatial index on each
> > shapefile.
> > 
> >>      In other words, if I first zoom into a known area, then open my
> >>      VRT, will QGIS at this stage still open all the subfiles, instead
> >>      of waiting until a specific subfile needs opening)
> >>    
> >>    * As I pan around my map, does QGIS open and close the VRT subfiles
> >>    
> >>      that are out of my current viewing region?
> > 
> > The VRT driver will maintain a pool of a maximum of 100 source layers by
> > default (that number can be altered by setting the OGR_VRT_MAX_OPENED
> > configuration option) and will close transparently the older ones
> > 
> >>    * Presumably if any of my VRT subfiles touch or overlap my current
> >>    
> >>      viewport, they would be "processed" depending on what I am doing?
> >>    
> >>    * Is there a way to structure a VRT file so that you can have access
> >>    
> >>      to the underlying files that make up the VRT? (Even edit access?)
> > 
> > Not sure what you mean by "have access to". But a union VRT can be opened
> > in update mode and the update mode will be forwareded to the source
> > layers (provided they support it). You can delete or modify features.
> > For creation of new features, you need to specify <SourceLayerFieldName>
> > as documented in http://gdal.org/drv_vrt.html
> > 
> >> Or, is the VRT just any easy way to bunch a whole lot of maps under one
> >> name, and there is no processing benefit depending on the area you are
> >> viewing or working in?
> > 
> > Your above VRT should work reasonably fast. Unless you have several
> > hunderds or thousands of source layers. In which case, you may need to
> > define more optional elements in the VRT to avoid the scans, and there
> > would be perhaps a need for some enhancements in the OGRUnionLayer
> > class.
> > 
> > Even
> 
> Hi Even,
> Thanks for the detailed thought, and for the effort of reviewing your code.
> I'm fiddling with setting up quite a big dataset - likely to have over
> 1000 shapefiles in the VRT - maybe even up to 3000 - but I will
> experiment and see what is both logical and practical.
> My goal with the above questions is to try to avoid opening all the
> shapefiles at the time the VRT is opened, so that there won't be a
> "million and one" physical disk IOs.
> If the user then loads my VRT with rendering off, it should load very
> quickly (if I can supply all the details needed, in the VRT file).
> Once the user has zoomed into his/her area of interest, and turns
> rendering on for the VRT, then (hopefully) only the underlying
> shapefiles in that AOI need to be physically accessed.
> 
> So, how compatible is the current code when opening a VRT, to zeroing
> the need to open any underlying VRT files before any rendering or other
> operations are done by the user, and if the user is "zoomed in", to
> limiting the underlying VRT file-actions only to those affected by the
> current zoom level?

Zoltan,

You definitely need to define all fields (otherwise the VRT driver will open each 
file to compute the union of fields) or declare 
<FieldStrategy>FirstLayer</FieldStrategy, the geometry type, the global 
extent. And ultimately, you would also need to declare the extent per layer, 
once an extra optimization would be done in the union layer to avoid opening 
files whose declared extent doesn't intersect the area of interest declared by 
SetSpatialFilter().
I can imagine an enhanced version of ogrbuildvrt, as a python script, that 
would retrieve all needed informations from source layers, could be usefull 
too.
I can imagine also an interesting improvement: you could declare some option 
in the VRT saying that by default GetNextFeature() on the union layer should 
return nothing, except if the area of interest doesn't cover more than X 
source layers, so that when the VRT is zoomed out, one doesn't try to open 
thousands of files.
If you're interested in some of those improvements, you can contact me. 
Perhaps some spatial indexing of the bounding boxes of the shapefiles would 
also help, instead of sequential iterations, but for a few thousands ones, 
that isn't probably necessary yet (spatial indices are generally interesting 
only starting with dozains of thousands of geometries).

> 
> ogrinfo -al -so gives a lot of info that could be added to the static
> VRT file, but is it enough to stop QGIS's implementation of VRT from
> physically querying the underlying files until absolutely necessary?
> 
> Also, when a VRT opened, do you really need all the knowledge (like
> featurecount) at this stage?

I know QGIS asks that in some situations for example when displaying 
informations about layers when opening a multi-layer dataset. You coud likely 
put a dummy value, like -1, since I don't think QGIS uses that except for 
informative purposes.

> One negative of me putting the featurecount into the VRT xml, is that
> someone could change that shape file, and the actual feature count would
> then differ from that in the xml file.
> 
> So, probably to negate the direction I am hoping to go in (like putting
> details into the VRT file so that opening the VRT would cause minimal
> disk io), the correct way would be to optimise the QGIS code so that the
> information about the underlying files is only read by QGIS when
> absolutely necessary.

Yes, there are perhaps optimizations possible on QGIS side as well. If you use 
GDAL trunk, compiled as a debug build, you can use the OGR C API spy 
mechanism. I've added recently to help debugging my improvements in the 
MapInfo driver when I spotted bugs in it when using QGIS. See 
http://www.gdal.org/ograpispy_8h.html

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the Qgis-developer mailing list