[gdal-dev] Open option for vectors in the cloud

Even Rouault even.rouault at spatialys.com
Thu Oct 24 11:05:00 PDT 2019


On jeudi 24 octobre 2019 17:42:23 CEST Rahkonen Jukka (MML) wrote:
> Hi,
> 
> I was experimenting with accessing some vector files through http (same data
> as FlatGeoBuffers, GeoPackage, and shapefile). The file size in each format
> was about 850 MB and the amount of data was about 240000 linestrings. I
> made ogrinfo request with spatial filter that selects one feature and
> cheched the number of http requests and amount of requested data.
> 
> FlatGeoBuffers
> 19 http requests
> 33046509 bytes read

Looking at the debug log, FlatGeoBuf currently loads the whole index-of-
features array( "Reading feature offsets index" ), which accounts for 32.7 MB 
of the above 33 MB. This could probably be avoided by only loading the offsets 
of the selected features. The shapefile driver a few years ago had the same 
issue and this was fixed by initializing the offset array to zeroes, and load 
on demand the offsets when needed.

> If somebody
> really finds a use case for reading vector data from the web it seems
> obvious that having a possibility to cache and re-use the spatial index
> would be very beneficial. I can imagine that with shapefile it would mean 
> downloading the .qix file, with GeoPackage reading the contents of the
> rtree index table, and with FlatGeoBuffers probably extracting the Static
> packed Hilbert R-tree index.

A general caching logic in /vsicurl/ would be preferable (although the 
download of the 'data' part of files might potentially evict the indexes, but 
having a dedicated logic in each driver to tell which files / region of the 
files should be cached would be a bit annoying). Basically doing a HEAD 
request on the file to get its last update date, and have a local cache of 
downloaded pieces would be a more general solution.

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the gdal-dev mailing list