[gdal-dev] Open option for vectors in the cloud
Even Rouault
even.rouault at spatialys.com
Thu Oct 24 11:05:00 PDT 2019
On jeudi 24 octobre 2019 17:42:23 CEST Rahkonen Jukka (MML) wrote:
> Hi,
>
> I was experimenting with accessing some vector files through http (same data
> as FlatGeoBuffers, GeoPackage, and shapefile). The file size in each format
> was about 850 MB and the amount of data was about 240000 linestrings. I
> made ogrinfo request with spatial filter that selects one feature and
> cheched the number of http requests and amount of requested data.
>
> FlatGeoBuffers
> 19 http requests
> 33046509 bytes read
Looking at the debug log, FlatGeoBuf currently loads the whole index-of-
features array( "Reading feature offsets index" ), which accounts for 32.7 MB
of the above 33 MB. This could probably be avoided by only loading the offsets
of the selected features. The shapefile driver a few years ago had the same
issue and this was fixed by initializing the offset array to zeroes, and load
on demand the offsets when needed.
> If somebody
> really finds a use case for reading vector data from the web it seems
> obvious that having a possibility to cache and re-use the spatial index
> would be very beneficial. I can imagine that with shapefile it would mean
> downloading the .qix file, with GeoPackage reading the contents of the
> rtree index table, and with FlatGeoBuffers probably extracting the Static
> packed Hilbert R-tree index.
A general caching logic in /vsicurl/ would be preferable (although the
download of the 'data' part of files might potentially evict the indexes, but
having a dedicated logic in each driver to tell which files / region of the
files should be cached would be a bit annoying). Basically doing a HEAD
request on the file to get its last update date, and have a local cache of
downloaded pieces would be a more general solution.
Even
--
Spatialys - Geospatial professional services
http://www.spatialys.com
More information about the gdal-dev
mailing list