[gdal-dev] Open option for vectors in the cloud

Björn Harrtell bjorn.harrtell at gmail.com
Wed Oct 30 18:17:27 PDT 2019


Thanks for trying out accessing FlatGeobuf via http.

For the record I've been slightly aware of this particular efficiency
problem and I aim to improve it when I can get to it, because this is a use
case I definitely want FlatGeobuf to grab the first place. :)

/Björn

Den tors 24 okt. 2019 kl 20:05 skrev Even Rouault <
even.rouault at spatialys.com>:

> On jeudi 24 octobre 2019 17:42:23 CEST Rahkonen Jukka (MML) wrote:
> > Hi,
> >
> > I was experimenting with accessing some vector files through http (same
> data
> > as FlatGeoBuffers, GeoPackage, and shapefile). The file size in each
> format
> > was about 850 MB and the amount of data was about 240000 linestrings. I
> > made ogrinfo request with spatial filter that selects one feature and
> > cheched the number of http requests and amount of requested data.
> >
> > FlatGeoBuffers
> > 19 http requests
> > 33046509 bytes read
>
> Looking at the debug log, FlatGeoBuf currently loads the whole index-of-
> features array( "Reading feature offsets index" ), which accounts for 32.7
> MB
> of the above 33 MB. This could probably be avoided by only loading the
> offsets
> of the selected features. The shapefile driver a few years ago had the
> same
> issue and this was fixed by initializing the offset array to zeroes, and
> load
> on demand the offsets when needed.
>
> > If somebody
> > really finds a use case for reading vector data from the web it seems
> > obvious that having a possibility to cache and re-use the spatial index
> > would be very beneficial. I can imagine that with shapefile it would
> mean
> > downloading the .qix file, with GeoPackage reading the contents of the
> > rtree index table, and with FlatGeoBuffers probably extracting the Static
> > packed Hilbert R-tree index.
>
> A general caching logic in /vsicurl/ would be preferable (although the
> download of the 'data' part of files might potentially evict the indexes,
> but
> having a dedicated logic in each driver to tell which files / region of
> the
> files should be cached would be a bit annoying). Basically doing a HEAD
> request on the file to get its last update date, and have a local cache of
> downloaded pieces would be a more general solution.
>
> Even
>
> --
> Spatialys - Geospatial professional services
> http://www.spatialys.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20191031/557a5ebf/attachment.html>


More information about the gdal-dev mailing list