[gdal-dev] Does OAPIF paging work as supposed?

Rahkonen Jukka (MML) jukka.rahkonen at maanmittauslaitos.fi
Mon Sep 27 07:09:57 PDT 2021


At least in this case having INITIAL_REQUEST_PAGE_SIZE=number option would resolve the practical problem. We have collections with millions of features and therefore large page size is essential when downloading the whole collection. On the other hand we have complicated geometries like lake polygons and reading 10000 large geometries for resolving the schema is pretty heavy. And we have 126 collections in the service that makes 126 x 10000 features read for resolving the schemas if the aim is to clip a small area from all collections into GeoPackage.

-Jukka Rahkonen-

Lähettäjä: Even Rouault <even.rouault at spatialys.com>
Lähetetty: maanantai 27. syyskuuta 2021 16.47
Vastaanottaja: Rahkonen Jukka (MML) <jukka.rahkonen at maanmittauslaitos.fi>; 'gdal-dev at lists.osgeo.org' <gdal-dev at lists.osgeo.org>
Aihe: Re: [gdal-dev] Does OAPIF paging work as supposed?


your analysis is completely correct. Whether this is expected or not probably depends on situations. Should we have a INITIAL_REQUEST_PAGE_SIZE=number open option to overload the number of features to retrieve specifically in the first request... ??

Regarding the spatial filter, it is passed through the OGR API generally after having queried the schema, and for most OGR datasources it wouldn't influence the schema, so there isn't much that can be done here, except maybe adding a BBOX=west,south,east,north open option.

One option to avoid both issues would be for the service to publish DescribedBy links at the collection level that would point to a XML schema (using a GML Simple Feature schema profile, such as the one understood by the GML driver) or a JSON schema (not "too" complicated too). Both are handled by the driver.

Le 27/09/2021 à 15:21, Rahkonen Jukka (MML) a écrit :

I tried to read a relatively small BBOX from an OAPIF server but the process feels rather slow and I do not quite understand what I am seeing in the log.

ogr2ogr -f GPKG test.gpkg OAPIF:https://avoin-paikkatieto.maanmittauslaitos.fi/maastotiedot/features/v1/?api-key=xxxx -spat 25 65 25.1 65.1 -oo PAGE_SIZE=10000 --debug on --config cpl_curl_verbose yes

Excerpts from the log:

HTTP: Fetch(https://avoin-paikkatieto.maanmittauslaitos.fi/maastotiedot/features/v1/collections/osoitepiste/items?api-key=xxxx&f=json&limit=10000)
HTTP: These HTTP headers were set: Accept: application/geo+json, application/json
> GET /maastotiedot/features/v1/collections/osoitepiste/items?api-key=xxxx&f=json&limit=10000 HTTP/1.1
Host: avoin-paikkatieto.maanmittauslaitos.fi
Accept-Encoding: gzip
Accept: application/geo+json, application/json

* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
GeoJSON: First pass: 56.54 %
GeoJSON: First pass: 100.00 %
HTTP: Fetch(https://avoin-paikkatieto.maanmittauslaitos.fi/maastotiedot/features/v1/collections/osoitepiste/items?api-key=xxxx&f=json&limit=10000&bbox=25,65,25.1000000000000014,65.0999999999999943)
> GET /maastotiedot/features/v1/collections/osoitepiste/items?api-key=xxxx&f=json&limit=10000&bbox=25,65,25.1000000000000014,65.0999999999999943 HTTP/1.1
< Content-Length: 309
GDALVectorTranslate: 0 features written in layer 'osoitepiste'

Do I read right that GDAL is first reading one page, in this time 10000 features without BBOX, perhaps for resolving the schema, and then makes a new query with BBOX? In this case the BBOX query finds nothing. Reading 10000 features on the first round and then discarding everything feels too expensive. Could it be enough to read for example 10 features that is the default page size on the first round instead of the full page?

-Jukka Rahkonen-


gdal-dev mailing list

gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>




My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20210927/63028a63/attachment-0001.html>

More information about the gdal-dev mailing list