[gdal-dev] Large GPKG files in S3
Rahkonen Jukka
jukka.rahkonen at maanmittauslaitos.fi
Thu Sep 25 02:03:21 PDT 2025
Hi,
R*Tree is standard SQLite module, so maybe it is best to look at the SQLite documentation https://www.sqlite.org/rtree.html.
Finding Spatialite documents from the website is always difficult (but it is Open Source and anybody can offer their help).
Spatialite documents contain some background information and examples and they are worth reading. You could start from a historic https://www.gaia-gis.it/gaia-sins/spatialite-tutorial-2.3.1.html#t8.
The current documentation about using spatial index with Spatialite is https://www.gaia-gis.it/fossil/libspatialite/wiki?name=SpatialIndex, but it describes the usage of the alternative way to access R*Tree tables with VirtualSpatialIndex https://www.gaia-gis.it/gaia-sins/SpatialIndex-Update.pdf that works only with Spatialite databases. With GeoPackage R*Tree tables must be queried directly.
-Jukka Rahkonen-
________________________________________
Lähettäjä: gdal-dev <gdal-dev-bounces at lists.osgeo.org> käyttäjän Even Rouault via gdal-dev <gdal-dev at lists.osgeo.org> puolesta
Lähetetty: Torstai 25. syyskuuta 2025 3.01
Vastaanottaja: Tim Harris <trharris78 at gmail.com>; gdal dev <gdal-dev at lists.osgeo.org>
Aihe: Re: [gdal-dev] Large GPKG files in S3
Hi Tim,
yes you're running into a well known s/limitation/feature/ of spatial
SQLite (I'm pretty sure that the Spatialite website has one page about
that, but can't find it right now). The use of the ST_ functions doesn't
trigger the RTree automatically, and you need to query it explicitly by
joining with the rtree_{table_name}_{geom_column_name} virtual table.
For example the following completes quite quickly:
$ ogrinfo -ro -al -q
/vsicurl/https://storage.googleapis.com/open-geodata/linz-examples/nz-building-outlines.gpkg
-sql "SELECT * FROM nz_building_outlines p JOIN
rtree_nz_building_outlines_geom rt ON p.rowid = rt.id WHERE rt.minx <
1771150 and rt.maxx > 1771140 and rt.miny < 5894877 and rt.maxy >
5894870 and ST_Intersects(p.geom, ST_GeomFromText('POINT(1771141
5894876)')) = 1"
Layer name: SELECT
OGRFeature(SELECT):11471
building_id (Integer) = 4199821
name (String) =
use (String) = Unknown
suburb_locality (String) = Karaka
town_city (String) =
territorial_authority (String) = Auckland
capture_method (String) = Feature Extraction
capture_source_group (String) = NZ Aerial Imagery
capture_source_id (Integer) = 1014
capture_source_name (String) = Auckland 0.075m Urban Aerial Photos (2017)
capture_source_from (Date) = 2017/03/15
capture_source_to (Date) = 2017/05/06
last_modified (Date) = 2019/04/05
id (Integer64) = 11471
minx (Real) = 1771139.875
maxx (Real) = 1771152
miny (Real) = 5894865.5
maxy (Real) = 5894883
MULTIPOLYGON (((1771139.91348946 5894880.21926274,1771146.68318411
5894882.80032223,1771151.97657734 5894868.91662014,1771145.02123927
5894866.26478195,1771143.49962591 5894870.255725,1771142.39909482
5894869.83612922,1771140.01981817 5894876.07658206,1771141.30599197
5894876.56695731,1771139.91348946 5894880.21926274)))
Even
Le 25/09/2025 à 01:38, Tim Harris via gdal-dev a écrit :
> Hi, trying to do some queries of fairly large GPKG files sitting in S3
> in the hopes that spatial indexes and S3 range requests make it fast
> and efficient. But it seems that's not the case and I'm wondering if
> I'm doing something wrong.
>
> The examples below are just looking for a geometry at a point. The
> filename is fake but the geoms.gpkg contains a layer "geoms" with a
> bunch of polygons. It's about 1.7 GB in size.
>
> Something like this runs relatively fast, it makes about 17 HTTP
> requests and outputs the feature:
>
> ---
> ogrinfo -ro -json -features /vsis3/bucket/geoms.gpkg geoms -spat -105
> 40 -105 40
> ---
>
> But this, using -sql instead of -spat, sits forever without finishing.
> With CPL_CURL_VERBOSE it sort of looks like it's just scanning through
> the file a range request at a time:
>
> ---
> ogrinfo -ro -json -features /vsis3/bucket/geoms.gpkg geoms -sql
> "select * from geoms where ST_Intersects(geom,
> ST_GeomFromText('POINT(-105 40)', 4326)) = 1"
> ---
>
> I also tried replacing the ST_GeomFromText with other things
> like MakePoint(-105, 40, 4326) and ST_EnvIntersects(geom, -105, 40,
> -105, 40), just in case. It seems like -spat uses the spatial index
> efficiently but -sql doesn't. I also tried using them both, but having
> the -sql there seems to make it slow.
>
> I also tried to add "explain query plan" to the sql statement:
>
> ---
> ogrinfo -ro
> /vsis3/prod-tilerepo-va2/metadata/20250920T041443/geoms.gpkg geoms
> -sql "EXPLAIN QUERY PLAN select * from geoms where ST_Intersects(geom,
> ST_GeomFromText('POINT(-105 40)', 4326)) = 1"
> ---
>
> That only sort of confirms the scan, and reports "SCAN geoms" in the
> output. This page, https://sqlite.org/eqp.html, explains the "explain
> query plan" output and I was hoping to see a "SEARCH geoms USING INDEX
> ___" of some sort.
>
> Is there something about the use of -sql that precludes it from using
> the spatial index? Maybe we need to do an ogr2ogr with just -spat to
> narrow it down, then apply a -sql to that result if we need additional
> filtering?
>
> Thanks
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
--
http://www.spatialys.com/
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
gdal-dev at lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev
More information about the gdal-dev
mailing list