[gdal-dev] Large GPKG files in S3
Even Rouault
even.rouault at spatialys.com
Wed Sep 24 17:01:19 PDT 2025
Hi Tim,
yes you're running into a well known s/limitation/feature/ of spatial
SQLite (I'm pretty sure that the Spatialite website has one page about
that, but can't find it right now). The use of the ST_ functions doesn't
trigger the RTree automatically, and you need to query it explicitly by
joining with the rtree_{table_name}_{geom_column_name} virtual table.
For example the following completes quite quickly:
$ ogrinfo -ro -al -q
/vsicurl/https://storage.googleapis.com/open-geodata/linz-examples/nz-building-outlines.gpkg
-sql "SELECT * FROM nz_building_outlines p JOIN
rtree_nz_building_outlines_geom rt ON p.rowid = rt.id WHERE rt.minx <
1771150 and rt.maxx > 1771140 and rt.miny < 5894877 and rt.maxy >
5894870 and ST_Intersects(p.geom, ST_GeomFromText('POINT(1771141
5894876)')) = 1"
Layer name: SELECT
OGRFeature(SELECT):11471
building_id (Integer) = 4199821
name (String) =
use (String) = Unknown
suburb_locality (String) = Karaka
town_city (String) =
territorial_authority (String) = Auckland
capture_method (String) = Feature Extraction
capture_source_group (String) = NZ Aerial Imagery
capture_source_id (Integer) = 1014
capture_source_name (String) = Auckland 0.075m Urban Aerial Photos (2017)
capture_source_from (Date) = 2017/03/15
capture_source_to (Date) = 2017/05/06
last_modified (Date) = 2019/04/05
id (Integer64) = 11471
minx (Real) = 1771139.875
maxx (Real) = 1771152
miny (Real) = 5894865.5
maxy (Real) = 5894883
MULTIPOLYGON (((1771139.91348946 5894880.21926274,1771146.68318411
5894882.80032223,1771151.97657734 5894868.91662014,1771145.02123927
5894866.26478195,1771143.49962591 5894870.255725,1771142.39909482
5894869.83612922,1771140.01981817 5894876.07658206,1771141.30599197
5894876.56695731,1771139.91348946 5894880.21926274)))
Even
Le 25/09/2025 à 01:38, Tim Harris via gdal-dev a écrit :
> Hi, trying to do some queries of fairly large GPKG files sitting in S3
> in the hopes that spatial indexes and S3 range requests make it fast
> and efficient. But it seems that's not the case and I'm wondering if
> I'm doing something wrong.
>
> The examples below are just looking for a geometry at a point. The
> filename is fake but the geoms.gpkg contains a layer "geoms" with a
> bunch of polygons. It's about 1.7 GB in size.
>
> Something like this runs relatively fast, it makes about 17 HTTP
> requests and outputs the feature:
>
> ---
> ogrinfo -ro -json -features /vsis3/bucket/geoms.gpkg geoms -spat -105
> 40 -105 40
> ---
>
> But this, using -sql instead of -spat, sits forever without finishing.
> With CPL_CURL_VERBOSE it sort of looks like it's just scanning through
> the file a range request at a time:
>
> ---
> ogrinfo -ro -json -features /vsis3/bucket/geoms.gpkg geoms -sql
> "select * from geoms where ST_Intersects(geom,
> ST_GeomFromText('POINT(-105 40)', 4326)) = 1"
> ---
>
> I also tried replacing the ST_GeomFromText with other things
> like MakePoint(-105, 40, 4326) and ST_EnvIntersects(geom, -105, 40,
> -105, 40), just in case. It seems like -spat uses the spatial index
> efficiently but -sql doesn't. I also tried using them both, but having
> the -sql there seems to make it slow.
>
> I also tried to add "explain query plan" to the sql statement:
>
> ---
> ogrinfo -ro
> /vsis3/prod-tilerepo-va2/metadata/20250920T041443/geoms.gpkg geoms
> -sql "EXPLAIN QUERY PLAN select * from geoms where ST_Intersects(geom,
> ST_GeomFromText('POINT(-105 40)', 4326)) = 1"
> ---
>
> That only sort of confirms the scan, and reports "SCAN geoms" in the
> output. This page, https://sqlite.org/eqp.html, explains the "explain
> query plan" output and I was hoping to see a "SEARCH geoms USING INDEX
> ___" of some sort.
>
> Is there something about the use of -sql that precludes it from using
> the spatial index? Maybe we need to do an ogr2ogr with just -spat to
> narrow it down, then apply a -sql to that result if we need additional
> filtering?
>
> Thanks
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
--
http://www.spatialys.com
My software is free, but my time generally not.
More information about the gdal-dev
mailing list