[gdal-dev] Large GPKG files in S3

Even Rouault even.rouault at spatialys.com
Wed Sep 24 17:01:19 PDT 2025


Hi Tim,

yes you're running into a well known s/limitation/feature/ of spatial 
SQLite (I'm pretty sure that the Spatialite website has one page about 
that, but can't find it right now). The use of the ST_ functions doesn't 
trigger the RTree automatically, and you need to query it explicitly by 
joining with the rtree_{table_name}_{geom_column_name} virtual table.

For example the following completes quite quickly:

$ ogrinfo -ro -al -q 
/vsicurl/https://storage.googleapis.com/open-geodata/linz-examples/nz-building-outlines.gpkg 
-sql "SELECT * FROM nz_building_outlines p JOIN 
rtree_nz_building_outlines_geom rt ON p.rowid = rt.id WHERE rt.minx < 
1771150 and rt.maxx > 1771140 and rt.miny < 5894877 and rt.maxy > 
5894870 and ST_Intersects(p.geom, ST_GeomFromText('POINT(1771141 
5894876)')) = 1"

Layer name: SELECT
OGRFeature(SELECT):11471
   building_id (Integer) = 4199821
   name (String) =
   use (String) = Unknown
   suburb_locality (String) = Karaka
   town_city (String) =
   territorial_authority (String) = Auckland
   capture_method (String) = Feature Extraction
   capture_source_group (String) = NZ Aerial Imagery
   capture_source_id (Integer) = 1014
   capture_source_name (String) = Auckland 0.075m Urban Aerial Photos (2017)
   capture_source_from (Date) = 2017/03/15
   capture_source_to (Date) = 2017/05/06
   last_modified (Date) = 2019/04/05
   id (Integer64) = 11471
   minx (Real) = 1771139.875
   maxx (Real) = 1771152
   miny (Real) = 5894865.5
   maxy (Real) = 5894883
   MULTIPOLYGON (((1771139.91348946 5894880.21926274,1771146.68318411 
5894882.80032223,1771151.97657734 5894868.91662014,1771145.02123927 
5894866.26478195,1771143.49962591 5894870.255725,1771142.39909482 
5894869.83612922,1771140.01981817 5894876.07658206,1771141.30599197 
5894876.56695731,1771139.91348946 5894880.21926274)))


Even

Le 25/09/2025 à 01:38, Tim Harris via gdal-dev a écrit :
> Hi, trying to do some queries of fairly large GPKG files sitting in S3 
> in the hopes that spatial indexes and S3 range requests make it fast 
> and efficient. But it seems that's not the case and I'm wondering if 
> I'm doing something wrong.
>
> The examples below are just looking for a geometry at a point. The 
> filename is fake but the geoms.gpkg contains a layer "geoms" with a 
> bunch of polygons. It's about 1.7 GB in size.
>
> Something like this runs relatively fast, it makes about 17 HTTP 
> requests and outputs the feature:
>
> ---
> ogrinfo -ro -json -features /vsis3/bucket/geoms.gpkg geoms -spat -105 
> 40 -105 40
> ---
>
> But this, using -sql instead of -spat, sits forever without finishing. 
> With CPL_CURL_VERBOSE it sort of looks like it's just scanning through 
> the file a range request at a time:
>
> ---
> ogrinfo -ro -json -features /vsis3/bucket/geoms.gpkg geoms -sql 
> "select * from geoms where ST_Intersects(geom, 
> ST_GeomFromText('POINT(-105 40)', 4326)) = 1"
> ---
>
> I also tried replacing the ST_GeomFromText with other things 
> like MakePoint(-105, 40, 4326) and ST_EnvIntersects(geom, -105, 40, 
> -105, 40), just in case. It seems like -spat uses the spatial index 
> efficiently but -sql doesn't. I also tried using them both, but having 
> the -sql there seems to make it slow.
>
> I also tried to add "explain query plan" to the sql statement:
>
> ---
> ogrinfo -ro 
> /vsis3/prod-tilerepo-va2/metadata/20250920T041443/geoms.gpkg geoms 
> -sql "EXPLAIN QUERY PLAN select * from geoms where ST_Intersects(geom, 
> ST_GeomFromText('POINT(-105 40)', 4326)) = 1"
> ---
>
> That only sort of confirms the scan, and reports "SCAN geoms" in the 
> output. This page, https://sqlite.org/eqp.html, explains the "explain 
> query plan" output and I was hoping to see a "SEARCH geoms USING INDEX 
> ___" of some sort.
>
> Is there something about the use of -sql that precludes it from using 
> the spatial index? Maybe we need to do an ogr2ogr with just -spat to 
> narrow it down, then apply a -sql to that result if we need additional 
> filtering?
>
> Thanks
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev

-- 
http://www.spatialys.com
My software is free, but my time generally not.



More information about the gdal-dev mailing list