<div dir="ltr">Thank you both, that's exactly what I was looking for.</div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Thu, Sep 25, 2025 at 3:03 AM Rahkonen Jukka <<a href="mailto:jukka.rahkonen@maanmittauslaitos.fi">jukka.rahkonen@maanmittauslaitos.fi</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>
<br>
R*Tree is standard SQLite module, so maybe it is best to look at the SQLite documentation <a href="https://www.sqlite.org/rtree.html" rel="noreferrer" target="_blank">https://www.sqlite.org/rtree.html</a>.<br>
Finding Spatialite documents from the website is always difficult (but it is Open Source and anybody can offer their help).<br>
Spatialite documents contain some background information and examples and they are worth reading. You could start from a historic <a href="https://www.gaia-gis.it/gaia-sins/spatialite-tutorial-2.3.1.html#t8" rel="noreferrer" target="_blank">https://www.gaia-gis.it/gaia-sins/spatialite-tutorial-2.3.1.html#t8</a>.<br>
The current documentation about using spatial index with Spatialite is <a href="https://www.gaia-gis.it/fossil/libspatialite/wiki?name=SpatialIndex" rel="noreferrer" target="_blank">https://www.gaia-gis.it/fossil/libspatialite/wiki?name=SpatialIndex</a>, but it describes the usage of the alternative way to access R*Tree tables with VirtualSpatialIndex <a href="https://www.gaia-gis.it/gaia-sins/SpatialIndex-Update.pdf" rel="noreferrer" target="_blank">https://www.gaia-gis.it/gaia-sins/SpatialIndex-Update.pdf</a> that works only with Spatialite databases. With GeoPackage R*Tree tables must be queried directly.<br>
<br>
-Jukka Rahkonen-<br>
<br>
<br>
________________________________________<br>
Lähettäjä: gdal-dev <<a href="mailto:gdal-dev-bounces@lists.osgeo.org" target="_blank">gdal-dev-bounces@lists.osgeo.org</a>> käyttäjän Even Rouault via gdal-dev <<a href="mailto:gdal-dev@lists.osgeo.org" target="_blank">gdal-dev@lists.osgeo.org</a>> puolesta<br>
Lähetetty: Torstai 25. syyskuuta 2025 3.01<br>
Vastaanottaja: Tim Harris <<a href="mailto:trharris78@gmail.com" target="_blank">trharris78@gmail.com</a>>; gdal dev <<a href="mailto:gdal-dev@lists.osgeo.org" target="_blank">gdal-dev@lists.osgeo.org</a>><br>
Aihe: Re: [gdal-dev] Large GPKG files in S3<br>
<br>
<br>
Hi Tim,<br>
<br>
yes you're running into a well known s/limitation/feature/ of spatial<br>
SQLite (I'm pretty sure that the Spatialite website has one page about<br>
that, but can't find it right now). The use of the ST_ functions doesn't<br>
trigger the RTree automatically, and you need to query it explicitly by<br>
joining with the rtree_{table_name}_{geom_column_name} virtual table.<br>
<br>
For example the following completes quite quickly:<br>
<br>
$ ogrinfo -ro -al -q<br>
/vsicurl/<a href="https://storage.googleapis.com/open-geodata/linz-examples/nz-building-outlines.gpkg-sql" rel="noreferrer" target="_blank">https://storage.googleapis.com/open-geodata/linz-examples/nz-building-outlines.gpkg<br>
-sql</a> "SELECT * FROM nz_building_outlines p JOIN<br>
rtree_nz_building_outlines_geom rt ON p.rowid = <a href="http://rt.id" rel="noreferrer" target="_blank">rt.id</a> WHERE rt.minx <<br>
1771150 and rt.maxx > 1771140 and rt.miny < 5894877 and rt.maxy ><br>
5894870 and ST_Intersects(p.geom, ST_GeomFromText('POINT(1771141<br>
5894876)')) = 1"<br>
<br>
Layer name: SELECT<br>
OGRFeature(SELECT):11471<br>
building_id (Integer) = 4199821<br>
name (String) =<br>
use (String) = Unknown<br>
suburb_locality (String) = Karaka<br>
town_city (String) =<br>
territorial_authority (String) = Auckland<br>
capture_method (String) = Feature Extraction<br>
capture_source_group (String) = NZ Aerial Imagery<br>
capture_source_id (Integer) = 1014<br>
capture_source_name (String) = Auckland 0.075m Urban Aerial Photos (2017)<br>
capture_source_from (Date) = 2017/03/15<br>
capture_source_to (Date) = 2017/05/06<br>
last_modified (Date) = 2019/04/05<br>
id (Integer64) = 11471<br>
minx (Real) = 1771139.875<br>
maxx (Real) = 1771152<br>
miny (Real) = 5894865.5<br>
maxy (Real) = 5894883<br>
MULTIPOLYGON (((1771139.91348946 5894880.21926274,1771146.68318411<br>
5894882.80032223,1771151.97657734 5894868.91662014,1771145.02123927<br>
5894866.26478195,1771143.49962591 5894870.255725,1771142.39909482<br>
5894869.83612922,1771140.01981817 5894876.07658206,1771141.30599197<br>
5894876.56695731,1771139.91348946 5894880.21926274)))<br>
<br>
<br>
Even<br>
<br>
Le 25/09/2025 à 01:38, Tim Harris via gdal-dev a écrit :<br>
> Hi, trying to do some queries of fairly large GPKG files sitting in S3<br>
> in the hopes that spatial indexes and S3 range requests make it fast<br>
> and efficient. But it seems that's not the case and I'm wondering if<br>
> I'm doing something wrong.<br>
><br>
> The examples below are just looking for a geometry at a point. The<br>
> filename is fake but the geoms.gpkg contains a layer "geoms" with a<br>
> bunch of polygons. It's about 1.7 GB in size.<br>
><br>
> Something like this runs relatively fast, it makes about 17 HTTP<br>
> requests and outputs the feature:<br>
><br>
> ---<br>
> ogrinfo -ro -json -features /vsis3/bucket/geoms.gpkg geoms -spat -105<br>
> 40 -105 40<br>
> ---<br>
><br>
> But this, using -sql instead of -spat, sits forever without finishing.<br>
> With CPL_CURL_VERBOSE it sort of looks like it's just scanning through<br>
> the file a range request at a time:<br>
><br>
> ---<br>
> ogrinfo -ro -json -features /vsis3/bucket/geoms.gpkg geoms -sql<br>
> "select * from geoms where ST_Intersects(geom,<br>
> ST_GeomFromText('POINT(-105 40)', 4326)) = 1"<br>
> ---<br>
><br>
> I also tried replacing the ST_GeomFromText with other things<br>
> like MakePoint(-105, 40, 4326) and ST_EnvIntersects(geom, -105, 40,<br>
> -105, 40), just in case. It seems like -spat uses the spatial index<br>
> efficiently but -sql doesn't. I also tried using them both, but having<br>
> the -sql there seems to make it slow.<br>
><br>
> I also tried to add "explain query plan" to the sql statement:<br>
><br>
> ---<br>
> ogrinfo -ro<br>
> /vsis3/prod-tilerepo-va2/metadata/20250920T041443/geoms.gpkg geoms<br>
> -sql "EXPLAIN QUERY PLAN select * from geoms where ST_Intersects(geom,<br>
> ST_GeomFromText('POINT(-105 40)', 4326)) = 1"<br>
> ---<br>
><br>
> That only sort of confirms the scan, and reports "SCAN geoms" in the<br>
> output. This page, <a href="https://sqlite.org/eqp.html" rel="noreferrer" target="_blank">https://sqlite.org/eqp.html</a>, explains the "explain<br>
> query plan" output and I was hoping to see a "SEARCH geoms USING INDEX<br>
> ___" of some sort.<br>
><br>
> Is there something about the use of -sql that precludes it from using<br>
> the spatial index? Maybe we need to do an ogr2ogr with just -spat to<br>
> narrow it down, then apply a -sql to that result if we need additional<br>
> filtering?<br>
><br>
> Thanks<br>
><br>
> _______________________________________________<br>
> gdal-dev mailing list<br>
> <a href="mailto:gdal-dev@lists.osgeo.org" target="_blank">gdal-dev@lists.osgeo.org</a><br>
> <a href="https://lists.osgeo.org/mailman/listinfo/gdal-dev" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a><br>
<br>
--<br>
<a href="http://www.spatialys.com/" rel="noreferrer" target="_blank">http://www.spatialys.com/</a><br>
My software is free, but my time generally not.<br>
<br>
_______________________________________________<br>
gdal-dev mailing list<br>
<a href="mailto:gdal-dev@lists.osgeo.org" target="_blank">gdal-dev@lists.osgeo.org</a><br>
<a href="https://lists.osgeo.org/mailman/listinfo/gdal-dev" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a><br>
<br>
<br>
</blockquote></div>