[gdal-dev] Large GPKG files in S3
Tim Harris
trharris78 at gmail.com
Thu Sep 25 07:09:13 PDT 2025
Thank you both, that's exactly what I was looking for.
On Thu, Sep 25, 2025 at 3:03 AM Rahkonen Jukka <
jukka.rahkonen at maanmittauslaitos.fi> wrote:
> Hi,
>
> R*Tree is standard SQLite module, so maybe it is best to look at the
> SQLite documentation https://www.sqlite.org/rtree.html.
> Finding Spatialite documents from the website is always difficult (but it
> is Open Source and anybody can offer their help).
> Spatialite documents contain some background information and examples and
> they are worth reading. You could start from a historic
> https://www.gaia-gis.it/gaia-sins/spatialite-tutorial-2.3.1.html#t8.
> The current documentation about using spatial index with Spatialite is
> https://www.gaia-gis.it/fossil/libspatialite/wiki?name=SpatialIndex, but
> it describes the usage of the alternative way to access R*Tree tables with
> VirtualSpatialIndex
> https://www.gaia-gis.it/gaia-sins/SpatialIndex-Update.pdf that works only
> with Spatialite databases. With GeoPackage R*Tree tables must be queried
> directly.
>
> -Jukka Rahkonen-
>
>
> ________________________________________
> Lähettäjä: gdal-dev <gdal-dev-bounces at lists.osgeo.org> käyttäjän Even
> Rouault via gdal-dev <gdal-dev at lists.osgeo.org> puolesta
> Lähetetty: Torstai 25. syyskuuta 2025 3.01
> Vastaanottaja: Tim Harris <trharris78 at gmail.com>; gdal dev <
> gdal-dev at lists.osgeo.org>
> Aihe: Re: [gdal-dev] Large GPKG files in S3
>
>
> Hi Tim,
>
> yes you're running into a well known s/limitation/feature/ of spatial
> SQLite (I'm pretty sure that the Spatialite website has one page about
> that, but can't find it right now). The use of the ST_ functions doesn't
> trigger the RTree automatically, and you need to query it explicitly by
> joining with the rtree_{table_name}_{geom_column_name} virtual table.
>
> For example the following completes quite quickly:
>
> $ ogrinfo -ro -al -q
> /vsicurl/
> https://storage.googleapis.com/open-geodata/linz-examples/nz-building-outlines.gpkg
> -sql
> <https://storage.googleapis.com/open-geodata/linz-examples/nz-building-outlines.gpkg-sql>
> "SELECT * FROM nz_building_outlines p JOIN
> rtree_nz_building_outlines_geom rt ON p.rowid = rt.id WHERE rt.minx <
> 1771150 and rt.maxx > 1771140 and rt.miny < 5894877 and rt.maxy >
> 5894870 and ST_Intersects(p.geom, ST_GeomFromText('POINT(1771141
> 5894876)')) = 1"
>
> Layer name: SELECT
> OGRFeature(SELECT):11471
> building_id (Integer) = 4199821
> name (String) =
> use (String) = Unknown
> suburb_locality (String) = Karaka
> town_city (String) =
> territorial_authority (String) = Auckland
> capture_method (String) = Feature Extraction
> capture_source_group (String) = NZ Aerial Imagery
> capture_source_id (Integer) = 1014
> capture_source_name (String) = Auckland 0.075m Urban Aerial Photos
> (2017)
> capture_source_from (Date) = 2017/03/15
> capture_source_to (Date) = 2017/05/06
> last_modified (Date) = 2019/04/05
> id (Integer64) = 11471
> minx (Real) = 1771139.875
> maxx (Real) = 1771152
> miny (Real) = 5894865.5
> maxy (Real) = 5894883
> MULTIPOLYGON (((1771139.91348946 5894880.21926274,1771146.68318411
> 5894882.80032223,1771151.97657734 5894868.91662014,1771145.02123927
> 5894866.26478195,1771143.49962591 5894870.255725,1771142.39909482
> 5894869.83612922,1771140.01981817 5894876.07658206,1771141.30599197
> 5894876.56695731,1771139.91348946 5894880.21926274)))
>
>
> Even
>
> Le 25/09/2025 à 01:38, Tim Harris via gdal-dev a écrit :
> > Hi, trying to do some queries of fairly large GPKG files sitting in S3
> > in the hopes that spatial indexes and S3 range requests make it fast
> > and efficient. But it seems that's not the case and I'm wondering if
> > I'm doing something wrong.
> >
> > The examples below are just looking for a geometry at a point. The
> > filename is fake but the geoms.gpkg contains a layer "geoms" with a
> > bunch of polygons. It's about 1.7 GB in size.
> >
> > Something like this runs relatively fast, it makes about 17 HTTP
> > requests and outputs the feature:
> >
> > ---
> > ogrinfo -ro -json -features /vsis3/bucket/geoms.gpkg geoms -spat -105
> > 40 -105 40
> > ---
> >
> > But this, using -sql instead of -spat, sits forever without finishing.
> > With CPL_CURL_VERBOSE it sort of looks like it's just scanning through
> > the file a range request at a time:
> >
> > ---
> > ogrinfo -ro -json -features /vsis3/bucket/geoms.gpkg geoms -sql
> > "select * from geoms where ST_Intersects(geom,
> > ST_GeomFromText('POINT(-105 40)', 4326)) = 1"
> > ---
> >
> > I also tried replacing the ST_GeomFromText with other things
> > like MakePoint(-105, 40, 4326) and ST_EnvIntersects(geom, -105, 40,
> > -105, 40), just in case. It seems like -spat uses the spatial index
> > efficiently but -sql doesn't. I also tried using them both, but having
> > the -sql there seems to make it slow.
> >
> > I also tried to add "explain query plan" to the sql statement:
> >
> > ---
> > ogrinfo -ro
> > /vsis3/prod-tilerepo-va2/metadata/20250920T041443/geoms.gpkg geoms
> > -sql "EXPLAIN QUERY PLAN select * from geoms where ST_Intersects(geom,
> > ST_GeomFromText('POINT(-105 40)', 4326)) = 1"
> > ---
> >
> > That only sort of confirms the scan, and reports "SCAN geoms" in the
> > output. This page, https://sqlite.org/eqp.html, explains the "explain
> > query plan" output and I was hoping to see a "SEARCH geoms USING INDEX
> > ___" of some sort.
> >
> > Is there something about the use of -sql that precludes it from using
> > the spatial index? Maybe we need to do an ogr2ogr with just -spat to
> > narrow it down, then apply a -sql to that result if we need additional
> > filtering?
> >
> > Thanks
> >
> > _______________________________________________
> > gdal-dev mailing list
> > gdal-dev at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
> --
> http://www.spatialys.com/
> My software is free, but my time generally not.
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20250925/8fc5dd10/attachment.htm>
More information about the gdal-dev
mailing list