[gdal-dev] Attribute filter on remote Parquet file is slow
Daniel Baston
dbaston at gmail.com
Wed Aug 28 09:45:35 PDT 2024
Hello,
I'm trying to use ogr2ogr with an attribute filter to pull 14 polygons
from Overture maps. Running the following command with CPL_DEBUG=ON
tells me that "PARQUET: Attribute filter fully translated to Arrow"
yet it takes about 7 minutes to complete, and appears to download
quite a bit of data:
ogr2ogr /tmp/vt.geojson
"PARQUET:/vsis3/overturemaps-us-west-2/release/2024-08-20.0/theme=divisions/type=division_area"
-select "id,division_id,names.primary" -where "subtype='county' AND
country='US' AND region='US-VT'"
Have I made a mistake in my ogr2ogr invocation? For comparison,
running what I believe to be an equivalent query in DuckDB takes about
10 seconds:
SELECT
id,
division_id,
names.primary,
ST_GeomFromWKB(geometry) as geometry
FROM
read_parquet('s3://overturemaps-us-west-2/release/2024-08-20.0/theme=divisions/type=division_area/*',
hive_partitioning=1)
WHERE
subtype = 'county'
AND country = 'US'
AND region = 'US-VT';
I am using GDAL master (e09d07a7) and libarrow 16.1.
Thanks,
Dan
More information about the gdal-dev
mailing list