[gdal-dev] Attribute filter on remote Parquet file is slow
Scott
public at postholer.com
Wed Aug 28 10:27:59 PDT 2024
I could be completely wrong here.
My understanding is duckdb uses httpfs or possibly some variant of fsspec.
I believe /vsis3 uses only libcurl, which doesn't *appear* to have
support for httpfs.
Again, I could be wildly wrong.
On 8/28/24 09:45, Daniel Baston via gdal-dev wrote:
> Hello,
>
> I'm trying to use ogr2ogr with an attribute filter to pull 14 polygons
> from Overture maps. Running the following command with CPL_DEBUG=ON
> tells me that "PARQUET: Attribute filter fully translated to Arrow"
> yet it takes about 7 minutes to complete, and appears to download
> quite a bit of data:
>
> ogr2ogr /tmp/vt.geojson
> "PARQUET:/vsis3/overturemaps-us-west-2/release/2024-08-20.0/theme=divisions/type=division_area"
> -select "id,division_id,names.primary" -where "subtype='county' AND
> country='US' AND region='US-VT'"
>
> Have I made a mistake in my ogr2ogr invocation? For comparison,
> running what I believe to be an equivalent query in DuckDB takes about
> 10 seconds:
>
> SELECT
> id,
> division_id,
> names.primary,
> ST_GeomFromWKB(geometry) as geometry
> FROM
> read_parquet('s3://overturemaps-us-west-2/release/2024-08-20.0/theme=divisions/type=division_area/*',
> hive_partitioning=1)
> WHERE
> subtype = 'county'
> AND country = 'US'
> AND region = 'US-VT';
>
> I am using GDAL master (e09d07a7) and libarrow 16.1.
>
> Thanks,
> Dan
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
More information about the gdal-dev
mailing list