[gdal-dev] Extracting data from a parquet file
Joaquim Manuel Freire Luís
jluis at ualg.pt
Mon Jul 22 14:21:30 PDT 2024
Ah, easy 😊
From: Even Rouault <even.rouault at spatialys.com>
Sent: Monday, July 22, 2024 8:30 PM
To: Joaquim Manuel Freire Luís <jluis at ualg.pt>; gdal-dev at lists.osgeo.org
Subject: Re: [gdal-dev] Extracting data from a parquet file
Le 22/07/2024 à 21:10, Joaquim Manuel Freire Luís a écrit :
Even,
Thanks for the explanation. But how did you find the name of the geometries (geo_point_2D and geo_shape)? Loading the “world-administrative-boundaries.parquet” in a binary editor I can see them there, but that’s certainly not the way to find these things.
$ ogrinfo world-administrative-boundaries.parquet -al -so | grep "Geometry Column"
Geometry Column 1 = geo_point_2d
Geometry Column 2 = geo_shape
Joaquim
From: Even Rouault <even.rouault at spatialys.com><mailto:even.rouault at spatialys.com>
Sent: Monday, July 22, 2024 2:29 PM
To: Joaquim Manuel Freire Luís <jluis at ualg.pt><mailto:jluis at ualg.pt>; gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>
Subject: Re: [gdal-dev] Extracting data from a parquet file
Joaquim,
The GeoPackage format only supports one geometry field per layer. and the QGIS OGR provider doesn't know currently how to handle several geometry fields per layer too
To do what you want, you need to explictly select the desired geometry field name with:
ogr2ogr out.gpkg world-administrative-boundaries.parquet -sql "select geo_shape, * from \"world-administrative-boundaries\""
Actually if you outputted to a format that supports several geometry fields per layer (let's say PostGIS), the above wouldn't work. You would need to exclude the geometry fields from the wildcard * selection with:
ogr2ogr out.gpkg world-administrative-boundaries.parquet -sql "select geo_shape, * exclude (geo_point_2D, geo_shape) from \"world-administrative-boundaries\""
Even
Le 19/07/2024 à 16:58, Joaquim Manuel Freire Luís via gdal-dev a écrit :
Hi,
I finally managed to build a working GDAL with the arrow/parquet driver and I’m now trying to convert this file
(https://public.opendatasoft.com/api/explore/v2.1/catalog/datasets/world-administrative-boundaries/exports/parquet?lang=en&timezone=Europe%2FLondon)
but can only extract the “Point”, not the “Multi polygon”
ogrinfo world-administrative-boundaries.parquet
INFO: Open of `world-administrative-boundaries.parquet'
using driver `Parquet' successful.
1: world-administrative-boundaries (Point, Multi Polygon)
This gets only the points
ogr2ogr lixo.gpkg world-administrative-boundaries.parquet
The same happens if I open the file in QGis. Points only, no polygons.
But if I do an ogrinfo -al, it prints all data in file.
ogrinfo -al world-administrative-boundaries.parquet
….
OGRFeature(world-administrative-boundaries):255
iso3 (String) = GIB
status (String) = UK Non-Self-Governing Territory
color_code (String) = GBR
name (String) = Gibraltar
…
So, how can we select in ogr2ogr to extract the polygons?
_______________________________________________
gdal-dev mailing list
gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>
https://lists.osgeo.org/mailman/listinfo/gdal-dev
--
http://www.spatialys.com
My software is free, but my time generally not.
--
http://www.spatialys.com
My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240722/66466f21/attachment.htm>
More information about the gdal-dev
mailing list