[gdal-dev] writing arrow geometry

Michael Sumner mdsumner at gmail.com
Mon Oct 7 07:51:07 PDT 2024


Thanks Dewey!  That does indeed fix it.


ogr2ogr ~/fromgdal.arrows
ogr/data/arrow/from_paleolimbot_geoarrow/polygon-default.ipc  ## using
*.arrows triggers same as -lco FORMAT=STREAM

then in R

nanoarrow::read_nanoarrow("~/fromgdal.arrows")
<nanoarrow_array_stream struct<row_num: int32, geometry:
geoarrow.polygon{list<item: list<item: struct<x: double, y: double>>>}>>

I had even toyed with the FORMAT=FILE/STREAM and the .arrows extension ...
but was a bit lost about what to expect.

Thanks all, Mike



On Tue, Oct 8, 2024 at 1:34 AM Dewey Dunnington <dewey at voltrondata.com>
wrote:

> Thank you Michael for the report and Even for the heads up!
>
> On nanoarrow's end, I believe that what's happening here is that
> Michael wrote an arrow "file" and then attempted to read with
> nanoarrow's IPC reader (which only reads streams). I believe that
> `ogr2ogr ~/fromgdal.arrow` -> `ogr2ogr ~/fromgdal.arrows` (note the s
> suffix) would write an IPC stream.
>
> (As a note, both Joris and I attempted to retain the more informative
> name "feather" for "Arrow files" to avoid this type of confusion).
>
> Let me know if that does not work!
>
> Cheers,
>
> -dewey
>
> On Mon, Oct 7, 2024 at 9:07 AM Joris Van den Bossche
> <jorisvandenbossche at gmail.com> wrote:
> >
> > The section about MultiPolygons at
> > https://geoarrow.org/format.html#memory-layouts mentions:
> >
> > > The child name of the outer list should be “polygons”; the child name
> of the middle list should be “rings”; the child name of the inner list
> should be “vertices”.
> >
> > So this is currently phrased as a "should" and not "must" regarding
> > the list's field names, and so the data generated by GDAL is valid
> > under that description.
> >
> > It might be good to verify with other consumers that those indeed can
> > handle such data (and add some test data with varying field names to
> > test against).
> >
> > Joris
> >
> > On Mon, 7 Oct 2024 at 15:39, Even Rouault via gdal-dev
> > <gdal-dev at lists.osgeo.org> wrote:
> > >
> > > Michael,
> > >
> > > my understanding of https://geoarrow.org/format.html#memory-layouts
> is that what writes OGR is supposed to be fine since they mentionned types
> like 'List<List<FixedSizeList<double>[2]>>'. Perhaps I've missed something
> or nanoarrow has stricter expectations? CC'ing Dewey Dunnington
> > >
> > > Even
> > >
> > > Le 07/10/2024 à 15:23, Michael Sumner via gdal-dev a écrit :
> > >
> > > I realize I left out the INTERLEAVING, ie.
> > >
> > > ogr2ogr ~/fromgdal.arrow
> ogr/data/arrow/from_paleolimbot_geoarrow/polygon-default.ipc -lco
> GEOMETRY_ENCODING=GEOARROW_INTERLEAVED
> > >
> > > but still, I get these list<item elements rather than their
> rings/vertices/geoarrow.point type names:
> > >
> > > <nanoarrow_array_stream struct<row_num: int32, geometry:
> geoarrow.polygon{list<item: list<item: fixed_size_list(2)<xy: double>>>}>>
> > >
> > >
> > >
> > > On Tue, Oct 8, 2024 at 12:19 AM Michael Sumner <mdsumner at gmail.com>
> wrote:
> > >>
> > >> When I investigate the schema in one of the test files
> > >>
> > >> ogr/data/arrow/from_paleolimbot_geoarrow/polygon-default.ipc
> > >>
> > >> I see expected  list<polygons and list<rings and xy etc. I'm printing
> this by using R nanoarrow::read_arrow, or from poLayer->GetArrowStream and
> I get the same output:
> > >>
> > >> <nanoarrow_array_stream struct<row_num: int32, geometry:
> geoarrow.polygon{list<rings: list<vertices:
> geoarrow.point{fixed_size_list(2)<xy: double>}>>}>>
> > >>
> > >> If I write a new .arrow with GDAL
> > >>
> > >> ogr2ogr ~/fromgdal.arrow
> ogr/data/arrow/from_paleolimbot_geoarrow/polygon-default.ipc
> > >>
> > >> the stream schema looks like this:
> > >>
> > >> <nanoarrow_array_stream struct<row_num: int32, geometry:
> geoarrow.polygon{list<item: list<item: struct<x: double, y: double>>>}>>
> > >>
> > >> and from nanoarrow I see
> > >>
> > >> nanoarrow::read_nanoarrow("~/fromgdal.arrow")
> > >> Error in read_nanoarrow.character("~/fromgdal.arrow") :
> > >>   array_stream->get_schema(): [29] Expected >= 1330795077 bytes of
> remaining data but found 2266 bytes in buffer
> > >>
> > >> Are we in-between moves regarding specifications, or something?  I'm
> having good results generally and this seems like a problem in the Arrow
> driver for write.
> > >>
> > >> Cheers, Mike
> > >>
> > >>
> > >> --
> > >> Michael Sumner
> > >> Research Software Engineer
> > >> Australian Antarctic Division
> > >> Hobart, Australia
> > >> e-mail: mdsumner at gmail.com
> > >
> > >
> > >
> > > --
> > > Michael Sumner
> > > Research Software Engineer
> > > Australian Antarctic Division
> > > Hobart, Australia
> > > e-mail: mdsumner at gmail.com
> > >
> > > _______________________________________________
> > > gdal-dev mailing list
> > > gdal-dev at lists.osgeo.org
> > > https://lists.osgeo.org/mailman/listinfo/gdal-dev
> > >
> > > --
> > > http://www.spatialys.com
> > > My software is free, but my time generally not.
> > >
> > > _______________________________________________
> > > gdal-dev mailing list
> > > gdal-dev at lists.osgeo.org
> > > https://lists.osgeo.org/mailman/listinfo/gdal-dev
>


-- 
Michael Sumner
Research Software Engineer
Australian Antarctic Division
Hobart, Australia
e-mail: mdsumner at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20241008/5c8005a7/attachment.htm>


More information about the gdal-dev mailing list