[gdal-dev] Read a /vsigzip/ csv.gz all at once
Joaquim Manuel Freire Luís
jluis at ualg.pt
Tue Jun 24 13:28:11 PDT 2025
To finish this. The problem was in the GMT.jl wrapper.
Short history.
1. Julia 1.9 started to crash with some GMT.jl tests
2. Opened an issue (https://github.com/JuliaLang/julia/issues/47003) but got little-to-none help
3. Found a patch for the situation (that seemed innocuous)
https://github.com/GenericMappingTools/GMT.jl/blob/master/src/gdal.jl#L1953
but that patch ended up causing this extreme slowdown. Since Julia doesn’t crash anymore, I removed that patch and now
julia> @time gdalread("/vsigzip//vsicurl/https://bulk.meteostat.net/v2/hourly/2022/08554.csv.gz");
1.007115 seconds (130.95 k allocations: 3.804 MiB)
Thanks for the discussion that helped a lot figuring out the problem.
Joaquim
From: gdal-dev <gdal-dev-bounces at lists.osgeo.org> On Behalf Of Joaquim Manuel Freire Luís via gdal-dev
Sent: Tuesday, June 24, 2025 6:19 PM
To: Erik Schnetter <schnetter at gmail.com>; gdal-dev at lists.osgeo.org
Subject: Re: [gdal-dev] Read a /vsigzip/ csv.gz all at once
Even
In case you are spending any time on this, please do not. It this time I’m persuaded that this is a Juia wrapper(s) issue but have no time to investigating it much more right now.
From: Joaquim Manuel Freire Luís <jluis at ualg.pt<mailto:jluis at ualg.pt>>
Sent: Tuesday, June 24, 2025 5:52 PM
To: Joaquim Manuel Freire Luís <jluis at ualg.pt<mailto:jluis at ualg.pt>>; Erik Schnetter <schnetter at gmail.com<mailto:schnetter at gmail.com>>; gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>
Subject: RE: [gdal-dev] Read a /vsigzip/ csv.gz all at once
Erik, BINGO.
Since, in this case, I know that the field type is string, if I replace the calls to getfield(…) to
OGR_F_GetFieldAsString(f.ptr, k)
I get these timings (on a local file)
julia> @time gdalread("/vsigzip/C:/TMP/.meteostat/cache/hourly/2025/08554.csv.gz");
0.046272 seconds (56.04 k allocations: 1.751 MiB)
and, for a remote one
julia> @time gdalread("/vsigzip//vsicurl/https://bulk.meteostat.net/v2/hourly/2022/08554.csv.gz");
1.191215 seconds (113.43 k allocations: 3.537 MiB)
From: gdal-dev <gdal-dev-bounces at lists.osgeo.org<mailto:gdal-dev-bounces at lists.osgeo.org>> On Behalf Of Joaquim Manuel Freire Luís via gdal-dev
Sent: Tuesday, June 24, 2025 5:33 PM
To: Erik Schnetter <schnetter at gmail.com<mailto:schnetter at gmail.com>>; gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>
Subject: Re: [gdal-dev] Read a /vsigzip/ csv.gz all at once
To complement what Eric said, here’s the ‘getfield’ function (This code was taken from ArchGDAL so we are talking of the same thing)
https://github.com/GenericMappingTools/GMT.jl/blob/master/src/gdal.jl#L2161
From: gdal-dev <gdal-dev-bounces at lists.osgeo.org<mailto:gdal-dev-bounces at lists.osgeo.org>> On Behalf Of Erik Schnetter via gdal-dev
Sent: Tuesday, June 24, 2025 5:21 PM
To: gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>
Subject: Re: [gdal-dev] Read a /vsigzip/ csv.gz all at once
The Julia wrapper (ArchGDAL.jl) for `getfield` calls `OGR_FD_GetFieldDefn` and several related function (to get the type of the field etc.). Are these possibly expensive operations in GDAL?
Any C function in GDAL can easily be called from Julia. Which C function would get all fields at once? I assume that e.g. `OGR_F_GetFieldAsDoubleList` would not work; this would be for values that are themselves lists?
The Julia code for `getfield` spends quite a bit of work to find out the type of the field. This includes a bit of reference counting, allocating small structures on the heap, registering finalizers for them etc. This could be avoided by adding a Julia wrapper that calls `getfield` repeatedly (even from Julia, calling C has no overhead by itself) for a range of integers. This would avoid the additional overhead having to do with handling types, and the Julia/GDAL reference counting. Even, is that what you had in mind?
-erik
On Jun 24, 2025, at 11:01, Even Rouault via gdal-dev <gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>> wrote:
Hi,
I don't know anything about Julia but I'd suspect that there must be something particularly slow in the way it interacts with C. For comparison, "time python3 swig/python/gdal-utils/osgeo_utils/samples/ogrinfo.py /vsigzip//vsicurl/https://bulk.meteostat.net/v2/hourly/2022/08554.csv.gz -al > /dev/null" that does essentially your loop, and also prints on stdout, runs in 1.5 seconds (compared to native ogrinfo that runs in 0.7 s). Perhaps you could write a Julia wrapper to get all fields of feature at once and return whatever dictionary or equivalent data structure is idiomatic (and efficient )in Julia ? Also are you sure your Julia wrapper is built with optimization enabled?
Even
Le 24/06/2025 à 16:33, Joaquim Manuel Freire Luís via gdal-dev a écrit :
Hi,
Im trying to read files like
https://bulk.meteostat.net/v2/hourly/2022/08554.csv.gz
in my Julia wrapper. The point is that, although I’m kind off succeeding, the hole operation is very slow.
What I’m doing (code not committed yet so can’t post a link) is to read like this
layer = getlayer(dataset, 0)
for f in layer
for k = 1: Gdal.nfield(f)
Gdal.getfield(f, k-1)
…
This works but it’s extremely slow because each “getfield” takes about 1e-4 seconds and the file has ~8 k rows, each with 13 fields. That amounts to > 10 sec.
I’ve searched but couldn’t find a way to read the entire file at once (which takes 1e-2 seconds if I read it, locally, with a gzip wrapper) and return it as a single string array that I could parse later.
Is that possible?
Thanks
Joaquim
_______________________________________________
gdal-dev mailing list
gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>
https://lists.osgeo.org/mailman/listinfo/gdal-dev
--
http://www.spatialys.com<http://www.spatialys.com/>
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>
https://lists.osgeo.org/mailman/listinfo/gdal-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20250624/f0c9c1f7/attachment-0001.htm>
More information about the gdal-dev
mailing list