[gdal-dev] gdal.Rasterize with same OGR dataset from two python threads
Even Rouault
even.rouault at spatialys.com
Mon Oct 28 09:08:51 PDT 2024
Le 28/10/2024 à 17:01, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND
APPLICATIONS INC] via gdal-dev a écrit :
>
> I have two calls to gdal.Rasterize, each of which target a separate
> GDAL memory dataset but source the same OGR memory dataset, that I
> hoped could be ran in parallel using Python’s concurrent futures. The
> idea being that each GDAL call unlocks the Python GIL, and performing
> read only operations on the vector database (except for storing memory
> for the results) could in principle be a safe and effective
> optimization, as the feature layers themselves are not mutated. The
> SQL dialect is SQLite, so presumably the OGR dataset has to be
> converted to a SQLite (memory) database. Technically SQLite supports
> multiple readers just fine, but this doesn’t mean GDAL/OGR does. The
> multithreading documentation page doesn’t explicitly mention OGR /
> vector datasets but I presume they inherit similar stateful
> restrictions (Yes RFC 101 is coming). However, running these SQL
> queries at the same times causes OGR to trip over itself (I presume
> OGR assumes only one query statement is being evaluated at the same time).
>
> So I think the intended work around is either: accept this is as a
> serially dependent task, or copy the dataset and have each Rasterize()
> work on a copy, yes?
>
I'm not clear if you use the same Python source vector dataset, or if
you open your source dataset once for each thread ? The first case is a
big no no: anything could happen, including wrong results and crashes.
One object per thread is the way to go. If the processing is very
intensive on acquiring source features, you may hit a global lock at the
SQLite level, but there isn't much we can do about that. Or you need to
use multi-processing parallelization instead of multi-threading. But you
certainly don't need to copy your source dataset.
>
> In the same spirit as RFC 101, which gives some thread safety to
> raster read-only workloads, is there interest in expanding this to
> vector datasets?
>
That would be tricky. What would be the expect result if a user would
use GetNextFeature() on a thread-safe OGRLayer...: would users expect
each thread to see all features or features would be distributed among
calling threads ?
Even
--
http://www.spatialys.com
My software is free, but my time generally not.
Butcher of all kinds of standards, open or closed formats. At the end, this is just about bytes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20241028/02d1295b/attachment.htm>
More information about the gdal-dev
mailing list