[gdal-dev] gdal.Rasterize with same OGR dataset from two python threads

Even Rouault even.rouault at spatialys.com
Mon Oct 28 09:08:51 PDT 2024


Le 28/10/2024 à 17:01, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND 
APPLICATIONS INC] via gdal-dev a écrit :
>
> I have two calls to gdal.Rasterize, each of which target a separate 
> GDAL memory dataset but source the same OGR memory dataset, that I 
> hoped could be ran in parallel using Python’s concurrent futures.  The 
> idea being that each GDAL call unlocks the Python GIL, and performing 
> read only operations on the vector database (except for storing memory 
> for the results) could in principle be a safe and effective 
> optimization, as the feature layers themselves are not mutated.  The 
> SQL dialect is SQLite, so presumably the OGR dataset has to be 
> converted to a SQLite (memory) database. Technically SQLite supports 
> multiple readers just fine, but this doesn’t mean GDAL/OGR does.  The 
> multithreading documentation page doesn’t explicitly mention OGR / 
> vector datasets but I presume they inherit similar stateful 
> restrictions (Yes RFC 101 is coming).  However, running these SQL 
> queries at the same times causes OGR to trip over itself (I presume 
> OGR assumes only one query statement is being evaluated at the same time).
>
> So I think the intended work around is either: accept this is as a 
> serially dependent task, or copy the dataset and have each Rasterize() 
> work on a copy, yes?
>
I'm not clear if you use the same Python source vector dataset, or if 
you open your source dataset once for each thread ?  The first case is a 
big no no: anything could happen, including wrong results and crashes. 
One object per thread is the way to go. If the processing is very 
intensive on acquiring source features, you may hit a global lock at the 
SQLite level, but there isn't much we can do about that. Or you need to 
use multi-processing parallelization instead of multi-threading. But you 
certainly don't need to copy your source dataset.
>
> In the same spirit as RFC 101, which gives some thread safety to 
> raster read-only workloads, is there interest in expanding this to 
> vector datasets?
>
That would be tricky. What would be the expect result if a user would 
use GetNextFeature() on a thread-safe OGRLayer...: would users expect 
each thread to see all features or features would be distributed among 
calling threads ?

Even

-- 
http://www.spatialys.com
My software is free, but my time generally not.
Butcher of all kinds of standards, open or closed formats. At the end, this is just about bytes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20241028/02d1295b/attachment.htm>


More information about the gdal-dev mailing list