[gdal-dev] gdal.Rasterize with same OGR dataset from two python threads

Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] jesse.r.meyer at nasa.gov
Mon Oct 28 09:01:58 PDT 2024


I have two calls to gdal.Rasterize, each of which target a separate GDAL memory dataset but source the same OGR memory dataset, that I hoped could be ran in parallel using Python’s concurrent futures.  The idea being that each GDAL call unlocks the Python GIL, and performing read only operations on the vector database (except for storing memory for the results) could in principle be a safe and effective optimization, as the feature layers themselves are not mutated.  The SQL dialect is SQLite, so presumably the OGR dataset has to be converted to a SQLite (memory) database.  Technically SQLite supports multiple readers just fine, but this doesn’t mean GDAL/OGR does.  The multithreading documentation page doesn’t explicitly mention OGR / vector datasets but I presume they inherit similar stateful restrictions (Yes RFC 101 is coming).  However, running these SQL queries at the same times causes OGR to trip over itself (I presume OGR assumes only one query statement is being evaluated at the same time).

So I think the intended work around is either: accept this is as a serially dependent task, or copy the dataset and have each Rasterize() work on a copy, yes?

In the same spirit as RFC 101, which gives some thread safety to raster read-only workloads, is there interest in expanding this to vector datasets?

Best,
Jesse

Lead Computer Scientist
Science Systems and Applications, Inc.
Dr Compton Tucker Team
NASA Goddard Space Flight Center
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20241028/4be47b26/attachment.htm>


More information about the gdal-dev mailing list