[gdal-dev] [EXTERNAL] Re: Expected runtime of polygonize (GDAL 3.9.0) for few very large features.
Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC]
jesse.r.meyer at nasa.gov
Tue Jul 23 12:08:30 PDT 2024
Excellent, thanks Even! Do you recall what the runtime was before these changes on your test system?
From: Even Rouault <even.rouault at spatialys.com>
Date: Tuesday, July 23, 2024 at 3:00 PM
To: Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] <jesse.r.meyer at nasa.gov>, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev <gdal-dev at lists.osgeo.org>
Subject: [EXTERNAL] Re: [gdal-dev] Expected runtime of polygonize (GDAL 3.9.0) for few very large features.
CAUTION: This email originated from outside of NASA. Please take care when clicking links or opening attachments. Use the "Report Message" button to report suspicious messages to the NASA SOC.
Hi,
I've got a chance to have a look at your test dataset. In https://github.com/OSGeo/gdal/pull/10477, I've reduced the runtime to 8 minutes (with GeoParquet output, without spatial sorting), by optimizing some implementation details. I believe this could be further reduced as most of the time is still spent in malloc/free of temporary objects (the output is 90 million polygons!) and some objects could be reused, but that would be more extensive changes
Even
Le 01/07/2024 à 18:40, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev a écrit :
Hi,
We’ve encountered a few images with what seems like pathological performance problems with polygonise. The details below are a report from another developer that I haven’t yet independently verified.
We threshold a raster image to a binary mask in a memory dataset, use that as its own mask to mask out the background.
gdal.Polygonize(nn_mem_band, nn_mem_band, ogr_mem_lyr, -1)
We have a number of 32k x 32k raster images that feature number of very large same-valued regions (some as large as 80% of the entire raster). We’re seeing ~10hrs on a modern workstation to complete the line of code above. OpenCV can apparently construct a connected components list in mere seconds, on the same workstation and image, so we’re considering constructing the OGR geometries directly from those as a temporary work around.
Is this situation a known pitfall with the current algorithm / data structures behind Polygonize?
I’m able to share the problematic tile(s) if of interest,
Best
Jesse
_______________________________________________
gdal-dev mailing list
gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>
https://lists.osgeo.org/mailman/listinfo/gdal-dev
--
http://www.spatialys.com<http://www.spatialys.com/>
My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240723/b732c8c2/attachment-0001.htm>
More information about the gdal-dev
mailing list