[gdal-dev] [EXTERNAL] Re: Expected runtime of polygonize (GDAL 3.9.0) for few very large features.

Even Rouault even.rouault at spatialys.com
Tue Jul 23 12:32:04 PDT 2024


Le 23/07/2024 à 21:08, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND 
APPLICATIONS INC] a écrit :
>
> Excellent, thanks Even!  Do you recall what the runtime was before 
> these changes on your test system?
>
I killed the process at about half an hour. Don't recall the progress it 
reached, maybe 40%-50%.
>
> *From: *Even Rouault <even.rouault at spatialys.com>
> *Date: *Tuesday, July 23, 2024 at 3:00 PM
> *To: *Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS 
> INC] <jesse.r.meyer at nasa.gov>, Meyer, Jesse R. (GSFC-618.0)[SCIENCE 
> SYSTEMS AND APPLICATIONS INC] via gdal-dev <gdal-dev at lists.osgeo.org>
> *Subject: *[EXTERNAL] Re: [gdal-dev] Expected runtime of polygonize 
> (GDAL 3.9.0) for few very large features.
>
> *CAUTION:*This email originated from outside of NASA.  Please take 
> care when clicking links or opening attachments.  Use the "Report 
> Message" button to report suspicious messages to the NASA SOC.
>
>
>
> Hi,
>
> I've got a chance to have a look at your test dataset. In 
> https://github.com/OSGeo/gdal/pull/10477, I've reduced the runtime to 
> 8 minutes (with GeoParquet output, without spatial sorting), by 
> optimizing some implementation details. I believe this could be 
> further reduced as most of the time is still spent in malloc/free of 
> temporary objects (the output is 90 million polygons!) and some 
> objects could be reused, but that would be more extensive changes
>
> Even
>
> Le 01/07/2024 à 18:40, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS 
> AND APPLICATIONS INC] via gdal-dev a écrit :
>
>     Hi,
>
>     We’ve encountered a few images with what seems like pathological
>     performance problems with polygonise.  The details below are a
>     report from another developer that I haven’t yet independently
>     verified.
>
>     We threshold a raster image to a binary mask in a memory dataset,
>     use that as its own mask to mask out the background.
>
>     gdal.Polygonize(nn_mem_band, nn_mem_band, ogr_mem_lyr, -1)
>
>     We have a number of 32k x 32k raster images that feature number of
>     very large same-valued regions (some as large as 80% of the entire
>     raster).  We’re seeing ~10hrs on a modern workstation to complete
>     the line of code above. OpenCV can apparently construct a
>     connected components list in mere seconds, on the same workstation
>     and image, so we’re considering constructing the OGR geometries
>     directly from those as a temporary work around.
>
>     Is this situation a known pitfall with the current algorithm /
>     data structures behind Polygonize?
>
>     I’m able to share the problematic tile(s) if of interest,
>
>     Best
>
>     Jesse
>
>
>
>     _______________________________________________
>
>     gdal-dev mailing list
>
>     gdal-dev at lists.osgeo.org
>
>     https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
> -- 
> http://www.spatialys.com
> My software is free, but my time generally not.

-- 
http://www.spatialys.com
My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240723/357895b6/attachment.htm>


More information about the gdal-dev mailing list