[gdal-dev] [EXTERNAL] Re: Expected runtime of polygonize (GDAL 3.9.0) for few very large features.
Even Rouault
even.rouault at spatialys.com
Tue Jul 23 12:32:04 PDT 2024
Le 23/07/2024 à 21:08, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND
APPLICATIONS INC] a écrit :
>
> Excellent, thanks Even! Do you recall what the runtime was before
> these changes on your test system?
>
I killed the process at about half an hour. Don't recall the progress it
reached, maybe 40%-50%.
>
> *From: *Even Rouault <even.rouault at spatialys.com>
> *Date: *Tuesday, July 23, 2024 at 3:00 PM
> *To: *Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS
> INC] <jesse.r.meyer at nasa.gov>, Meyer, Jesse R. (GSFC-618.0)[SCIENCE
> SYSTEMS AND APPLICATIONS INC] via gdal-dev <gdal-dev at lists.osgeo.org>
> *Subject: *[EXTERNAL] Re: [gdal-dev] Expected runtime of polygonize
> (GDAL 3.9.0) for few very large features.
>
> *CAUTION:*This email originated from outside of NASA. Please take
> care when clicking links or opening attachments. Use the "Report
> Message" button to report suspicious messages to the NASA SOC.
>
>
>
> Hi,
>
> I've got a chance to have a look at your test dataset. In
> https://github.com/OSGeo/gdal/pull/10477, I've reduced the runtime to
> 8 minutes (with GeoParquet output, without spatial sorting), by
> optimizing some implementation details. I believe this could be
> further reduced as most of the time is still spent in malloc/free of
> temporary objects (the output is 90 million polygons!) and some
> objects could be reused, but that would be more extensive changes
>
> Even
>
> Le 01/07/2024 à 18:40, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS
> AND APPLICATIONS INC] via gdal-dev a écrit :
>
> Hi,
>
> We’ve encountered a few images with what seems like pathological
> performance problems with polygonise. The details below are a
> report from another developer that I haven’t yet independently
> verified.
>
> We threshold a raster image to a binary mask in a memory dataset,
> use that as its own mask to mask out the background.
>
> gdal.Polygonize(nn_mem_band, nn_mem_band, ogr_mem_lyr, -1)
>
> We have a number of 32k x 32k raster images that feature number of
> very large same-valued regions (some as large as 80% of the entire
> raster). We’re seeing ~10hrs on a modern workstation to complete
> the line of code above. OpenCV can apparently construct a
> connected components list in mere seconds, on the same workstation
> and image, so we’re considering constructing the OGR geometries
> directly from those as a temporary work around.
>
> Is this situation a known pitfall with the current algorithm /
> data structures behind Polygonize?
>
> I’m able to share the problematic tile(s) if of interest,
>
> Best
>
> Jesse
>
>
>
> _______________________________________________
>
> gdal-dev mailing list
>
> gdal-dev at lists.osgeo.org
>
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
> --
> http://www.spatialys.com
> My software is free, but my time generally not.
--
http://www.spatialys.com
My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240723/357895b6/attachment.htm>
More information about the gdal-dev
mailing list