<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hi,</p>
<p>I've got a chance to have a look at your test dataset. In
<a class="moz-txt-link-freetext" href="https://github.com/OSGeo/gdal/pull/10477">https://github.com/OSGeo/gdal/pull/10477</a>, I've reduced the runtime
to 8 minutes (with GeoParquet output, without spatial sorting), by
optimizing some implementation details. I believe this could be
further reduced as most of the time is still spent in malloc/free
of temporary objects (the output is 90 million polygons!) and some
objects could be reused, but that would be more extensive changes</p>
<p>Even<br>
</p>
<div class="moz-cite-prefix">Le 01/07/2024 à 18:40, Meyer, Jesse R.
(GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev a
écrit :<br>
</div>
<blockquote type="cite"
cite="mid:MN2PR09MB5932834D78C863A6F0BCC7C0C9D32@MN2PR09MB5932.namprd09.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="Generator"
content="Microsoft Word 15 (filtered medium)">
<style>@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face
{font-family:Aptos;
panose-1:2 11 0 4 2 2 2 2 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Aptos",sans-serif;
mso-ligatures:standardcontextual;}span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Aptos",sans-serif;
color:windowtext;}.MsoChpDefault
{mso-style-type:export-only;
font-size:11.0pt;}div.WordSection1
{page:WordSection1;}</style>
<div class="WordSection1">
<p class="MsoNormal">Hi,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We’ve encountered a few images with what
seems like pathological performance problems with polygonise.
The details below are a report from another developer that I
haven’t yet independently verified.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We threshold a raster image to a binary
mask in a memory dataset, use that as its own mask to mask out
the background.<o:p></o:p></p>
<p class="MsoNormal">gdal.Polygonize(nn_mem_band, nn_mem_band,
ogr_mem_lyr, -1)<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We have a number of 32k x 32k raster images
that feature number of very large same-valued regions (some as
large as 80% of the entire raster). We’re seeing ~10hrs on a
modern workstation to complete the line of code above. OpenCV
can apparently construct a connected components list in mere
seconds, on the same workstation and image, so we’re
considering constructing the OGR geometries directly from
those as a temporary work around.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Is this situation a known pitfall with the
current algorithm / data structures behind Polygonize?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I’m able to share the problematic tile(s)
if of interest,<o:p></o:p></p>
<p class="MsoNormal">Best<o:p></o:p></p>
<p class="MsoNormal">Jesse<o:p></o:p></p>
</div>
<br>
<fieldset class="moz-mime-attachment-header"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
gdal-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a>
<a class="moz-txt-link-freetext" href="https://lists.osgeo.org/mailman/listinfo/gdal-dev">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a>
</pre>
</blockquote>
<pre class="moz-signature" cols="72">--
<a class="moz-txt-link-freetext" href="http://www.spatialys.com">http://www.spatialys.com</a>
My software is free, but my time generally not.</pre>
</body>
</html>