[gdal-dev] gdal_polygonize.py TIF to JSON performance

Graeme B. Bell grb at skogoglandskap.no
Tue Jan 13 01:37:09 PST 2015


>> 
>> The reason for so many reads (though 2.3 seconds out of "a few hours" is
>> negligible overhead) is that the algorithm operates on a pair of adjacent
>> raster lines at a time. This allows processing of extremely large images
>> with very modest memory requirements. It's been a while since I've looked at
>> the code, but from my recollection, the algorithm should scale approximately
>> linearly in the number of pixels and polygons in the image. Far more
>> important to the run-time is the nature of the image itself. If the input is
>> something like a satellite photo, your output can be orders of magnitude
>> larger than the input image, as you can get a polygon for nearly every
>> pixel. If the output format is a verbose format like KML or JSON, the number
>> of bytes to describe each pixel is large. How big was the output in your
>> colleague's run?


Three points. 

- until last year, "Dan's GDAL scripts" had a polygonisation routine that was an order of magnitude faster than gdal_polygonise for our use cases.

- locally, for geometry burning/raster processing/polygonising we use 'rbuild' to manage tiling (I'm the author) to get e.g. 100x speedup from parallelisation and smaller tasks that fit better in cache - you can find it on http://github.com/gbb. You can run a polygon merge afterwards on the union of the tiles to consolidate the polygons.

- There are two 'worst case' situations for polygonisation. You outline one of them above (zillions of tiny polygons). My own experience has been that this problem was handled well by either gdal_polygonise or dan's scripts - I can't remember which one worked well. There is another 'worst case' situation that occurs frequently as follows: 

Whenever you deal with national scale data for any country with coastline, you frequently end up with an absolutely gigantic and horrifically complex single polygon which depicts the coastline and all the rivers throughout the country as a single continuous edge. This mega-polygon, so often present and so often necessary, is very time-consuming for gdal_polygonise to produce and the result is very painful for every GIS geometry package to handle. 

It would be great if the people behind gdal_polygonise could put some thought into this extremely common situation for anyone working with country or continent scale rasters to make sure that it is handled well. It has certainly affected us a great deal when working with data at up to 2m resolution for a country larger than the UK...

Graeme.




More information about the gdal-dev mailing list