[gdal-dev] gdal_polygonize.py TIF to JSON performance

Mon Jan 12 08:04:10 PST 2015

Your team writes that the image is usually exported as a vector file, eg
shapefile. Can they do this successfully for the 1.4GB image? If so,
have you tried just converting the shapefile to geojson? Might be the
simplest solution.

If that doesn't work, you could try tiling, as you mention. As Even has
already noted, the challenge to threading the code is rejoining the
polygons at the boundaries. It's not an overwhelming problem, but it is
a challenge and requires buffering the output rather than streaming it.

You could do a poor-man's version of multi-threading.
1. Tile your input image. I would probably try something bigger than
1024x1024 that your mention. Perhaps 4K x 4K, maybe 8K x 8K. Overlap the
tiles by a pixel or two on all edges.
    For the initial experiment just a couple of adjacent tiles are
sufficient.
2. Feed each tile to gdal_polygonize in as many processes as you have
available processors.
3. Take the resulting polygon files and merge them into a single
shapefile (or other equivalent format).  You can do this with ogr2ogr or
in qgis
4. Dissolve using the classification value
5. Split multipart polygons to single polygons.

I don't know anything about how the dissolve algorithm is written, so I
can't predict it's performance and how it will scale with image size and
number of tiles. However, if it takes advantage of spatial indices, it
could scale fairly well unless you have shapes (like roads) that tend to
stretch from one tile boundary to the next.

On 1/12/2015 3:07 AM, chris snow wrote:
> Hi David,
>
> Thanks for your response.  I have a little more information since
> feeding your response to the project team:
>
> "The tif file is around 1.4GB as you noted and the data is similar to
> that of the result of an image classification where each pixel value
> is in a range between (say) 1-5. After a classification this image is
> usually exported as a vector file (EVF of Shapefile) but in this case
> we want to use geojson. This has taken both Mark and myself weeks to
> complete with gdal_polygonize as you noted.
>
> I think an obvious way to speed this up would be threading by breaking
> the tiff file in tiles (say 1024x1024) and spreading these over the
> available cores, then there would need to be a way to dissolve the
> tile boundaries to complete the polygons as we would not want obvious
> tile lines."
>
> Does this help?
>
>