[gdal-dev] gdal_polygonize.py TIF to JSON performance

Mon Jan 12 08:19:10 PST 2015

Hi David,

Thanks for the response.  I'll feed your question about converting the
shapefile to geojson back to the team.

In the meantime, I have also received some more info on your previous questions:

"The input file was 1.4GB, the output geojson was around 17GB IIRC.

The raster file contains a UK Flood River model, it works on a 5 metre
grid with pixels representing the associated flood risk at a given
point on the map.  The data is continuous in that a long river would
have a band of colour / risk which would follow its course and this
could run for miles.  Vectorising this could result in a very large
long vector with extremely complex geometry (imagine trying to
vectorise the whole of the Thames for example)."

Many thanks,

Chris

On 12 January 2015 at 16:04, David Strip <gdal at stripfamily.net> wrote:
> Your team writes that the image is usually exported as a vector file, eg
> shapefile. Can they do this successfully for the 1.4GB image? If so,
> have you tried just converting the shapefile to geojson? Might be the
> simplest solution.
>
> If that doesn't work, you could try tiling, as you mention. As Even has
> already noted, the challenge to threading the code is rejoining the
> polygons at the boundaries. It's not an overwhelming problem, but it is
> a challenge and requires buffering the output rather than streaming it.
>
> You could do a poor-man's version of multi-threading.
> 1. Tile your input image. I would probably try something bigger than
> 1024x1024 that your mention. Perhaps 4K x 4K, maybe 8K x 8K. Overlap the
> tiles by a pixel or two on all edges.
>     For the initial experiment just a couple of adjacent tiles are
> sufficient.
> 2. Feed each tile to gdal_polygonize in as many processes as you have
> available processors.
> 3. Take the resulting polygon files and merge them into a single
> shapefile (or other equivalent format).  You can do this with ogr2ogr or
> in qgis
> 4. Dissolve using the classification value
> 5. Split multipart polygons to single polygons.
>
> I don't know anything about how the dissolve algorithm is written, so I
> can't predict it's performance and how it will scale with image size and
> number of tiles. However, if it takes advantage of spatial indices, it
> could scale fairly well unless you have shapes (like roads) that tend to
> stretch from one tile boundary to the next.
>
> On 1/12/2015 3:07 AM, chris snow wrote:
>> Hi David,
>>
>> Thanks for your response.  I have a little more information since
>> feeding your response to the project team:
>>
>> "The tif file is around 1.4GB as you noted and the data is similar to
>> that of the result of an image classification where each pixel value
>> is in a range between (say) 1-5. After a classification this image is
>> usually exported as a vector file (EVF of Shapefile) but in this case
>> we want to use geojson. This has taken both Mark and myself weeks to
>> complete with gdal_polygonize as you noted.
>>
>> I think an obvious way to speed this up would be threading by breaking
>> the tiff file in tiles (say 1024x1024) and spreading these over the
>> available cores, then there would need to be a way to dissolve the
>> tile boundaries to complete the polygons as we would not want obvious
>> tile lines."
>>
>> Does this help?
>>
>>
>