[gdal-dev] gdal_polygonize.py TIF to JSON performance

chris snow chsnow123 at gmail.com
Sun Jan 11 09:11:53 PST 2015


I have been informed by a colleague attempting to convert a 1.4GB TIF file
using gdal_polygonize.py on a g2.2xlarge Amazon instance (8 vCPU, 15gb RAM)
that the processing took over 2 weeks running constantly.   I have also
been told that the same conversion using commercial tooling was completed
in a few minutes.

As a result, I'm currently investigating to see if there is an opportunity
for improving the performance of the gdal_polygonize.py TIF to JSON
conversion.  I have run a strace while attempting the same conversion, but
stopped after a few hours (the gdal_polygonize.py status indicator was
showing between 5% and 7.5% complete).  The strace results are:


% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.40    2.348443           9    252474           read
  0.18    0.004139           3      1554           lseek
  0.12    0.002878           7       439       269 open
  0.10    0.002447          20       123        87 stat
  0.10    0.002429           5       459           mmap
  0.02    0.000561           3       208           munmap
  0.02    0.000529           2       216           fstat
  0.02    0.000504           3       188           mprotect
  0.01    0.000314           2       188           brk
  0.01    0.000173          29         6         5 unlink
  0.00    0.000109          11        10           getdents
  0.00    0.000098           1        67           rt_sigaction
  0.00    0.000000           0         4           write
  0.00    0.000000           0       173           close
  0.00    0.000000           0        12           lstat
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         2           rt_sigreturn
  0.00    0.000000           0         5         1 ioctl
  0.00    0.000000           0        91        91 access
  0.00    0.000000           0        20           mremap
  0.00    0.000000           0         5         3 execve
  0.00    0.000000           0         1           getcwd
  0.00    0.000000           0         4         2 readlink
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0         1           getuid
  0.00    0.000000           0         1           getgid
  0.00    0.000000           0         1           geteuid
  0.00    0.000000           0         1           getegid
  0.00    0.000000           0         2           arch_prctl
  0.00    0.000000           0         4         1 futex
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         5           openat
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    2.362624                256268       459 total


FYI - I performed my test inside a vagrant virtualbox guest with 30GB
memory and 8 CPUS assigned to the guest.

It appears that the input TIF file is read in small pieces at a time.

I have shared the results here in case any one else is looking at
optimising the performance of the conversion or already has ideas where the
code can be optimised.

Best regards,

Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20150111/bc9d3d47/attachment.html>


More information about the gdal-dev mailing list