<div dir="ltr"><div><span style="font-family:monospace,monospace">I have been informed by a colleague attempting to convert a 1.4GB TIF file using gdal_polygonize.py on a g2.2xlarge Amazon instance (8 vCPU, 15gb RAM) that the processing took over 2 weeks running constantly.  I have also been told that the same conversion using commercial tooling was completed in a few minutes.<br><br>As a result, I'm currently investigating to see if there is an opportunity for improving the performance of the gdal_polygonize.py TIF to JSON conversion. I have run a strace while attempting the same conversion, but stopped after a few hours (the gdal_polygonize.py status indicator was showing between 5% and 7.5% complete). The strace results are:<br><br><br>% time    seconds usecs/call    calls   errors syscall<br>------ ----------- ----------- --------- --------- ----------------<br> 99.40   2.348443          9   252474          read<br> 0.18   0.004139          3     1554          lseek<br> 0.12   0.002878          7      439      269 open<br> 0.10   0.002447         20      123       87 stat<br> 0.10   0.002429          5      459          mmap<br> 0.02   0.000561          3      208          munmap<br> 0.02   0.000529          2      216          fstat<br> 0.02   0.000504          3      188          mprotect<br> 0.01   0.000314          2      188          brk<br> 0.01   0.000173         29        6        5 unlink<br> 0.00   0.000109         11       10          getdents<br> 0.00   0.000098          1       67          rt_sigaction<br> 0.00   0.000000          0        4          write<br> 0.00   0.000000          0      173          close<br> 0.00   0.000000          0       12          lstat<br> 0.00   0.000000          0        1          rt_sigprocmask<br> 0.00   0.000000          0        2          rt_sigreturn<br> 0.00   0.000000          0        5        1 ioctl<br> 0.00   0.000000          0       91       91 access<br> 0.00   0.000000          0       20          mremap<br> 0.00   0.000000          0        5        3 execve<br> 0.00   0.000000          0        1          getcwd<br> 0.00   0.000000          0        4        2 readlink<br> 0.00   0.000000          0        1          getrlimit<br> 0.00   0.000000          0        1          getuid<br> 0.00   0.000000          0        1          getgid<br> 0.00   0.000000          0        1          geteuid<br> 0.00   0.000000          0        1          getegid<br> 0.00   0.000000          0        2          arch_prctl<br> 0.00   0.000000          0        4        1 futex<br> 0.00   0.000000          0        1          set_tid_address<br> 0.00   0.000000          0        5          openat<br> 0.00   0.000000          0        1          set_robust_list<br>------ ----------- ----------- --------- --------- ----------------<br>100.00   2.362624               256268      459 total<br><br><br></span></div><span style="font-family:monospace,monospace">FYI - I performed my test inside a vagrant virtualbox guest with 30GB memory and 8 CPUS assigned to the guest.<br></span><div><span style="font-family:monospace,monospace"><br>It appears that the input TIF file is read in small pieces at a time.<br></span><div><span style="font-family:monospace,monospace"><br>I have shared the results here in case any one else is looking at optimising the performance of the conversion or already has ideas where the code can be optimised.<br><br></span></div><div><span style="font-family:monospace,monospace">Best regards,<br><br></span></div><div><span style="font-family:monospace,monospace">Chris<br></span></div></div></div>