<div dir="ltr"><div><span style="font-family:monospace,monospace">I have been informed by a colleague attempting to convert a 1.4GB TIF file using gdal_polygonize.py on a g2.2xlarge Amazon instance (8 vCPU, 15gb RAM) that the processing took over 2 weeks running constantly.   I have also been told that the same conversion using commercial tooling was completed in a few minutes.<br><br>As a result, I'm currently investigating to see if there is an opportunity for improving the performance of the gdal_polygonize.py TIF to JSON conversion.  I have run a strace while attempting the same conversion, but stopped after a few hours (the gdal_polygonize.py status indicator was showing between 5% and 7.5% complete).  The strace results are:<br><br><br>% time     seconds  usecs/call     calls    errors syscall<br>------ ----------- ----------- --------- --------- ----------------<br> 99.40    2.348443           9    252474           read<br>  0.18    0.004139           3      1554           lseek<br>  0.12    0.002878           7       439       269 open<br>  0.10    0.002447          20       123        87 stat<br>  0.10    0.002429           5       459           mmap<br>  0.02    0.000561           3       208           munmap<br>  0.02    0.000529           2       216           fstat<br>  0.02    0.000504           3       188           mprotect<br>  0.01    0.000314           2       188           brk<br>  0.01    0.000173          29         6         5 unlink<br>  0.00    0.000109          11        10           getdents<br>  0.00    0.000098           1        67           rt_sigaction<br>  0.00    0.000000           0         4           write<br>  0.00    0.000000           0       173           close<br>  0.00    0.000000           0        12           lstat<br>  0.00    0.000000           0         1           rt_sigprocmask<br>  0.00    0.000000           0         2           rt_sigreturn<br>  0.00    0.000000           0         5         1 ioctl<br>  0.00    0.000000           0        91        91 access<br>  0.00    0.000000           0        20           mremap<br>  0.00    0.000000           0         5         3 execve<br>  0.00    0.000000           0         1           getcwd<br>  0.00    0.000000           0         4         2 readlink<br>  0.00    0.000000           0         1           getrlimit<br>  0.00    0.000000           0         1           getuid<br>  0.00    0.000000           0         1           getgid<br>  0.00    0.000000           0         1           geteuid<br>  0.00    0.000000           0         1           getegid<br>  0.00    0.000000           0         2           arch_prctl<br>  0.00    0.000000           0         4         1 futex<br>  0.00    0.000000           0         1           set_tid_address<br>  0.00    0.000000           0         5           openat<br>  0.00    0.000000           0         1           set_robust_list<br>------ ----------- ----------- --------- --------- ----------------<br>100.00    2.362624                256268       459 total<br><br><br></span></div><span style="font-family:monospace,monospace">FYI - I performed my test inside a vagrant virtualbox guest with 30GB memory and 8 CPUS assigned to the guest.<br></span><div><span style="font-family:monospace,monospace"><br>It appears that the input TIF file is read in small pieces at a time.<br></span><div><span style="font-family:monospace,monospace"><br>I have shared the results here in case any one else is looking at optimising the performance of the conversion or already has ideas where the code can be optimised.<br><br></span></div><div><span style="font-family:monospace,monospace">Best regards,<br><br></span></div><div><span style="font-family:monospace,monospace">Chris<br></span></div></div></div>