[gdal-dev] gdal_polygonize.py TIF to JSON performance
chris snow
chsnow123 at gmail.com
Sun Jan 11 09:11:53 PST 2015
I have been informed by a colleague attempting to convert a 1.4GB TIF file
using gdal_polygonize.py on a g2.2xlarge Amazon instance (8 vCPU, 15gb RAM)
that the processing took over 2 weeks running constantly. I have also
been told that the same conversion using commercial tooling was completed
in a few minutes.
As a result, I'm currently investigating to see if there is an opportunity
for improving the performance of the gdal_polygonize.py TIF to JSON
conversion. I have run a strace while attempting the same conversion, but
stopped after a few hours (the gdal_polygonize.py status indicator was
showing between 5% and 7.5% complete). The strace results are:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.40 2.348443 9 252474 read
0.18 0.004139 3 1554 lseek
0.12 0.002878 7 439 269 open
0.10 0.002447 20 123 87 stat
0.10 0.002429 5 459 mmap
0.02 0.000561 3 208 munmap
0.02 0.000529 2 216 fstat
0.02 0.000504 3 188 mprotect
0.01 0.000314 2 188 brk
0.01 0.000173 29 6 5 unlink
0.00 0.000109 11 10 getdents
0.00 0.000098 1 67 rt_sigaction
0.00 0.000000 0 4 write
0.00 0.000000 0 173 close
0.00 0.000000 0 12 lstat
0.00 0.000000 0 1 rt_sigprocmask
0.00 0.000000 0 2 rt_sigreturn
0.00 0.000000 0 5 1 ioctl
0.00 0.000000 0 91 91 access
0.00 0.000000 0 20 mremap
0.00 0.000000 0 5 3 execve
0.00 0.000000 0 1 getcwd
0.00 0.000000 0 4 2 readlink
0.00 0.000000 0 1 getrlimit
0.00 0.000000 0 1 getuid
0.00 0.000000 0 1 getgid
0.00 0.000000 0 1 geteuid
0.00 0.000000 0 1 getegid
0.00 0.000000 0 2 arch_prctl
0.00 0.000000 0 4 1 futex
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 5 openat
0.00 0.000000 0 1 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 2.362624 256268 459 total
FYI - I performed my test inside a vagrant virtualbox guest with 30GB
memory and 8 CPUS assigned to the guest.
It appears that the input TIF file is read in small pieces at a time.
I have shared the results here in case any one else is looking at
optimising the performance of the conversion or already has ideas where the
code can be optimised.
Best regards,
Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20150111/bc9d3d47/attachment.html>
More information about the gdal-dev
mailing list