[gdal-dev] help using gdal_grid ...
Paul Spencer
pagameba at gmail.com
Thu Jun 5 09:09:29 EDT 2008
Thanks Andrey. Sorry for the delay in replying - I was busy trying
different options :) I'll say at the start that I ended up using
GRASS and getting the result I wanted in pretty decent time. I'd
never used GRASS before and I was totally amazed at how quickly I was
able to get what I wanted done - I guess you could say that GDAL drove
me to use GRASS ;)
What I did try to do with gdal_grid was to process the individual
files. Each file consisted of several thousand points collected over
a roughly square area for which I know exactly what the bounding box
is. My script combined the target file with the surrounding files (if
they existed) so I had a moving 3x3 grid of input files to feed the
smaller central file with points to feed gdal_grid to produce a raster
of about 40x40 pixels to get my desired resolution of 70m per pixel.
I also broke it up so that I could run several processes
simultaneously. On an amazon ec2 high compute instance, I let this
run for about 24 hours with 4 parallel processes and it still didn't
finish.
I also tried to combine some of the resulting tifs using gdal_merge.py
to see what the output was looking like, and ran into problems with
the tif files being vertically upside down (thus confusing
gdal_merge). I'm reasonably certain that the input was being fed in
correctly according to the docs so I am not sure why they are upside-
down - I saw another post recently with this problem but I didn't dig
into it and assume that the problem is on my side.
Ok, so that's what I tried with about 1.8 million points (I miscounted
originally) and I never actually succeeded in producing a DEM. GRASS
was able to suck in the points and produce a smoothed DEM in about 4
hours on my mac including my learning curve having never actually used
GRASS. It is entirely possible that my scripts are inefficient etc
but it seems to me that gdal_grid is not really suitable for
processing large amounts of elevation data at this point in time.
With all due respect to the developers, I would suggest that the web
page for gdal_grid give a bit more detail about the potential
performance problems and some strategies for optimizing gdal_grid.
It might also be a reasonable enhancement to introduce a mechanism for
windowing the input data into small buckets based on some grid size -
for instance, it might be very reasonable to have a 10x10 window and
pre-process the input points into each spatial area then run the
analysis on the 10x10 window using the points from that window plus
the surrounding 8 windows (for instance). I think this approach would
(in at least my case) have produced much faster results with no
difference in the resulting output.
Cheers
Paul
On 2-Jun-08, at 7:29 AM, Andrey Kiselev wrote:
> Paul,
>
> On Sun, Jun 01, 2008 at 09:16:08PM -0400, Paul Spencer wrote:
>> I have a set of x,y,z data in text files. There are about 1800
>> individual files. Each file has several thousand points. The sum
>> total is about 14.5 million entries.
>
> For every pixel of the output raster the whole set of the input points
> will be revowed. So the best way to go is to split your area in
> smaller
> tiles. If you have the one XYZ file per tile then you can just combine
> 9 XYZ files together for each output raster tile and do gdal_grid on
> that smaller sets. Then gdal_merge resukting raster tiles. This should
> be pretty good scriptable, the trick is only required on region
> borders.
>
>> * what is a reasonable -outsize value? Originally I though 5900 x
>> 3000 based on the 70 m per measurement thing, but perhaps that is way
>> too big?
>
> That is totally depend on what is your final purpose and expected
> final
> resolution. If you want to get the best possible resolution the
> choose a
> step that is close to average distance between your input points (at
> someday computation of this metric will be added to the list of
> gdal_grid
> features :-).
>
>> * invdist seems to be the slowest algorithm based on some quick tests
>> on individual files. Is there much difference between average and
>> nearest? What values of radius1 and radius2 will work the fastest
>> while still producing reasonable results of the -outsize above?
>
> I have created the preliminary version of the GDAL Grid tutorial. It
> does not contain too much examples yet, but basic information is
> already
> there:
>
> http://www.gdal.org/grid_tutorial.html
>
> Nearest Neighbour is a way different from the Moving Average. Actually
> the best usage of NN is to convert XYZ array, created from the regular
> grid back into that grid. If you know that your points are located
> near
> the grid nodes then NN will do the job for you.
>
>> * would it be better to convert from CSV to something else (shp?)
>> first?
>
> No way. But I think that importing data in database with tiling
> approach
> suggested above may help. Spatial filtering could be done efficienly
> inside DB.
>
>> * would it be better to process individual input files then run
>> gdal_merge.py on the result?
>
> Yes, I think so.
>
> Best regards,
> Andrey
>
> --
> Andrey V. Kiselev
> ICQ# 26871517
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
More information about the gdal-dev
mailing list