[gdal-dev] help using gdal_grid ...

Thu Jun 5 09:09:29 EDT 2008

Thanks Andrey.  Sorry for the delay in replying - I was busy trying  
different options :)  I'll say at the start that I ended up using  
GRASS and getting the result I wanted in pretty decent time.  I'd  
never used GRASS before and I was totally amazed at how quickly I was  
able to get what I wanted done - I guess you could say that GDAL drove  
me to use GRASS ;)

What I did try to do with gdal_grid was to process the individual  
files.  Each file consisted of several thousand points collected over  
a roughly square area for which I know exactly what the bounding box  
is.  My script combined the target file with the surrounding files (if  
they existed) so I had a moving 3x3 grid of input files to feed the  
smaller central file with points to feed gdal_grid to produce a raster  
of about 40x40 pixels to get my desired resolution of 70m per pixel.   
I also broke it up so that I could run several processes  
simultaneously.  On an amazon ec2 high compute instance, I let this  
run for about 24 hours with 4 parallel processes and it still didn't  
finish.

I also tried to combine some of the resulting tifs using gdal_merge.py  
to see what the output was looking like, and ran into problems with  
the tif files being vertically upside down (thus confusing  
gdal_merge).  I'm reasonably certain that the input was being fed in  
correctly according to the docs so I am not sure why they are upside- 
down - I saw another post recently with this problem but I didn't dig  
into it and assume that the problem is on my side.

Ok, so that's what I tried with about 1.8 million points (I miscounted  
originally) and I never actually succeeded in producing a DEM.  GRASS  
was able to suck in the points and produce a smoothed DEM in about 4  
hours on my mac including my learning curve having never actually used  
GRASS.  It is entirely possible that my scripts are inefficient etc  
but it seems to me that gdal_grid is not really suitable for  
processing large amounts of elevation data at this point in time.

With all due respect to the developers, I would suggest that the web  
page for gdal_grid give a bit more detail about the potential  
performance problems and some strategies for optimizing gdal_grid.

It might also be a reasonable enhancement to introduce a mechanism for  
windowing the input data into small buckets based on some grid size -  
for instance, it might be very reasonable to have a 10x10 window and  
pre-process the input points into each spatial area then run the  
analysis on the 10x10 window using the points from that window plus  
the surrounding 8 windows (for instance).  I think this approach would  
(in at least my case) have produced much faster results with no  
difference in the resulting output.

Cheers

Paul

On 2-Jun-08, at 7:29 AM, Andrey Kiselev wrote:

> Paul,
>
> On Sun, Jun 01, 2008 at 09:16:08PM -0400, Paul Spencer wrote:
>> I have a set of x,y,z data in text files.  There are about 1800
>> individual files.  Each file has several thousand points.  The sum
>> total is about 14.5 million entries.
>
> For every pixel of the output raster the whole set of the input points
> will be revowed. So the best way to go is to split your area in  
> smaller
> tiles. If you have the one XYZ file per tile then you can just combine
> 9 XYZ files together for each output raster tile and do gdal_grid on
> that smaller sets. Then gdal_merge resukting raster tiles. This should
> be pretty good scriptable, the trick is only required on region  
> borders.
>
>> * what is a reasonable -outsize value?  Originally I though 5900 x
>> 3000 based on the 70 m per measurement thing, but perhaps that is way
>> too big?
>
> That is totally depend on what is your final purpose and expected  
> final
> resolution. If you want to get the best possible resolution the  
> choose a
> step that is close to average distance between your input points (at
> someday computation of this metric will be added to the list of  
> gdal_grid
> features :-).
>
>> * invdist seems to be the slowest algorithm based on some quick tests
>> on individual files.  Is there much difference between average and
>> nearest?  What values of radius1 and radius2 will work the fastest
>> while still producing reasonable results of the -outsize above?
>
> I have created the preliminary version of the GDAL Grid tutorial. It
> does not contain too much examples yet, but basic information is  
> already
> there:
>
> http://www.gdal.org/grid_tutorial.html
>
> Nearest Neighbour is a way different from the Moving Average. Actually
> the best usage of NN is to convert XYZ array, created from the regular
> grid back into that grid. If you know that your points are located  
> near
> the grid nodes then NN will do the job for you.
>
>> * would it be better to convert from CSV to something else (shp?)
>> first?
>
> No way. But I think that importing data in database with tiling  
> approach
> suggested above may help. Spatial filtering could be done efficienly
> inside DB.
>
>> * would it be better to process individual input files then run
>> gdal_merge.py on the result?
>
> Yes, I think so.
>
> Best regards,
> Andrey
>
> -- 
> Andrey V. Kiselev
> ICQ# 26871517
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev