[gdal-dev] help using gdal_grid ...

Mon Jun 2 07:09:26 EDT 2008

Thanks for the feedback, Tamas.

I took the drastic step of going and looking at the code last  
night :)  I'm not a gdal developer so this was a pretty scary step for  
me.  What I discovered is that (in psuedo code) the logic of gdal_grid  
is:

gdal_grid
   for each column in outsize
     for each row in outsize
       cell = grid_function()

grid_function
   for each point in the input
     calculate cell value.

so in my case, even the least expensive operation is looping about  
133168000000000 times (14.5 million points over a grid of 3200 x 2900)  
plus doing whatever calculation is required.  This doesn't seem  
efficient to me - I can only conceive of 50 points (in my data set)  
being significant in calculating a cell value.  Even several hundred  
points being considered would be a vast improvement.

Right now, I am considering a couple of options, since it seems  
unlikely I have the skill level to change gdal_grid:

1. all my data is split over about 1800 files each covering a much  
smaller geographic area than I am computing for the final result.  I  
think a much more efficient option will be to process them  
individually and combine the resulting DEM.  However, I expect this to  
produce noticeable edge effects.  So I am thinking of combining the  
nine files surrounding the one I am working on and processing those  
points for a very small raster (say 32x32) and doing a moving window  
on the points to produce about 1800 tif files that can then be merged  
into a single DEM.

2. putting all the points into a postgis table and selecting out  
windows of points to calculate cell values and writing some sort of  
(*gasp*) python script to write individual cell values.

I just did some rough calculations on the last one and assuming 100 ms  
per query and calculation, it should complete in about 25 hours -  
waaay better!

I think a vast improvement could be made to gdal_grid by first  
processing all the input points to index them spatially then only  
passing a subset of the points to the grid function for actual  
processing.  I have no clue where to start with this in gdal/ogr  
although I suspect that much of the functionality is there in the code  
already.

Fortunately, I should see Frank today and will bend his ear about  
this :)

Cheers

Paul

On 2-Jun-08, at 5:17 AM, Tamas Szekeres wrote:

> Paul,
>
> In the recent days I've run into the same performance problem
> according to the large number of the scattered points. I admit I could
> get a significant performance increment by reducing the number of the
> floating point operations in the inner loop of the invdist
> implementation. I've also created a custom invdist when the power is
> 2.0 so the that the pow() function can totally be eliminated in this
> special case. I've been thinking of contributing back of these changes
> but I'm hesitant to mess the things up with funky optimizations inside
> the code. Maybe a bug report along with a patch would be more
> reasonable from me.
>
> Best regards,
>
> Tamas
>
>
>
> 2008/6/2 Paul Spencer <pagameba at gmail.com>:
>> Hi,
>>
>> I have a set of x,y,z data in text files.  There are about 1800  
>> individual
>> files.  Each file has several thousand points.  The sum total is  
>> about 14.5
>> million entries.
>>
>> I would like to convert this into a DEM so I can make contours and
>> hillshade.
>>
>> My first attempt has been to concatenate all the files into a  
>> single file,
>> create a VRT file, and run gdal_grid on the resulting file.  It  
>> took about 4
>> hours for gdal_grid (running at 99.5% of one core on my macbookpro)  
>> to
>> output the first '0' in the progress monitor so I abandoned this  
>> process :)
>>
>> So I would like some advice on how to tune the parameters to  
>> gdal_grid to
>> get reasonable results in a more suitable amount of time.
>>
>> The data has been collected at approximately 70 meter intervals or  
>> less
>> depending on the terrain.
>>
>> The area of coverage is in ESPG:2036 and is
>>
>> 2302989.998137,7597784.001173 - 2716502.001863,7388826.998827
>>
>> which is about 413000 m x 209000 m
>>
>> Questions:
>>
>> * what is a reasonable -outsize value?  Originally I though 5900 x  
>> 3000
>> based on the 70 m per measurement thing, but perhaps that is way  
>> too big?
>>
>> * invdist seems to be the slowest algorithm based on some quick  
>> tests on
>> individual files.  Is there much difference between average and  
>> nearest?
>> What values of radius1 and radius2 will work the fastest while still
>> producing reasonable results of the -outsize above?
>>
>> * would it be better to convert from CSV to something else (shp?)  
>> first?
>>
>> * would it be better to process individual input files then run
>> gdal_merge.py on the result?
>>
>> Cheers
>>
>> Paul
>>
>> __________________________________________
>>
>> Paul Spencer
>> Chief Technology Officer
>> DM Solutions Group Inc
>> http://www.dmsolutions.ca/
>> _______________________________________________
>> gdal-dev mailing list
>> gdal-dev at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>>