[gdal-dev] help using gdal_grid ...
Paul Spencer
pagameba at gmail.com
Mon Jun 2 07:09:26 EDT 2008
Thanks for the feedback, Tamas.
I took the drastic step of going and looking at the code last
night :) I'm not a gdal developer so this was a pretty scary step for
me. What I discovered is that (in psuedo code) the logic of gdal_grid
is:
gdal_grid
for each column in outsize
for each row in outsize
cell = grid_function()
grid_function
for each point in the input
calculate cell value.
so in my case, even the least expensive operation is looping about
133168000000000 times (14.5 million points over a grid of 3200 x 2900)
plus doing whatever calculation is required. This doesn't seem
efficient to me - I can only conceive of 50 points (in my data set)
being significant in calculating a cell value. Even several hundred
points being considered would be a vast improvement.
Right now, I am considering a couple of options, since it seems
unlikely I have the skill level to change gdal_grid:
1. all my data is split over about 1800 files each covering a much
smaller geographic area than I am computing for the final result. I
think a much more efficient option will be to process them
individually and combine the resulting DEM. However, I expect this to
produce noticeable edge effects. So I am thinking of combining the
nine files surrounding the one I am working on and processing those
points for a very small raster (say 32x32) and doing a moving window
on the points to produce about 1800 tif files that can then be merged
into a single DEM.
2. putting all the points into a postgis table and selecting out
windows of points to calculate cell values and writing some sort of
(*gasp*) python script to write individual cell values.
I just did some rough calculations on the last one and assuming 100 ms
per query and calculation, it should complete in about 25 hours -
waaay better!
I think a vast improvement could be made to gdal_grid by first
processing all the input points to index them spatially then only
passing a subset of the points to the grid function for actual
processing. I have no clue where to start with this in gdal/ogr
although I suspect that much of the functionality is there in the code
already.
Fortunately, I should see Frank today and will bend his ear about
this :)
Cheers
Paul
On 2-Jun-08, at 5:17 AM, Tamas Szekeres wrote:
> Paul,
>
> In the recent days I've run into the same performance problem
> according to the large number of the scattered points. I admit I could
> get a significant performance increment by reducing the number of the
> floating point operations in the inner loop of the invdist
> implementation. I've also created a custom invdist when the power is
> 2.0 so the that the pow() function can totally be eliminated in this
> special case. I've been thinking of contributing back of these changes
> but I'm hesitant to mess the things up with funky optimizations inside
> the code. Maybe a bug report along with a patch would be more
> reasonable from me.
>
> Best regards,
>
> Tamas
>
>
>
> 2008/6/2 Paul Spencer <pagameba at gmail.com>:
>> Hi,
>>
>> I have a set of x,y,z data in text files. There are about 1800
>> individual
>> files. Each file has several thousand points. The sum total is
>> about 14.5
>> million entries.
>>
>> I would like to convert this into a DEM so I can make contours and
>> hillshade.
>>
>> My first attempt has been to concatenate all the files into a
>> single file,
>> create a VRT file, and run gdal_grid on the resulting file. It
>> took about 4
>> hours for gdal_grid (running at 99.5% of one core on my macbookpro)
>> to
>> output the first '0' in the progress monitor so I abandoned this
>> process :)
>>
>> So I would like some advice on how to tune the parameters to
>> gdal_grid to
>> get reasonable results in a more suitable amount of time.
>>
>> The data has been collected at approximately 70 meter intervals or
>> less
>> depending on the terrain.
>>
>> The area of coverage is in ESPG:2036 and is
>>
>> 2302989.998137,7597784.001173 - 2716502.001863,7388826.998827
>>
>> which is about 413000 m x 209000 m
>>
>> Questions:
>>
>> * what is a reasonable -outsize value? Originally I though 5900 x
>> 3000
>> based on the 70 m per measurement thing, but perhaps that is way
>> too big?
>>
>> * invdist seems to be the slowest algorithm based on some quick
>> tests on
>> individual files. Is there much difference between average and
>> nearest?
>> What values of radius1 and radius2 will work the fastest while still
>> producing reasonable results of the -outsize above?
>>
>> * would it be better to convert from CSV to something else (shp?)
>> first?
>>
>> * would it be better to process individual input files then run
>> gdal_merge.py on the result?
>>
>> Cheers
>>
>> Paul
>>
>> __________________________________________
>>
>> Paul Spencer
>> Chief Technology Officer
>> DM Solutions Group Inc
>> http://www.dmsolutions.ca/
>> _______________________________________________
>> gdal-dev mailing list
>> gdal-dev at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
More information about the gdal-dev
mailing list