[OSGeo-Discuss] Image Management in an RDBMS...(was OS Spatialenvironment 'sizing')

Andy Turner A.G.D.Turner at leeds.ac.uk
Fri Feb 22 08:27:48 PST 2008


Hi,
 
I'm processing a dataset for the Cairngorms National Park in the UK.
This source is NextMap data at a 5 metre square gridded raster. It has
30000 columns and 24000 rows. Amongst other things I calculate roughness
for kernels taking in all values within 64 celldistances. The roughness
output is calculated at the same resolution as the input (along with
around 60 other metrics).
 
This is a small dataset in comparison with some data for Mars that I am
processing in a similar way. I am also grappling with the SRTM90 data
from 60South to 60North this has several hundred thousands of rows and
columns. On the hardware side, I need terrabytes of disc space, but only
one or so gigabytes of faster access memory to do this work. Point is as
many of you know, there are big raster datasets out there and now is as
good a time as any to process them.
 
I split the data into chunks and store them as files on disc. I have
problems when the number of files gets too large and when the size of
each chunk gets too large. I compromise and at the moment and tend to
use chunks with about 500 row and 500 columns (I could use any with my
program so long as all chunks have the same dimensions). The problem of
too many files I think is an operating system problem. The problem of
too large chunks is more down to the implementation of raster processing
and it's memory handling. I try to hold enough data in memory in my
program so that I get answers in a reasonable time frame. (BTW, my
programs are FOSS and I'll release a new version of Grids soon which you
can pick up via http://www.geog.leeds.ac.uk/people/a.turner/src/).
 
I do the Geomorphometrics processing both on my PC and on some High End
Computers. On HECs I am considering using a federated datastore like SRB
and parallelising as the task is "embarasingly parallel" so it is
reasonably easy to do (please excuse my spelling). I am also looking
into more Grid/Web Services SOA ways of doing this.
 
In the past I have used the blobs database approach. I don't know which
is best, but I'm working with files again now. In the past I have found
that for some things RandomAccessFiles are best and directly
manipulating information on disc rather than say using a swap appraoch.
I think what is best all depends on what you are doing, how many raster
datasets you simultaneously use in the processing. Nearly all of my
processing these days involves computing for kernels at the same
resolution of the inputs with (sometimes) multiple inputs (but usually
just one input) and multiple outputs (about 50 or so).
 
Best wishes,
 
Andy
http://www.geog.leeds.ac.uk/people/a.turner/
  
 

________________________________

From: discuss-bounces at lists.osgeo.org
[mailto:discuss-bounces at lists.osgeo.org] On Behalf Of
Bruce.Bannerman at dpi.vic.gov.au
Sent: 22 February 2008 04:53
To: OSGeo Discussions
Subject: Re: [OSGeo-Discuss] Image Management in an RDBMS...(was OS
Spatialenvironment 'sizing')



IMO 

> 
> 12 million records is teensy. Stuff it into PostGIS. It's the billion-

> point LIDAR sets that leave me queasy, but I can't begin to think of a

> reasonable architecture for that without learning more about how the  
> points are actually USED, which I really am not clear on at the
moment.
> 

Paul, 

Agreed. 

Generation of TINs or surfaces of roughness over that number of points
will challenge any data management solution. 

However, the time is coming / has come when people will want to do it. 

It is perhaps a good candidate for Grid architectures and high
performance computing. 

Bruce 

Notice:
This email and any attachments may contain information that is personal,
confidential,
legally privileged and/or copyright. No part of it should be reproduced,
adapted or communicated without the prior written consent of the
copyright owner. 

It is the responsibility of the recipient to check for and remove
viruses.

If you have received this email in error, please notify the sender by
return email, delete it from your system and destroy any copies. You are
not authorised to use, communicate or rely on the information contained
in this email.

Please consider the environment before printing this email.

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/discuss/attachments/20080222/d8192cb4/attachment-0002.html>


More information about the Discuss mailing list