data proliferation or the data that ate the disk space

Sat Mar 25 06:16:23 PST 2006

Richard,

Data management is a common problem. The best practices for me have been 
to separate physical storage and logical storage. This is easiest to do 
on Linux systems with symbolic links. For physical storage, I like to 
keep datasets self contained, especially if I have to update them at any 
frequency. Because these are self contained (ie. in a single directory 
tree) it is easy to create a parallel tree with new data and just swap 
out the old data for the new data by changing the symlink to the new 
data. This also allow any data tree to reside on any partition.

For logical storage, I think in terms of maps or applications and I 
build a single directory for each. Into this directory, I link in the 
physical datasets in need and I create all the tileindexes relative to 
that directory. Then in the mapfile I set DATAPATH to point to that 
directory. So for example, I have tiger data directories for the 
separate tiger releases with physical names like:

/u/data/tiger2004fe/
/u/data/tiger2004se/
/u2/data/tiger2005fe/

In my application directory I have something like:

/u/application/tiger -> /u/data/tiger2005fe/

I call the tiger data by "tiger" regardless of the version I am showing. 
That way I can change the underlying data without the application caring 
and I don't need to rebuild the tileindexes.

If I want to move the application to another server, I move the physical 
datasets I need and the application directory and fix up the symlinks to 
point to the respective new locations. In 99% of the time I do not need 
to rebuild the tileindexes.

Hope this helps,
   -Steve W.

Richard Taylor wrote:
> Hello LIST
> 
> this is not just a MapServer question, but perhaps some of you farther 
> down the path have insights that you are willing to pass on.
> 
> As my learning curve progresses i find that local data volume is 
> increasing rapidly. It started of course with local apps, then expanded 
> with my introduction to MapServer, in my case ms4w, for getting the 
> basics, then has continued on to local directories to send up to remote 
> unix system instances.
> 
> While the mapfiles allow one to give a full path to your data, meaning 
> locally you can get at it wherever it is, that structure does not hold 
> well with or all with remote instances. the end result is multiple 
> copies of many files, some of which are quite large, one for local apps, 
> one for ms4w, and one for each remote mapserver.
> 
> One solution is to keep getting large storage space but feeling this 
> might a common problem wonder if any of the long term users or those 
> with large data volumes have come to a 'best practises' solution to this 
> issue.
> 
> thanks in advance
> 
> richard taylor
>