[mapserver-users] Mapserver Storage

Fri Jan 28 10:57:25 EST 2011

Thank you for the prompt replies so far! : )

@Mark
Great, that's the first time I heard about that method. I'm doing a 
reality check here haha sadly I dont think we can do that kind of work 
as of now (skill-wise) and money is an issue as well as EC2 instances 
cost so much more for small startups although im aware that as the 
computing needs grow larger the AWS platform increases its value so 
much. Do you typically use this setup for new projects?

@Paul
Ok at least we can cross S3/EBS or any similar service out of our "what 
if" list. Not to mention it's so damn expensive when you get to that 
amount of data. For the Backblaze solution I think that would work very 
well for backups or archiving purposes but I'm really concerned about 
storage that is viable for working with raster images and other files 
mapserver renders.

Unfortunately, im not a hardware guy although trying my best to 
understand all of these :D
What do you exactly mean by pathetic I/O on S3/EBS?

@Bobb
You said you went through the route of multiple CPUs, meaning you had 
many mapserver machines? In your opinion is RAID 1+0 a good enough 
config for something mapserver can work on?

@All
Does this imply that getting a single dedicated with mass storage 
(everything is there, mapserver itself and the data) is not a good idea?

And lastly, what is the best practice on storing raster images? Im not 
sure if you can do this with mapserver but assuming you have 1 "raster 
map" can you divide the images into 10 huge tiles and let mapserver put 
them together at runtime? If so, what is a good way to go about it? Like 
do you partition it into 1GB images?

On 1/28/2011 11:50 PM, Mark Korver wrote:
> There are ways to use S3 as the store for source images by using tools
> like s3fs (FUSE-based file system backed by Amazon S3) and writing
> front end code the intercepts the incoming WMS request, filters using
> a grid, then routes to the appropriate EC2 MapServer instance.  This
> allows particular instances to do region based caching of source data.
>   First request that "hits" a new source file is slow, but second one
> is read from EC2's "built-in" storage.  This kind of setup allows you
> to run n-number of mapservers all looking at the same data stored on a
> S3, but would require some work up front.  The smaller the grid, the
> more mapservers. and if you want to scale more you can use LB and
> autoscaling.
On 1/28/2011 11:29 PM, Bob Basques wrote:
>
> All,
>
>
> I'm working on a similar project currently.  Setting up 50tb of 
> storage, we went the route of multiple CPUs, with large disks. 
>   Redundant raid config, so half of physical disk available for 
> storage.  We're in the 30+tb of real storage across a 4U setup right 
> now.  Cost (with hardware/setup/initial config) is below those numbers 
> below (so far), because we're building from scratch and learning along 
> the way.
>
>
> I would tend to agree on not using the off site stuff, just 
> considering the moving of the data and the idea of co-lo to some other 
> remote location starts to fall apart.   The transfer costs, in 
> bandwidth and/or time, really start to eat into things cost wise. 
>  Some of this depends on the end uses as well.  We're building a data 
> site for distribution of really large files and datasets.
>
>
> bobb
>
>
>
> >>> Paul Spencer <pspencer at dmsolutions.ca> wrote:
>
> Hi,
>
> I would personally recommend against AWS S3/EBS for anything of this 
> scale as the I/O is pretty pathetic unless you invest in their very 
> high end instances.  We've set up a 4TB 'SAN' using glusterfs on AWS 
> EC2 using 1TB EBS volumes and separate instances for each - the 
> performance has been so poor that we have had to redesign our workflow 
> to get copies of data onto EBS attached to each mapserver instance - 
> for scaling that sucks and even then the I/O performance of EBS is not 
> that great on the normal instances.
>
> I'm not a hardware guy but I think the purpose of a dedicated SAN box 
> is to provide high bandwidth access to large amounts storage so that 
> the data can effectively be distributed to/from multiple machines over 
> a network - ideal for scaling mapserver onto multiple servers but 
> rendering from the same data.  I  read an article about a year ago 
> from a company that provides petabyte storage for online storage, it 
> details how they built their storage devices - they say $7867 for 67 
> terabytes
>
> http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/
>
> Seems pretty geeky, but perhaps you are the hardware type or know 
> someone who is :)
>
>
> On 2011-01-28, at 3:41 AM, tigana.fluens at gmail.com wrote:
>
> > Hello guys, we're a startup and new to mapserver. We're expecting 
> large amounts of data to come by (at least on our scale) around 
> 40-60TB of raster images for mapserver to render. My question is for 
> the infrastructure, what is the best way to store this (cost-efficiently)?
> >
> > - Do we just get a dedicated server with a lot of HDDs? I'm looking 
> at a 48TB setup in RAID 1+0 so i get 24TB right what happens now if we 
> need more? Also, how can we scale from the mapserver side?  Is access 
> to different storage servers possible?
> > - I've considered SANs but then it's not practical right because 
> only one machine will access the storage?
> > - What about Amazon's S3? or EBS? Anything we can use on that?
> >
> > I wish to get awesome advice on this storage issue, basically what 
> the considered best practice is for the mapserver people :P Thanks
> > _______________________________________________
> > mapserver-users mailing list
> > mapserver-users at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/mapserver-users
>
>
> __________________________________________
>
>    Paul Spencer
>    Chief Technology Officer
>    DM Solutions Group Inc
> http://research.dmsolutions.ca/
>
> _______________________________________________
> mapserver-users mailing list
> mapserver-users at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/mapserver-users
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/mapserver-users/attachments/20110128/0461f510/attachment.html