[mapserver-users] Mapserver Storage
tigana.fluens at gmail.com
tigana.fluens at gmail.com
Fri Jan 28 10:57:25 EST 2011
Thank you for the prompt replies so far! : )
Great, that's the first time I heard about that method. I'm doing a
reality check here haha sadly I dont think we can do that kind of work
as of now (skill-wise) and money is an issue as well as EC2 instances
cost so much more for small startups although im aware that as the
computing needs grow larger the AWS platform increases its value so
much. Do you typically use this setup for new projects?
Ok at least we can cross S3/EBS or any similar service out of our "what
if" list. Not to mention it's so damn expensive when you get to that
amount of data. For the Backblaze solution I think that would work very
well for backups or archiving purposes but I'm really concerned about
storage that is viable for working with raster images and other files
Unfortunately, im not a hardware guy although trying my best to
understand all of these :D
What do you exactly mean by pathetic I/O on S3/EBS?
You said you went through the route of multiple CPUs, meaning you had
many mapserver machines? In your opinion is RAID 1+0 a good enough
config for something mapserver can work on?
Does this imply that getting a single dedicated with mass storage
(everything is there, mapserver itself and the data) is not a good idea?
And lastly, what is the best practice on storing raster images? Im not
sure if you can do this with mapserver but assuming you have 1 "raster
map" can you divide the images into 10 huge tiles and let mapserver put
them together at runtime? If so, what is a good way to go about it? Like
do you partition it into 1GB images?
On 1/28/2011 11:50 PM, Mark Korver wrote:
> There are ways to use S3 as the store for source images by using tools
> like s3fs (FUSE-based file system backed by Amazon S3) and writing
> front end code the intercepts the incoming WMS request, filters using
> a grid, then routes to the appropriate EC2 MapServer instance. This
> allows particular instances to do region based caching of source data.
> First request that "hits" a new source file is slow, but second one
> is read from EC2's "built-in" storage. This kind of setup allows you
> to run n-number of mapservers all looking at the same data stored on a
> S3, but would require some work up front. The smaller the grid, the
> more mapservers. and if you want to scale more you can use LB and
On 1/28/2011 11:29 PM, Bob Basques wrote:
> I'm working on a similar project currently. Setting up 50tb of
> storage, we went the route of multiple CPUs, with large disks.
> Redundant raid config, so half of physical disk available for
> storage. We're in the 30+tb of real storage across a 4U setup right
> now. Cost (with hardware/setup/initial config) is below those numbers
> below (so far), because we're building from scratch and learning along
> the way.
> I would tend to agree on not using the off site stuff, just
> considering the moving of the data and the idea of co-lo to some other
> remote location starts to fall apart. The transfer costs, in
> bandwidth and/or time, really start to eat into things cost wise.
> Some of this depends on the end uses as well. We're building a data
> site for distribution of really large files and datasets.
> >>> Paul Spencer <pspencer at dmsolutions.ca> wrote:
> I would personally recommend against AWS S3/EBS for anything of this
> scale as the I/O is pretty pathetic unless you invest in their very
> high end instances. We've set up a 4TB 'SAN' using glusterfs on AWS
> EC2 using 1TB EBS volumes and separate instances for each - the
> performance has been so poor that we have had to redesign our workflow
> to get copies of data onto EBS attached to each mapserver instance -
> for scaling that sucks and even then the I/O performance of EBS is not
> that great on the normal instances.
> I'm not a hardware guy but I think the purpose of a dedicated SAN box
> is to provide high bandwidth access to large amounts storage so that
> the data can effectively be distributed to/from multiple machines over
> a network - ideal for scaling mapserver onto multiple servers but
> rendering from the same data. I read an article about a year ago
> from a company that provides petabyte storage for online storage, it
> details how they built their storage devices - they say $7867 for 67
> Seems pretty geeky, but perhaps you are the hardware type or know
> someone who is :)
> On 2011-01-28, at 3:41 AM, tigana.fluens at gmail.com wrote:
> > Hello guys, we're a startup and new to mapserver. We're expecting
> large amounts of data to come by (at least on our scale) around
> 40-60TB of raster images for mapserver to render. My question is for
> the infrastructure, what is the best way to store this (cost-efficiently)?
> > - Do we just get a dedicated server with a lot of HDDs? I'm looking
> at a 48TB setup in RAID 1+0 so i get 24TB right what happens now if we
> need more? Also, how can we scale from the mapserver side? Is access
> to different storage servers possible?
> > - I've considered SANs but then it's not practical right because
> only one machine will access the storage?
> > - What about Amazon's S3? or EBS? Anything we can use on that?
> > I wish to get awesome advice on this storage issue, basically what
> the considered best practice is for the mapserver people :P Thanks
> > _______________________________________________
> > mapserver-users mailing list
> > mapserver-users at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/mapserver-users
> Paul Spencer
> Chief Technology Officer
> DM Solutions Group Inc
> mapserver-users mailing list
> mapserver-users at lists.osgeo.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mapserver-users