<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#ffffff">
Thank you for the prompt replies so far! : )<br>
<br>
@Mark<br>
Great, that's the first time I heard about that method. I'm doing a
reality check here haha sadly I dont think we can do that kind of
work as of now (skill-wise) and money is an issue as well as EC2
instances cost so much more for small startups although im aware
that as the computing needs grow larger the AWS platform increases
its value so much. Do you typically use this setup for new projects?<br>
<br>
@Paul<br>
Ok at least we can cross S3/EBS or any similar service out of our
"what if" list. Not to mention it's so damn expensive when you get
to that amount of data. For the Backblaze solution I think that
would work very well for backups or archiving purposes but I'm
really concerned about storage that is viable for working with
raster images and other files mapserver renders.<br>
<br>
Unfortunately, im not a hardware guy although trying my best to
understand all of these :D<br>
What do you exactly mean by pathetic I/O on S3/EBS?<br>
<br>
@Bobb<br>
You said you went through the route of multiple CPUs, meaning you
had many mapserver machines? In your opinion is RAID 1+0 a good
enough config for something mapserver can work on?<br>
<br>
@All<br>
Does this imply that getting a single dedicated with mass storage
(everything is there, mapserver itself and the data) is not a good
idea?<br>
<br>
And lastly, what is the best practice on storing raster images? Im
not sure if you can do this with mapserver but assuming you have 1
"raster map" can you divide the images into 10 huge tiles and let
mapserver put them together at runtime? If so, what is a good way to
go about it? Like do you partition it into 1GB images? <br>
<br>
On 1/28/2011 11:50 PM, Mark Korver wrote:
<blockquote
cite="mid:AANLkTi=-u17YmgDBzrPe1CAA48t_0+9YE8=qCd=TeMwU@mail.gmail.com"
type="cite">
<pre wrap="">There are ways to use S3 as the store for source images by using tools
like s3fs (FUSE-based file system backed by Amazon S3) and writing
front end code the intercepts the incoming WMS request, filters using
a grid, then routes to the appropriate EC2 MapServer instance. This
allows particular instances to do region based caching of source data.
First request that "hits" a new source file is slow, but second one
is read from EC2's "built-in" storage. This kind of setup allows you
to run n-number of mapservers all looking at the same data stored on a
S3, but would require some work up front. The smaller the grid, the
more mapservers. and if you want to scale more you can use LB and
autoscaling.</pre>
</blockquote>
On 1/28/2011 11:29 PM, Bob Basques wrote:
<blockquote cite="mid:4D428C7C020000A800025AF2@heckle" type="cite">
<p style="margin-top: 0pt; margin-bottom: 0pt;"> <font
face="Comic Sans MS" size="3">All,</font> </p>
<br>
<p style="margin-top: 0pt; margin-bottom: 0pt;"> <font
face="Comic Sans MS" size="3">I'm working on a similar project
currently. Setting up 50tb of storage, we went the route of
multiple CPUs, with large disks. Redundant raid config, so
half of physical disk available for storage. We're in the
30+tb of real storage across a 4U setup right now. Cost (with
hardware/setup/initial config) is below those numbers below
(so far), because we're building from scratch and learning
along the way.</font> </p>
<br>
<p style="margin-top: 0pt; margin-bottom: 0pt;"> <font
face="Comic Sans MS" size="3">I would tend to agree on not
using the off site stuff, just considering the moving of the
data and the idea of co-lo to some other remote location
starts to fall apart. The transfer costs, in bandwidth
and/or time, really start to eat into things cost wise. Some
of this depends on the end uses as well. We're building a
data site for distribution of really large files and datasets.</font>
</p>
<br>
<p style="margin-top: 0pt; margin-bottom: 0pt;"> <font
face="Comic Sans MS" size="3">bobb</font> </p>
<p style="margin-top: 0pt; margin-bottom: 0pt;"> <br>
<br>
>>> Paul Spencer <a class="moz-txt-link-rfc2396E" href="mailto:pspencer@dmsolutions.ca"><pspencer@dmsolutions.ca></a> wrote:<br>
</p>
<table style="font-size: 1em; margin: 0pt 0pt 0pt 15px;"
bgcolor="#f3f3f3" border="0">
<tbody>
<tr>
<td>
<div style="border-left: 1px solid rgb(5, 5, 5);
padding-left: 7px;">
<p style="margin-top: 0pt; margin-bottom: 0pt;"> Hi,<br>
<br>
I would personally recommend against AWS S3/EBS for
anything of this scale as the I/O is pretty pathetic
unless you invest in their very high end instances.
We've set up a 4TB 'SAN' using glusterfs on AWS EC2
using 1TB EBS volumes and separate instances for each
- the performance has been so poor that we have had to
redesign our workflow to get copies of data onto EBS
attached to each mapserver instance - for scaling that
sucks and even then the I/O performance of EBS is not
that great on the normal instances.<br>
<br>
I'm not a hardware guy but I think the purpose of a
dedicated SAN box is to provide high bandwidth access
to large amounts storage so that the data can
effectively be distributed to/from multiple machines
over a network - ideal for scaling mapserver onto
multiple servers but rendering from the same data. I
read an article about a year ago from a company that
provides petabyte storage for online storage, it
details how they built their storage devices - they
say $7867 for 67 terabytes<br>
<br>
<a moz-do-not-send="true"
href="http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/">http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/</a><br>
<br>
Seems pretty geeky, but perhaps you are the hardware
type or know someone who is :)<br>
<br>
<br>
On 2011-01-28, at 3:41 AM, <a class="moz-txt-link-abbreviated" href="mailto:tigana.fluens@gmail.com">tigana.fluens@gmail.com</a>
wrote:<br>
<br>
> Hello guys, we're a startup and new to mapserver.
We're expecting large amounts of data to come by (at
least on our scale) around 40-60TB of raster images
for mapserver to render. My question is for the
infrastructure, what is the best way to store this
(cost-efficiently)?<br>
><br>
> - Do we just get a dedicated server with a lot of
HDDs? I'm looking at a 48TB setup in RAID 1+0 so i get
24TB right what happens now if we need more? Also, how
can we scale from the mapserver side? Is access to
different storage servers possible?<br>
> - I've considered SANs but then it's not
practical right because only one machine will access
the storage?<br>
> - What about Amazon's S3? or EBS? Anything we can
use on that?<br>
><br>
> I wish to get awesome advice on this storage
issue, basically what the considered best practice is
for the mapserver people :P Thanks<br>
> _______________________________________________<br>
> mapserver-users mailing list<br>
> <a class="moz-txt-link-abbreviated" href="mailto:mapserver-users@lists.osgeo.org">mapserver-users@lists.osgeo.org</a><br>
> <a moz-do-not-send="true"
href="http://lists.osgeo.org/mailman/listinfo/mapserver-users">http://lists.osgeo.org/mailman/listinfo/mapserver-users</a><br>
<br>
<br>
__________________________________________<br>
<br>
Paul Spencer<br>
Chief Technology Officer<br>
DM Solutions Group Inc<br>
<a moz-do-not-send="true"
href="http://research.dmsolutions.ca/">http://research.dmsolutions.ca/</a><br>
<br>
_______________________________________________<br>
mapserver-users mailing list<br>
<a class="moz-txt-link-abbreviated" href="mailto:mapserver-users@lists.osgeo.org">mapserver-users@lists.osgeo.org</a><br>
<a moz-do-not-send="true"
href="http://lists.osgeo.org/mailman/listinfo/mapserver-users">http://lists.osgeo.org/mailman/listinfo/mapserver-users</a><br>
</p>
</div>
</td>
</tr>
</tbody>
</table>
</blockquote>
<br>
</body>
</html>