<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#ffffff">

    Thank you for the prompt replies so far! : )<br>

    <br>

    @Mark<br>

    Great, that's the first time I heard about that method. I'm doing a

    reality check here haha sadly I dont think we can do that kind of

    work as of now (skill-wise) and money is an issue as well as EC2

    instances cost so much more for small startups although im aware

    that as the computing needs grow larger the AWS platform increases

    its value so much. Do you typically use this setup for new projects?<br>

    <br>

    @Paul<br>

    Ok at least we can cross S3/EBS or any similar service out of our

    "what if" list. Not to mention it's so damn expensive when you get

    to that amount of data. For the Backblaze solution I think that

    would work very well for backups or archiving purposes but I'm

    really concerned about storage that is viable for working with

    raster images and other files mapserver renders.<br>

    <br>

    Unfortunately, im not a hardware guy although trying my best to

    understand all of these :D<br>

    What do you exactly mean by pathetic I/O on S3/EBS?<br>

    <br>

    @Bobb<br>

    You said you went through the route of multiple CPUs, meaning you

    had many mapserver machines? In your opinion is RAID 1+0 a good

    enough config for something mapserver can work on?<br>

    <br>

    @All<br>

    Does this imply that getting a single dedicated with mass storage

    (everything is there, mapserver itself and the data) is not a good

    idea?<br>

    <br>

    And lastly, what is the best practice on storing raster images? Im

    not sure if you can do this with mapserver but assuming you have 1

    "raster map" can you divide the images into 10 huge tiles and let

    mapserver put them together at runtime? If so, what is a good way to

    go about it? Like do you partition it into 1GB images? <br>

    <br>

    On 1/28/2011 11:50 PM, Mark Korver wrote:

    <blockquote

      cite="mid:AANLkTi=-u17YmgDBzrPe1CAA48t_0+9YE8=qCd=TeMwU@mail.gmail.com"

      type="cite">

      <pre wrap="">There are ways to use S3 as the store for source images by using tools

like s3fs (FUSE-based file system backed by Amazon S3) and writing

front end code the intercepts the incoming WMS request, filters using

a grid, then routes to the appropriate EC2 MapServer instance.  This

allows particular instances to do region based caching of source data.

 First request that "hits" a new source file is slow, but second one

is read from EC2's "built-in" storage.  This kind of setup allows you

to run n-number of mapservers all looking at the same data stored on a

S3, but would require some work up front.  The smaller the grid, the

more mapservers. and if you want to scale more you can use LB and

autoscaling.</pre>

    </blockquote>

    On 1/28/2011 11:29 PM, Bob Basques wrote:

    <blockquote cite="mid:4D428C7C020000A800025AF2@heckle" type="cite">

      <p style="margin-top: 0pt; margin-bottom: 0pt;"> <font

          face="Comic Sans MS" size="3">All,</font> </p>

      <br>

      <p style="margin-top: 0pt; margin-bottom: 0pt;"> <font

          face="Comic Sans MS" size="3">I'm working on a similar project

          currently.  Setting up 50tb of storage, we went the route of

          multiple CPUs, with large disks.   Redundant raid config, so

          half of physical disk available for storage.  We're in the

          30+tb of real storage across a 4U setup right now.  Cost (with

          hardware/setup/initial config) is below those numbers below

          (so far), because we're building from scratch and learning

          along the way.</font> </p>

      <br>

      <p style="margin-top: 0pt; margin-bottom: 0pt;"> <font

          face="Comic Sans MS" size="3">I would tend to agree on not

          using the off site stuff, just considering the moving of the

          data and the idea of co-lo to some other remote location

          starts to fall apart.   The transfer costs, in bandwidth

          and/or time, really start to eat into things cost wise.  Some

          of this depends on the end uses as well.  We're building a

          data site for distribution of really large files and datasets.</font>

      </p>

      <br>

      <p style="margin-top: 0pt; margin-bottom: 0pt;"> <font

          face="Comic Sans MS" size="3">bobb</font> </p>

      <p style="margin-top: 0pt; margin-bottom: 0pt;"> <br>

        <br>

        >>> Paul Spencer <a class="moz-txt-link-rfc2396E" href="mailto:pspencer@dmsolutions.ca"><pspencer@dmsolutions.ca></a> wrote:<br>

      </p>

      <table style="font-size: 1em; margin: 0pt 0pt 0pt 15px;"

        bgcolor="#f3f3f3" border="0">

        <tbody>

          <tr>

            <td>

              <div style="border-left: 1px solid rgb(5, 5, 5);

                padding-left: 7px;">

                <p style="margin-top: 0pt; margin-bottom: 0pt;"> Hi,<br>

                  <br>

                  I would personally recommend against AWS S3/EBS for

                  anything of this scale as the I/O is pretty pathetic

                  unless you invest in their very high end instances. 

                  We've set up a 4TB 'SAN' using glusterfs on AWS EC2

                  using 1TB EBS volumes and separate instances for each

                  - the performance has been so poor that we have had to

                  redesign our workflow to get copies of data onto EBS

                  attached to each mapserver instance - for scaling that

                  sucks and even then the I/O performance of EBS is not

                  that great on the normal instances.<br>

                  <br>

                  I'm not a hardware guy but I think the purpose of a

                  dedicated SAN box is to provide high bandwidth access

                  to large amounts storage so that the data can

                  effectively be distributed to/from multiple machines

                  over a network - ideal for scaling mapserver onto

                  multiple servers but rendering from the same data.  I 

                  read an article about a year ago from a company that

                  provides petabyte storage for online storage, it

                  details how they built their storage devices - they

                  say $7867 for 67 terabytes<br>

                  <br>

                  <a moz-do-not-send="true"

href="http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/">http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/</a><br>

                  <br>

                  Seems pretty geeky, but perhaps you are the hardware

                  type or know someone who is :)<br>

                  <br>

                  <br>

                  On 2011-01-28, at 3:41 AM, <a class="moz-txt-link-abbreviated" href="mailto:tigana.fluens@gmail.com">tigana.fluens@gmail.com</a>

                  wrote:<br>

                  <br>

                  > Hello guys, we're a startup and new to mapserver.

                  We're expecting large amounts of data to come by (at

                  least on our scale) around 40-60TB of raster images

                  for mapserver to render. My question is for the

                  infrastructure, what is the best way to store this

                  (cost-efficiently)?<br>

                  ><br>

                  > - Do we just get a dedicated server with a lot of

                  HDDs? I'm looking at a 48TB setup in RAID 1+0 so i get

                  24TB right what happens now if we need more? Also, how

                  can we scale from the mapserver side?  Is access to

                  different storage servers possible?<br>

                  > - I've considered SANs but then it's not

                  practical right because only one machine will access

                  the storage?<br>

                  > - What about Amazon's S3? or EBS? Anything we can

                  use on that?<br>

                  ><br>

                  > I wish to get awesome advice on this storage

                  issue, basically what the considered best practice is

                  for the mapserver people :P Thanks<br>

                  > _______________________________________________<br>

                  > mapserver-users mailing list<br>

                  > <a class="moz-txt-link-abbreviated" href="mailto:mapserver-users@lists.osgeo.org">mapserver-users@lists.osgeo.org</a><br>

                  > <a moz-do-not-send="true"

                    href="http://lists.osgeo.org/mailman/listinfo/mapserver-users">http://lists.osgeo.org/mailman/listinfo/mapserver-users</a><br>

                  <br>

                  <br>

                  __________________________________________<br>

                  <br>

                     Paul Spencer<br>

                     Chief Technology Officer<br>

                     DM Solutions Group Inc<br>

                     <a moz-do-not-send="true"

                    href="http://research.dmsolutions.ca/">http://research.dmsolutions.ca/</a><br>

                  <br>

                  _______________________________________________<br>

                  mapserver-users mailing list<br>

                  <a class="moz-txt-link-abbreviated" href="mailto:mapserver-users@lists.osgeo.org">mapserver-users@lists.osgeo.org</a><br>

                  <a moz-do-not-send="true"

                    href="http://lists.osgeo.org/mailman/listinfo/mapserver-users">http://lists.osgeo.org/mailman/listinfo/mapserver-users</a><br>

                </p>

              </div>

            </td>

          </tr>

        </tbody>

      </table>

    </blockquote>

    <br>

  </body>

</html>