[SAC] Ideas for the upgrades - discussion

Mon Jun 2 18:52:28 PDT 2014

I've been pondering options for how to approach the next round of upgrades.
http://wiki.osgeo.org/wiki/Infrastructure_Transition_Plan_2014

There seem to be 2 major ways of going which seem somewhat incompatible
with each other. Part of our current troubles are that we are mixing
these 2 solutions.

1. We move to software Raid 5 or Raid 10 on machines, keep going the
redundant power route. Backup plan is if something goes down to create
the service fresh on another machine that is up. With this plan I would
suggest we use VPS setup using LXC (Maybe with Docker) or OpenVZ. Each
project would get it's own LXC instance 100% separate from other
projects. Only the host needs kernel and OS updates, each of the
sub-containers can run it's own separate service stack. One method of
making redeployment faster in case of failure is if all the sites are
scripted setups using Chef,Puppet,Juju,Docker or some other means.
Single machines in this setup would likely be bigger with more disks and
power redundancy. ~$4-5000

2. Move to a cloud oriented configuration. We didn't really do this
right last time because we didn't know how it all worked then. The ideal
is that you have a series of relatively identical machines, usually
without raid, each Virtual Machine is live mirrored to one of the other
machines in the cluster. If a particular disk goes out you simply switch
to the hot-copy failover while you fix the original. By distributing the
VMs 2nd disk around to different machines you balance the cluster such
that if any one machine goes down it's pretty quick to spin up the
failovers on the remaining hardware. Disk contention is avoided by VMs
not sharing real disks much at all. This is the ideal setup of Ganeti or
OpenStack. Single machines in this setup would likely be smaller. ~$2-3000

In either option we do have the possibility of implementing large
storage separate from serving. By large storage I mean the large number
of static files we are starting to accumulate now that many projects use
something like Sphinx to generate websites instead of DB driven sites.
Then we could NFS or ISCSI mount it to a finely tuned front end for
service. Why would we do this? Well if we want to have vastly different
disk configurations, or if we wanted a BSD box so we could use ZFS.
AstroDog can explain this method more.

A few things are more clear to me:
1. We will start using XFS more for sphinx sites, and probably try to
get OS systems on at least ext4 if not XFS
2. New hard disks are likely to be a mix of SSD (120-256GB each) and
7200 rpm SATA (probably 2.5" 1TB)
3. More thought will go into which disks are used for what.
4. We need to leverage CDNs, EU mirrors, and tighter security (OWASP) to
handle nefarious traffic that brings up our loads.

Thanks,
Alex

PS: Any volunteers want to track down the liasons from each OSGeo
project to make sure they are aware of the planning and request their
input on their projected 3-5 year needs?
http://wiki.osgeo.org/wiki/Project_Steering_Committees