[SAC] OSGeo Ganeti Cluster

Lance Albertson lance at osuosl.org
Thu Jul 27 13:46:35 PDT 2017


​​OSGeo Admins,

I'd like to do several changes to your Ganeti cluster eventually to bring
it up to a better supported platform and version of Ganeti as well.
Unfortunately this is going to cause some downtime for each node but I'm
pretty sure I can do it without losing data or downtime to certain VMs.
Both of your nodes are currently running Gentoo which we haven't been
maintaining other than for very important security issues that come up.
Also, the version of Ganeti is currently 2.6.2 and the latest stable
version is 2.15.2 which includes several improvements.

The summary of items I'd like to do are:

   1. Install CentOS 7 as the OS for all of the nodes
   2. Switch to managing said nodes to Chef instead of Cfengine
   3. Upgrade Ganeti from 2.6.2 to 2.15.2 (or whatever is stable at the
   point we get to this)

This is going to need to be a multi-stage process unfortunately, but I'm
hoping I only have to do one down time per node. I've tested this process
in a Vagrant environment and it seems to work.

Here's the actual steps I plan to do:

   1. Take osgeo3 down and reinstall it's OS with CentOS 7 and retain it's
   LVM data for VMs
   2. Install Ganeti 2.6.2 on osgeo3 using Chef so that the version stays
   the same throughout the whole cluster
   3. Readd osgeo3 back into the cluster using its previous configuration
   and start all the VMs back up
   4. Repeat the process of steps 1 through 3 with osgeo4
   5. Upgrade Ganeti to 2.11.8 on all the nodes (I've found this to be
   safer than jumping from 2.6.2 directly to 2.15 as they made some major
   changes to the backend in those versions)
   6. Finally upgrade Ganeti to 2.15.2 or whatever is latest stable at the
   time.

So my questions to you are:

   1. Should any of the instances below be migrated to another node during
   it's primary node downtime? If so and they're currently set to plain, we
   can convert them to DRBD, it will just take a short downtime (depending on
   how large the disk is) and move them over.
   2. When could we start doing this? I was hoping to start within the next
   month or so but it can certainly be adjusted.
   3. How should we communicate in real-time if we need to? Via #osuosl on
   IRC? Other means?

*Instance* *Primary_node* *Status* *Memory* *DiskUsage* *Disk_template*
adhoc.osgeo.osuosl.org osgeo4.osuosl.bak running 4096 65536 plain
base.osgeo.osuosl.org osgeo3.osuosl.bak ADMIN_down - 4096 plain
download.osgeo.osuosl.org osgeo3.osuosl.bak running 8192 158720 plain
mail.osgeo.osuosl.org osgeo4.osuosl.bak running 4096 75776 plain
projects.osgeo.osuosl.org osgeo4.osuosl.bak running 16384 208896 plain
qgis.osgeo.osuosl.org osgeo4.osuosl.bak running 6144 167936 plain
secure.osgeo.osuosl.org osgeo3.osuosl.bak running 4096 14464 drbd
tracsvn2.osgeo.osuosl.org osgeo3.osuosl.bak ADMIN_down - 86016 plain
tracsvn.osgeo.osuosl.org osgeo3.osuosl.bak running 8192 106496 plain
web.osgeo.osuosl.org osgeo3.osuosl.bak running 4096 36864 plain
webextra.osgeo.osuosl.org osgeo3.osuosl.bak running 4096 126976 plain
wiki.osgeo.osuosl.org osgeo3.osuosl.bak running 4096 20480 plain

​Thanks-​

-- 
Lance Albertson
Director
Oregon State University | Open Source Lab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/sac/attachments/20170727/00be144f/attachment.html>


More information about the Sac mailing list