[SAC] OSGeo Ganeti Cluster

Thu Dec 14 13:14:38 PST 2017

Resending this plain text cause got bounced first time.

Lance,

We have plans to retire osgeo4 so may not be worthwhile to upgrade that.

After we get the new hardware, would it be possible for you to do the upgrade on the new hardware we send, and then move all the VMs on osgeo4 to the new hardware and then chuck osgeo4 (or use for whatever you want)?

I think that would be ideal if it's not too much trouble.

See our upcoming agenda items for reference.

https://wiki.osgeo.org/wiki/SAC_Meeting_2017-12-21

Alex please correct me if I misspoke.

Thanks,
Regina

From: Sac [mailto:sac-bounces at lists.osgeo.org] On Behalf Of Lance Albertson
Sent: Thursday, December 14, 2017 3:46 PM
To: sac at lists.osgeo.org
Cc: systems at osuosl.org; sysadmin at osgeo.org
Subject: Re: [SAC] OSGeo Ganeti Cluster

Hi All,

I'm not sure if you got my original email back in July but I'm finally ready to start scheduling this. I'd like to amend my plan below to the following:

Summary:
1. Upgrade Ganeti from 2.6.2 to 2.15.2
2. Install CentOS 7 as the OS for all of the nodes
3. Switch to managing said nodes to Chef instead of Cfengine
Here's the actual steps I plan to do:
1. 
U
pgrade Ganeti to 2.15.2 on the current cluster from 2.6.2

2. Migrating high priority instances from plain to drbd using --no-wait-for-sync [1]
3. Failover instances on osgeo3 to osgeo4

4. Take osgeo3 down and reinstall it's OS with CentOS 7 and retain it's LVM data for VMs
5. Readd osgeo3 back into the cluster using its previous configuration and start all the VMs back up
6. Repeat the process of steps 
3
through 
5
 with osgeo4
I'd like to go ahead with #1 and then schedule a time to do #2 after that's completed.

Let me know!

[1] The -t (--disk-template) option will change the disk template of the instance. Currently only conversions between the plain and drbd disk templates are supported, and the instance must be stopped before attempting the conversion. When changing from the plain to the drbd disk template, a new secondary node must be specified via the -n option. The option --no-wait-for-sync can be used when converting to the drbd template in order to make the instance available for startup before DRBD has finished resyncing.

On Thu, Jul 27, 2017 at 1:46 PM, Lance Albertson <lance at osuosl.org> wrote:
OSGeo Admins,

I'd like to do several changes to your Ganeti cluster eventually to bring it up to a better supported platform and version of Ganeti as well. Unfortunately this is going to cause some downtime for each node but I'm pretty sure I can do it without losing data or downtime to certain VMs.  Both of your nodes are currently running Gentoo which we haven't been maintaining other than for very important security issues that come up. Also, the version of Ganeti is currently 2.6.2 and the latest stable version is 2.15.2 which includes several improvements.

The summary of items I'd like to do are:
1. Install CentOS 7 as the OS for all of the nodes
2. Switch to managing said nodes to Chef instead of Cfengine
3. Upgrade Ganeti from 2.6.2 to 2.15.2 (or whatever is stable at the point we get to this)
This is going to need to be a multi-stage process unfortunately, but I'm hoping I only have to do one down time per node. I've tested this process in a Vagrant environment and it seems to work.

Here's the actual steps I plan to do:
1. Take osgeo3 down and reinstall it's OS with CentOS 7 and retain it's LVM data for VMs
2. Install Ganeti 2.6.2 on osgeo3 using Chef so that the version stays the same throughout the whole cluster
3. Readd osgeo3 back into the cluster using its previous configuration and start all the VMs back up
4. Repeat the process of steps 1 through 3 with osgeo4
5. Upgrade Ganeti to 2.11.8 on all the nodes (I've found this to be safer than jumping from 2.6.2 directly to 2.15 as they made some major changes to the backend in those versions)
6. Finally upgrade Ganeti to 2.15.2 or whatever is latest stable at the time.
So my questions to you are:
1. Should any of the instances below be migrated to another node during it's primary node downtime? If so and they're currently set to plain, we can convert them to DRBD, it will just take a short downtime (depending on how large the disk is) and move them over.
2. When could we start doing this? I was hoping to start within the next month or so but it can certainly be adjusted.
3. How should we communicate in real-time if we need to? Via #osuosl on IRC? Other means?
Instance
Primary_node
Status
Memory
DiskUsage
Disk_template
adhoc.osgeo.osuosl.org
osgeo4.osuosl.bak
running
4096
65536
plain
base.osgeo.osuosl.org
osgeo3.osuosl.bak
ADMIN_down
-
4096
plain
download.osgeo.osuosl.org
osgeo3.osuosl.bak
running
8192
158720
plain
mail.osgeo.osuosl.org
osgeo4.osuosl.bak
running
4096
75776
plain
projects.osgeo.osuosl.org
osgeo4.osuosl.bak
running
16384
208896
plain
qgis.osgeo.osuosl.org
osgeo4.osuosl.bak
running
6144
167936
plain
secure.osgeo.osuosl.org
osgeo3.osuosl.bak
running
4096
14464
drbd
tracsvn2.osgeo.osuosl.org
osgeo3.osuosl.bak
ADMIN_down
-
86016
plain
tracsvn.osgeo.osuosl.org
osgeo3.osuosl.bak
running
8192
106496
plain
web.osgeo.osuosl.org
osgeo3.osuosl.bak
running
4096
36864
plain
webextra.osgeo.osuosl.org
osgeo3.osuosl.bak
running
4096
126976
plain
wiki.osgeo.osuosl.org
osgeo3.osuosl.bak
running
4096
20480
plain

Thanks-

-- 
Lance Albertson
Director
Oregon State University | Open Source Lab 

-- 
Lance Albertson
Director
Oregon State University | Open Source Lab