[SAC] FOSS4G meet with OSUOSL update

Alex Mandel tech_dev at wildintellect.com
Tue Sep 16 10:52:06 PDT 2014


On 09/16/2014 06:01 AM, Martin Spott wrote:
> Hi, please excuse the late reply.  I was so enthusiastic over the
> GRASS/PostGIS stuff I did last weekend so I completely forgot about the
> admin side of the coin  ;-)
> 
> On Thu, Sep 11, 2014 at 11:10:01AM -0700, Alex Mandel wrote:
> 
>> 1. OSL can give Martin  (and others if we request) access to the host OS
>> of osgeo3 and osgeo4 via their VPN. Which I think is the same way Martin
>> setup osgeo5/backup.
> 
> Sounds good, so we can detect and inspect issues without depending on
> OSL's support.
> Setting up the backup machine was a little bit more complex. I was
> given access to some non-public network and then, as far as memory
> serves, connected to a console-to-IP adapter using some special browser
> plugin.

It's the same deal, you have to VPN to the non-public network. the
console-to-IP is only when the main host isn't booting. Otherwise you
can ssh to the host.

> 
>> 3. Justin thinks there's enough space on other OSL hardware to shift
>> osgeo4 vms for approximately 1 month so we can redo osgeo4. AKA switch
>> to software raid 5, update the host OS, etc... We should schedule this
>> in advance based on SAC availability to do required work.
> 
> I have to admit I have no idea about what exactly they did to get 'our'
> Ganeti setup running.  I suspect they apply some sort of automated
> install onto every machine in order to keep all affected systems
> compatible to each other.  Right ?
> Therefore I also suspect they have a preferred host OS for good reasons
> and I don't think it's advisable to question their choice.  I also
> didn't mean to question Ganeti in general, just having too limited
> access was a bit unfortunate.

All I know is the host OS is currently Gentoo and OSUOSL has mentioned
that they now standardize on CentOS. Yes I too would assume they use
something to keep all the nodes the same, possibly Chef (Justin said
that's what they tend to use).

> On Sat, Sep 13, 2014 at 10:37:55AM -0700, Alex Mandel wrote:
> 
>> 1. OSUOSL did point out that if we switch to software raid, it will be a
>> little more difficult to swap disks. I assume Martin S. knows what needs
>> to be done in order to tell the raid to drop a disk and be ready for the
>> new one. Question, does software raid allow hot swapping disks?
> 
> Yes - but even though I've done this many times, I still always look at
> the manual before removing a disk from the set  ;-)
> 
>> 2. I forgot to talk with OSUOSL about upgrading/changing the OS on
>> osgeo4. Since we are redoing the disks I suspect this will need to
>> happen. I will email them.
> 
> See above.
> 
>> Proposal Work Plan:
>>
>> Step 1: Offload osgeo4
>>
>> 1. Hotcopy mail from osgeo3 to osgeo4.
> 
> .... "from osgeo4 to osgeo3" in order to offload osgeoo4 !?
> 
Yes to offload osgeo4 and have 0 downtime or less than 1 minute downtime
for Mail.

Long version, enable drbd, wait a few hours for it to sync, failover to
osgeo3, remove drbd. It's a trick to shuffle a non-drbd instance with 0
downtime. Only works with nodes on the same ganeti cluster.

>> 2. Get new VM or docker, or docker in a vm (Debian 7) on OSUOSL ganeti
>> cluster. Invite projects to migrate their stuff from projects over to
>> the new spot.
> 
> I think inviting projects to migrate their stuff is the most
> unpredictable item in the entire plan  :-)
> 
Agreed, see more below.

> Practically speaking I'd prefer a plan which allows changing from HW
> RAID to SW RAID (and maybe updating the host OS, if required)
> independent from any other migration plans.  Thus, if OSL offers to run
> our VM's on their cluster for a limited period, then I'd like to do
> exactly this: Move our VM's off osgeo4, redo osgeo4 (using OSL's
> favourite host OS and Ganeti) and move the VM's back to osgeo4.  If we
> expect the projects to migrate their stuff in the same period, then we
> might fail to meet the schedule.
> 
For me there are 2 considerations:
1. Downtime - my proposal has almost 0 downtime by shuffling stuff
between running machines. Moving a VM between clusters requires downtime
of hours.

2. Cruft, Markus' biggest complaint about the current Projects VM is
that there seems to be too much stuff in once place to figure out what's
actually causing the troubles.

> Aside from that, I think the migration to Docker ("Docker" used as a
> synonym for Linux containers) is a different taks.  One of the main
> benefits of using containers is to make better use of the specific
> topology of the hardware, much better than virtualization can do.
> Therefore I'd refrain from using containers inside a VM, because the
> net benefit isn't convincing and instead start using containers on new
> hardware with no virtualization in place.
> 
> While we're at it: Are we talking about redoing osgeo4 only, not osgeo3
> as well ?
> 

Nope just osgeo4, osgeo3 is running ok right now. Though after we redo
osgeo4 we could migrate all the VMs over to osgeo4 from osgeo3 so we can
upgrade the host OS on osgeo3 to match. Since the raid 5 on osgeo3 seems
to perform ok I'd rather not touch it.

>> Step 4: Buy new hardware (sometime in the next year)
> 
> See above.

I suggest Docker for the new hardware as OSUOSL is familiar with it and
it might be a better utilization of our hardware. It also means that
once a project writes a docker setup script, we know how to deploy it
anywhere.

> 
>> Invite projects to a new sac-announce list which sends out notices about
>> upcoming events.
> 
> I'd say we should request projects to subscribe to such list. As far as
> I understand, communication to projects was mostly done over private
> channels in the past and, as a consequence, they did expect the
> continuation of private 'care' in case of upcoming changes.  This
> didn't work out as planned, as we know  ;-)

Markus filed a ticket to make a Sac-announce list, Jachym (Board
Secretary) said he will forward a request for projects to join once I
send it to him.

> 
>> - 24 hour response team
> 
> Hah, as far as I understand, the primary reason why *I* was invited
> into the admin team was the fact that I don't live in any of the US
> time zones  :-)

Sure, the board just wants to explore ways to fund support services.
Since you and me isn't always enough to respond to all emergencies.

> Best regards,
> 	Martin.
> 

Thanks,
Alex



More information about the Sac mailing list