[SAC] [Hosting] Unplanned outage: Hypervisor issue with gprod1 on primary Ganeti cluster

Lance Albertson lance at osuosl.org
Wed Jun 13 07:48:29 PDT 2018


All,

At approximately 2:36AM PDT (0900 UTC), one of the hypervisors (gprod1) in
our primary Ganeti cluster started having hardware issues. This took down
all of the instances running on that node. I attempted to bring the node
back online however the hardware issue prevented it to come back online. At
that point I failed all of the VM instances over to their secondary nodes
and forced another node to become the Ganeti master (since gprod1 WAS the
master). All of the instances were back online by around 7:40AM PDT (1400
UTC).

Everything at this point seems to be back to normal (except for gprod1). I
will look into bringing gprod1 back online later today.

Thank you and sorry for the outages this caused.

-- 
Lance Albertson
Director
Oregon State University | Open Source Lab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/sac/attachments/20180613/e505266a/attachment.html>
-------------- next part --------------
_______________________________________________
Hosting mailing list
Hosting at osuosl.org
https://lists.osuosl.org/mailman/listinfo/hosting


More information about the Sac mailing list