[SAC] [Hosting] Partial datacenter outage this morning

Lance Albertson lance at osuosl.org
Mon Jul 2 14:52:44 PDT 2018


Looks like we had another power event while they were trying to fix the UPS
today. We didn't have any outages except for one project machine which was
a single PSU host. Apologies if this affected anyone's hosts. Hopefully
this is the last of this!

Thanks-

​​​---------- Forwarded message ----------
From: Fowler, Stephen Lee <steve.fowler at oregonstate.edu>
Date: Mon, Jul 2, 2018 at 2:42 PM
Subject: Re: [Kerr_b210-announce] Saturday Power issue

All,

We had the maintenance techs in and it turns out that we had a battery
short when the UPS tried to take the load.  That resulted in a momentary
power loss until the generator was able to spin up and provide power.
While we do have these units tested on a regular basis there is no way to
predict when a battery is going give up the fight.  The battery has been
replaced and the unit returned to normal operation.



On Mon, Jul 2, 2018 at 10:29 AM, Lance Albertson <lance at osuosl.org> wrote:

> FYI: I got the following regarding the power event on Saturday morning.
>
> ​---------- Forwarded message ----------
> From: Fowler, Stephen Lee <steve.fowler at oregonstate.edu>
> Date: Mon, Jul 2, 2018 at 10:26 AM
> Subject: [Kerr_b210-announce] Saturday Power issue
>
> All,
>
>
> I learned after the fact that we had a power event on Saturday that
> affected power in B210.  I did see that the generator came on line, but I
> did not get any alerts from the other units in that power chain.  Further
> investigation revealed that one of the UPS suffered an inverter fault that
> is likely the cause of some systems losing power.  While we monitor the
> systems in B210 we did not receive any errors from the UPS themselves, so I
> was not aware there had been an issue.
>
> What is happening:
>
> I have engaged the UPS maintenance service to investigate and repair the
> faulty UPS.  I will also be talking with them about the logging and
> notification failure of both units.
>
>
> On Sat, Jun 30, 2018 at 9:21 AM, Lance Albertson <lance at osuosl.org> wrote:
>
>> All,
>>
>> It seems as though we had some kind of a power event at approximately
>> 6:21AM PDT (13:21 UTC). that affected some (but not all)  of our hosts. At
>> this point I'm not entirely sure what happened but my guess that one of the
>> power circuits went down and then came back online. This is confusing since
>> the UPS should have prevented that. I'm going to be heading into the
>> datacenter soon to do a visual inspection.
>>
>> If you have any hosts that are offline and need me to help bring them
>> back, please send an email to support at osuosl.org and I will take a look.
>> Feel free to also reach out on IRC at #osuosl.
>>
>> Thanks-
>>
>> --
>> Lance Albertson
>> Director
>> Oregon State University | Open Source Lab
>>
>
>
>
> --
> Lance Albertson
> Director
> Oregon State University | Open Source Lab
>



-- 
Lance Albertson
Director
Oregon State University | Open Source Lab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/sac/attachments/20180702/d5a821d6/attachment.html>
-------------- next part --------------
_______________________________________________
Hosting mailing list
Hosting at osuosl.org
https://lists.osuosl.org/mailman/listinfo/hosting


More information about the Sac mailing list