[SAC] [Hosting] RESOLVED: Core network switch reboot

Lance Albertson lance at osuosl.org
Fri Oct 28 09:40:53 PDT 2022


Hi All,

Unfortunately it looks like this switch decided to reboot again last night
at around 1AM PDT. Thankfully the impact was smaller than before with all
of the adjustments we made in the recent weeks.

I wanted to send another update on how we're going to permanently fix this
moving forward.

I have racked two "new" Arista 1G switches which will replace two of the
three Cisco Nexus fabric extenders where the majority of our hosts are.
Once I have those plumbed into the new switches, I can start moving hosts
over to these switches one by one. I'll send out another email with a list
of hosts this will impact in a few weeks once it's ready.

Before that happens we need to finish running fiber to our second core
switch and finish the MLAG configuration and backend upstream connection.
Once this is finished, we'll have more redundancy in our network. There
will be another brief outage when we switch over to the new "core" switches
with MLAG.

Thanks again for your patience. Hopefully I can get these all done before
this switch decides to reboot again!

On Mon, Sep 12, 2022 at 11:54 PM Lance Albertson <lance at osuosl.org> wrote:

> Sadly this just happened again about 50 minutes ago. We may need to do
> some emergency firmware patching tomorrow. As a backup plan, I'm also
> formulating a plan to add another switch to try and minimize the impact of
> this troublesome switch.
>
> Once I gather some additional information tomorrow morning, I'll send an
> update on what we're planning to do.
>
> Thanks again for your patience.
>
> On Mon, Sep 12, 2022 at 3:14 PM Lance Albertson <lance at osuosl.org> wrote:
>
>> This happened again at approximately 10AM PDT. Since we moved our uplink
>> to this switch, everything went down while the switch rebooted.
>>
>> We're still planning on doing an upgrade but don't have a date yet for
>> that. We'll hopefully get that going soon.
>>
>> Thanks for your patience.
>>
>> On Wed, Aug 24, 2022 at 7:40 AM Lance Albertson <lance at osuosl.org> wrote:
>>
>>> Unfortunately this just happened again overnight. We may need to
>>> schedule another outage to perform some software upgrade on this switch so
>>> that this stops happening. We'll send an announcement out once we have
>>> everything in place to do that upgrade.
>>>
>>> Thanks-
>>>
>>> On Wed, May 25, 2022 at 11:22 PM Lance Albertson <lance at osuosl.org>
>>> wrote:
>>>
>>>> All,
>>>>
>>>> It appears that one of our core network switches had a kernel panic and
>>>> rebooted which caused widespread outages throughout our infrastructure. As
>>>> of right now, everything appears to be back to normal but please let me
>>>> know if that isn't the case by sending an email to support at osuosl.org.
>>>>
>>>> Apologies for the outage and we'll be looking into why this switch had
>>>> a kernel panic in the first place.
>>>>
>>>> Thanks-
>>>>
>>>> --
>>>> Lance Albertson
>>>> Director
>>>> Oregon State University | Open Source Lab
>>>>
>>>
>>>
>>> --
>>> Lance Albertson
>>> Director
>>> Oregon State University | Open Source Lab
>>>
>>
>>
>> --
>> Lance Albertson
>> Director
>> Oregon State University | Open Source Lab
>>
>
>
> --
> Lance Albertson
> Director
> Oregon State University | Open Source Lab
>


-- 
Lance Albertson
Director
Oregon State University | Open Source Lab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/sac/attachments/20221028/c6ae7599/attachment.htm>
-------------- next part --------------
_______________________________________________
Hosting mailing list
Hosting at osuosl.org
https://lists.osuosl.org/mailman/listinfo/hosting


More information about the Sac mailing list