<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">This has been completed and everything seems to be working fine.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Now keep in mind, the troublesome switch could reboot again until we figure out why it's happening. If it does, it's impact should be smaller than before at least.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Thanks!</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Sep 13, 2022 at 2:42 PM Lance Albertson <<a href="mailto:lance@osuosl.org">lance@osuosl.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">I have the "new" switch setup and ready to go. I'm currently planning on doing this switch in about 20 minutes (3pm PDT). You will see a set of outages as I plan to do the following:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">1. Move LinkOregon uplink to "new" switch</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">2. Move oslsw3 uplink to "new" switch</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">3. Move oslsw1 uplink to "new" switch</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">4. Move remaining backend 10g switches</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">If anything goes wrong, I should be able to quickly revert the change.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Sep 13, 2022 at 11:08 AM Lance Albertson <<a href="mailto:lance@osuosl.org" target="_blank">lance@osuosl.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">All,</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">I wanted to pass along more information on where we're at and our current plans to try and work around this issue.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Without going deep into the history of our core network infrastructure, we have two core "routers" that are both aging and we're in the process of replacing them with something newer.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Previously, our uplink was connected through our Cisco 6509. This switch has several 1G line cards that half of our servers are directly connected to.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">The other core switch is a Cisco Nexus 6001 which has three fabric extenders which provide 1G connectivity to the other half of our servers. When we migrated over to the LinkOregon network, we moved the uplink over to this Nexus 6k as it was much easier to get LR optics for it.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Unfortunately this Nexus 6k has started kernel panicking and rebooting in the past several months multiple times causing these outages. Much of our downlink 10G switches are connected to this Nexus 6k which means there's a larger impact when it goes down.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">A few years ago a high speed trading company donated us a pallet full of Arista switches and I've been slowly adding to our infrastructure. Even though they are EOL, they still work very well and we haven't had any problems with them. And since I have a lot of them, I can easily replace one if one goes bad.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">My current plan is to set up one of these Arista switches and move all of the current 10G connections to it. This way, at least we can reduce the impact if/when this Nexus 6k switch reboots again. In theory, it should only affect the servers directly connected to the FEX switches if it reboots again. </div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">I reached out to the OSU IT community and they graciously donated two 10G-LR optical modules so that I can put this plan in place without having to wait to ship modules.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Current plan for today:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">- Setup new Arista switch</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">- Move upstream connectivity to LinkOregon to it</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">- Move all downstream 10G links to this router</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">I will send another email when I plan to do the actual outages for the cut over.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Longer term plans</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">- Work with vendors to replace our aging core network infrastructure with something that's still supported and we can afford</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">- Look into getting redundancy put into place so that we don't have this issue anymore</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">- Migrate off of the older equipment</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">If anyone on this list has connections to Arista or any other major edge networking vendor, please let me know. That will certainly help our situation in the long term!</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">I had already started working on a plan to replace these systems but it seems my time may have run out (at least for the Nexus 6k switch).</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Thanks all for your patience and support!</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div></div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Sep 12, 2022 at 11:54 PM Lance Albertson <<a href="mailto:lance@osuosl.org" target="_blank">lance@osuosl.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="font-family:arial,helvetica,sans-serif;font-size:small">Sadly this just happened again about 50 minutes ago. We may need to do some emergency firmware patching tomorrow. As a backup plan, I'm also formulating a plan to add another switch to try and minimize the impact of this troublesome switch.</div><div style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div style="font-family:arial,helvetica,sans-serif;font-size:small">Once I gather some additional information tomorrow morning, I'll send an update on what we're planning to do.</div><div style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div style="font-family:arial,helvetica,sans-serif;font-size:small">Thanks again for your patience.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Sep 12, 2022 at 3:14 PM Lance Albertson <<a href="mailto:lance@osuosl.org" target="_blank">lance@osuosl.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="font-family:arial,helvetica,sans-serif;font-size:small">This happened again at approximately 10AM PDT. Since we moved our uplink to this switch, everything went down while the switch rebooted.</div><div style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div style="font-family:arial,helvetica,sans-serif;font-size:small">We're still planning on doing an upgrade but don't have a date yet for that. We'll hopefully get that going soon.</div><div style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div style="font-family:arial,helvetica,sans-serif;font-size:small">Thanks for your patience.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Aug 24, 2022 at 7:40 AM Lance Albertson <<a href="mailto:lance@osuosl.org" target="_blank">lance@osuosl.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="font-family:arial,helvetica,sans-serif;font-size:small">Unfortunately this just happened again overnight. We may need to schedule another outage to perform some software upgrade on this switch so that this stops happening. We'll send an announcement out once we have everything in place to do that upgrade.</div><div style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div style="font-family:arial,helvetica,sans-serif;font-size:small">Thanks-</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, May 25, 2022 at 11:22 PM Lance Albertson <<a href="mailto:lance@osuosl.org" target="_blank">lance@osuosl.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="font-family:arial,helvetica,sans-serif;font-size:small">All,</div><div style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div style="font-family:arial,helvetica,sans-serif;font-size:small">It appears that one of our core network switches had a kernel panic and rebooted which caused widespread outages throughout our infrastructure. As of right now, everything appears to be back to normal but please let me know if that isn't the case by sending an email to <a href="mailto:support@osuosl.org" target="_blank">support@osuosl.org</a>.</div><div style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div style="font-family:arial,helvetica,sans-serif;font-size:small">Apologies for the outage and we'll be looking into why this switch had a kernel panic in the first place.</div><div style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div style="font-family:arial,helvetica,sans-serif;font-size:small">Thanks-</div></div></blockquote></div></blockquote></div></blockquote></div></blockquote></div><div><br></div>-- <br><div dir="ltr"><div dir="ltr"><font face="arial, helvetica, sans-serif">Lance Albertson</font><div><div><font face="arial, helvetica, sans-serif">Director</font></div><div><span style="font-family:arial,helvetica,sans-serif">Oregon State University | </span><span style="font-family:arial,helvetica,sans-serif">Open Source Lab </span></div></div></div></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><div dir="ltr"><font face="arial, helvetica, sans-serif">Lance Albertson</font><div><div><font face="arial, helvetica, sans-serif">Director</font></div><div><span style="font-family:arial,helvetica,sans-serif">Oregon State University | </span><span style="font-family:arial,helvetica,sans-serif">Open Source Lab </span></div></div></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><font face="arial, helvetica, sans-serif">Lance Albertson</font><div><div><font face="arial, helvetica, sans-serif">Director</font></div><div><span style="font-family:arial,helvetica,sans-serif">Oregon State University | </span><span style="font-family:arial,helvetica,sans-serif">Open Source Lab </span></div></div></div></div>