From lance at osuosl.org Mon May 4 11:19:27 2026 From: lance at osuosl.org (Lance Albertson) Date: Mon, 4 May 2026 11:19:27 -0700 Subject: [Hosting] System reboots due to Copy Fail Message-ID: Hi All, As you may be aware, a serious security vulnerability in the Linux Kernel, known as Copy Fail [1], was recently discovered. I spent much of Friday rebooting our critical systems and am continuing that process today. We have nearly completed reboots for all hypervisors and will proceed to other managed systems next. If you maintain a VM on our infrastructure, I highly recommend that you upgrade and reboot your systems as soon as possible to ensure the necessary fixes are applied. Please let me know if you have any questions or concerns. [1] https://copy.fail/ Thanks- -- Lance Albertson Director Oregon State University | Open Source Lab -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Hosting mailing list Hosting at lists.osuosl.org https://lists.osuosl.org/mailman/listinfo/hosting From lance at osuosl.org Wed May 6 16:46:52 2026 From: lance at osuosl.org (Lance Albertson) Date: Wed, 6 May 2026 16:46:52 -0700 Subject: [Hosting] =?utf-8?q?Scheduled_Network_Maintenance_=E2=80=93_Thur?= =?utf-8?b?c2RheSwgTWF5IDE0dGgsIDEyOjAw4oCTMjowMCBQTSBQRFQgKDE5OjAw?= =?utf-8?b?4oCTMjE6MDAgVVRDKQ==?= Message-ID: Hello, We are writing to inform you of a scheduled network maintenance window that may affect your hosted services at OSU Open Source Lab. *Date:* Thursday, May 14th, 2026 *Time: *12:00 PM ? 2:00 PM PDT (19:00 ? 21:00 UTC) *Expected Impact: *Brief outage (likely only a few minutes) *What's happening:* LinkOregon will be enabling MLAG (Multi-Chassis Link Aggregation) on our upstream ports to our edge routers. Currently, all traffic flows through a single router. This change will allow us to fail over between two routers, providing improved redundancy and reliability for all hosted projects going forward. We will also need to make some configuration changes on our routers ahead of the maintenance window. We do not anticipate these preliminary changes causing any disruption, but we will send a separate notification if that changes. While we expect the actual outage to last only a few minutes, we have reserved the full 2-hour window as a precaution. We apologize for any inconvenience this may cause. If you have any questions or concerns, please don't hesitate to reach out. Thank you for your patience and continued support of OSUOSL. Thanks! -- Lance Albertson Director Oregon State University | Open Source Lab -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Hosting mailing list Hosting at lists.osuosl.org https://lists.osuosl.org/mailman/listinfo/hosting From lance at osuosl.org Thu May 14 14:29:49 2026 From: lance at osuosl.org (Lance Albertson) Date: Thu, 14 May 2026 14:29:49 -0700 Subject: [Hosting] =?utf-8?q?Scheduled_Network_Maintenance_=E2=80=93_Thur?= =?utf-8?b?c2RheSwgTWF5IDE0dGgsIDEyOjAw4oCTMjowMCBQTSBQRFQgKDE5OjAw?= =?utf-8?b?4oCTMjE6MDAgVVRDKQ==?= In-Reply-To: References: Message-ID: Short update on this (I need to run home). I believe we have this mostly working. The only remaining issue seems to be that IPv6 with SLAAC isn't working properly. We're waiting to hear back from an Arista tech for a solution. I'll send a more detailed message later once we figure everything out. Thanks again for your patience and I apologize for the extended outage window. Thanks- On Wed, May 6, 2026 at 4:46?PM Lance Albertson wrote: > Hello, > > We are writing to inform you of a scheduled network maintenance window > that may affect your hosted services at OSU Open Source Lab. > > *Date:* Thursday, May 14th, 2026 > *Time: *12:00 PM ? 2:00 PM PDT (19:00 ? 21:00 UTC) > *Expected Impact: *Brief outage (likely only a few minutes) > > *What's happening:* > LinkOregon will be enabling MLAG (Multi-Chassis Link Aggregation) on our > upstream ports to our edge routers. Currently, all traffic flows through a > single router. This change will allow us to fail over between two routers, > providing improved redundancy and reliability for all hosted projects going > forward. > > We will also need to make some configuration changes on our routers ahead > of the maintenance window. We do not anticipate these preliminary changes > causing any disruption, but we will send a separate notification if that > changes. > > While we expect the actual outage to last only a few minutes, we have > reserved the full 2-hour window as a precaution. > > We apologize for any inconvenience this may cause. If you have any > questions or concerns, please don't hesitate to reach out. > > Thank you for your patience and continued support of OSUOSL. > > Thanks! > > -- > Lance Albertson > Director > Oregon State University | Open Source Lab > -- Lance Albertson Director Oregon State University | Open Source Lab -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Hosting mailing list Hosting at lists.osuosl.org https://lists.osuosl.org/mailman/listinfo/hosting From lance at osuosl.org Thu May 14 15:44:06 2026 From: lance at osuosl.org (Lance Albertson) Date: Thu, 14 May 2026 15:44:06 -0700 Subject: [Hosting] =?utf-8?q?Scheduled_Network_Maintenance_=E2=80=93_Thur?= =?utf-8?b?c2RheSwgTWF5IDE0dGgsIDEyOjAw4oCTMjowMCBQTSBQRFQgKDE5OjAw?= =?utf-8?b?4oCTMjE6MDAgVVRDKQ==?= In-Reply-To: References: Message-ID: We fixed the IPv6 SLAAC issue. I'm still working on the post-mortem and hope to have something sent later today or at least by tomorrow. Thanks- On Thu, May 14, 2026 at 2:29?PM Lance Albertson wrote: > Short update on this (I need to run home). > > I believe we have this mostly working. The only remaining issue seems to > be that IPv6 with SLAAC isn't working properly. We're waiting to hear back > from an Arista tech for a solution. I'll send a more detailed message later > once we figure everything out. > > Thanks again for your patience and I apologize for the extended outage > window. > > Thanks- > > On Wed, May 6, 2026 at 4:46?PM Lance Albertson wrote: > >> Hello, >> >> We are writing to inform you of a scheduled network maintenance window >> that may affect your hosted services at OSU Open Source Lab. >> >> *Date:* Thursday, May 14th, 2026 >> *Time: *12:00 PM ? 2:00 PM PDT (19:00 ? 21:00 UTC) >> *Expected Impact: *Brief outage (likely only a few minutes) >> >> *What's happening:* >> LinkOregon will be enabling MLAG (Multi-Chassis Link Aggregation) on our >> upstream ports to our edge routers. Currently, all traffic flows through a >> single router. This change will allow us to fail over between two routers, >> providing improved redundancy and reliability for all hosted projects going >> forward. >> >> We will also need to make some configuration changes on our routers ahead >> of the maintenance window. We do not anticipate these preliminary changes >> causing any disruption, but we will send a separate notification if that >> changes. >> >> While we expect the actual outage to last only a few minutes, we have >> reserved the full 2-hour window as a precaution. >> >> We apologize for any inconvenience this may cause. If you have any >> questions or concerns, please don't hesitate to reach out. >> >> Thank you for your patience and continued support of OSUOSL. >> >> Thanks! >> >> -- >> Lance Albertson >> Director >> Oregon State University | Open Source Lab >> > > > -- > Lance Albertson > Director > Oregon State University | Open Source Lab > -- Lance Albertson Director Oregon State University | Open Source Lab -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Hosting mailing list Hosting at lists.osuosl.org https://lists.osuosl.org/mailman/listinfo/hosting From lance at osuosl.org Thu May 14 17:05:21 2026 From: lance at osuosl.org (Lance Albertson) Date: Thu, 14 May 2026 17:05:21 -0700 Subject: [Hosting] Post-mortem: Network connectivity issues during edge router upgrade Message-ID: Hi everyone, *Date*: 2026-05-14 *Impact*: Intermittent IPv4 and IPv6 connectivity for some hosted services for approximately 3 hours and 20 minutes beyond the planned maintenance window. Today, OSL performed scheduled maintenance to bring our second edge router (sw-edge1) into active service alongside our existing edge router (sw-edge2). The goal was active-active routing redundancy at our network edge, eliminating long-standing traffic asymmetry, and enabling future edge router maintenance without service interruption. The maintenance hit two issues: *1. An upstream LACP issue with our ISP (LinkOregon).* Stale configuration on the interface facing our new switch ? left over from a pseudo-wire used during our data center migration earlier this year ? prevented the new uplink from forming an active LACP bundle. Because we had already activated sw-edge1 as a Layer 3 router, traffic that hashed to sw-edge1 had no clean path out and was disrupted until the bundle came up. LinkOregon's team identified and removed the legacy configuration once we they noticed it. *2. An ARP and IPv6 neighbor synchronization issue between our two edge switches.* After we resolved the LACP issues, some hosted services experienced intermittent connectivity ? some hosts were reachable, others were not, with the pattern shifting over time. The root cause was a subtle platform-specific behavior on our Arista switches: by default, MLAG (the technology bonding our two edge routers into an active-active pair) does not synchronize ARP and IPv6 neighbor state between peer switches unless an additional software agent is active. We had been operating under the assumption that this synchronization happened automatically ? a widespread assumption that turned out to be incorrect for our hardware platform. We had reviewed the migration plan with both LinkOregon and Arista beforehand, and neither of these failure modes was anticipated by anyone involved. We're grateful that Arista's engineer was able to join us on short notice ? the engineer who helped us had a meeting in ten minutes when we reached out and provided a working fix within about twenty. Their fix involved enabling a VxLAN configuration between our edge switches (used purely to activate the synchronization agent, not to carry traffic) and changing our IPv6 gateway addressing model to give each switch a unique IPv6 address alongside the shared gateway. From the host perspective, gateway addresses are unchanged. The IPv4 fix was in place by 2:20 PM PDT; IPv6 SLAAC was fully restored by approximately 3:20 PM PDT. Thanks to Arista's engineer for the quick response, to LinkOregon's network team for the fast turnaround, and to our hosted projects for their patience. If you observed connectivity issues you'd like us to verify against our timeline, please reach out via support at osuosl.org. Thanks- -- Lance Albertson Director Oregon State University | Open Source Lab -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ Hosting mailing list Hosting at lists.osuosl.org https://lists.osuosl.org/mailman/listinfo/hosting