[SAC] [Hosting] ftp-osl storage upgrade (full rebuild required) - Jun 18, 2018 9:30AM PDT (Jun 18 1630 UTC)

Lance Albertson lance at osuosl.org
Thu Jun 14 10:00:10 PDT 2018


Service(s) affected: ftp.osuosl.org

During the outage, the master syncing node for our FTP cluster (ftp-osl)
will be offline which means any updates to our software mirrors will be
delayed.

Outage Window:
Start: Mon, Jun 18 9:30AM PDT (Mon Jun 18 1630 UTC)
End: Mon, Jun 18 3:00PM PDT (Mon Jun 18 2200 UTC)

Reason for outage:

Our FTP cluster is starting to run low on disk space and we will be adding
additional hard drives to the system. Our system currently has 9.375T of
disk space and we're planning on upgrading it to 18.75T (this takes into
account the RAID6 configuration)

Unfortunately, due to the nature of the how the disk arrays are configured,
we will not be able to grow the RAID array without a complete rebuild. This
means we're going to have to re-copy all 8.8TB of data off of the machine
and back onto it. Since this task is rather large and time consuming we've
come up with a better alternative so that we don't have our master FTP
server offline for very long.

We have just recently built a new Ceph cluster for some new storage needs
at the OSL and we are going to temporarily use this cluster to serve the
ftp-osl content. I've already copied the content onto a new volume and have
tested it enough to feel it can handle the load. This should make the
transition plan much easier and quicker than initially.This server is
already out of DNS rotation and we are planning on keeping it out of
rotation until this process is complete to reduce the I/O load.

So here's the plan thus far starting on Monday:

1. Stopping all services on the system and doing one final rsync to the
Ceph volume
2. Rebooting machine and destroying the current RAID and creating a new one
with the new disks
3. Reinstall the OS
4. Bootstrap machine without FTP components initially, setup ceph volume
5. Deploy FTP components after Ceph volume is setup and ready to go
6. Ensure inter FTP node syncing is working using the Ceph volume
7. Sync data from Ceph volume back over to local disks (I'm guessing this
will take 18-24 hours)
8. Once sync is complete, shutdown all services and switch the mount point
over to the local disks
9. Profit!

I would like to thank IBM for donating the hard drives needed for this
upgrade.

We will plan on doing the storage upgrades on our two other nodes (ftp-nyc
& ftp-chi) soon, however we won't be using the Ceph cluster for this since
they are remote. The current plan is to take one machine out for several
days and sync the data back between the nodes. I will send another outage
announcement for those two nodes once we're ready for that. We still need
to ship the drives to the locations and work with the local data centers to
get them installed.

Projects affected: Any project using our FTP cluster as a master syncing
point

-- 
Lance Albertson
Director
Oregon State University | Open Source Lab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/sac/attachments/20180614/c20b65a3/attachment.html>
-------------- next part --------------
_______________________________________________
Hosting mailing list
Hosting at osuosl.org
https://lists.osuosl.org/mailman/listinfo/hosting


More information about the Sac mailing list