[SAC] [Hosting] ftp-osl storage upgrade (full rebuild required) - Jun 18, 2018 9:30AM PDT (Jun 18 1630 UTC)

Lance Albertson lance at osuosl.org
Mon Jun 18 15:49:12 PDT 2018


I just wanted to send you all an update on where we're at in the process.

As of right now, ftp-osl is back online and serving it's content from the
the Ceph volume. I've gone ahead and kicked off a few manual syncs to catch
everything up however if you're using us as a master I recommend you kick
off an update job right now. I'm also currently copying the content to the
local disks which I expect to run through tomorrow sometime.

The rebuild took a little bit longer than originally planned due to some
issues I ran into building the new RAID array. My original plan didn't work
so I had to go with plan B which took a little longer. Plan B resulted in
creating two separate RAID6 arrays which means I lost about 2T in capacity
from my original plan.

I'm keeping ftp-osl out of the public rotation for now since it's I/O
throughput isn't likely as good as before since it's serving the content
via Ceph.

I'll send another update tomorrow when I'm ready to switch back over to
local storage. Please let me know if you notice any issues.

Thanks-

On Thu, Jun 14, 2018 at 3:52 PM, Lance Albertson <lance at osuosl.org> wrote:

> I had a few questions regarding this outages that I wanted to clarify for
> everyone.
>
> 1. There should be no outage during the 5.5 hour outage window for
> anything pointed to ftp.osuosl.org (unless your DNS is directly pointing
> at ftp-osl.osuosl.org)
> 2. During the 18-24hr sync from ceph to local storage, ftp-osl should have
> normal read/write operations. There might be a little bit of I/O
> performance hit during that window but it's hard to tell. There will be a
> short (likely 5 min) outage to read/writes on ftp-osl when I do the final
> switch back to local storage however.
>
> On Thu, Jun 14, 2018 at 10:00 AM, Lance Albertson <lance at osuosl.org>
> wrote:
>
>> Service(s) affected: ftp.osuosl.org
>>
>> During the outage, the master syncing node for our FTP cluster (ftp-osl)
>> will be offline which means any updates to our software mirrors will be
>> delayed.
>>
>> Outage Window:
>> Start: Mon, Jun 18 9:30AM PDT (Mon Jun 18 1630 UTC)
>> End: Mon, Jun 18 3:00PM PDT (Mon Jun 18 2200 UTC)
>>
>> Reason for outage:
>>
>> Our FTP cluster is starting to run low on disk space and we will be
>> adding additional hard drives to the system. Our system currently has
>> 9.375T of disk space and we're planning on upgrading it to 18.75T (this
>> takes into account the RAID6 configuration)
>>
>> Unfortunately, due to the nature of the how the disk arrays are
>> configured, we will not be able to grow the RAID array without a complete
>> rebuild. This means we're going to have to re-copy all 8.8TB of data off of
>> the machine and back onto it. Since this task is rather large and time
>> consuming we've come up with a better alternative so that we don't have our
>> master FTP server offline for very long.
>>
>> We have just recently built a new Ceph cluster for some new storage needs
>> at the OSL and we are going to temporarily use this cluster to serve the
>> ftp-osl content. I've already copied the content onto a new volume and have
>> tested it enough to feel it can handle the load. This should make the
>> transition plan much easier and quicker than initially.This server is
>> already out of DNS rotation and we are planning on keeping it out of
>> rotation until this process is complete to reduce the I/O load.
>>
>> So here's the plan thus far starting on Monday:
>>
>> 1. Stopping all services on the system and doing one final rsync to the
>> Ceph volume
>> 2. Rebooting machine and destroying the current RAID and creating a new
>> one with the new disks
>> 3. Reinstall the OS
>> 4. Bootstrap machine without FTP components initially, setup ceph volume
>> 5. Deploy FTP components after Ceph volume is setup and ready to go
>> 6. Ensure inter FTP node syncing is working using the Ceph volume
>> 7. Sync data from Ceph volume back over to local disks (I'm guessing this
>> will take 18-24 hours)
>> 8. Once sync is complete, shutdown all services and switch the mount
>> point over to the local disks
>> 9. Profit!
>>
>> I would like to thank IBM for donating the hard drives needed for this
>> upgrade.
>>
>> We will plan on doing the storage upgrades on our two other nodes
>> (ftp-nyc & ftp-chi) soon, however we won't be using the Ceph cluster for
>> this since they are remote. The current plan is to take one machine out for
>> several days and sync the data back between the nodes. I will send another
>> outage announcement for those two nodes once we're ready for that. We still
>> need to ship the drives to the locations and work with the local data
>> centers to get them installed.
>>
>> Projects affected: Any project using our FTP cluster as a master syncing
>> point
>>
>> --
>> Lance Albertson
>> Director
>> Oregon State University | Open Source Lab
>>
>
>
>
> --
> Lance Albertson
> Director
> Oregon State University | Open Source Lab
>



-- 
Lance Albertson
Director
Oregon State University | Open Source Lab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/sac/attachments/20180618/6425cbcc/attachment-0001.html>
-------------- next part --------------
_______________________________________________
Hosting mailing list
Hosting at osuosl.org
https://lists.osuosl.org/mailman/listinfo/hosting


More information about the Sac mailing list