[SAC] [Hosting] ftp-osl storage upgrade (full rebuild required) - Jun 18, 2018 9:30AM PDT (Jun 18 1630 UTC)

Lance Albertson lance at osuosl.org
Tue Jun 19 08:57:47 PDT 2018


It's taking longer than I expected to sync the data back to the local
disks. This is due to the fact that the system is also rebuilding two RAID6
arrays which I forgot to account for. This is also making the system more
slower than I expected. At this rate it might take a few days to copy all
of the data back. Hopefully once the RAID6 arrays have finished rebuilding,
the I/O rate will speed up the syncing. Both arrays are currently at 55%
and 47% and we've transferred over 993G of 8.8T of data to the local disks.

I will send another update once I'm ready switch the system back over.

Thanks-

On Mon, Jun 18, 2018 at 3:49 PM, Lance Albertson <lance at osuosl.org> wrote:

> I just wanted to send you all an update on where we're at in the process.
>
> As of right now, ftp-osl is back online and serving it's content from the
> the Ceph volume. I've gone ahead and kicked off a few manual syncs to catch
> everything up however if you're using us as a master I recommend you kick
> off an update job right now. I'm also currently copying the content to the
> local disks which I expect to run through tomorrow sometime.
>
> The rebuild took a little bit longer than originally planned due to some
> issues I ran into building the new RAID array. My original plan didn't work
> so I had to go with plan B which took a little longer. Plan B resulted in
> creating two separate RAID6 arrays which means I lost about 2T in capacity
> from my original plan.
>
> I'm keeping ftp-osl out of the public rotation for now since it's I/O
> throughput isn't likely as good as before since it's serving the content
> via Ceph.
>
> I'll send another update tomorrow when I'm ready to switch back over to
> local storage. Please let me know if you notice any issues.
>
> Thanks-
>
> On Thu, Jun 14, 2018 at 3:52 PM, Lance Albertson <lance at osuosl.org> wrote:
>
>> I had a few questions regarding this outages that I wanted to clarify for
>> everyone.
>>
>> 1. There should be no outage during the 5.5 hour outage window for
>> anything pointed to ftp.osuosl.org (unless your DNS is directly pointing
>> at ftp-osl.osuosl.org)
>> 2. During the 18-24hr sync from ceph to local storage, ftp-osl should
>> have normal read/write operations. There might be a little bit of I/O
>> performance hit during that window but it's hard to tell. There will be a
>> short (likely 5 min) outage to read/writes on ftp-osl when I do the final
>> switch back to local storage however.
>>
>> On Thu, Jun 14, 2018 at 10:00 AM, Lance Albertson <lance at osuosl.org>
>> wrote:
>>
>>> Service(s) affected: ftp.osuosl.org
>>>
>>> During the outage, the master syncing node for our FTP cluster (ftp-osl)
>>> will be offline which means any updates to our software mirrors will be
>>> delayed.
>>>
>>> Outage Window:
>>> Start: Mon, Jun 18 9:30AM PDT (Mon Jun 18 1630 UTC)
>>> End: Mon, Jun 18 3:00PM PDT (Mon Jun 18 2200 UTC)
>>>
>>> Reason for outage:
>>>
>>> Our FTP cluster is starting to run low on disk space and we will be
>>> adding additional hard drives to the system. Our system currently has
>>> 9.375T of disk space and we're planning on upgrading it to 18.75T (this
>>> takes into account the RAID6 configuration)
>>>
>>> Unfortunately, due to the nature of the how the disk arrays are
>>> configured, we will not be able to grow the RAID array without a complete
>>> rebuild. This means we're going to have to re-copy all 8.8TB of data off of
>>> the machine and back onto it. Since this task is rather large and time
>>> consuming we've come up with a better alternative so that we don't have our
>>> master FTP server offline for very long.
>>>
>>> We have just recently built a new Ceph cluster for some new storage
>>> needs at the OSL and we are going to temporarily use this cluster to serve
>>> the ftp-osl content. I've already copied the content onto a new volume and
>>> have tested it enough to feel it can handle the load. This should make the
>>> transition plan much easier and quicker than initially.This server is
>>> already out of DNS rotation and we are planning on keeping it out of
>>> rotation until this process is complete to reduce the I/O load.
>>>
>>> So here's the plan thus far starting on Monday:
>>>
>>> 1. Stopping all services on the system and doing one final rsync to the
>>> Ceph volume
>>> 2. Rebooting machine and destroying the current RAID and creating a new
>>> one with the new disks
>>> 3. Reinstall the OS
>>> 4. Bootstrap machine without FTP components initially, setup ceph volume
>>> 5. Deploy FTP components after Ceph volume is setup and ready to go
>>> 6. Ensure inter FTP node syncing is working using the Ceph volume
>>> 7. Sync data from Ceph volume back over to local disks (I'm guessing
>>> this will take 18-24 hours)
>>> 8. Once sync is complete, shutdown all services and switch the mount
>>> point over to the local disks
>>> 9. Profit!
>>>
>>> I would like to thank IBM for donating the hard drives needed for this
>>> upgrade.
>>>
>>> We will plan on doing the storage upgrades on our two other nodes
>>> (ftp-nyc & ftp-chi) soon, however we won't be using the Ceph cluster for
>>> this since they are remote. The current plan is to take one machine out for
>>> several days and sync the data back between the nodes. I will send another
>>> outage announcement for those two nodes once we're ready for that. We still
>>> need to ship the drives to the locations and work with the local data
>>> centers to get them installed.
>>>
>>> Projects affected: Any project using our FTP cluster as a master syncing
>>> point
>>>
>>
-- 
Lance Albertson
Director
Oregon State University | Open Source Lab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/sac/attachments/20180619/0434cda0/attachment-0001.html>
-------------- next part --------------
_______________________________________________
Hosting mailing list
Hosting at osuosl.org
https://lists.osuosl.org/mailman/listinfo/hosting


More information about the Sac mailing list