[SAC] [support.osuosl.org #23649] [OSGeo] Failed disk in osgeo4.osuosl.bak

Alex Mandel tech_dev at wildintellect.com
Thu Apr 17 21:14:48 PDT 2014


Friday 1pm PST?

Unless I hear screams from the community about some event happening lets
plan for that. We'll also plan to shutdown most if not all of the VMs to
make it go faster.

Thanks,
Alex

On 04/17/2014 02:56 PM, Justin Dugger via RT wrote:
> I've received a pair of drives today ATTN: OSGEO. Let me know when you'd like to take a downtime and we'll get that in.
> 
> Justin
> 
> On Fri Apr 11 14:29:56 2014, tech at wildintellect.com wrote:
>> Justin,
>>
>> Thanks, we suspected this when we did the battery replacement. 2
>> drives
>> have been ordered and should arrive early next week. Yes we got a
>> spare
>> this time.
>>
>> When it comes in we should plan an outage window to turn on off VMs to
>> make the rebuild go faster.
>>
>> Thanks,
>> Alex
>>
>> On 04/11/2014 01:08 PM, Justin Dugger via RT wrote:
>>> Hey, an important followup!
>>>
>>> It appears that osgeo4 has lost another drive (this time in slot 0)
>> around 5:30AM:
>>>
>>> 05:39 PROBLEM: osgeo4.osuosl.bak/Dell RAID Array is CRITICAL,
>> CRITICAL: 0:BBU Charged (100%):0:RAID-6:6 drives:557.75GB:Partially
>> Drives:6 1 Bad Drives (88 Errors), Apr 11, 12:39 UTC
>>>
>>> This will affect I/O performance in the obvious ways: any read
>> involving the affected disk will require reading all other volumes to
>> calculate what the block should be.
>>>
>>> On Tue Apr 08 09:20:21 2014, jldugger wrote:
>>>> And another followup to document the results of the repair:
>>>>
>>>> osgeo4 came back online complaining that one of it's Power Supply
>>>> units has failed. It also took quite a while for the VM qgis to
>> fsck,
>>>> and that ended up requiring a manual fsck to repair.
>>>>
>>>> We've agreed to delay osgeo3's battery replacement until next week.
>>>>
>>>> On Mon Apr 07 11:30:29 2014, jldugger wrote:
>>>>> Just to confirm/document what was discussed on IRC:
>>>>>
>>>>> The RAID array rebuild last week, but we discovered the cause of
>> the
>>>>> low throughput was the RAID card on osgeo4 detected a weak battery
>>>>> state and transitioned to a slower, safer WriteBack policy.
>>>>>
>>>>> We've received a pair of batteries and will be taking a planned
>>>>> downtime to install them.
>>>>>
>>>>> On Thu Apr 03 09:25:58 2014, jldugger wrote:
>>>>>> On Thu Apr 03 08:28:55 2014, ramereth wrote:
>>>>>>> On Thu, Apr 3, 2014 at 12:04 AM, tech at wildintellect.com via RT <
>>>>>>> support at osuosl.org> wrote:
>>>>>>>
>>>>>>>> Something seems amiss. The ProjectsVM stopped responding, high
>>>>>>    disk
>>>>>>>> latency and iowait ( 10-11pm PST
>>>>>>>
>>>>>>> Rebuild Progress on Device at Enclosure 32, Slot 3 Completed 82%
>>>>> in
>>>>>>> 200
>>>>>>> Minutes.
>>>>>>>
>>>>>>> I've never seen a rebuild take this long before but this
>>>> hardware
>>>>> is
>>>>>>> starting to show its age a little.
>>>>>>
>>>>>> The only time I've seen things go this slowly was the time I
>>>> forgot
>>>>> to
>>>>>>    take our (very busy) FTP mirror out of rotation for the
>>>> duration
>>>>> of
>>>>>>    a build. Under RAID 5, recalculating a block on the
>> replacement
>>>>>>    drive requires a reading in a block on all the other drives.
>> So
>>>>>>    rebuilds can 'steal' a lot of I/O from a system that was
>>>> already
>>>>>>    down 1 disk worth of I/O requests per second. While you can
>>>>>>    sometimes tune the RAID firmware to rebuild at a lower
>>>> priority,
>>>>>>    there's a balancing act between service latency and repairing
>>>> the
>>>>>>    RAID array before a second drive fails.
>>>>>>
>>>>>> TL;DR: sorry this is taking so long; I didn't realize the
>> services
>>>>>>    depending on it were quite so IO bound.
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Sac mailing list
>>> Sac at lists.osgeo.org
>>> http://lists.osgeo.org/mailman/listinfo/sac
>>>
>>
> 
> 
> 
> _______________________________________________
> Sac mailing list
> Sac at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/sac
> 



More information about the Sac mailing list