[SAC] [support.osuosl.org #23649] [OSGeo] Failed disk in osgeo4.osuosl.bak

Justin Dugger via RT support at osuosl.org
Mon Apr 7 11:30:30 PDT 2014


Just to confirm/document what was discussed on IRC:

The RAID array rebuild last week, but we discovered the cause of the low throughput was the RAID card on osgeo4 detected a weak battery state and transitioned to a slower, safer WriteBack policy.

We've received a pair of batteries and will be taking a planned downtime to install them.

On Thu Apr 03 09:25:58 2014, jldugger wrote:
> On Thu Apr 03 08:28:55 2014, ramereth wrote:
> > On Thu, Apr 3, 2014 at 12:04 AM, tech at wildintellect.com via RT <
> > support at osuosl.org> wrote:
> >
> > > Something seems amiss. The ProjectsVM stopped responding, high
>    disk
> > > latency and iowait ( 10-11pm PST
> >
> > Rebuild Progress on Device at Enclosure 32, Slot 3 Completed 82% in
> > 200
> > Minutes.
> >
> > I've never seen a rebuild take this long before but this hardware is
> > starting to show its age a little.
> 
> The only time I've seen things go this slowly was the time I forgot to
>    take our (very busy) FTP mirror out of rotation for the duration of
>    a build. Under RAID 5, recalculating a block on the replacement
>    drive requires a reading in a block on all the other drives. So
>    rebuilds can 'steal' a lot of I/O from a system that was already
>    down 1 disk worth of I/O requests per second. While you can
>    sometimes tune the RAID firmware to rebuild at a lower priority,
>    there's a balancing act between service latency and repairing the
>    RAID array before a second drive fails.
> 
> TL;DR: sorry this is taking so long; I didn't realize the services
>    depending on it were quite so IO bound.





More information about the Sac mailing list