[SAC] [support.osuosl.org #23649] [OSGeo] Failed disk in osgeo4.osuosl.bak
Justin Dugger via RT
support at osuosl.org
Mon Apr 7 11:30:30 PDT 2014
Just to confirm/document what was discussed on IRC:
The RAID array rebuild last week, but we discovered the cause of the low throughput was the RAID card on osgeo4 detected a weak battery state and transitioned to a slower, safer WriteBack policy.
We've received a pair of batteries and will be taking a planned downtime to install them.
On Thu Apr 03 09:25:58 2014, jldugger wrote:
> On Thu Apr 03 08:28:55 2014, ramereth wrote:
> > On Thu, Apr 3, 2014 at 12:04 AM, tech at wildintellect.com via RT <
> > support at osuosl.org> wrote:
> >
> > > Something seems amiss. The ProjectsVM stopped responding, high
> disk
> > > latency and iowait ( 10-11pm PST
> >
> > Rebuild Progress on Device at Enclosure 32, Slot 3 Completed 82% in
> > 200
> > Minutes.
> >
> > I've never seen a rebuild take this long before but this hardware is
> > starting to show its age a little.
>
> The only time I've seen things go this slowly was the time I forgot to
> take our (very busy) FTP mirror out of rotation for the duration of
> a build. Under RAID 5, recalculating a block on the replacement
> drive requires a reading in a block on all the other drives. So
> rebuilds can 'steal' a lot of I/O from a system that was already
> down 1 disk worth of I/O requests per second. While you can
> sometimes tune the RAID firmware to rebuild at a lower priority,
> there's a balancing act between service latency and repairing the
> RAID array before a second drive fails.
>
> TL;DR: sorry this is taking so long; I didn't realize the services
> depending on it were quite so IO bound.
More information about the Sac
mailing list