[BUGS] SCSI hardware failure?

jonathan michaels jlm at caamora.com.au
Mon Jan 14 19:38:11 EST 2008


greetings all,

On Mon, Jan 14, 2008 at 03:04:12PM +1100, Callum Gibson wrote:
> On 14Jan08 14:43, Tong Wang wrote:

=== various bits trimmed for pick a reason.

> }I am really trying to get AMANDA emails back. As for the reason why I 
> }relate the disk error with this, because I tried to reboot the server, with 
> }the first try failed saying:
> }
> }Missing Operating System

not a good sign, these machines have a ms dos partition to hold
teh ms windows toolset that fixes/setups the machine and
configures teh relevent parts .. it looks like this could have
evaporated as part of teh bad media report.

i am not sure how freebsd reports that kind of a situation ..
any thoughts callum ??

> }On the second try, it halted half way, with the following message:
> }
> }Drive on AIC-7902 B at slot 00, 08:07:01, SCSI ID: 0 has exceeded failure 
> }prediction threshold.
> 
> That's really bad. You need to replace that quick I reckon. So you have
> rebooted now, at least.

worse, i think .. i haven't looked this up on google  (i go
thereunless i ABSOLUTELY have too) but from memory this looks
like an embedded scsi chipset and as such means that teh
motherboard needs to be replaced .. service contract time
hopefully 

> }This is why I relate these two issues together, maybe I'm wrong though.
> 
> Yeah, hard to pinpoint what is stuffed now though. It could just be your
> mail system or just the permissions on /tmp. I still tend to think that
> disk errors should result in a total failure, rather than something odd
> like /tmp changing permission, or strange corruption throughout the
> filesystem, although I did see just that on a Solaris 8 system recently.
> And if you were getting corruption, you might expect whole subsystems to
> fail, for example. In my case, I got core-dumping binaries, libraries
> you couldn't link against, etc.
> 
> }df gives the following:
> }
> }Filesystem             1K-blocks      Used     Avail Capacity  Mounted on
> }/dev/mirror/gm0s1a        495726    107454    348614    24%    /
> }devfs                          1         1         0   100%    /dev
> }/dev/mirror/gm0s1d       2026030     78516   1785432     4%    /var
> }/dev/mirror/gm0s1e       2026030    997190    866758    53%    /tmp
> }/dev/mirror/gm0s1f      10154158   1415242   7926584    15%    /usr
> }/dev/stripe/stripe0s1a 800991544 576215642 216765988    73%    /export

just a small hint "df(8) -h" gives a more reasonable/readable
output for example,

Filesystem       Size    Used   Avail Capacity  Mounted on
/dev/idad0s1a    248M     35M    193M    15%    /
devfs            1.0K    1.0K      0B   100%    /dev
/dev/idad0s1d    4.8G    3.0G    1.5G    67%    /home
/dev/idad0s1f    1.9G    9.1M    1.8G     1%    /tmp
/dev/idad0s1e     15G    2.8G     11G    21%    /usr
/dev/idad0s1g    4.8G     19M    4.4G     0%    /var
/dev/idad0s1h    4.5G    4.0K    4.2G     0%    /var/mail
devfs            1.0K    1.0K      0B   100%    /var/named/dev
[caamora] ~> 

unless you need to know how many blocks you have on the
platters, tha above is a raid'd (just raid5, with tape drive i
do not see teh point in raid 1+0) from a couple of 9 gb drives
then split up by fdisk.
 
> Particularly since you're using mirroring/striping, perhaps the real
> drive errors are being hidden under this software and it's incorrectly
> reporting or hiding the failures. I dunno - I haven't used that stuff

the raid subsystem usuall has its own reporting system, perhaps
its time to go looking at teh online (i mean the machines own
reporting system) as per teh proliant with its "surestart"
cdrom environment.

> before (although I use amanda on a 5.4 system regularly, as it happens).
> Perhaps someone with more of a clue will pipe up on this.

i'm planing on using amanda as onn as i can relocate a couple
of 70 gb dlt-7000 tape-drives into a smaller box teh 50+ kg
desktop they are in at teh moment is a bit awkward to move
around <GRIN> i'm not a strong as i wonce was ..  ah thats life. 

sorry for teh potential bad news it looks like its time to make
sure teh backups are as reliable as all the advertising claims,
been there done that ..

best wishes for teh new year and teh just past christmas

much kind regards

jonathan

-- 
================================================================
powered by ..
QNX, OS9 and freeBSD  --  http://caamora com au/operating system
==== === appropriate solution in an inappropriate world === ====


More information about the BUGS mailing list