[BUGS] SCSI hardware failure?

Callum Gibson callumgibson at optusnet.com.au
Mon Jan 14 15:04:12 EST 2008


On 14Jan08 14:43, Tong Wang wrote:
}su amanda
}mail -s test -F tong at hsa.com.au
}
}and here is the result:
}
}mail: /tmp/mail.RspIDEqoLfkF: Permission denied
}
}When running amverify which is provided by AMANDA, it always sends a 
}summary report email to my email address, and other than the normal output 
}it showed the similar Permission denied error at the end of the summary 
}which is displayed in stdout. However, when sending mails from root 
}account, it works. But I got permission denied error when su to other user 
}accounts from NIS.

Hmmm, this doesn't sound good. It could just be the permissions on /tmp.
I don't suppose you did any restores recently to /tmp?  At the end of
restore using amrecover it asks to set permissions on '.' which might
do that. Is your /tmp still drwxrwxrwt ?

Does NIS look to be up and running properly? Other system services?
Does passwd and group look to be sensible? Do other parts of
permissioning and user id look ok (eg. ls -l output?).

}I am really trying to get AMANDA emails back. As for the reason why I 
}relate the disk error with this, because I tried to reboot the server, with 
}the first try failed saying:
}
}Missing Operating System
}
}On the second try, it halted half way, with the following message:
}
}Drive on AIC-7902 B at slot 00, 08:07:01, SCSI ID: 0 has exceeded failure 
}prediction threshold.

That's really bad. You need to replace that quick I reckon. So you have
rebooted now, at least.

}This is why I relate these two issues together, maybe I'm wrong though.

Yeah, hard to pinpoint what is stuffed now though. It could just be your
mail system or just the permissions on /tmp. I still tend to think that
disk errors should result in a total failure, rather than something odd
like /tmp changing permission, or strange corruption throughout the
filesystem, although I did see just that on a Solaris 8 system recently.
And if you were getting corruption, you might expect whole subsystems to
fail, for example. In my case, I got core-dumping binaries, libraries
you couldn't link against, etc.

}df gives the following:
}
}Filesystem             1K-blocks      Used     Avail Capacity  Mounted on
}/dev/mirror/gm0s1a        495726    107454    348614    24%    /
}devfs                          1         1         0   100%    /dev
}/dev/mirror/gm0s1d       2026030     78516   1785432     4%    /var
}/dev/mirror/gm0s1e       2026030    997190    866758    53%    /tmp
}/dev/mirror/gm0s1f      10154158   1415242   7926584    15%    /usr
}/dev/stripe/stripe0s1a 800991544 576215642 216765988    73%    /export

Particularly since you're using mirroring/striping, perhaps the real
drive errors are being hidden under this software and it's incorrectly
reporting or hiding the failures. I dunno - I haven't used that stuff
before (although I use amanda on a 5.4 system regularly, as it happens).
Perhaps someone with more of a clue will pipe up on this.

    C

-- 

Callum Gibson @ home
http://members.optusnet.com.au/callumgibson/


More information about the BUGS mailing list