[BUGS] SCSI hardware failure?

Tong Wang tong at hsa.com.au
Mon Jan 14 15:19:09 EST 2008



Callum Gibson wrote:
> On 14Jan08 14:43, Tong Wang wrote:
> }su amanda
> }mail -s test -F tong at hsa.com.au
> }
> }and here is the result:
> }
> }mail: /tmp/mail.RspIDEqoLfkF: Permission denied
> }
> }When running amverify which is provided by AMANDA, it always sends a 
> }summary report email to my email address, and other than the normal output 
> }it showed the similar Permission denied error at the end of the summary 
> }which is displayed in stdout. However, when sending mails from root 
> }account, it works. But I got permission denied error when su to other user 
> }accounts from NIS.
>
> Hmmm, this doesn't sound good. It could just be the permissions on /tmp.
> I don't suppose you did any restores recently to /tmp?  At the end of
> restore using amrecover it asks to set permissions on '.' which might
> do that. Is your /tmp still drwxrwxrwt ?
>   
Doh!!! after all the googling and reading and pulling my hair, I forgot 
the very basic thing - checking permissions on /tmp ! yeah it's set to 
755 somehow, but I'm sure I didn't set this up.. that's the problem and 
now AMANDA is working as before...... you are a true champ Callum!
> Does NIS look to be up and running properly? Other system services?
> Does passwd and group look to be sensible? Do other parts of
> permissioning and user id look ok (eg. ls -l output?).
>
> }I am really trying to get AMANDA emails back. As for the reason why I 
> }relate the disk error with this, because I tried to reboot the server, with 
> }the first try failed saying:
> }
> }Missing Operating System
> }
> }On the second try, it halted half way, with the following message:
> }
> }Drive on AIC-7902 B at slot 00, 08:07:01, SCSI ID: 0 has exceeded failure 
> }prediction threshold.
>
> That's really bad. You need to replace that quick I reckon. So you have
> rebooted now, at least.
>
>   
Yeah this is the top priority on my to-do list now, didn't realise it is 
this serious bacause of the "No Action Needed" message.
> }This is why I relate these two issues together, maybe I'm wrong though.
>
> Yeah, hard to pinpoint what is stuffed now though. It could just be your
> mail system or just the permissions on /tmp. I still tend to think that
> disk errors should result in a total failure, rather than something odd
> like /tmp changing permission, or strange corruption throughout the
> filesystem, although I did see just that on a Solaris 8 system recently.
> And if you were getting corruption, you might expect whole subsystems to
> fail, for example. In my case, I got core-dumping binaries, libraries
> you couldn't link against, etc.
>
> }df gives the following:
> }
> }Filesystem             1K-blocks      Used     Avail Capacity  Mounted on
> }/dev/mirror/gm0s1a        495726    107454    348614    24%    /
> }devfs                          1         1         0   100%    /dev
> }/dev/mirror/gm0s1d       2026030     78516   1785432     4%    /var
> }/dev/mirror/gm0s1e       2026030    997190    866758    53%    /tmp
> }/dev/mirror/gm0s1f      10154158   1415242   7926584    15%    /usr
> }/dev/stripe/stripe0s1a 800991544 576215642 216765988    73%    /export
>
> Particularly since you're using mirroring/striping, perhaps the real
> drive errors are being hidden under this software and it's incorrectly
> reporting or hiding the failures. I dunno - I haven't used that stuff
> before (although I use amanda on a 5.4 system regularly, as it happens).
> Perhaps someone with more of a clue will pipe up on this.
>
>     C
>
>   
All right Callum like I said you are a champ. Greatly appreciate your 
time and help!

Thanks

Bestest Regards

Tong


More information about the BUGS mailing list