[BUGS] SCSI hardware failure?
Tong Wang
tong at hsa.com.au
Mon Jan 14 15:19:09 EST 2008
Callum Gibson wrote:
> On 14Jan08 14:43, Tong Wang wrote:
> }su amanda
> }mail -s test -F tong at hsa.com.au
> }
> }and here is the result:
> }
> }mail: /tmp/mail.RspIDEqoLfkF: Permission denied
> }
> }When running amverify which is provided by AMANDA, it always sends a
> }summary report email to my email address, and other than the normal output
> }it showed the similar Permission denied error at the end of the summary
> }which is displayed in stdout. However, when sending mails from root
> }account, it works. But I got permission denied error when su to other user
> }accounts from NIS.
>
> Hmmm, this doesn't sound good. It could just be the permissions on /tmp.
> I don't suppose you did any restores recently to /tmp? At the end of
> restore using amrecover it asks to set permissions on '.' which might
> do that. Is your /tmp still drwxrwxrwt ?
>
Doh!!! after all the googling and reading and pulling my hair, I forgot
the very basic thing - checking permissions on /tmp ! yeah it's set to
755 somehow, but I'm sure I didn't set this up.. that's the problem and
now AMANDA is working as before...... you are a true champ Callum!
> Does NIS look to be up and running properly? Other system services?
> Does passwd and group look to be sensible? Do other parts of
> permissioning and user id look ok (eg. ls -l output?).
>
> }I am really trying to get AMANDA emails back. As for the reason why I
> }relate the disk error with this, because I tried to reboot the server, with
> }the first try failed saying:
> }
> }Missing Operating System
> }
> }On the second try, it halted half way, with the following message:
> }
> }Drive on AIC-7902 B at slot 00, 08:07:01, SCSI ID: 0 has exceeded failure
> }prediction threshold.
>
> That's really bad. You need to replace that quick I reckon. So you have
> rebooted now, at least.
>
>
Yeah this is the top priority on my to-do list now, didn't realise it is
this serious bacause of the "No Action Needed" message.
> }This is why I relate these two issues together, maybe I'm wrong though.
>
> Yeah, hard to pinpoint what is stuffed now though. It could just be your
> mail system or just the permissions on /tmp. I still tend to think that
> disk errors should result in a total failure, rather than something odd
> like /tmp changing permission, or strange corruption throughout the
> filesystem, although I did see just that on a Solaris 8 system recently.
> And if you were getting corruption, you might expect whole subsystems to
> fail, for example. In my case, I got core-dumping binaries, libraries
> you couldn't link against, etc.
>
> }df gives the following:
> }
> }Filesystem 1K-blocks Used Avail Capacity Mounted on
> }/dev/mirror/gm0s1a 495726 107454 348614 24% /
> }devfs 1 1 0 100% /dev
> }/dev/mirror/gm0s1d 2026030 78516 1785432 4% /var
> }/dev/mirror/gm0s1e 2026030 997190 866758 53% /tmp
> }/dev/mirror/gm0s1f 10154158 1415242 7926584 15% /usr
> }/dev/stripe/stripe0s1a 800991544 576215642 216765988 73% /export
>
> Particularly since you're using mirroring/striping, perhaps the real
> drive errors are being hidden under this software and it's incorrectly
> reporting or hiding the failures. I dunno - I haven't used that stuff
> before (although I use amanda on a 5.4 system regularly, as it happens).
> Perhaps someone with more of a clue will pipe up on this.
>
> C
>
>
All right Callum like I said you are a champ. Greatly appreciate your
time and help!
Thanks
Bestest Regards
Tong
More information about the BUGS
mailing list