[BUGS] Worth persuing or try NetBSD?

Andy Farkas chuzzwassa at gmail.com
Mon Dec 28 14:38:30 EST 2009


Hello arm-chair FreeBSD kernel hackers!

My main gateway box has been running FreeBSD 4.12 for quite a while until
I recently decided to upgrade it, via the source upgrade route:

 1/ cvsup RELENG_5_5_0_RELEASE, make (GENERIC) world, install world, reboot.
 2/ cvsup RELENG_6_0_0_RELEASE, make (GENERIC) world, install world, reboot.
 3/ cvsup RELENG_6_1_0_RELEASE, make (GENERIC) world, install world, reboot.

6.1-RELEASE is where the problem started. Processes started hanging during
disk I/O. 6.0-R works flawlessly and is able to do buildworlds without fail.

The disk controller is a:

amr0 at pci0:6:0:	class=0x018000 card=0x00000000 chip=0x9010101e rev=0x03 hdr=0x00
    vendor   = 'American Megatrends Inc.'
    device   = 'MegaRAID 428 Ultra Fast Wide SCSI RAID Controller'
    class    = mass storage


After some research, I discovered there was a "mega update" merged into the
amr(4) driver between 6.0-R and 6.1-R. So this is what I am concentrating on.

Booting back into 6.0-R I built the 6.1-R source again and included options
DDB, KDB, and BREAK_TO_DEBUGGER in the kernel config. I ran this kernel for
a bit until it started hanging. I then did a 'shutdown now' and some processes
were still hung but I got the single user prompt. I typed 'reboot', it hung.

Then I pressed CTRL-ALT-ESC and got the DDB prompt. I type 'panic' and got
a crash dump. I then rebooted back to a working 6.0-R kernel.

So I have a crash dump and wish to track down what happened. Here is what ps
says about the dump:

<div>
hewey# ps axlHwwwM /var/crash/vmcore.3 -N /boot/kernel/kernel -O lockname
  UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME
COMMAND            PID LOCK    TT  STAT      TIME COMMAND
    0     0     0   4  96  0     0     0 -      WLs   ??    0:00.00
[swapper]            0 -       ??  WLs    0:00.00 [swapper]
    0     1     0   0   8  0   724     0 wait   DLs   ??    0:00.31
[init]               1 -       ??  DLs    0:00.31 [init]
    0     2     0   0  -8  0     0     0 -      DL    ??    0:56.69
[g_event]            2 -       ??  DL     0:56.69 [g_event]
    0     3     0   0  -8  0     0     0 -      DL    ??    0:40.25
[g_up]               3 -       ??  DL     0:40.25 [g_up]
    0     4     0   0  -8  0     0     0 -      DL    ??    0:53.29
[g_down]             4 -       ??  DL     0:53.29 [g_down]
    0     5     0   0   8  0     0     0 -      DL    ??    0:00.00
[thread taskq]       5 -       ??  DL     0:00.00 [thread taskq]
    0     6     0   0   8  0     0     0 -      DL    ??    0:00.00
[kqueue taskq]       6 -       ??  DL     0:00.00 [kqueue taskq]
    0     7     0   0  -8  0     0     0 -      DL    ??    0:05.39
[fdc0]               7 -       ??  DL     0:05.39 [fdc0]
    0     8     0   0 -16  0     0     0 psleep DL    ??    0:01.68
[pagedaemon]         8 -       ??  DL     0:01.68 [pagedaemon]
    0     9     0   4  20  0     0     0 psleep DL    ??    0:00.00
[vmdaemon]           9 -       ??  DL     0:00.00 [vmdaemon]
    0    10     0   0 -16  0     0     0 ktrace DL    ??    0:00.00
[ktrace]            10 -       ??  DL     0:00.00 [ktrace]
    0    11     0  49 171  0     0     0 -      RL    ??  5045:25.09
[idle]              11 -       ??  RL   5045:25.09 [idle]
    0    12     0   0 -44  0     0     0 -      WL    ??    1:58.20
[swi1: net]         12 -       ??  WL     1:58.20 [swi1: net]
    0    13     0   0 -32  0     0     0 -      WL    ??   13:22.54
[swi4: clock sio    13 -       ??  WL    13:22.54 [swi4: clock sio]
    0    14     0   0 -36  0     0     0 -      WL    ??    0:00.00
[swi3: vm]          14 -       ??  WL     0:00.00 [swi3: vm]
    0    15     0   0 -16  0     0     0 -      DL    ??    1:05.02
[yarrow]            15 -       ??  DL     1:05.02 [yarrow]
    0    16     0   0 -40  0     0     0 -      WL    ??    0:00.00
[swi2: cambio]      16 -       ??  WL     0:00.00 [swi2: cambio]
    0    17     0   0 -28  0     0     0 -      WL    ??    0:00.00
[swi5: +]           17 -       ??  WL     0:00.00 [swi5: +]
    0    18     0   0 -24  0     0     0 -      WL    ??    0:00.00
[swi6: +]           18 -       ??  WL     0:00.00 [swi6: +]
    0    19     0   0 -24  0     0     0 -      WL    ??    0:00.00
[swi6: task queu    19 -       ??  WL     0:00.00 [swi6: task queue]
    0    20     0   0 -64  0     0     0 -      WL    ??    0:00.00
[irq14: ata0]       20 -       ??  WL     0:00.00 [irq14: ata0]
    0    21     0   0 -64  0     0     0 -      WL    ??    0:00.00
[irq15: ata1]       21 -       ??  WL     0:00.00 [irq15: ata1]
    0    22     0   0 -64  0     0     0 -      WL    ??    0:06.59
[irq11: amr0]       22 -       ??  WL     0:06.59 [irq11: amr0]
    0    23     0   0 -68  0     0     0 -      WL    ??    0:42.83
[irq10: fxp0]       23 -       ??  WL     0:42.83 [irq10: fxp0]
    0    24     0   0 -60  0     0     0 -      RL    ??    0:00.10
[irq1: atkbd0]      24 -       ??  RL     0:00.10 [irq1: atkbd0]
    0    25     0   0 -60  0     0     0 -      WL    ??    0:00.00
[irq7: ppc0]        25 -       ??  WL     0:00.00 [irq7: ppc0]
    0    26     0   0 -48  0     0     0 -      WL    ??    0:12.15
[swi0: sio]         26 -       ??  WL     0:12.15 [swi0: sio]
    0    27     0   0 171  0     0     0 pgzero DL    ??    0:23.90
[pagezero]          27 -       ??  DL     0:23.90 [pagezero]
    0    28     0   0 -16  0     0     0 psleep DL    ??    0:07.56
[bufdaemon]         28 -       ??  DL     0:07.56 [bufdaemon]
    0    29     0   0  -4  0     0     0 getblk DL    ??    0:59.97
[syncer]            29 -       ??  DL     0:59.97 [syncer]
    0    30     0   0  -7  0     0     0 bo_wwa DL    ??    0:03.41
[vnlru]             30 -       ??  DL     0:03.41 [vnlru]
    0    31     0   0 -16  0     0     0 sdflus DL    ??    0:17.20
[softdepflush]      31 -       ??  DL     0:17.20 [softdepflush]
    0    32     0   0  96  0     0     0 -      DL    ??    1:37.15
[schedcpu]          32 -       ??  DL     1:37.15 [schedcpu]
  100   572     1   0  -4  0 14756     0 ufs    D     ??    6:23.72
[squid]            572 -       ??  D      6:23.72 [squid]
  100   604   572   0 -84  0     0     0 -      ZW    ??    0:00.00
<defunct>          604 -       ??  ZW     0:00.00 <defunct>
    0 11202     1   0  -4  0  1304     0 ufs    D     ??    0:34.76
[find]           11202 -       ??  D      0:34.76 [find]
    0 12415     1   0   8  0  1632     0 wait   Ds    ??    0:00.03
[sh]             12415 -       ??  Ds     0:00.03 [sh]
    0 12416 12415   0  -8  0  1232     0 biord  D+    ??    0:00.04
[reboot]         12416 -       ??  D+     0:00.04 [reboot]
hewey#
</div>

As you can see PIDs 29, 30, and 31 are stuck because of disk I/O.

But this is where I am stuck. I do not know kgdb well enough to debug
the crash dump I have. The only thing I have learnt to do is 'info
threads' and then 'thread 22'. Then, bang, brick wall.

Is it worth persuing this to debug the driver for such an old piece of
hardware in an old Pentium-Pro 200MHz? Or would it be worth trying NetBSD
instead?

Perhaps I should post this to the -fs or -hackers mlist.... is there a
-scsi list?  .../me checks...

-andyf


More information about the BUGS mailing list