Quantcast

sparc64 hang with zfs v28

classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

sparc64 hang with zfs v28

Roger Hammerstein


I saw the announcement for zfs v28 so I updated an ultra60
to the top of the tree.   a 'zfs list' or 'zpool status' or 'kldload zfs'
will hang my machine.

I can break to the debugger via the serial console after I enabled
the alternate break sequence.

Has anyone else tried the latest zfs with sparc64 machines ?


falcon# uname -a
FreeBSD falcon 9.0-CURRENT FreeBSD 9.0-CURRENT #2: Wed Mar  2 11:16:56 EST 2011     root@falcon:/usr/obj/usr/src/sys/GENERIC  sparc64
falcon# kldload zfs
[HANG]


vmstat in  a window hangs, doesn't print anything helpful:
vmstat::
 0 0 0    457M  1920M    11   0   0   0     0   0   0   0 2279  219  413  0  1 99
 0 0 0    457M  1920M     0   0   0   0     0   0   0   0 2271  117  372  0  0 100
 0 0 0    457M  1920M    11   0   0   0     0   0   0   0 2274  173  393  0  1 99
 1 0 0    460M  1913M    84   0   1   0    13   0   0   0 2331  241  855  0  5 95
[hang]




falcon# KDB: enter: Break sequence on console
[ thread pid 1013 tid 100058 ]
Stopped at      kdb_enter+0x80: ta              %xcc, 1
db>


db> ps
  pid  ppid  pgrp   uid   state   wmesg         wchan        cmd
 1013  1006  1013     0  R+      CPU 1                       kldload
 1006  1003  1006     0  Ss+     pause    0xfffff80001886dd8 csh
 1003   875  1003     0  Ss      select   0xfffff8000142cc40 sshd
 1002   998  1002     0  S+      nanslp   0xc0ac8c28 vmstat
  998   995   998     0  Ss+     pause    0xfffff80001887240 csh
  995   875   995     0  Ss      select   0xfffff8000142ca40 sshd
  994   987   994     0  S+      select   0xfffff800014c92c0 top
  987   984   987     0  Ss+     pause    0xfffff80001889240 csh
  984   875   984     0  Ss      select   0xfffff8000142c740 sshd
  980   975   980     0  S+      ttyin    0xfffff800011394a8 csh
  976     1     1     0  S       ttydcd   0xfffff800011390e8 getty
  975     1   975     0  Ss+     wait     0xfffff8000179a000 login
  974     1   974     0  Ss+     ttyin    0xfffff8000113b8a8 getty
  973     1   973     0  Ss+     ttyin    0xfffff8000113bca8 getty
  972     1   972     0  Ss+     ttyin    0xfffff800013d00a8 getty
  971     1   971     0  Ss+     ttyin    0xfffff800013d04a8 getty
  970     1   970     0  Ss+     ttyin    0xfffff800013d08a8 getty
  969     1   969     0  Ss+     ttyin    0xfffff800011380a8 getty
  968     1   968     0  Ss+     ttyin    0xfffff800011384a8 getty
  967     1   967     0  Ss+     ttyin    0xfffff800011388a8 getty
  894     1   894     0  Ss      nanslp   0xc0ac8c28 cron
  887     1   887    25  Ss      pause    0xfffff80001651240 sendmail
  883     1   883     0  Ss      select   0xfffff800017a27c0 sendmail
  875     1   875     0  Ss      select   0xfffff8000142bb40 sshd
  795     1   795     0  Ss      select   0xfffff8000142b540 ntpd
  591     1   591     0  Ss      select   0xfffff8000142b240 syslogd
  414     1   414     0  Ss      select   0xfffff8000142ae40 devd
  109     1   109     0  Ss      pause    0xfffff80001568dd8 adjkerntz
   18     0     0     0  DL      -        0xc0ac7978 [schedcpu]
   17     0     0     0  DL      sdflush  0xc0c93350 [softdepflush]
   16     0     0     0  DL      vlruwt   0xfffff800010ca8d0 [vnlru]
   15     0     0     0  DL      syncer   0xc0c84978 [syncer]
   14     0     0     0  DL      psleep   0xc0c844a8 [bufdaemon]
    9     0     0     0  DL      pgzero   0xc0c96474 [pagezero]
    8     0     0     0  DL      psleep   0xc0c952b0 [vmdaemon]
    7     0     0     0  DL      psleep   0xc0c952ec [pagedaemon]
    6     0     0     0  DL      ccb_scan 0xc0aa7fb8 [xpt_thrd]
    5     0     0     0  DL      waiting_ 0xc0c874e0 [sctp_iterator]
   13     0     0     0  DL      -        0xc0ac7978 [yarrow]
    4     0     0     0  DL      -        0xc0ac3d60 [g_down]
    3     0     0     0  DL      -        0xc0ac3d58 [g_up]
    2     0     0     0  DL      -        0xc0ac3d48 [g_event]
   12     0     0     0  RL      (threaded)                  [intr]
100026                   I                                   [vec2022: sym1]
100025                   I                                   [vec2016: sym0]
100024                   RunQ                                [vec2017: hme0]
100023                   RunQ                                [swi0: uart uart+]
100022                   I                                   [vec2024: pcib0]
100021                   I                                   [vec2021: pcib0]
100020                   I                                   [swi6: task queue]
100019                   I                                   [swi6: Giant taskq]
100016                   I                                   [swi5: +]
100015                   I                                   [swi2: cambio]
100008                   I                                   [swi3: vm]
100007                   I                                   [swi1: netisr 0]
100006                   RunQ                                [swi4: clock]
100005                   Run     CPU 0                       [swi4: clock]
   11     0     0     0  RL      (threaded)                  [idle]
100004                   CanRun                              [idle: cpu0]
100003                   CanRun                              [idle: cpu1]
    1     0     1     0  SLs     wait     0xfffff800010c9a70 [init]
   10     0     0     0  DL      audit_wo 0xc0c927d8 [audit]
    0     0     0     0  DLs     (threaded)                  [kernel]
100027                   D       -        0xc0ac7978 [deadlkres]
100018                   D       -        0xfffff8000108d400 [thread taskq]
100017                   D       -        0xfffff8000108d480 [ffs_trim taskq]
100014                   D       -        0xfffff8000108d580 [kqueue taskq]
100012                   D       -        0xfffff8000108d600 [firmware taskq]
100000                   D       sched    0xc0ac3f18 [swapper]
db>



db> trace
Tracing pid 1013 tid 100058 td 0xfffff80001428cc0
uart_intr_rxready() at uart_intr_rxready+0xbc
scc_bfe_intr() at scc_bfe_intr+0xbc
intr_event_handle() at intr_event_handle+0x64
intr_execute_handlers() at intr_execute_handlers+0x8
intr_fast() at intr_fast+0x68
-- interrupt level=0xc pil=0 %o7=0xc0477e68 --
fixup_filename() at fixup_filename+0x4
witness_checkorder() at witness_checkorder+0x98
_mtx_lock_flags() at _mtx_lock_flags+0x110
_vm_map_lock_read() at _vm_map_lock_read+0x1c
vm_map_lookup() at vm_map_lookup+0x4c
vm_fault_hold() at vm_fault_hold+0x94
vm_fault() at vm_fault+0x14
trap_pfault() at trap_pfault+0x338
trap() at trap+0x3a8
-- fast data access mmu miss tar=0xc18d4000 %o7=0xc03fa894 --
opensolaris_utsname_init() at opensolaris_utsname_init+0x8c
linker_load_dependencies() at linker_load_dependencies+0x260
link_elf_load_file() at link_elf_load_file+0x5ac
linker_load_module() at linker_load_module+0xa30
kern_kldload() at kern_kldload+0xb8
kldload() at kldload+0x60
syscallenter() at syscallenter+0x270
syscall() at syscall+0x74
-- syscall (304, FreeBSD ELF64, kldload) %o7=0x100cbc --
userland() at 0x40475108
user trace: trap %o7=0x100cbc
pc 0x40475108, sp 0x7fdffffdc51
pc 0x100a90, sp 0x7fdffffe1d1
pc 0x40206fb4, sp 0x7fdffffe291
done
db>


Anyone else try it yet ?
(How can I show what pid 1013 is doing ?)
     _______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: sparc64 hang with zfs v28

Marius Strobl
On Wed, Mar 02, 2011 at 11:46:38AM -0500, Roger Hammerstein wrote:
>
>
> I saw the announcement for zfs v28 so I updated an ultra60
> to the top of the tree.   a 'zfs list' or 'zpool status' or 'kldload zfs'
> will hang my machine.
>

It looks like the binutils 2.17 import broke kernel modules for reasons
unknown so far, resulting in said hang.

Marius

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: sparc64 hang with zfs v28

Marius Strobl
On Wed, Mar 02, 2011 at 09:03:10PM +0100, Marius Strobl wrote:

> On Wed, Mar 02, 2011 at 11:46:38AM -0500, Roger Hammerstein wrote:
> >
> >
> > I saw the announcement for zfs v28 so I updated an ultra60
> > to the top of the tree.   a 'zfs list' or 'zpool status' or 'kldload zfs'
> > will hang my machine.
> >
>
> It looks like the binutils 2.17 import broke kernel modules for reasons
> unknown so far, resulting in said hang.
>

FYI, kernel modules generally should work again with r219340, I haven't
tested ZFS though.

Marius

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

RE: sparc64 hang with zfs v28

Roger Hammerstein


> FYI, kernel modules generally should work again with r219340, I haven't
> tested ZFS though.


Thanks!
I cvsuppedd and rebuilt kernel.




falcon# uname -a
FreeBSD falcon 9.0-CURRENT FreeBSD 9.0-CURRENT #3: Sun Mar  6 18:55:14 EST 2011     root@falcon:/usr/obj/usr/src/sys/GENERIC  sparc64
falcon#

I did a kldload zfs and it loaded ok.

falcon# kldstat
Id Refs Address            Size     Name
 1    9 0xc0000000 e42878   kernel
 2    1 0xc14a2000 32e000   zfs.ko
 3    1 0xc17d0000 104000   opensolaris.ko
falcon#


But a 'zpool status' or 'zfs list' will cause a zfs or zpool process
to eat 99% of a cpu and essentially hang the shell i ran zfs/zpool in.



falcon# zfs list

ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;

            to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.

ZFS filesystem version 5

ZFS storage pool version 28

[Hang here]





last pid:  1012;  load averages:  0.79,  0.30,  0.16                          up 0+00:13:58  20:58:43

23 processes:  2 running, 21 sleeping

CPU:  0.0% user,  0.0% nice, 52.5% system,  0.0% interrupt, 47.5% idle

Mem: 16M Active, 11M Inact, 46M Wired, 64K Cache, 12M Buf, 1915M Free

Swap: 4055M Total, 4055M Free



  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND                          

 1006 root        1  53    0 21672K  2904K CPU1    1   0:05 99.47% zfs

  998 root        1  40    0 41776K  6376K select  0   0:01  0.00% sshd

  994 root        1  16    0 11880K  3536K pause   0   0:01  0.00% csh

  795 root        1  40    0 16720K  3968K select  0   0:00  0.00% ntpd

 1001 root        1  16    0 11880K  3464K pause   0   0:00  0.00% csh

  975 root        1   8    0 25168K  2672K wait    1   0:00  0.00% login







stays at 99%.

truss -p 1006 doesn't "attach", it just hangs.



ctrl-t on the zfs list shell:

oad: 0.95  cmd: zfs 1006 [running] 182.26r 0.00u 4.66s 99% 2872k

load: 0.95  cmd: zfs 1006 [running] 183.30r 0.00u 4.66s 99% 2872k

load: 0.95  cmd: zfs 1006 [running] 183.76r 0.00u 4.66s 99% 2872k

load: 0.95  cmd: zfs 1006 [running] 184.08r 0.00u 4.66s 99% 2872k

load: 0.95  cmd: zfs 1006 [running] 184.36r 0.00u 4.66s 99% 2872k




A second time with zpool status::
last pid:  1224;  load averages:  0.98,  0.55,  0.24                                     up 0+02:07:39  23:12:33
26 processes:  2 running, 24 sleeping
CPU:  0.0% user,  0.0% nice, 50.2% system,  0.4% interrupt, 49.4% idle
Mem: 18M Active, 13M Inact, 46M Wired, 64K Cache, 12M Buf, 1911M Free
Swap: 4055M Total, 4055M Free

  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND                                      
 1200 root        1  62    0 22704K  2920K CPU1    1   0:00 99.02% zpool
  793 root        1  40    0 16720K  3968K select  0   0:02  0.00% ntpd
 1180 root        1  16    0 11880K  3536K pause   1   0:01  0.00% csh
 1184 root        1  40    0 41776K  6376K select  0   0:01  0.00% sshd
 1201 root        1  40    0 41776K  6376K select  0   0:01  0.00% sshd

falcon# truss -p 1200
truss: can not attach to target process: Device busy
falcon# truss -p 1200
truss: can not attach to target process: Device busy
falcon#


ctrl-t on the zpool status command:
load: 0.62  cmd: zpool 1200 [running] 54.30r 0.00u 0.07s 83% 2888k
load: 0.99  cmd: zpool 1200 [running] 271.73r 0.00u 0.07s 99% 2888k
load: 0.99  cmd: zpool 1200 [running] 272.37r 0.00u 0.07s 99% 2888k
load: 0.99  cmd: zpool 1200 [running] 272.75r 0.00u 0.07s 99% 2888k
load: 0.99  cmd: zpool 1200 [running] 273.38r 0.00u 0.07s 99% 2888k





truss -f zpool status::

 1014: sigprocmask(SIG_SETMASK,0x0,0x0)          = 0 (0x0)
 1014: sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
 1014: sigprocmask(SIG_SETMASK,0x0,0x0)          = 0 (0x0)
 1014: sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
 1014: sigprocmask(SIG_SETMASK,0x0,0x0)          = 0 (0x0)
 1014: modfind(0x40d3f140,0x9a0,0xc78,0x10a,0x1027e8,0x7fdffffe8d0) = 303 (0x12f)
 1014: open("/dev/zfs",O_RDWR,06170)             = 3 (0x3)
 1014: open("/dev/zero",O_RDONLY,0666)           = 4 (0x4)
 1014: open("/etc/zfs/exports",O_RDONLY,0666)    ERR#2 'No such file or directory'
 1014: __sysctl(0x7fdffff8de8,0x2,0x7fdffff8eb0,0x7fdffff8f18,0x40d3f118,0x13) = 0 (0x0)
 1014: __sysctl(0x7fdffff8eb0,0x4,0x40e4d084,0x7fdffff8fe0,0x0,0x0) = 0 (0x0)
[hang]
ctrl-t

load: 0.31  cmd: zpool 1014 [running] 12.47r 0.00u 0.07s 44% 2912k


 1014 root        1  54    0 22704K  2944K CPU0    0   0:00 98.47% zpool


falcon# truss -p 1014
truss: can not attach to target process: Device busy

iostat -x 1 shows no reads and no writes to any disks


There's a 2-disk zfs mirror attached to this ultra60 from a freebsd-8 install, but I don't know
why that would cause a problem with the latest zfs v28.

I can successfully read the labels on those two mirror disks with zdb -l /dev/da[36]


     _______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: sparc64 hang with zfs v28

Marius Strobl
On Sun, Mar 06, 2011 at 11:27:42PM -0500, Roger Hammerstein wrote:

>
>
> > FYI, kernel modules generally should work again with r219340, I haven't
> > tested ZFS though.
>
>
> Thanks!
> I cvsuppedd and rebuilt kernel.
>
>
>
>
> falcon# uname -a
> FreeBSD falcon 9.0-CURRENT FreeBSD 9.0-CURRENT #3: Sun Mar  6 18:55:14 EST 2011     root@falcon:/usr/obj/usr/src/sys/GENERIC  sparc64
> falcon#
>
> I did a kldload zfs and it loaded ok.
>
> falcon# kldstat
> Id Refs Address            Size     Name
>  1    9 0xc0000000 e42878   kernel
>  2    1 0xc14a2000 32e000   zfs.ko
>  3    1 0xc17d0000 104000   opensolaris.ko
> falcon#
>
>
> But a 'zpool status' or 'zfs list' will cause a zfs or zpool process
> to eat 99% of a cpu and essentially hang the shell i ran zfs/zpool in.
>
>
>
> falcon# zfs list
>
> ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;
>
>             to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.
>
> ZFS filesystem version 5
>
> ZFS storage pool version 28
>
> [Hang here]
>
>
>
>
>
> last pid:  1012;  load averages:  0.79,  0.30,  0.16                          up 0+00:13:58  20:58:43
>
> 23 processes:  2 running, 21 sleeping
>
> CPU:  0.0% user,  0.0% nice, 52.5% system,  0.0% interrupt, 47.5% idle
>
> Mem: 16M Active, 11M Inact, 46M Wired, 64K Cache, 12M Buf, 1915M Free
>
> Swap: 4055M Total, 4055M Free
>
>
>
>   PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND                          
>
>  1006 root        1  53    0 21672K  2904K CPU1    1   0:05 99.47% zfs
>
>   998 root        1  40    0 41776K  6376K select  0   0:01  0.00% sshd
>
>   994 root        1  16    0 11880K  3536K pause   0   0:01  0.00% csh
>
>   795 root        1  40    0 16720K  3968K select  0   0:00  0.00% ntpd
>
>  1001 root        1  16    0 11880K  3464K pause   0   0:00  0.00% csh
>
>   975 root        1   8    0 25168K  2672K wait    1   0:00  0.00% login
>
>
>
>
>
>
>
> stays at 99%.
>
> truss -p 1006 doesn't "attach", it just hangs.
>
>
>
> ctrl-t on the zfs list shell:
>
> oad: 0.95  cmd: zfs 1006 [running] 182.26r 0.00u 4.66s 99% 2872k
>
> load: 0.95  cmd: zfs 1006 [running] 183.30r 0.00u 4.66s 99% 2872k
>
> load: 0.95  cmd: zfs 1006 [running] 183.76r 0.00u 4.66s 99% 2872k
>
> load: 0.95  cmd: zfs 1006 [running] 184.08r 0.00u 4.66s 99% 2872k
>
> load: 0.95  cmd: zfs 1006 [running] 184.36r 0.00u 4.66s 99% 2872k
>
>
>
>
> A second time with zpool status::
> last pid:  1224;  load averages:  0.98,  0.55,  0.24                                     up 0+02:07:39  23:12:33
> 26 processes:  2 running, 24 sleeping
> CPU:  0.0% user,  0.0% nice, 50.2% system,  0.4% interrupt, 49.4% idle
> Mem: 18M Active, 13M Inact, 46M Wired, 64K Cache, 12M Buf, 1911M Free
> Swap: 4055M Total, 4055M Free
>
>   PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND                                      
>  1200 root        1  62    0 22704K  2920K CPU1    1   0:00 99.02% zpool
>   793 root        1  40    0 16720K  3968K select  0   0:02  0.00% ntpd
>  1180 root        1  16    0 11880K  3536K pause   1   0:01  0.00% csh
>  1184 root        1  40    0 41776K  6376K select  0   0:01  0.00% sshd
>  1201 root        1  40    0 41776K  6376K select  0   0:01  0.00% sshd
>
> falcon# truss -p 1200
> truss: can not attach to target process: Device busy
> falcon# truss -p 1200
> truss: can not attach to target process: Device busy
> falcon#
>
>
> ctrl-t on the zpool status command:
> load: 0.62  cmd: zpool 1200 [running] 54.30r 0.00u 0.07s 83% 2888k
> load: 0.99  cmd: zpool 1200 [running] 271.73r 0.00u 0.07s 99% 2888k
> load: 0.99  cmd: zpool 1200 [running] 272.37r 0.00u 0.07s 99% 2888k
> load: 0.99  cmd: zpool 1200 [running] 272.75r 0.00u 0.07s 99% 2888k
> load: 0.99  cmd: zpool 1200 [running] 273.38r 0.00u 0.07s 99% 2888k
>
>
>
>
>
> truss -f zpool status::
>
>  1014: sigprocmask(SIG_SETMASK,0x0,0x0)          = 0 (0x0)
>  1014: sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
>  1014: sigprocmask(SIG_SETMASK,0x0,0x0)          = 0 (0x0)
>  1014: sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
>  1014: sigprocmask(SIG_SETMASK,0x0,0x0)          = 0 (0x0)
>  1014: modfind(0x40d3f140,0x9a0,0xc78,0x10a,0x1027e8,0x7fdffffe8d0) = 303 (0x12f)
>  1014: open("/dev/zfs",O_RDWR,06170)             = 3 (0x3)
>  1014: open("/dev/zero",O_RDONLY,0666)           = 4 (0x4)
>  1014: open("/etc/zfs/exports",O_RDONLY,0666)    ERR#2 'No such file or directory'
>  1014: __sysctl(0x7fdffff8de8,0x2,0x7fdffff8eb0,0x7fdffff8f18,0x40d3f118,0x13) = 0 (0x0)
>  1014: __sysctl(0x7fdffff8eb0,0x4,0x40e4d084,0x7fdffff8fe0,0x0,0x0) = 0 (0x0)
> [hang]
> ctrl-t
>
> load: 0.31  cmd: zpool 1014 [running] 12.47r 0.00u 0.07s 44% 2912k
>
>
>  1014 root        1  54    0 22704K  2944K CPU0    0   0:00 98.47% zpool
>
>
> falcon# truss -p 1014
> truss: can not attach to target process: Device busy
>
> iostat -x 1 shows no reads and no writes to any disks
>
>
> There's a 2-disk zfs mirror attached to this ultra60 from a freebsd-8 install, but I don't know
> why that would cause a problem with the latest zfs v28.
>

Me neither :) You'll probably get better help from the ZFS maintainers
than on this list.

Marius

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: sparc64 hang with zfs v28

Marius Strobl
On Mon, Mar 07, 2011 at 09:06:26AM +0100, Marius Strobl wrote:

> On Sun, Mar 06, 2011 at 11:27:42PM -0500, Roger Hammerstein wrote:
> >
> >
> > > FYI, kernel modules generally should work again with r219340, I haven't
> > > tested ZFS though.
> >
> >
> > Thanks!
> > I cvsuppedd and rebuilt kernel.
> >
> >
> >
> >
> > falcon# uname -a
> > FreeBSD falcon 9.0-CURRENT FreeBSD 9.0-CURRENT #3: Sun Mar  6 18:55:14 EST 2011     root@falcon:/usr/obj/usr/src/sys/GENERIC  sparc64
> > falcon#
> >
> > I did a kldload zfs and it loaded ok.
> >
> > falcon# kldstat
> > Id Refs Address            Size     Name
> >  1    9 0xc0000000 e42878   kernel
> >  2    1 0xc14a2000 32e000   zfs.ko
> >  3    1 0xc17d0000 104000   opensolaris.ko
> > falcon#
> >
> >
> > But a 'zpool status' or 'zfs list' will cause a zfs or zpool process
> > to eat 99% of a cpu and essentially hang the shell i ran zfs/zpool in.
> >
> >
> >
> > falcon# zfs list
> >
> > ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;
> >
> >             to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.
> >
> > ZFS filesystem version 5
> >
> > ZFS storage pool version 28
> >
> > [Hang here]
> >
> >
> >
> >
> >
> > last pid:  1012;  load averages:  0.79,  0.30,  0.16                          up 0+00:13:58  20:58:43
> >
> > 23 processes:  2 running, 21 sleeping
> >
> > CPU:  0.0% user,  0.0% nice, 52.5% system,  0.0% interrupt, 47.5% idle
> >
> > Mem: 16M Active, 11M Inact, 46M Wired, 64K Cache, 12M Buf, 1915M Free
> >
> > Swap: 4055M Total, 4055M Free
> >
> >
> >
> >   PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND                          
> >
> >  1006 root        1  53    0 21672K  2904K CPU1    1   0:05 99.47% zfs
> >
> >   998 root        1  40    0 41776K  6376K select  0   0:01  0.00% sshd
> >
> >   994 root        1  16    0 11880K  3536K pause   0   0:01  0.00% csh
> >
> >   795 root        1  40    0 16720K  3968K select  0   0:00  0.00% ntpd
> >
> >  1001 root        1  16    0 11880K  3464K pause   0   0:00  0.00% csh
> >
> >   975 root        1   8    0 25168K  2672K wait    1   0:00  0.00% login
> >
> >
> >
> >
> >
> >
> >
> > stays at 99%.
> >
> > truss -p 1006 doesn't "attach", it just hangs.
> >
> >
> >
> > ctrl-t on the zfs list shell:
> >
> > oad: 0.95  cmd: zfs 1006 [running] 182.26r 0.00u 4.66s 99% 2872k
> >
> > load: 0.95  cmd: zfs 1006 [running] 183.30r 0.00u 4.66s 99% 2872k
> >
> > load: 0.95  cmd: zfs 1006 [running] 183.76r 0.00u 4.66s 99% 2872k
> >
> > load: 0.95  cmd: zfs 1006 [running] 184.08r 0.00u 4.66s 99% 2872k
> >
> > load: 0.95  cmd: zfs 1006 [running] 184.36r 0.00u 4.66s 99% 2872k
> >
> >
> >
> >
> > A second time with zpool status::
> > last pid:  1224;  load averages:  0.98,  0.55,  0.24                                     up 0+02:07:39  23:12:33
> > 26 processes:  2 running, 24 sleeping
> > CPU:  0.0% user,  0.0% nice, 50.2% system,  0.4% interrupt, 49.4% idle
> > Mem: 18M Active, 13M Inact, 46M Wired, 64K Cache, 12M Buf, 1911M Free
> > Swap: 4055M Total, 4055M Free
> >
> >   PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND                                      
> >  1200 root        1  62    0 22704K  2920K CPU1    1   0:00 99.02% zpool
> >   793 root        1  40    0 16720K  3968K select  0   0:02  0.00% ntpd
> >  1180 root        1  16    0 11880K  3536K pause   1   0:01  0.00% csh
> >  1184 root        1  40    0 41776K  6376K select  0   0:01  0.00% sshd
> >  1201 root        1  40    0 41776K  6376K select  0   0:01  0.00% sshd
> >
> > falcon# truss -p 1200
> > truss: can not attach to target process: Device busy
> > falcon# truss -p 1200
> > truss: can not attach to target process: Device busy
> > falcon#
> >
> >
> > ctrl-t on the zpool status command:
> > load: 0.62  cmd: zpool 1200 [running] 54.30r 0.00u 0.07s 83% 2888k
> > load: 0.99  cmd: zpool 1200 [running] 271.73r 0.00u 0.07s 99% 2888k
> > load: 0.99  cmd: zpool 1200 [running] 272.37r 0.00u 0.07s 99% 2888k
> > load: 0.99  cmd: zpool 1200 [running] 272.75r 0.00u 0.07s 99% 2888k
> > load: 0.99  cmd: zpool 1200 [running] 273.38r 0.00u 0.07s 99% 2888k
> >
> >
> >
> >
> >
> > truss -f zpool status::
> >
> >  1014: sigprocmask(SIG_SETMASK,0x0,0x0)          = 0 (0x0)
> >  1014: sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
> >  1014: sigprocmask(SIG_SETMASK,0x0,0x0)          = 0 (0x0)
> >  1014: sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
> >  1014: sigprocmask(SIG_SETMASK,0x0,0x0)          = 0 (0x0)
> >  1014: modfind(0x40d3f140,0x9a0,0xc78,0x10a,0x1027e8,0x7fdffffe8d0) = 303 (0x12f)
> >  1014: open("/dev/zfs",O_RDWR,06170)             = 3 (0x3)
> >  1014: open("/dev/zero",O_RDONLY,0666)           = 4 (0x4)
> >  1014: open("/etc/zfs/exports",O_RDONLY,0666)    ERR#2 'No such file or directory'
> >  1014: __sysctl(0x7fdffff8de8,0x2,0x7fdffff8eb0,0x7fdffff8f18,0x40d3f118,0x13) = 0 (0x0)
> >  1014: __sysctl(0x7fdffff8eb0,0x4,0x40e4d084,0x7fdffff8fe0,0x0,0x0) = 0 (0x0)
> > [hang]
> > ctrl-t
> >
> > load: 0.31  cmd: zpool 1014 [running] 12.47r 0.00u 0.07s 44% 2912k
> >
> >
> >  1014 root        1  54    0 22704K  2944K CPU0    0   0:00 98.47% zpool
> >
> >
> > falcon# truss -p 1014
> > truss: can not attach to target process: Device busy
> >
> > iostat -x 1 shows no reads and no writes to any disks
> >
> >
> > There's a 2-disk zfs mirror attached to this ultra60 from a freebsd-8 install, but I don't know
> > why that would cause a problem with the latest zfs v28.
> >
>
> Me neither :) You'll probably get better help from the ZFS maintainers
> than on this list.
>

Thinking about it this might be caused by the binutils regression
also affecting userland. If a world built with the following patch
in place still behaves the same you should better contact the ZFS
maintainers though:
http://people.freebsd.org/~marius/elfxx-sparc.c.diff

Marius

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

RE: sparc64 hang with zfs v28

Roger Hammerstein


> Thinking about it this might be caused by the binutils regression
> also affecting userland. If a world built with the following patch
> in place still behaves the same you should better contact the ZFS
> maintainers though:
> http://people.freebsd.org/~marius/elfxx-sparc.c.diff

I kept the same cvsup from Sunday, added that patch and rebuilt world and kernel.

Now 'kldload zfs' or 'zpool status' locks the machine up, no serial console,
no network.  It doesn't respond to breaks on the serial console.
 (I didn't put the alternate break sequence in).


With the kernel.old from before your patch, 'kldload zfs' will work.

falcon# kldload zfs

ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;
            to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.
ZFS filesystem version 5
ZFS storage pool version 28
falcon#
falcon#
falcon#
falcon#
falcon# kldstat
Id Refs Address            Size     Name
 1    9 0xc0000000 e42878   kernel
 2    1 0xc14a2000 32e000   zfs.ko
 3    1 0xc17d0000 104000   opensolaris.ko
falcon#

'zpool status' will still eat an entire cpu.

 1020 root        1  49    0 22712K  2888K CPU1    1   0:00 99.92% zpool

falcon# procstat -kk 1020
  PID    TID COMM             TDNAME           KSTACK
 1020 100063 zpool            initial thread   <running>
falcon#


falcon# zpool status
load: 0.76  cmd: zpool 1020 [running] 82.09r 0.00u 0.04s 97% 2856k
load: 0.76  cmd: zpool 1020 [running] 82.33r 0.00u 0.04s 97% 2856k
load: 0.76  cmd: zpool 1020 [running] 82.55r 0.00u 0.04s 97% 2856k
load: 0.76  cmd: zpool 1020 [running] 82.75r 0.00u 0.04s 97% 2856k
load: 0.76  cmd: zpool 1020 [running] 82.93r 0.00u 0.04s 97% 2856k




=================
a truss of 'zpool status' without the zfs module loaded gets stuck after
this:  (zpool status will cause the zfs module to get loaded)

falcon# truss zpool status
mmap(0x0,32768,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 1076183040 (0x40254000)
issetugid(0x40256000,0x0,0x0,0x0,0x1000000000000000,0x2000000000000000) = 0 (0x0)
open("/etc/libmap.conf",O_RDONLY,0666)           ERR#2 'No such file or directory'
open("/var/run/ld-elf.so.hints",O_RDONLY,030036223340) = 3 (0x3)
read(3,"tnhE\0\0\0\^A\0\0\0\M^@\0\0\0-\0"...,128) = 128 (0x80)
mmap(0x0,40960,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 1076215808 (0x4025c000)
lseek(3,0x80,SEEK_SET)                           = 128 (0x80)
read(3,"/lib:/usr/lib:/usr/lib/compat:/u"...,45) = 45 (0x2d)
close(3)                                         = 0 (0x0)
access("/lib/libavl.so.2",0)                     = 0 (0x0)
open("/lib/libavl.so.2",O_RDONLY,0666)           = 3 (0x3)
fstat(3,{ mode=-r--r--r-- ,inode=9467939,size=8416,blksize=16384 }) = 0 (0x0)
pread(0x3,0x40358a78,0x2000,0x0,0x0,0x40367220)  = 8192 (0x2000)
mmap(0x0,1056768,PROT_NONE,MAP_PRIVATE|MAP_ANON|MAP_NOCORE,-1,0x0) = 1077313536 (0x40368000)
mmap(0x40368000,8192,PROT_READ|PROT_EXEC,MAP_PRIVATE|MAP_FIXED|MAP_NOCORE,3,0x0) = 1077313536 (0x40368000)
mmap(0x40468000,8192,PROT_READ|PROT_WRITE|PROT_EXEC,MAP_PRIVATE|MAP_FIXED,3,0x0) = 1078362112 (0x40468000)
close(3)          
<SNIP more libs>

access("/lib/libthr.so.3",0)                     = 0 (0x0)
open("/lib/libthr.so.3",O_RDONLY,020)            = 3 (0x3)
fstat(3,{ mode=-r--r--r-- ,inode=9467937,size=116936,blksize=16384 }) = 0 (0x0)
pread(0x3,0x40358a78,0x2000,0x0,0x0,0x40367188)  = 8192 (0x2000)
mmap(0x0,1204224,PROT_NONE,MAP_PRIVATE|MAP_ANON|MAP_NOCORE,-1,0x0) = 1092362240 (0x411c2000)
mmap(0x411c2000,106496,PROT_READ|PROT_EXEC,MAP_PRIVATE|MAP_FIXED|MAP_NOCORE,3,0x0) = 1092362240 (0x411c2000)
mmap(0x412da000,16384,PROT_READ|PROT_WRITE|PROT_EXEC,MAP_PRIVATE|MAP_FIXED,3,0x18000) = 1093509120 (0x412da000)
mprotect(0x412de000,40960,PROT_READ|PROT_WRITE|PROT_EXEC) = 0 (0x0)
close(3)                                         = 0 (0x0)
mmap(0x0,40960,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 1076256768 (0x40266000)
mmap(0x0,106496,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 1076297728 (0x40270000)
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGT
TOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
sysarch(0x1,0x41082fb8,0xc0792e40,0x40f75653,0x412dbde8,0x800005) = 0 (0x0)
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGT
TOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGT
TOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
getpid()                                         = 1002 (0x3ea)
__sysctl(0x7fdffffe0b8,0x2,0x412e7f38,0x7fdffffe0c0,0x0,0x0) = 0 (0x0)
__sysctl(0x7fdffffdeb8,0x2,0x7fdffffdf80,0x7fdffffdfe8,0x411d96d8,0xd) = 0 (0x0)
__sysctl(0x7fdffffdf80,0x3,0x412e6e48,0x7fdffffe0c0,0x0,0x0) = 0 (0x0)
readlink("/etc/malloc.conf",0x7fdffffda06,1024)  ERR#2 'No such file or directory'
issetugid(0xffffffffffffffff,0xffffffffffffffff,0x400,0x40e965f6,0x41092968,0x400000) = 0 (0x0)
break(0x2176d8)                                  = 0 (0x0)
break(0x2176d8)                                  = 0 (0x0)
break(0x400000)                                  = 0 (0x0)
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 1093566464 (0x412e8000)
mmap(0x416e8000,1146880,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 1097760768 (0x416e8000)
munmap(0x412e8000,1146880)                       = 0 (0x0)
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
thr_self(0x41404400,0x412e6e48,0x2000,0x412dd4a8,0x412e7f44,0x412dd450) = 0 (0x0)
mmap(0x7fdffbfe000,8192,PROT_NONE,MAP_ANON,-1,0x0) = 8787498885120 (0x7fdffbfe000)
thr_set_name(0x186dd,0x411d9768,0x0,0x1000,0xffffffffffffffff,0x0) = 0 (0x0)
rtprio_thread(0x0,0x186dd,0x7fdffffdfec,0x22,0xffffffffffffffff,0x0) = 0 (0x0)
sysarch(0x2,0x410853b0,0x600,0x80,0x0,0x0)       = 0 (0x0)
sigaction(32,{ 0x411ce740 SA_SIGINFO ss_t },0x0) = 0 (0x0)
sigprocmask(SIG_UNBLOCK,0x0,0x0)                 = 0 (0x0)
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|



And it stops right there after SIGINT.

     _______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: sparc64 hang with zfs v28

Marius Strobl
On Wed, Mar 09, 2011 at 10:03:24AM -0500, Roger Hammerstein wrote:

>
>
> > Thinking about it this might be caused by the binutils regression
> > also affecting userland. If a world built with the following patch
> > in place still behaves the same you should better contact the ZFS
> > maintainers though:
> > http://people.freebsd.org/~marius/elfxx-sparc.c.diff
>
> I kept the same cvsup from Sunday, added that patch and rebuilt world and kernel.
>
> Now 'kldload zfs' or 'zpool status' locks the machine up, no serial console,
> no network.  It doesn't respond to breaks on the serial console.
>  (I didn't put the alternate break sequence in).
>

Sorry, when porting that fix from another version of binutils to what
is in head I introduced a bug. I've corrected the patch at the above
URL accodingly. With that version applied to a r219086 checkout (i.e.
right before the ZFS v28 import) now both pre-loading modules and
loading them after boot as well as ZFS works again:
v215# kldload geom_mirror.ko
GEOM_MIRROR: Device mirror/gm0 launched (1/1).
v215# kldload zfs.ko
ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;
            to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.

ZFS filesystem version 4
ZFS storage pool version 15
v215#
v215# mdconfig -a -t malloc -s 100m
md0
v215# mdconfig -a -t malloc -s 100m
md1
v215# zpool create tank mirror /dev/md0 /dev/md1
v215# zpool list
NAME   SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
tank  95.5M  85.5K  95.4M     0%  ONLINE  -
v215#
v215# zpool status
  pool: tank
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            md0     ONLINE       0     0     0
            md1     ONLINE       0     0     0

errors: No known data errors

Dimitry, are you ok with that version being committed to head?

As for the remaining ZFS hangs you're seeing I can reproduce these
with a r219092 checkout (i.e. right after the ZFS v28 import plus
fixes) and the above patch applied, so this problem appears to be
orthogonal:
v215# kldload zfs.ko
ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;
            to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.
ZFS filesystem version 5
ZFS storage pool version 28
v215# mdconfig -a -t malloc -s 100m
md0
v215# mdconfig -a -t malloc -s 100m
md1
v215# zpool create tank mirror /dev/md0 /dev/md1
^Z
<hangs here>

The end of the corresponding ktrace dump is:
<...>
v215# kdump | tail
  1683 initial thread RET   madvise 0
  1683 initial thread CALL  close(0x5)
  1683 initial thread RET   close 0
  1683 initial thread CALL  __sysctl(0x7fdffff8778,0x2,0x7fdffff8840,0x7fdffff88a8,0x40d41168,0x13)
  1683 initial thread SCTL  "sysctl.name2oid"
  1683 initial thread RET   __sysctl 0
  1683 initial thread CALL  __sysctl(0x7fdffff8840,0x4,0x40e4f158,0x7fdffff8978,0,0)
  1683 initial thread SCTL  "vfs.zfs.version.spa"
  1683 initial thread RET   __sysctl 0
  1683 initial thread CALL  ioctl(0x3,0xd5985a00 ,0x7fdffff8a60)

Please work with the ZFS maintainers (CC'ed) to get this fixed.

Marius

>
> With the kernel.old from before your patch, 'kldload zfs' will work.
>
> falcon# kldload zfs
>
> ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;
>             to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.
> ZFS filesystem version 5
> ZFS storage pool version 28
> falcon#
> falcon#
> falcon#
> falcon#
> falcon# kldstat
> Id Refs Address            Size     Name
>  1    9 0xc0000000 e42878   kernel
>  2    1 0xc14a2000 32e000   zfs.ko
>  3    1 0xc17d0000 104000   opensolaris.ko
> falcon#
>
> 'zpool status' will still eat an entire cpu.
>
>  1020 root        1  49    0 22712K  2888K CPU1    1   0:00 99.92% zpool
>
> falcon# procstat -kk 1020
>   PID    TID COMM             TDNAME           KSTACK
>  1020 100063 zpool            initial thread   <running>
> falcon#
>
>
> falcon# zpool status
> load: 0.76  cmd: zpool 1020 [running] 82.09r 0.00u 0.04s 97% 2856k
> load: 0.76  cmd: zpool 1020 [running] 82.33r 0.00u 0.04s 97% 2856k
> load: 0.76  cmd: zpool 1020 [running] 82.55r 0.00u 0.04s 97% 2856k
> load: 0.76  cmd: zpool 1020 [running] 82.75r 0.00u 0.04s 97% 2856k
> load: 0.76  cmd: zpool 1020 [running] 82.93r 0.00u 0.04s 97% 2856k
>
>
>
>
> =================
> a truss of 'zpool status' without the zfs module loaded gets stuck after
> this:  (zpool status will cause the zfs module to get loaded)
>
> falcon# truss zpool status
> mmap(0x0,32768,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 1076183040 (0x40254000)
> issetugid(0x40256000,0x0,0x0,0x0,0x1000000000000000,0x2000000000000000) = 0 (0x0)
> open("/etc/libmap.conf",O_RDONLY,0666)           ERR#2 'No such file or directory'
> open("/var/run/ld-elf.so.hints",O_RDONLY,030036223340) = 3 (0x3)
> read(3,"tnhE\0\0\0\^A\0\0\0\M^@\0\0\0-\0"...,128) = 128 (0x80)
> mmap(0x0,40960,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 1076215808 (0x4025c000)
> lseek(3,0x80,SEEK_SET)                           = 128 (0x80)
> read(3,"/lib:/usr/lib:/usr/lib/compat:/u"...,45) = 45 (0x2d)
> close(3)                                         = 0 (0x0)
> access("/lib/libavl.so.2",0)                     = 0 (0x0)
> open("/lib/libavl.so.2",O_RDONLY,0666)           = 3 (0x3)
> fstat(3,{ mode=-r--r--r-- ,inode=9467939,size=8416,blksize=16384 }) = 0 (0x0)
> pread(0x3,0x40358a78,0x2000,0x0,0x0,0x40367220)  = 8192 (0x2000)
> mmap(0x0,1056768,PROT_NONE,MAP_PRIVATE|MAP_ANON|MAP_NOCORE,-1,0x0) = 1077313536 (0x40368000)
> mmap(0x40368000,8192,PROT_READ|PROT_EXEC,MAP_PRIVATE|MAP_FIXED|MAP_NOCORE,3,0x0) = 1077313536 (0x40368000)
> mmap(0x40468000,8192,PROT_READ|PROT_WRITE|PROT_EXEC,MAP_PRIVATE|MAP_FIXED,3,0x0) = 1078362112 (0x40468000)
> close(3)          
> <SNIP more libs>
>
> access("/lib/libthr.so.3",0)                     = 0 (0x0)
> open("/lib/libthr.so.3",O_RDONLY,020)            = 3 (0x3)
> fstat(3,{ mode=-r--r--r-- ,inode=9467937,size=116936,blksize=16384 }) = 0 (0x0)
> pread(0x3,0x40358a78,0x2000,0x0,0x0,0x40367188)  = 8192 (0x2000)
> mmap(0x0,1204224,PROT_NONE,MAP_PRIVATE|MAP_ANON|MAP_NOCORE,-1,0x0) = 1092362240 (0x411c2000)
> mmap(0x411c2000,106496,PROT_READ|PROT_EXEC,MAP_PRIVATE|MAP_FIXED|MAP_NOCORE,3,0x0) = 1092362240 (0x411c2000)
> mmap(0x412da000,16384,PROT_READ|PROT_WRITE|PROT_EXEC,MAP_PRIVATE|MAP_FIXED,3,0x18000) = 1093509120 (0x412da000)
> mprotect(0x412de000,40960,PROT_READ|PROT_WRITE|PROT_EXEC) = 0 (0x0)
> close(3)                                         = 0 (0x0)
> mmap(0x0,40960,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 1076256768 (0x40266000)
> mmap(0x0,106496,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 1076297728 (0x40270000)
> sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGT
> TOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
> sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
> sysarch(0x1,0x41082fb8,0xc0792e40,0x40f75653,0x412dbde8,0x800005) = 0 (0x0)
> sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGT
> TOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
> sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
> sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGT
> TOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
> sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
> sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
> sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
> getpid()                                         = 1002 (0x3ea)
> __sysctl(0x7fdffffe0b8,0x2,0x412e7f38,0x7fdffffe0c0,0x0,0x0) = 0 (0x0)
> __sysctl(0x7fdffffdeb8,0x2,0x7fdffffdf80,0x7fdffffdfe8,0x411d96d8,0xd) = 0 (0x0)
> __sysctl(0x7fdffffdf80,0x3,0x412e6e48,0x7fdffffe0c0,0x0,0x0) = 0 (0x0)
> readlink("/etc/malloc.conf",0x7fdffffda06,1024)  ERR#2 'No such file or directory'
> issetugid(0xffffffffffffffff,0xffffffffffffffff,0x400,0x40e965f6,0x41092968,0x400000) = 0 (0x0)
> break(0x2176d8)                                  = 0 (0x0)
> break(0x2176d8)                                  = 0 (0x0)
> break(0x400000)                                  = 0 (0x0)
> mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 1093566464 (0x412e8000)
> mmap(0x416e8000,1146880,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 1097760768 (0x416e8000)
> munmap(0x412e8000,1146880)                       = 0 (0x0)
> sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
> sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
> thr_self(0x41404400,0x412e6e48,0x2000,0x412dd4a8,0x412e7f44,0x412dd450) = 0 (0x0)
> mmap(0x7fdffbfe000,8192,PROT_NONE,MAP_ANON,-1,0x0) = 8787498885120 (0x7fdffbfe000)
> thr_set_name(0x186dd,0x411d9768,0x0,0x1000,0xffffffffffffffff,0x0) = 0 (0x0)
> rtprio_thread(0x0,0x186dd,0x7fdffffdfec,0x22,0xffffffffffffffff,0x0) = 0 (0x0)
> sysarch(0x2,0x410853b0,0x600,0x80,0x0,0x0)       = 0 (0x0)
> sigaction(32,{ 0x411ce740 SA_SIGINFO ss_t },0x0) = 0 (0x0)
> sigprocmask(SIG_UNBLOCK,0x0,0x0)                 = 0 (0x0)
> sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
> sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
> sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
> sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
> sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
> sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
> sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
> sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
> sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
> sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
> sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
> sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
> sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
> sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
> sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
> sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
> sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|
>
>
>
> And it stops right there after SIGINT.
>
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: sparc64 hang with zfs v28

Dimitry Andric-4
On 2011-03-10 19:54, Marius Strobl wrote:
...
> Sorry, when porting that fix from another version of binutils to what
> is in head I introduced a bug. I've corrected the patch at the above
> URL accodingly. With that version applied to a r219086 checkout (i.e.
> right before the ZFS v28 import) now both pre-loading modules and
> loading them after boot as well as ZFS works again:
...
> Dimitry, are you ok with that version being committed to head?

Yes, this patch is fine to commit.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: sparc64 hang with zfs v28

Marius Strobl
On Thu, Mar 10, 2011 at 08:20:45PM +0100, Dimitry Andric wrote:

> On 2011-03-10 19:54, Marius Strobl wrote:
> ...
> >Sorry, when porting that fix from another version of binutils to what
> >is in head I introduced a bug. I've corrected the patch at the above
> >URL accodingly. With that version applied to a r219086 checkout (i.e.
> >right before the ZFS v28 import) now both pre-loading modules and
> >loading them after boot as well as ZFS works again:
> ...
> >Dimitry, are you ok with that version being committed to head?
>
> Yes, this patch is fine to commit.

Thanks, commited in r219530. Will you handle upstream?

Marius

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: sparc64 hang with zfs v28

Marius Strobl
In reply to this post by Marius Strobl
On Thu, Mar 10, 2011 at 07:54:23PM +0100, Marius Strobl wrote:

> On Wed, Mar 09, 2011 at 10:03:24AM -0500, Roger Hammerstein wrote:
> >
>
> As for the remaining ZFS hangs you're seeing I can reproduce these
> with a r219092 checkout (i.e. right after the ZFS v28 import plus
> fixes) and the above patch applied, so this problem appears to be
> orthogonal:
> v215# kldload zfs.ko
> ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;
>             to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.
> ZFS filesystem version 5
> ZFS storage pool version 28
> v215# mdconfig -a -t malloc -s 100m
> md0
> v215# mdconfig -a -t malloc -s 100m
> md1
> v215# zpool create tank mirror /dev/md0 /dev/md1
> ^Z
> <hangs here>
>
> The end of the corresponding ktrace dump is:
> <...>
> v215# kdump | tail
>   1683 initial thread RET   madvise 0
>   1683 initial thread CALL  close(0x5)
>   1683 initial thread RET   close 0
>   1683 initial thread CALL  __sysctl(0x7fdffff8778,0x2,0x7fdffff8840,0x7fdffff88a8,0x40d41168,0x13)
>   1683 initial thread SCTL  "sysctl.name2oid"
>   1683 initial thread RET   __sysctl 0
>   1683 initial thread CALL  __sysctl(0x7fdffff8840,0x4,0x40e4f158,0x7fdffff8978,0,0)
>   1683 initial thread SCTL  "vfs.zfs.version.spa"
>   1683 initial thread RET   __sysctl 0
>   1683 initial thread CALL  ioctl(0x3,0xd5985a00 ,0x7fdffff8a60)
>
> Please work with the ZFS maintainers (CC'ed) to get this fixed.
>

Is there any progress regarding this or at least some tips what
debug information to provide?

Marius

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: sparc64 hang with zfs v28

Michael Moll
Hi,

On Sat, Mar 19, 2011 at 04:28:38PM +0100, Marius Strobl wrote:

> Is there any progress regarding this or at least some tips what
> debug information to provide?

Roger created kern/155615 for this, but I do not see anything else
besides from that. Personally this bites my quite hard as almost all my
sparc64 machines are using ZFS. :(

Kind Regards
--
Michael Moll
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: sparc64 hang with zfs v28

Pawel Jakub Dawidek
On Mon, Mar 21, 2011 at 06:56:32PM +0100, Michael Moll wrote:

> Hi,
>
> On Sat, Mar 19, 2011 at 04:28:38PM +0100, Marius Strobl wrote:
>
> > Is there any progress regarding this or at least some tips what
> > debug information to provide?
>
> Roger created kern/155615 for this, but I do not see anything else
> besides from that. Personally this bites my quite hard as almost all my
> sparc64 machines are using ZFS. :(
Hi.

Sorry for the delay in responding...

Are you able to send me the output of 'alltrace' command from DDB?

--
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://yomoli.com

attachment0 (203 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: sparc64 hang with zfs v28

Marius Strobl
On Mon, Mar 21, 2011 at 06:59:33PM +0100, Pawel Jakub Dawidek wrote:

> On Mon, Mar 21, 2011 at 06:56:32PM +0100, Michael Moll wrote:
> > Hi,
> >
> > On Sat, Mar 19, 2011 at 04:28:38PM +0100, Marius Strobl wrote:
> >
> > > Is there any progress regarding this or at least some tips what
> > > debug information to provide?
> >
> > Roger created kern/155615 for this, but I do not see anything else
> > besides from that. Personally this bites my quite hard as almost all my
> > sparc64 machines are using ZFS. :(
>
> Hi.
>
> Sorry for the delay in responding...
>
> Are you able to send me the output of 'alltrace' command from DDB?
>

available here:
http://people.freebsd.org/~marius/zfs_alltrace.txt

Marius

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: sparc64 hang with zfs v28

Pawel Jakub Dawidek
On Tue, Mar 22, 2011 at 05:07:31PM +0100, Marius Strobl wrote:

> On Mon, Mar 21, 2011 at 06:59:33PM +0100, Pawel Jakub Dawidek wrote:
> > On Mon, Mar 21, 2011 at 06:56:32PM +0100, Michael Moll wrote:
> > > Hi,
> > >
> > > On Sat, Mar 19, 2011 at 04:28:38PM +0100, Marius Strobl wrote:
> > >
> > > > Is there any progress regarding this or at least some tips what
> > > > debug information to provide?
> > >
> > > Roger created kern/155615 for this, but I do not see anything else
> > > besides from that. Personally this bites my quite hard as almost all my
> > > sparc64 machines are using ZFS. :(
> >
> > Hi.
> >
> > Sorry for the delay in responding...
> >
> > Are you able to send me the output of 'alltrace' command from DDB?
> >
>
> available here:
> http://people.freebsd.org/~marius/zfs_alltrace.txt
Are you able to convert zfs_ioc_pool_create+0x3c into line number?

I use the following script for i386/amd64:

#!/bin/sh

if [ $# -ne 2 ]; then
        echo "usage: `basename $0` kernel function+offset" >/dev/stderr
        exit 1
fi

kern=$1
func=`echo $2 | awk -F+ '{print $1}'`
off=`echo $2 | awk -F+ '{print $2}'`

objdump -d ${kern} | \
        egrep '^[0-9a-f]{8,16} <'${func}'>' | \
        awk '{printf("0x%s\n", $1)}' | \
        xargs -J ADDR printf "%u + %u\n" ADDR $off | \
        bc | \
        xargs printf "0x%x\n" | \
        xargs addr2line -e ${kern}


And:

        fa2line.sh /boot/kernel/zfs.ko.symbols zfs_ioc_pool_create+0x3c

--
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://yomoli.com

attachment0 (203 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

RE: sparc64 hang with zfs v28

Roger Hammerstein



> > available here:
> > http://people.freebsd.org/~marius/zfs_alltrace.txt
>
> Are you able to convert zfs_ioc_pool_create+0x3c into line number?
>
> I use the following script for i386/amd64:
SNIP


Thank you all for your assistance.  

Mine looks like this:

Tracing command zpool pid 990 tid 100073 td 0xfffff80001746000
--More--^M        ^Muart_intr_rxready() at uart_intr_rxready+0xbc
scc_bfe_intr() at scc_bfe_intr+0xbc
intr_event_handle() at intr_event_handle+0x64
intr_execute_handlers() at intr_execute_handlers+0x8
intr_fast() at intr_fast+0x68
-- interrupt level=0xc pil=0 %o7=0xc02af034 --
witness_unlock() at witness_unlock+0x3e4
_mtx_unlock_flags() at _mtx_unlock_flags+0x11c
_vm_map_unlock_read() at _vm_map_unlock_read+0x1c
vm_map_lookup() at vm_map_lookup+0x78
vm_fault_hold() at vm_fault_hold+0x94
vm_fault() at vm_fault+0x14
trap_pfault() at trap_pfault+0x338
trap() at trap+0x3a8
-- fast data access mmu miss tar=0x41446000 %o7=0xc1233134 --
bcopy() at bcopy+0x9c
zfs_ioc_pool_configs() at zfs_ioc_pool_configs+0x24
zfsdev_ioctl() at zfsdev_ioctl+0xe0
devfs_ioctl_f() at devfs_ioctl_f+0xe8
kern_ioctl() at kern_ioctl+0x294
--More--^M        ^Mioctl() at ioctl+0x190
syscallenter() at syscallenter+0x270
syscall() at syscall+0x74
-- syscall (54, FreeBSD ELF64, ioctl) %o7=0x40d15e24 --
userland() at 0x40f75668
user trace: trap %o7=0x40d15e24
pc 0x40f75668, sp 0x7fdffff8651
pc 0x40d3bfb0, sp 0x7fdffff8731
pc 0x40d3c364, sp 0x7fdffff9db1
pc 0x10e588, sp 0x7fdffff9e81
pc 0x10e5d4, sp 0x7fdffff9f41
pc 0x1064e0, sp 0x7fdffffa011
pc 0x107268, sp 0x7fdffffa101
pc 0x103450, sp 0x7fdffffe1d1
pc 0x4021aff4, sp 0x7fdffffe291
done


  990 root        1  89   20 22720K  2976K CPU1    1   0:00 100.00% zpool

So mine looks like you would want zfs_ioc_pool_configs+0x24

(using objdump -d doesn't seem to work)
falcon# objdump -d /boot/kernel/zfs.ko.symbols
/boot/kernel/zfs.ko.symbols:     file format elf64-sparc-freebsd
falcon#




but using  objdump -D gives me:

falcon# ./fa2line.sh /boot/kernel/zfs.ko.symbols zfs_ioc_pool_configs+0x24
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:1311
falcon#


which is:
1310:
1311:   error = put_nvlist(zc, configs);
1312:
1313:   nvlist_free(configs);
1314:
1315:   return (error);
1316:}
1317:


     _______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: sparc64 hang with zfs v28

Marius Strobl
On Tue, Mar 22, 2011 at 01:51:20PM -0400, Roger Hammerstein wrote:

>
> which is:
> 1310:
> 1311:   error = put_nvlist(zc, configs);
> 1312:
> 1313:   nvlist_free(configs);
> 1314:
> 1315:   return (error);
> 1316:}
> 1317:
>
Uhm, looks like r219089 changed some xcopy{in,out}() into
ddi_copy{in,out}(), i.e. copy{in,out}() into bcopy(), which
is just wrong for copying in data in from/out to userspace.
However, looking at the other uses of ddi_copy{in,out}() it
generally seems that ddi_copy{in,out}() should be defined to
copy{in,out}(). With the attached patch at least my simple
test cases works again. The one remaining xcopyout() in
zfs_ioctl.c then could be also replaced with a ddi_copyout().
Not sure how any of this manages to work on x86 :)

Marius


_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64
To unsubscribe, send any mail to "[hidden email]"

sunddi.h.diff (895 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: sparc64 hang with zfs v28

Pawel Jakub Dawidek
On Tue, Mar 22, 2011 at 08:11:17PM +0100, Marius Strobl wrote:

> On Tue, Mar 22, 2011 at 01:51:20PM -0400, Roger Hammerstein wrote:
> >
> > which is:
> > 1310:
> > 1311:   error = put_nvlist(zc, configs);
> > 1312:
> > 1313:   nvlist_free(configs);
> > 1314:
> > 1315:   return (error);
> > 1316:}
> > 1317:
> >
>
> Uhm, looks like r219089 changed some xcopy{in,out}() into
> ddi_copy{in,out}(), i.e. copy{in,out}() into bcopy(), which
> is just wrong for copying in data in from/out to userspace.
> However, looking at the other uses of ddi_copy{in,out}() it
> generally seems that ddi_copy{in,out}() should be defined to
> copy{in,out}(). With the attached patch at least my simple
> test cases works again. The one remaining xcopyout() in
> zfs_ioctl.c then could be also replaced with a ddi_copyout().
> Not sure how any of this manages to work on x86 :)
Yeah, I found this as well and waiting for my test machine to be free to
test it. Thanks.

--
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://yomoli.com

attachment0 (203 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

RE: sparc64 hang with zfs v28

Roger Hammerstein


> > Uhm, looks like r219089 changed some xcopy{in,out}() into
> > ddi_copy{in,out}(), i.e. copy{in,out}() into bcopy(), which
> > is just wrong for copying in data in from/out to userspace.
> > However, looking at the other uses of ddi_copy{in,out}() it
> > generally seems that ddi_copy{in,out}() should be defined to
> > copy{in,out}(). With the attached patch at least my simple
> > test cases works again. The one remaining xcopyout() in
> > zfs_ioctl.c then could be also replaced with a ddi_copyout().
> > Not sure how any of this manages to work on x86 :)
>
> Yeah, I found this as well and waiting for my test machine to be free to
> test it. Thanks.


This patch worked on my ultra 60.  I rebuilt the kernel

falcon# kldstat
Id Refs Address            Size     Name
 1    9 0xc0000000 b1f8c0   kernel
 2    1 0xc10a2000 32e000   zfs.ko
 3    1 0xc13d0000 104000   opensolaris.ko
falcon#
falcon# zpool  status
  pool: tank
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            da3     ONLINE       0     0     0
            da6     ONLINE       0     0     0

errors: No known data errors
falcon#
falcon#

I zfs mounted tank and can ls and copy files to it, delete files.
Looks ok.



     _______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: sparc64 hang with zfs v28

Michael Moll
In reply to this post by Marius Strobl
Hi All,

On Tue, Mar 22, 2011 at 08:11:17PM +0100, Marius Strobl wrote:

> Uhm, looks like r219089 changed some xcopy{in,out}() into
> ddi_copy{in,out}(), i.e. copy{in,out}() into bcopy(), which
> is just wrong for copying in data in from/out to userspace.
> However, looking at the other uses of ddi_copy{in,out}() it
> generally seems that ddi_copy{in,out}() should be defined to
> copy{in,out}(). With the attached patch at least my simple
> test cases works again.

That looks good, I will test more tomorrow but when netbooting I can
import a zpool now. The only thing is that when upgrading the kernel and
using the old world it still hangs:
http://space.kvedulv.de/zfs_v28/at.txt

zfs_ioctl_compat_post() is probably the problematic function.

Regards
--
Michael Moll
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64
To unsubscribe, send any mail to "[hidden email]"
12
Loading...