Quantcast

Re: mpd5/Netgraph issues after upgrading to 7.4

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mpd5/Netgraph issues after upgrading to 7.4

Przemyslaw Frasunek
Dear All,

unfortunately, one of my mpd5 PPPoE access servers started panicing every few
hours.

I'm running recent 8.3-STABLE (as of 23th May) with WITNESS, INVARIANTS and
DEBUG_MEMGUARD compiled. Unfortunately, I'm unable to catch crashdump. For some
reason, it is not saved on dumpdev.

The only thing I have is panic string:

Fatal trap 9: general protection fault while in kernel mode
cpuid = 2; apic id = 02
instruction pointer      = 0x20:0xffffffff804b4e2d
stack pointer            = 0x28:0xffffff8185386560
frame pointer            = 0x28:0xffffff81853865d0
code segment             = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process          = 2832 (mpd5)
trap number              = 9

According to "objdump -d", the fault address points to prelist_remove().

I tried to replace all of hardware, but it still panics in the same way. I would
be really grateful for any hints.

dmesg output: http://www.frasunek.com/tmp/dmesg.txt
kernel config: http://www.frasunek.com/tmp/kernel.txt
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mpd5/Netgraph issues after upgrading to 7.4

Eugene Grosbein-7
15.06.2012 18:33, Przemyslaw Frasunek пишет:
> Dear All,
>
> unfortunately, one of my mpd5 PPPoE access servers started panicing every few
> hours.
>
> I'm running recent 8.3-STABLE (as of 23th May) with WITNESS, INVARIANTS and
> DEBUG_MEMGUARD compiled. Unfortunately, I'm unable to catch crashdump. For some
> reason, it is not saved on dumpdev.

The reason is that 8-STABLE fails to stop scheduler on panic
that breaks writing of crashdumps.

8.3-STABLE has new sysctl kern.stop_scheduler_on_panic.
You should set it to 1 to get crashdumps saved.

One more: does your box has PS/2 keyboard or USB? It matters too.
For systems having USB keyboard there is another patch needed to obtain
crashdumps (by Andriy Gapon):

http://www.kuzbass.ru/freebsd/patches/stop_scheduler_on_panic.usb.diff

Eugene Grosbein

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mpd5/Netgraph issues after upgrading to 7.4

Przemyslaw Frasunek
> One more: does your box has PS/2 keyboard or USB? It matters too.
> For systems having USB keyboard there is another patch needed to obtain
> crashdumps (by Andriy Gapon):

Thanks a lot. I have KVM connected using USB. I'll apply this patch.

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mpd5/Netgraph issues after upgrading to 7.4

Eugene Grosbein-7
In reply to this post by Eugene Grosbein-7
15.06.2012 18:50, Eugene Grosbein пишет:

>> unfortunately, one of my mpd5 PPPoE access servers started panicing every few
>> hours.
>>
>> I'm running recent 8.3-STABLE (as of 23th May) with WITNESS, INVARIANTS and
>> DEBUG_MEMGUARD compiled. Unfortunately, I'm unable to catch crashdump. For some
>> reason, it is not saved on dumpdev.
>
> The reason is that 8-STABLE fails to stop scheduler on panic
> that breaks writing of crashdumps.
>
> 8.3-STABLE has new sysctl kern.stop_scheduler_on_panic.
> You should set it to 1 to get crashdumps saved.
>
> One more: does your box has PS/2 keyboard or USB? It matters too.
> For systems having USB keyboard there is another patch needed to obtain
> crashdumps (by Andriy Gapon):
>
> http://www.kuzbass.ru/freebsd/patches/stop_scheduler_on_panic.usb.diff

Sorry, right URL is: http://www.grosbein.net/freebsd/patches/stop_scheduler_on_panic.usb.diff

Eugene Grosbein
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mpd5/Netgraph issues after upgrading to 7.4

Gleb Smirnoff
In reply to this post by Przemyslaw Frasunek
On Fri, Jun 15, 2012 at 01:33:05PM +0200, Przemyslaw Frasunek wrote:
P> unfortunately, one of my mpd5 PPPoE access servers started panicing every few
P> hours.
P>
P> I'm running recent 8.3-STABLE (as of 23th May) with WITNESS, INVARIANTS and
P> DEBUG_MEMGUARD compiled. Unfortunately, I'm unable to catch crashdump. For some
P> reason, it is not saved on dumpdev.
P>
P> The only thing I have is panic string:
P>
P> Fatal trap 9: general protection fault while in kernel mode
P> cpuid = 2; apic id = 02
P> instruction pointer      = 0x20:0xffffffff804b4e2d
P> stack pointer            = 0x28:0xffffff8185386560
P> frame pointer            = 0x28:0xffffff81853865d0
P> code segment             = base 0x0, limit 0xfffff, type 0x1b
P> = DPL 0, pres 1, long 1, def32 0, gran 1
P> processor eflags = interrupt enabled, resume, IOPL = 0
P> current process          = 2832 (mpd5)
P> trap number              = 9
P>
P> According to "objdump -d", the fault address points to prelist_remove().
P>
P> I tried to replace all of hardware, but it still panics in the same way. I would
P> be really grateful for any hints.
P>
P> dmesg output: http://www.frasunek.com/tmp/dmesg.txt
P> kernel config: http://www.frasunek.com/tmp/kernel.txt

I suspect this isn't related to netgraph, but to IPv6 since prelist_remove()
is found in netinet6/nd6_rtr.c.

Several times I looked into ND code and found lots of race prone code there.
May be some was recently fixed by bz@, but definitely not merged to stable/8.

--
Totus tuus, Glebius.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mpd5/Netgraph issues after upgrading to 7.4

mdtancsa
On 6/15/2012 4:31 PM, Gleb Smirnoff wrote:
> On Fri, Jun 15, 2012 at 01:33:05PM +0200, Przemyslaw Frasunek wrote:
> P> unfortunately, one of my mpd5 PPPoE access servers started panicing every few
> P> hours.
> P>
> P> I'm running recent 8.3-STABLE (as of 23th May) with WITNESS, INVARIANTS and

> I suspect this isn't related to netgraph, but to IPv6 since prelist_remove()
> is found in netinet6/nd6_rtr.c.
>
> Several times I looked into ND code and found lots of race prone code there.
> May be some was recently fixed by bz@, but definitely not merged to stable/8.

There were a bunch of commits / fixes by BZ on the 5th of June.  Perhaps
try updating to RELENG_8 as of today. If you are not using IPv6, perhaps
disable for a day to see if that makes a difference stability wise ?  It
did for me back in Nov when running with v6 on an LNS was not stable.

http://lists.freebsd.org/pipermail/svn-src-stable-8/2012-June/007555.html

        ---Mike



>


--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, [hidden email]
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mpd5/Netgraph issues after upgrading to 7.4

Przemyslaw Frasunek
In reply to this post by Gleb Smirnoff
> I suspect this isn't related to netgraph, but to IPv6 since prelist_remove()
> is found in netinet6/nd6_rtr.c.
>
> Several times I looked into ND code and found lots of race prone code there.
> May be some was recently fixed by bz@, but definitely not merged to stable/8.

Thanks a lot guys. For now, I disabled IPv6 on this BRAS. Let's see if it's
going to help.

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mpd5/Netgraph issues after upgrading to 7.4

Arnaud Lacombe-6
In reply to this post by Eugene Grosbein-7
Hi,

On Fri, Jun 15, 2012 at 7:50 AM, Eugene Grosbein <[hidden email]> wrote:

> 15.06.2012 18:33, Przemyslaw Frasunek пишет:
>> Dear All,
>>
>> unfortunately, one of my mpd5 PPPoE access servers started panicing every few
>> hours.
>>
>> I'm running recent 8.3-STABLE (as of 23th May) with WITNESS, INVARIANTS and
>> DEBUG_MEMGUARD compiled. Unfortunately, I'm unable to catch crashdump. For some
>> reason, it is not saved on dumpdev.
>
> The reason is that 8-STABLE fails to stop scheduler on panic
> that breaks writing of crashdumps.
>
> 8.3-STABLE has new sysctl kern.stop_scheduler_on_panic.
> You should set it to 1 to get crashdumps saved.
>
Is there technical reason to have the scheduler still running after panic() ?

 - Arnaud

> One more: does your box has PS/2 keyboard or USB? It matters too.
> For systems having USB keyboard there is another patch needed to obtain
> crashdumps (by Andriy Gapon):
>
> http://www.kuzbass.ru/freebsd/patches/stop_scheduler_on_panic.usb.diff
>
> Eugene Grosbein
>
> _______________________________________________
> [hidden email] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "[hidden email]"
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mpd5/Netgraph issues after upgrading to 7.4

Bjoern A. Zeeb
In reply to this post by Przemyslaw Frasunek

On 15. Jun 2012, at 21:57 , Przemyslaw Frasunek wrote:

>> I suspect this isn't related to netgraph, but to IPv6 since prelist_remove()
>> is found in netinet6/nd6_rtr.c.
>>
>> Several times I looked into ND code and found lots of race prone code there.
>> May be some was recently fixed by bz@, but definitely not merged to stable/8.
>
> Thanks a lot guys. For now, I disabled IPv6 on this BRAS. Let's see if it's
> going to help.

It will, as there are no fixes in the tree yet for this issue.

/bz

--
Bjoern A. Zeeb                                 You have to have visions!
   It does not matter how good you are. It matters what good you do!

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mpd5/Netgraph issues after upgrading to 7.4

Eugene Grosbein-7
In reply to this post by Arnaud Lacombe-6
17.06.2012 01:13, Arnaud Lacombe пишет:

> Hi,
>
> On Fri, Jun 15, 2012 at 7:50 AM, Eugene Grosbein <[hidden email]> wrote:
>> 15.06.2012 18:33, Przemyslaw Frasunek пишет:
>>> Dear All,
>>>
>>> unfortunately, one of my mpd5 PPPoE access servers started panicing every few
>>> hours.
>>>
>>> I'm running recent 8.3-STABLE (as of 23th May) with WITNESS, INVARIANTS and
>>> DEBUG_MEMGUARD compiled. Unfortunately, I'm unable to catch crashdump. For some
>>> reason, it is not saved on dumpdev.
>>
>> The reason is that 8-STABLE fails to stop scheduler on panic
>> that breaks writing of crashdumps.
>>
>> 8.3-STABLE has new sysctl kern.stop_scheduler_on_panic.
>> You should set it to 1 to get crashdumps saved.
>>
> Is there technical reason to have the scheduler still running after panic() ?

It seems no. That's just "a misfeature".

Eugene Grosbein
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mpd5/Netgraph issues after upgrading to 7.4

mdtancsa
In reply to this post by Przemyslaw Frasunek
On 6/15/2012 5:57 PM, Przemyslaw Frasunek wrote:
>> I suspect this isn't related to netgraph, but to IPv6 since prelist_remove()
>> is found in netinet6/nd6_rtr.c.
>>
>> Several times I looked into ND code and found lots of race prone code there.
>> May be some was recently fixed by bz@, but definitely not merged to stable/8.
>
> Thanks a lot guys. For now, I disabled IPv6 on this BRAS. Let's see if it's
> going to help.

Hi,
        Any changes in stability ?

        ---Mike

--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, [hidden email]
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mpd5/Netgraph issues after upgrading to 7.4

Przemyslaw Frasunek
>> Thanks a lot guys. For now, I disabled IPv6 on this BRAS. Let's see if it's
>> going to help.
> Hi,
> Any changes in stability ?

Hi,

It's way better now. I had no crash since IPv6 is disabled.

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mpd5/Netgraph issues after upgrading to 7.4

Adrian Chadd-2
Hi,

Is it possible to get you to setup a test BRAS running 9-STABLE, so
you can provide feedback about how stable ipv4/ipv6 PPPoE is for you?

It's great that you've solved it for 7.x, and I know that bz and
others know about a variety of fun issues in the networking stack that
may be related to this, but the only way this will get fixed and
validated is if you can help us test it out on something more recent.

Are you able to do any kind of load balancing, or do you just have one BRAS?

Thanks,



Adrian
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mpd5/Netgraph issues after upgrading to 7.4

mdtancsa
On 6/18/2012 6:51 PM, Adrian Chadd wrote:
> Hi,
>
> Is it possible to get you to setup a test BRAS running 9-STABLE, so
> you can provide feedback about how stable ipv4/ipv6 PPPoE is for you?

I have another LNS to deploy soon and I can enable IPv6 and use RELENG9.
I have in the past been able to trigger the panic after a few days of
use with IPv6 enabled.  Should have it up and running in a week or so.

        ---Mike
--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, [hidden email]
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mpd5/Netgraph issues after upgrading to 7.4

Bjoern A. Zeeb
In reply to this post by Adrian Chadd-2

On 18. Jun 2012, at 22:51 , Adrian Chadd wrote:

> Hi,
>
> Is it possible to get you to setup a test BRAS running 9-STABLE, so
> you can provide feedback about how stable ipv4/ipv6 PPPoE is for you?
>
> It's great that you've solved it for 7.x, and I know that bz and
> others know about a variety of fun issues in the networking stack that
> may be related to this, but the only way this will get fixed and
> validated is if you can help us test it out on something more recent.


And bz has already replied that the issue ahs not been fixed in 8/9/HEAD
yet.  Read your emails please.

BTW. I can reproduce it from a shell script fairly easily if you want to
work on the fix;  I also have crash information for you if needed; that's
not the point;  I just need 24 hours.

/bz

--
Bjoern A. Zeeb                                 You have to have visions!
   It does not matter how good you are. It matters what good you do!

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mpd5/Netgraph issues after upgrading to 7.4

Przemyslaw Frasunek
In reply to this post by Przemyslaw Frasunek
> It's way better now. I had no crash since IPv6 is disabled.

After reenabling IPv6, the crash occurred within 6 hours. This time, crashdump
was properly saved (thanks to patch suggested by Eugene).

As already stated by bz, panic is definitely related to races in IPv6 code:

(kgdb) bt
#0  doadump () at pcpu.h:224
#1  0xffffffff80376ae3 in boot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:448
#2  0xffffffff80377017 in panic (fmt=0x1 <Address 0x1 out of bounds>)
    at /usr/src/sys/kern/kern_shutdown.c:639
#3  0xffffffff8053c380 in trap_fatal (frame=0x9, eva=Variable "eva" is not
available.
)
    at /usr/src/sys/amd64/amd64/trap.c:848
#4  0xffffffff8053c8d1 in trap (frame=0xffffff818562c4a0)
    at /usr/src/sys/amd64/amd64/trap.c:600
#5  0xffffffff80523654 in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:228
#6  0xffffffff804b72bd in prelist_remove (pr=0xffffff0006d3d980)
    at /usr/src/sys/netinet6/nd6_rtr.c:966
#7  0xffffffff804b077a in nd6_purge (ifp=0xffffff004c927800)
    at /usr/src/sys/netinet6/nd6.c:802
#8  0xffffffff8049d0d4 in in6_ifdetach (ifp=0xffffff004c927800)
    at /usr/src/sys/netinet6/in6_ifattach.c:792
#9  0xffffffff80420888 in if_detach (ifp=0xffffff004c927800)
    at /usr/src/sys/net/if.c:919
#10 0xffffffff8044345e in ng_iface_shutdown (node=0xffffff004c086700)
    at /usr/src/sys/netgraph/ng_iface.c:803
#11 0xffffffff8043ec65 in ng_rmnode (node=0xffffff004c086700, dummy1=Variable
"dummy1" is not available.
)
    at /usr/src/sys/netgraph/ng_base.c:752
#12 0xffffffff8043f66d in ng_apply_item (node=0xffffff004c086700,
    item=0xffffff002407ae00, rw=1) at /usr/src/sys/netgraph/ng_base.c:2453
#13 0xffffffff804404be in ng_snd_item (item=Variable "item" is not available.
)
    at /usr/src/sys/netgraph/ng_base.c:2250
#14 0xffffffff8044df24 in ngc_send (so=Variable "so" is not available.
)
    at /usr/src/sys/netgraph/ng_socket.c:317
#15 0xffffffff803e1a97 in sosend_generic (so=0xffffff0004a71aa0,
    addr=0xffffff004c4d11e0, uio=0xffffff818562ca00, top=0xffffff00046ac700,
    control=0x0, flags=Variable "flags" is not available.
) at /usr/src/sys/kern/uipc_socket.c:1295
#16 0xffffffff803e6535 in kern_sendit (td=0xffffff000428e000, s=5,
    mp=0xffffff818562cad0, flags=0, control=0x0, segflg=Variable "segflg" is not
available.
)
    at /usr/src/sys/kern/uipc_syscalls.c:785
#17 0xffffffff803e66fc in sendit (td=0xffffff000428e000, s=5,
    mp=0xffffff818562cad0, flags=0) at /usr/src/sys/kern/uipc_syscalls.c:717
#18 0xffffffff803e67ed in sendto (td=Variable "td" is not available.
) at /usr/src/sys/kern/uipc_syscalls.c:837
#19 0xffffffff8053bb32 in amd64_syscall (td=0xffffff000428e000, traced=0)
    at subr_syscall.c:114
#20 0xffffffff8052394c in Xfast_syscall ()
    at /usr/src/sys/amd64/amd64/exception.S:387
(kgdb) frame 6
#6  0xffffffff804b72bd in prelist_remove (pr=0xffffff0006d3d980)
    at /usr/src/sys/netinet6/nd6_rtr.c:966
966             LIST_REMOVE(pr, ndpr_entry);
(kgdb) list
961                     return;         /* notice here? */
962
963             s = splnet();
964
965             /* unlink ndpr_entry from nd_prefix list */
966             LIST_REMOVE(pr, ndpr_entry);
967
968             /* free list of routers that adversed the prefix */
969             LIST_FOREACH_SAFE(pfr, &pr->ndpr_advrtrs, pfr_entry, next) {
970                     free(pfr, M_IP6NDP);
(kgdb) print *pr
$1 = {ndpr_ifp = 0xdeadc0dedeadc0de, ndpr_entry = {
    le_next = 0xdeadc0dedeadc0de, le_prev = 0xdeadc0dedeadc0de},
  ndpr_prefix = {sin6_len = 222 '�', sin6_family = 192 '�', sin6_port = 57005,
    sin6_flowinfo = 3735929054, sin6_addr = {__u6_addr = {
        __u6_addr8 = "������������", __u6_addr16 = {49374, 57005, 49374,
          57005, 49374, 57005, 49374, 57005}, __u6_addr32 = {3735929054,
          3735929054, 3735929054, 3735929054}}}, sin6_scope_id = 3735929054},
  ndpr_mask = {__u6_addr = {__u6_addr8 = "������������", __u6_addr16 = {
        49374, 57005, 49374, 57005, 49374, 57005, 49374, 57005},
      __u6_addr32 = {3735929054, 3735929054, 3735929054, 3735929054}}},
  ndpr_vltime = 3735929054, ndpr_pltime = 3735929054,
  ndpr_expire = -2401050962867404578, ndpr_preferred = -2401050962867404578,
  ndpr_lastupdate = -2401050962867404578, ndpr_flags = {onlink = 0 '\0',
    autonomous = 1 '\001', reserved = 55 '7'}, ndpr_stateflags = 3735929054,
  ndpr_advrtrs = {lh_first = 0xdeadc0dedeadc0de}, ndpr_plen = 224 '�',
  ndpr_refcnt = -1}

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mpd5/Netgraph issues after upgrading to 7.4

Przemyslaw Frasunek
> After reenabling IPv6, the crash occurred within 6 hours. This time, crashdump
> was properly saved (thanks to patch suggested by Eugene).

My PPPoE BRAS was stable for 17 days. This morning, it crashed in another way:

current process         = 2762 (mpd5)
trap number             = 9
panic: general protection fault
cpuid = 5
KDB: stack backtrace:
#0 0xffffffff803a04a6 at kdb_backtrace+0x66
#1 0xffffffff8036dfde at panic+0x1ce
#2 0xffffffff80503300 at trap_fatal+0x290
#3 0xffffffff80503851 at trap+0x111
#4 0xffffffff804ea5d4 at calltrap+0x8
#5 0xffffffff8041d314 at lltable_prefix_free+0x74
#6 0xffffffff8044c014 at in_ifscrub+0x2c4
#7 0xffffffff8044d3f3 at in_control+0x793
#8 0xffffffff80418d2d at ifioctl+0xccd
#9 0xffffffff803b6842 at kern_ioctl+0x92
#10 0xffffffff803b6aa0 at ioctl+0xf0
#11 0xffffffff80502ab2 at amd64_syscall+0x302
#12 0xffffffff804ea8cc at Xfast_syscall+0xfc
Uptime: 17d7h18m38s

(kgdb) bt
#0  doadump () at pcpu.h:224
#1  0xffffffff8036da83 in boot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:448
#2  0xffffffff8036dfb7 in panic (fmt=0x1 <Address 0x1 out of bounds>)
    at /usr/src/sys/kern/kern_shutdown.c:639
#3  0xffffffff80503300 in trap_fatal (frame=0x9, eva=Variable "eva" is not
available.
)
    at /usr/src/sys/amd64/amd64/trap.c:848
#4  0xffffffff80503851 in trap (frame=0xffffff818569a700)
    at /usr/src/sys/amd64/amd64/trap.c:600
#5  0xffffffff804ea5d4 in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:228
#6  0xffffffff8044e44a in in_lltable_prefix_free (llt=0xffffff00044af600,
    prefix=0xffffff818569a8a0, mask=0xffffff818569a890, flags=2)
    at /usr/src/sys/netinet/in.c:1402
#7  0xffffffff8041d314 in lltable_prefix_free (af=2,
    prefix=0xffffff818569a8a0, mask=0xffffff818569a890, flags=2)
    at /usr/src/sys/net/if_llatbl.c:242
#8  0xffffffff8044c014 in in_ifscrub (ifp=Variable "ifp" is not available.
) at /usr/src/sys/netinet/in.c:1223
#9  0xffffffff8044d3f3 in in_control (so=Variable "so" is not available.
) at /usr/src/sys/netinet/in.c:588
#10 0xffffffff80418d2d in ifioctl (so=0xffffff001de4f2a8, cmd=2149607705,
    data=0xffffff001ddcdb00 "ng29", td=0xffffff000427f000)
    at /usr/src/sys/net/if.c:2606
#11 0xffffffff803b6842 in kern_ioctl (td=0xffffff000427f000, fd=Variable "fd" is
not available.
) at file.h:275
#12 0xffffffff803b6aa0 in ioctl (td=0xffffff000427f000, uap=0xffffff818569abb0)
    at /usr/src/sys/kern/sys_generic.c:679
#13 0xffffffff80502ab2 in amd64_syscall (td=0xffffff000427f000, traced=0)
    at subr_syscall.c:114
#14 0xffffffff804ea8cc in Xfast_syscall ()
    at /usr/src/sys/amd64/amd64/exception.S:387
#15 0x00000008018fbf8c in ?? ()

(kgdb) frame 6
#6  0xffffffff8044e44a in in_lltable_prefix_free (llt=0xffffff00044af600,
    prefix=0xffffff818569a8a0, mask=0xffffff818569a890, flags=2)
    at /usr/src/sys/netinet/in.c:1402
1402                            if (IN_ARE_MASKED_ADDR_EQUAL((struct sockaddr_in
*)L3_ADDR(lle),

(kgdb) list
1397
1398                            /*
1399                             * (flags & LLE_STATIC) means deleting all entries
1400                             * including static ARP entries
1401                             */
1402                            if (IN_ARE_MASKED_ADDR_EQUAL((struct sockaddr_in
*)L3_ADDR(lle),
1403                                                         pfx, msk) &&
1404                                ((flags & LLE_STATIC) || !(lle->la_flags &
LLE_STATIC))) {
1405                                    int canceled;
1406

(kgdb) print lle
$1 = (struct llentry *) 0xdeadc0dedeadc0de

(kgdb) print *llt
$2 = {llt_link = {sle_next = 0xffffff00040b4400}, lle_head = {{
      lh_first = 0xffffff00041d9900}, {lh_first = 0xffffff0004cc3e00}, {
      lh_first = 0x0}, {lh_first = 0xffffff001db9a500}, {
      lh_first = 0xffffff0004251300}, {lh_first = 0x0}, {
      lh_first = 0xffffff001d67f300}, {lh_first = 0xffffff00044a9300}, {
      lh_first = 0xffffff00041dab00}, {lh_first = 0x0}, {lh_first = 0x0}, {
      lh_first = 0xffffff0006950c00}, {lh_first = 0x0}, {lh_first = 0x0}, {
      lh_first = 0xffffff0004cac800}, {lh_first = 0x0}, {lh_first = 0x0}, {
      lh_first = 0xffffff00291da900}, {lh_first = 0x0}, {lh_first = 0x0}, {
      lh_first = 0x0}, {lh_first = 0x0}, {lh_first = 0xffffff00790f6900}, {
      lh_first = 0xffffff001dc68100}, {lh_first = 0xffffff0004185100}, {
      lh_first = 0xffffff0006952b00}, {lh_first = 0x0}, {
      lh_first = 0xffffff001de00700}, {lh_first = 0x0}, {lh_first = 0x0}, {
      lh_first = 0xffffff001ddff100}, {lh_first = 0xffffff0004caf500}},
  llt_af = 2, llt_ifp = 0xffffff00043ef800,
  llt_free = 0xffffffff8044b920 <in_lltable_free>,
  llt_prefix_free = 0xffffffff8044e400 <in_lltable_prefix_free>,
  llt_lookup = 0xffffffff8044b3a0 <in_lltable_lookup>,
  llt_dump = 0xffffffff8044b150 <in_lltable_dump>}

(kgdb) list -
1387                           u_int flags)
1388    {
1389            const struct sockaddr_in *pfx = (const struct sockaddr_in *)prefix;
1390            const struct sockaddr_in *msk = (const struct sockaddr_in *)mask;
1391            struct llentry *lle, *next;
1392            register int i;
1393            size_t pkts_dropped;
1394
1395            for (i=0; i < LLTBL_HASHTBL_SIZE; i++) {
1396                    LIST_FOREACH_SAFE(lle, &llt->lle_head[i], lle_next, next) {
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mpd5/Netgraph issues after upgrading to 7.4

Eugene Grosbein-7
On Sat, Jul 07, 2012 at 10:26:46AM +0200, Przemyslaw Frasunek wrote:

> > After reenabling IPv6, the crash occurred within 6 hours. This time, crashdump
> > was properly saved (thanks to patch suggested by Eugene).
>
> My PPPoE BRAS was stable for 17 days. This morning, it crashed in another way:
>
> current process         = 2762 (mpd5)
> trap number             = 9
> panic: general protection fault
> cpuid = 5
> KDB: stack backtrace:
> #0 0xffffffff803a04a6 at kdb_backtrace+0x66
> #1 0xffffffff8036dfde at panic+0x1ce
> #2 0xffffffff80503300 at trap_fatal+0x290
> #3 0xffffffff80503851 at trap+0x111
> #4 0xffffffff804ea5d4 at calltrap+0x8
> #5 0xffffffff8041d314 at lltable_prefix_free+0x74
> #6 0xffffffff8044c014 at in_ifscrub+0x2c4
> #7 0xffffffff8044d3f3 at in_control+0x793
> #8 0xffffffff80418d2d at ifioctl+0xccd
> #9 0xffffffff803b6842 at kern_ioctl+0x92
> #10 0xffffffff803b6aa0 at ioctl+0xf0
> #11 0xffffffff80502ab2 at amd64_syscall+0x302
> #12 0xffffffff804ea8cc at Xfast_syscall+0xfc
> Uptime: 17d7h18m38s

Did you set net.isr.direct=0 (and/or direct_force)?
If so, don't do that. Get back to default 1 for these two sysctls.

Eugene Grosbein
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mpd5/Netgraph issues after upgrading to 7.4

Przemyslaw Frasunek
> Did you set net.isr.direct=0 (and/or direct_force)?
> If so, don't do that. Get back to default 1 for these two sysctls.

Both sysctls are set to default values.

This is my /etc/sysctl.conf:

net.inet6.ip6.redirect=0
net.inet.icmp.drop_redirect=1
net.inet6.icmp6.rediraccept=0
hw.acpi.power_button_state=NONE
net.inet.ip.intr_queue_maxlen=800
hw.intr_storm_threshold=10000
kern.stop_scheduler_on_panic=1
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: mpd5/Netgraph issues after upgrading to 7.4

Gleb Smirnoff
In reply to this post by Przemyslaw Frasunek
On Sat, Jul 07, 2012 at 10:26:46AM +0200, Przemyslaw Frasunek wrote:
P> > After reenabling IPv6, the crash occurred within 6 hours. This time, crashdump
P> > was properly saved (thanks to patch suggested by Eugene).
P>
P> My PPPoE BRAS was stable for 17 days. This morning, it crashed in another way:
P>
P> current process         = 2762 (mpd5)
P> trap number             = 9
P> panic: general protection fault
P> cpuid = 5
P> KDB: stack backtrace:
P> #0 0xffffffff803a04a6 at kdb_backtrace+0x66
P> #1 0xffffffff8036dfde at panic+0x1ce
P> #2 0xffffffff80503300 at trap_fatal+0x290
P> #3 0xffffffff80503851 at trap+0x111
P> #4 0xffffffff804ea5d4 at calltrap+0x8
P> #5 0xffffffff8041d314 at lltable_prefix_free+0x74
P> #6 0xffffffff8044c014 at in_ifscrub+0x2c4
P> #7 0xffffffff8044d3f3 at in_control+0x793
P> #8 0xffffffff80418d2d at ifioctl+0xccd
P> #9 0xffffffff803b6842 at kern_ioctl+0x92
P> #10 0xffffffff803b6aa0 at ioctl+0xf0
P> #11 0xffffffff80502ab2 at amd64_syscall+0x302
P> #12 0xffffffff804ea8cc at Xfast_syscall+0xfc
P> Uptime: 17d7h18m38s

This looks very much related to a known race in ARP code.

See this email and related thread:

http://lists.freebsd.org/pipermail/freebsd-net/2012-March/031865.html

Ryan didn't check in any patches since, and I failed to follow on this
problem due to ENOTIME.

I've added Ryan to Cc. Ryan, what's the status of the problem at your
side? Did you come to any solution?

--
Totus tuus, Glebius.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
12
Loading...