Quantcast

9-stabe: cd device gone, ATA_CAM panics

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

9-stabe: cd device gone, ATA_CAM panics

Oliver Fromme
Hi,

I recently updated an amd64 machine from 8-stable to 9-stable,
csupped on June 1st:

$ uname -rsm
FreeBSD 9.0-STABLE-20120601 amd64

When I merged my old kernel configuration, at first I kept
"device atapicam" because this is still mentioned in NOTES.
Config and compiling worked, but linking failed with missing
symbols.  I don't remember which symbols, but it's easy to
reproduce if necessary.

Anyway, I commented atapicam out because it seems that now
"options ATA_CAM" does the same thing.  This time the kernel
linked, but during boot I got the following panic:

atapci0: <Promise PDC20269 UDMA133 controller> port 0xdc00-0xdc07,0xd880-0xd883,0xd800-0xd807,0xcc00-0xcc03,0xc880-0xc88f mem 0xfeaf8000-0xfeafbfff irq 21 at device 6.0 on pci3
ata2: <ATA channel> at channel 0 on atapci0
ata3: <ATA channel> at channel 1 on atapci0
[...]
ata2: reset tp1 mask=03 ostat0=50 ostat1=00
ata2: stat0=0x00 err=0x01 lsb=0x14 msb=0xeb
ata2: stat1=0x00 err=0x04 lsb=0x00 msb=0x00
ata2: reset tp2 stat0=00 stat1=00 devices=0x10000
(cd0:ata2:0:0:0): AutoSense failed
(cd0:ata2:0:0:0): Error 5, Unretryable error
(cd0:ata2:0:0:0): got CAM status 0x50
(cd0:ata2:0:0:0): fatal error, failed to attach to device
(cd0:ata2:0:0:0): lost device
ata2: reset tp1 mask=03 ostat0=50 ostat1=00
ata2: stat0=0x00 err=0x01 lsb=0x14 msb=0xeb
ata2: stat1=0x00 err=0x04 lsb=0x00 msb=0x00
ata2: reset tp2 stat0=00 stat1=00 devices=0x10000
(cd0:ata2:0:0:0): AutoSense failed
(cd0:ata2:0:0:0): Error 5, Unretryable error
(cd0:ata2:0:0:0): removing device entry
panic: cam_periph_release_locked_buses: release of 0xfffffe0007321700 when refcount is zero

cpuid = 4
KDB: stack backtrace:
#0 0xffffffff807a3c96 at kdb_backtrace+0x66
#1 0xffffffff8076d74e at panic+0x1ce
#2 0xffffffff802a200e at cam_periph_release_locked_buses+0x3e
#3 0xffffffff802a202e at cam_periph_release_locked+0x1e
#4 0xffffffff802a2f52 at cam_periph_release+0x52
#5 0xffffffff802babcd at cdclose+0xbd
#6 0xffffffff806d7332 at g_disk_access+0x242
#7 0xffffffff806db618 at g_access+0x188
#8 0xffffffff807110f8 at g_raid_md_taste_sii+0x188
#9 0xffffffff806e96d6 at g_raid_taste+0x126
#10 0xffffffff806db0cd at g_new_provider_event+0x6d
#11 0xffffffff806d8c08 at g_run_events+0x1e8
#12 0xffffffff8073f00e at fork_exit+0x11e
#13 0xffffffff809a2d4e at fork_trampoline+0xe
Uptime: 48s
Automatic reboot in 15 seconds - press any key on the console to abort

Then I commented ATA_CAM out, too.  This time there's no
panic, and everything works fine, *except* that there are
no cd devices whatsoever.

$ ls /dev | grep cd
$

There's no mention of any cd device in /var/run/dmesg.boot.
Also, various invocations of atacontrol(8) don't change
anything.  "atacontrol list" claims there are no devices
present.

This is a Promise (P)ATA controller (UDMA-133) with a
DVD-ROM/R/RW drive connected as master device to the first
channel (ata2), nothing else.

It worked fine with 8-stable.

Best regards
   Oliver

--
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"Python is an experiment in how much freedom programmers need.
Too much freedom and nobody can read another's code; too little
and expressiveness is endangered."
        -- Guido van Rossum
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: 9-stabe: cd device gone, ATA_CAM panics

Sergey Kandaurov
On 6 June 2012 23:29, Oliver Fromme <[hidden email]> wrote:
> Hi,
>

Hi, Oliver Fromme.

This is a wild guess, but see below.

> Anyway, I commented atapicam out because it seems that now
> "options ATA_CAM" does the same thing.  This time the kernel
> linked, but during boot I got the following panic:
>
> atapci0: <Promise PDC20269 UDMA133 controller> port 0xdc00-0xdc07,0xd880-0xd883,0xd800-0xd807,0xcc00-0xcc03,0xc880-0xc88f mem 0xfeaf8000-0xfeafbfff irq 21 at device 6.0 on pci3
> ata2: <ATA channel> at channel 0 on atapci0
> ata3: <ATA channel> at channel 1 on atapci0
> [...]
> ata2: reset tp1 mask=03 ostat0=50 ostat1=00
> ata2: stat0=0x00 err=0x01 lsb=0x14 msb=0xeb
> ata2: stat1=0x00 err=0x04 lsb=0x00 msb=0x00
> ata2: reset tp2 stat0=00 stat1=00 devices=0x10000

Looks like this is the first ata (re-)init pass.

There we walk the long way. xpt_register_async() receives AC_FOUND_DEVICE
and does allocatiion and initialization of cam periph with cd(4) functions.
cdregister() is one of them, it calls to cdstart() via periph_start()
callback with CD_STATE_PROBE, then to cddone() with CD_CCB_PROBE via xpt
action. There we get bad CCB state and eventually parse it as
CAM_AUTOSENSE_FAIL | CAM_DEV_QFRZN. This indicates that cam got an invalid
sense data. Somewhere on this way we seem to gain a reference count on a
peripheral at cdregister() (which is set to 1?), and drop it at cddone().
This looks odd, so I am likely wrong there.

> (cd0:ata2:0:0:0): AutoSense failed
> (cd0:ata2:0:0:0): Error 5, Unretryable error

Both messages are consequence from the generic error handler to indicate
CAM_AUTOSENSE_FAIL, which is not restartable, so EIO error is also set.

> (cd0:ata2:0:0:0): got CAM status 0x50

0x50 stands for CAM_AUTOSENSE_FAIL | CAM_DEV_QFRZN (queue is frozen)

> (cd0:ata2:0:0:0): fatal error, failed to attach to device

The two last messages originate from cddone() xpt completion function.
To the moment, periph is still valid, and periph refcount is also valid
(has a non-zero value).

> (cd0:ata2:0:0:0): lost device

This is from the callback function called from cam_periph_invalidate(),
called from cddone(). It is called to invalidate periph with its specific
callback function. Then periph is marked as invalid with CAM_PERIPH_INVALID
periph flag. periph refcount should still be valid. As such, it is not
freed. At least the last reference is held to be released at cddone().
I don't know that is the state of refcount after cddone() releases it.

> ata2: reset tp1 mask=03 ostat0=50 ostat1=00
> ata2: stat0=0x00 err=0x01 lsb=0x14 msb=0xeb
> ata2: stat1=0x00 err=0x04 lsb=0x00 msb=0x00
> ata2: reset tp2 stat0=00 stat1=00 devices=0x10000

Looks like this is the second ata (re-)init pass.

> (cd0:ata2:0:0:0): AutoSense failed
> (cd0:ata2:0:0:0): Error 5, Unretryable error

Same as above, but without refcount++ this time as cam periph became
invalidated previously; cam_periph_acquire() handles this and doesn't
allow to increment periph refcount. Messages are a bit shorter now due
to CAM_DEV_QFRZN. Any way we will get to cam_periph_invalidate() from
cddone(). When it comes there periph is already marked as invalid with
CAM_PERIPH_INVALID flag. periph refcount should also be zero due to
cam_periph_acquire() handling. As such, we go to camperiphfree() that
removes the periph from list and destroys it.

> (cd0:ata2:0:0:0): removing device entry

This is from cdcleanup() periph destructor callback called from camperiphfree().

> panic: cam_periph_release_locked_buses: release of 0xfffffe0007321700 when refcount is zero

So, at the second pass the periph refcount is only seems to be decremented
to zero, then periph is destroyed which results in geom tasting on a dead
provider.

>
> cpuid = 4
> KDB: stack backtrace:
> #0 0xffffffff807a3c96 at kdb_backtrace+0x66
> #1 0xffffffff8076d74e at panic+0x1ce
> #2 0xffffffff802a200e at cam_periph_release_locked_buses+0x3e
> #3 0xffffffff802a202e at cam_periph_release_locked+0x1e
> #4 0xffffffff802a2f52 at cam_periph_release+0x52
> #5 0xffffffff802babcd at cdclose+0xbd
> #6 0xffffffff806d7332 at g_disk_access+0x242
> #7 0xffffffff806db618 at g_access+0x188
> #8 0xffffffff807110f8 at g_raid_md_taste_sii+0x188
> #9 0xffffffff806e96d6 at g_raid_taste+0x126
> #10 0xffffffff806db0cd at g_new_provider_event+0x6d
> #11 0xffffffff806d8c08 at g_run_events+0x1e8
> #12 0xffffffff8073f00e at fork_exit+0x11e
> #13 0xffffffff809a2d4e at fork_trampoline+0xe

Probably the way to fix this is to modify
cam_periph_release_locked_buses() to test CAM_PERIPH_INVALID either.

how about this patch?
(beware: it was not compile tested, just speculating)

Index: sys/cam/cam_periph.c
===================================================================
--- sys/cam/cam_periph.c        (revision 236694)
+++ sys/cam/cam_periph.c        (working copy)
@@ -374,7 +374,7 @@
 {
        if (periph->refcount != 0) {
                periph->refcount--;
-       } else {
+       } else if ((periph->flags & CAM_PERIPH_INVALID) == 0) {
                panic("%s: release of %p when refcount is zero\n ", __func__,
                      periph);
        }

--
wbr,
pluknet
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: 9-stabe: cd device gone, ATA_CAM panics

Oliver Fromme
Sergey Kandaurov <[hidden email]> wrote:
 > This is a wild guess, but see below.
 > [...]
 > Probably the way to fix this is to modify
 > cam_periph_release_locked_buses() to test CAM_PERIPH_INVALID either.
 >
 > how about this patch?
 > (beware: it was not compile tested, just speculating)

Thanks for your detailed analysis!

I'm afraid the patch is not correct.  With that patch, the
following page fault occurs (same output as before until
"removing device entry"):

(cd0:ata2:0:0:0): removing device entry
(cd0:nobus:X:X): removing device entry


Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 04
fault virtual address   = 0x20
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff807a0d03
stack pointer           = 0x28:0xffffff80002a9270
frame pointer           = 0x28:0xffffff80002a9290
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (g_event)
trap number             = 12
panic: page fault
cpuid = 4
KDB: stack backtrace:
#0 0xffffffff807a3ca6 at kdb_backtrace+0x66
#1 0xffffffff8076d75e at panic+0x1ce
#2 0xffffffff809b7400 at trap_fatal+0x290
#3 0xffffffff809b773d at trap_pfault+0x1ed
#4 0xffffffff809b7d5e at trap+0x3ce
#5 0xffffffff809a282f at calltrap+0x8
#6 0xffffffff806d6a16 at disk_destroy+0x26
#7 0xffffffff802ba687 at cdcleanup+0x97
#8 0xffffffff802a1e79 at camperiphfree+0x99
#9 0xffffffff802a203e at cam_periph_release_locked+0x1e
#10 0xffffffff802a2f62 at cam_periph_release+0x52
#11 0xffffffff802babdd at cdclose+0xbd
#12 0xffffffff806d7342 at g_disk_access+0x242
#13 0xffffffff806db628 at g_access+0x188
#14 0xffffffff806fc353 at g_raid_md_taste_ddf+0x1f3
#15 0xffffffff806e96e6 at g_raid_taste+0x126
#16 0xffffffff806db0dd at g_new_provider_event+0x6d
#17 0xffffffff806d8c18 at g_run_events+0x1e8
Uptime: 48s
Automatic reboot in 15 seconds - press any key on the console to abort

By the way, there is a long delay (~20s) before the two
"ata2: reset tp1" lines.  I guess something hangs here and
causes a time-out.  I didn't have such delays with 8.x.

I need a working DVD drive, so I'm now considering to
downgrade to 8-stable.  But then again, TMPFS didn't work
a well for me as it does in 9-stable (which was the main
reason for me to upgrade), so I'm kind of stuck in a
difficult situation.

I'm willing to test more patches, of course.  :-)

Best regards
   Oliver

--
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

 > Can the denizens of this group enlighten me about what the
 > advantages of Python are, versus Perl ?
"python" is more likely to pass unharmed through your spelling
checker than "perl".
        -- An unknown poster and Fredrik Lundh
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: 9-stabe: cd device gone, ATA_CAM panics

Oliver Fromme
Oliver Fromme <[hidden email]> wrote:
 > [...]
 > I need a working DVD drive, so I'm now considering to
 > downgrade to 8-stable.  But then again, TMPFS didn't work
 > a well for me as it does in 9-stable (which was the main
 > reason for me to upgrade), so I'm kind of stuck in a
 > difficult situation.

Fortunately, 9-stable works with "device atapicam", as I
just found out.  I thought I had already tried that and
got errors during linking, but that was probably with
the ATA_CAM option enabled at the same time which causes
conflicts, obviously.

So, everything's back to normal with "device atapicam"
for now, and without ATA_CAM.

Best regards
   Oliver


--
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"In My Egoistical Opinion, most people's C programs should be indented
six feet downward and covered with dirt."
        -- Blair P. Houghton
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: 9-stabe: cd device gone, ATA_CAM panics

Steven Hartland
----- Original Message -----
From: "Oliver Fromme" <[hidden email]>

> > I need a working DVD drive, so I'm now considering to
> > downgrade to 8-stable.  But then again, TMPFS didn't work
> > a well for me as it does in 9-stable (which was the main
> > reason for me to upgrade), so I'm kind of stuck in a
> > difficult situation.
>
> Fortunately, 9-stable works with "device atapicam", as I
> just found out.  I thought I had already tried that and
> got errors during linking, but that was probably with
> the ATA_CAM option enabled at the same time which causes
> conflicts, obviously.
>
> So, everything's back to normal with "device atapicam"
> for now, and without ATA_CAM.

Hi Oliver had a similar experience here might want to try
the patch in the following PR:-
http://www.freebsd.org/cgi/query-pr.cgi?pr=169495

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to [hidden email].

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Loading...