|
Hi,
I recently updated an amd64 machine from 8-stable to 9-stable, csupped on June 1st: $ uname -rsm FreeBSD 9.0-STABLE-20120601 amd64 When I merged my old kernel configuration, at first I kept "device atapicam" because this is still mentioned in NOTES. Config and compiling worked, but linking failed with missing symbols. I don't remember which symbols, but it's easy to reproduce if necessary. Anyway, I commented atapicam out because it seems that now "options ATA_CAM" does the same thing. This time the kernel linked, but during boot I got the following panic: atapci0: <Promise PDC20269 UDMA133 controller> port 0xdc00-0xdc07,0xd880-0xd883,0xd800-0xd807,0xcc00-0xcc03,0xc880-0xc88f mem 0xfeaf8000-0xfeafbfff irq 21 at device 6.0 on pci3 ata2: <ATA channel> at channel 0 on atapci0 ata3: <ATA channel> at channel 1 on atapci0 [...] ata2: reset tp1 mask=03 ostat0=50 ostat1=00 ata2: stat0=0x00 err=0x01 lsb=0x14 msb=0xeb ata2: stat1=0x00 err=0x04 lsb=0x00 msb=0x00 ata2: reset tp2 stat0=00 stat1=00 devices=0x10000 (cd0:ata2:0:0:0): AutoSense failed (cd0:ata2:0:0:0): Error 5, Unretryable error (cd0:ata2:0:0:0): got CAM status 0x50 (cd0:ata2:0:0:0): fatal error, failed to attach to device (cd0:ata2:0:0:0): lost device ata2: reset tp1 mask=03 ostat0=50 ostat1=00 ata2: stat0=0x00 err=0x01 lsb=0x14 msb=0xeb ata2: stat1=0x00 err=0x04 lsb=0x00 msb=0x00 ata2: reset tp2 stat0=00 stat1=00 devices=0x10000 (cd0:ata2:0:0:0): AutoSense failed (cd0:ata2:0:0:0): Error 5, Unretryable error (cd0:ata2:0:0:0): removing device entry panic: cam_periph_release_locked_buses: release of 0xfffffe0007321700 when refcount is zero cpuid = 4 KDB: stack backtrace: #0 0xffffffff807a3c96 at kdb_backtrace+0x66 #1 0xffffffff8076d74e at panic+0x1ce #2 0xffffffff802a200e at cam_periph_release_locked_buses+0x3e #3 0xffffffff802a202e at cam_periph_release_locked+0x1e #4 0xffffffff802a2f52 at cam_periph_release+0x52 #5 0xffffffff802babcd at cdclose+0xbd #6 0xffffffff806d7332 at g_disk_access+0x242 #7 0xffffffff806db618 at g_access+0x188 #8 0xffffffff807110f8 at g_raid_md_taste_sii+0x188 #9 0xffffffff806e96d6 at g_raid_taste+0x126 #10 0xffffffff806db0cd at g_new_provider_event+0x6d #11 0xffffffff806d8c08 at g_run_events+0x1e8 #12 0xffffffff8073f00e at fork_exit+0x11e #13 0xffffffff809a2d4e at fork_trampoline+0xe Uptime: 48s Automatic reboot in 15 seconds - press any key on the console to abort Then I commented ATA_CAM out, too. This time there's no panic, and everything works fine, *except* that there are no cd devices whatsoever. $ ls /dev | grep cd $ There's no mention of any cd device in /var/run/dmesg.boot. Also, various invocations of atacontrol(8) don't change anything. "atacontrol list" claims there are no devices present. This is a Promise (P)ATA controller (UDMA-133) with a DVD-ROM/R/RW drive connected as master device to the first channel (ata2), nothing else. It worked fine with 8-stable. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "Python is an experiment in how much freedom programmers need. Too much freedom and nobody can read another's code; too little and expressiveness is endangered." -- Guido van Rossum _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[hidden email]" |
|
On 6 June 2012 23:29, Oliver Fromme <[hidden email]> wrote:
> Hi, > Hi, Oliver Fromme. This is a wild guess, but see below. > Anyway, I commented atapicam out because it seems that now > "options ATA_CAM" does the same thing. This time the kernel > linked, but during boot I got the following panic: > > atapci0: <Promise PDC20269 UDMA133 controller> port 0xdc00-0xdc07,0xd880-0xd883,0xd800-0xd807,0xcc00-0xcc03,0xc880-0xc88f mem 0xfeaf8000-0xfeafbfff irq 21 at device 6.0 on pci3 > ata2: <ATA channel> at channel 0 on atapci0 > ata3: <ATA channel> at channel 1 on atapci0 > [...] > ata2: reset tp1 mask=03 ostat0=50 ostat1=00 > ata2: stat0=0x00 err=0x01 lsb=0x14 msb=0xeb > ata2: stat1=0x00 err=0x04 lsb=0x00 msb=0x00 > ata2: reset tp2 stat0=00 stat1=00 devices=0x10000 Looks like this is the first ata (re-)init pass. There we walk the long way. xpt_register_async() receives AC_FOUND_DEVICE and does allocatiion and initialization of cam periph with cd(4) functions. cdregister() is one of them, it calls to cdstart() via periph_start() callback with CD_STATE_PROBE, then to cddone() with CD_CCB_PROBE via xpt action. There we get bad CCB state and eventually parse it as CAM_AUTOSENSE_FAIL | CAM_DEV_QFRZN. This indicates that cam got an invalid sense data. Somewhere on this way we seem to gain a reference count on a peripheral at cdregister() (which is set to 1?), and drop it at cddone(). This looks odd, so I am likely wrong there. > (cd0:ata2:0:0:0): AutoSense failed > (cd0:ata2:0:0:0): Error 5, Unretryable error Both messages are consequence from the generic error handler to indicate CAM_AUTOSENSE_FAIL, which is not restartable, so EIO error is also set. > (cd0:ata2:0:0:0): got CAM status 0x50 0x50 stands for CAM_AUTOSENSE_FAIL | CAM_DEV_QFRZN (queue is frozen) > (cd0:ata2:0:0:0): fatal error, failed to attach to device The two last messages originate from cddone() xpt completion function. To the moment, periph is still valid, and periph refcount is also valid (has a non-zero value). > (cd0:ata2:0:0:0): lost device This is from the callback function called from cam_periph_invalidate(), called from cddone(). It is called to invalidate periph with its specific callback function. Then periph is marked as invalid with CAM_PERIPH_INVALID periph flag. periph refcount should still be valid. As such, it is not freed. At least the last reference is held to be released at cddone(). I don't know that is the state of refcount after cddone() releases it. > ata2: reset tp1 mask=03 ostat0=50 ostat1=00 > ata2: stat0=0x00 err=0x01 lsb=0x14 msb=0xeb > ata2: stat1=0x00 err=0x04 lsb=0x00 msb=0x00 > ata2: reset tp2 stat0=00 stat1=00 devices=0x10000 Looks like this is the second ata (re-)init pass. > (cd0:ata2:0:0:0): AutoSense failed > (cd0:ata2:0:0:0): Error 5, Unretryable error Same as above, but without refcount++ this time as cam periph became invalidated previously; cam_periph_acquire() handles this and doesn't allow to increment periph refcount. Messages are a bit shorter now due to CAM_DEV_QFRZN. Any way we will get to cam_periph_invalidate() from cddone(). When it comes there periph is already marked as invalid with CAM_PERIPH_INVALID flag. periph refcount should also be zero due to cam_periph_acquire() handling. As such, we go to camperiphfree() that removes the periph from list and destroys it. > (cd0:ata2:0:0:0): removing device entry This is from cdcleanup() periph destructor callback called from camperiphfree(). > panic: cam_periph_release_locked_buses: release of 0xfffffe0007321700 when refcount is zero So, at the second pass the periph refcount is only seems to be decremented to zero, then periph is destroyed which results in geom tasting on a dead provider. > > cpuid = 4 > KDB: stack backtrace: > #0 0xffffffff807a3c96 at kdb_backtrace+0x66 > #1 0xffffffff8076d74e at panic+0x1ce > #2 0xffffffff802a200e at cam_periph_release_locked_buses+0x3e > #3 0xffffffff802a202e at cam_periph_release_locked+0x1e > #4 0xffffffff802a2f52 at cam_periph_release+0x52 > #5 0xffffffff802babcd at cdclose+0xbd > #6 0xffffffff806d7332 at g_disk_access+0x242 > #7 0xffffffff806db618 at g_access+0x188 > #8 0xffffffff807110f8 at g_raid_md_taste_sii+0x188 > #9 0xffffffff806e96d6 at g_raid_taste+0x126 > #10 0xffffffff806db0cd at g_new_provider_event+0x6d > #11 0xffffffff806d8c08 at g_run_events+0x1e8 > #12 0xffffffff8073f00e at fork_exit+0x11e > #13 0xffffffff809a2d4e at fork_trampoline+0xe Probably the way to fix this is to modify cam_periph_release_locked_buses() to test CAM_PERIPH_INVALID either. how about this patch? (beware: it was not compile tested, just speculating) Index: sys/cam/cam_periph.c =================================================================== --- sys/cam/cam_periph.c (revision 236694) +++ sys/cam/cam_periph.c (working copy) @@ -374,7 +374,7 @@ { if (periph->refcount != 0) { periph->refcount--; - } else { + } else if ((periph->flags & CAM_PERIPH_INVALID) == 0) { panic("%s: release of %p when refcount is zero\n ", __func__, periph); } -- wbr, pluknet _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[hidden email]" |
|
Sergey Kandaurov <[hidden email]> wrote:
> This is a wild guess, but see below. > [...] > Probably the way to fix this is to modify > cam_periph_release_locked_buses() to test CAM_PERIPH_INVALID either. > > how about this patch? > (beware: it was not compile tested, just speculating) Thanks for your detailed analysis! I'm afraid the patch is not correct. With that patch, the following page fault occurs (same output as before until "removing device entry"): (cd0:ata2:0:0:0): removing device entry (cd0:nobus:X:X): removing device entry Fatal trap 12: page fault while in kernel mode cpuid = 4; apic id = 04 fault virtual address = 0x20 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff807a0d03 stack pointer = 0x28:0xffffff80002a9270 frame pointer = 0x28:0xffffff80002a9290 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (g_event) trap number = 12 panic: page fault cpuid = 4 KDB: stack backtrace: #0 0xffffffff807a3ca6 at kdb_backtrace+0x66 #1 0xffffffff8076d75e at panic+0x1ce #2 0xffffffff809b7400 at trap_fatal+0x290 #3 0xffffffff809b773d at trap_pfault+0x1ed #4 0xffffffff809b7d5e at trap+0x3ce #5 0xffffffff809a282f at calltrap+0x8 #6 0xffffffff806d6a16 at disk_destroy+0x26 #7 0xffffffff802ba687 at cdcleanup+0x97 #8 0xffffffff802a1e79 at camperiphfree+0x99 #9 0xffffffff802a203e at cam_periph_release_locked+0x1e #10 0xffffffff802a2f62 at cam_periph_release+0x52 #11 0xffffffff802babdd at cdclose+0xbd #12 0xffffffff806d7342 at g_disk_access+0x242 #13 0xffffffff806db628 at g_access+0x188 #14 0xffffffff806fc353 at g_raid_md_taste_ddf+0x1f3 #15 0xffffffff806e96e6 at g_raid_taste+0x126 #16 0xffffffff806db0dd at g_new_provider_event+0x6d #17 0xffffffff806d8c18 at g_run_events+0x1e8 Uptime: 48s Automatic reboot in 15 seconds - press any key on the console to abort By the way, there is a long delay (~20s) before the two "ata2: reset tp1" lines. I guess something hangs here and causes a time-out. I didn't have such delays with 8.x. I need a working DVD drive, so I'm now considering to downgrade to 8-stable. But then again, TMPFS didn't work a well for me as it does in 9-stable (which was the main reason for me to upgrade), so I'm kind of stuck in a difficult situation. I'm willing to test more patches, of course. :-) Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd > Can the denizens of this group enlighten me about what the > advantages of Python are, versus Perl ? "python" is more likely to pass unharmed through your spelling checker than "perl". -- An unknown poster and Fredrik Lundh _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[hidden email]" |
|
Oliver Fromme <[hidden email]> wrote:
> [...] > I need a working DVD drive, so I'm now considering to > downgrade to 8-stable. But then again, TMPFS didn't work > a well for me as it does in 9-stable (which was the main > reason for me to upgrade), so I'm kind of stuck in a > difficult situation. Fortunately, 9-stable works with "device atapicam", as I just found out. I thought I had already tried that and got errors during linking, but that was probably with the ATA_CAM option enabled at the same time which causes conflicts, obviously. So, everything's back to normal with "device atapicam" for now, and without ATA_CAM. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "In My Egoistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[hidden email]" |
|
----- Original Message -----
From: "Oliver Fromme" <[hidden email]> > > I need a working DVD drive, so I'm now considering to > > downgrade to 8-stable. But then again, TMPFS didn't work > > a well for me as it does in 9-stable (which was the main > > reason for me to upgrade), so I'm kind of stuck in a > > difficult situation. > > Fortunately, 9-stable works with "device atapicam", as I > just found out. I thought I had already tried that and > got errors during linking, but that was probably with > the ATA_CAM option enabled at the same time which causes > conflicts, obviously. > > So, everything's back to normal with "device atapicam" > for now, and without ATA_CAM. Hi Oliver had a similar experience here might want to try the patch in the following PR:- http://www.freebsd.org/cgi/query-pr.cgi?pr=169495 ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to [hidden email]. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[hidden email]" |
| Powered by Nabble | Edit this page |
