|
Hello,
First of all apologies if this has been fixed in RC3. I set this server up with mfsbsd, which is RC1, and didn't get to update the system yet. This box has 6 hdds, a 2-mirror zpool was set up as the root pool, with 2 spares. While testing hot swapping I noticed that while the controller detects disk removal/insertion, the zpool will never recover. The problem seems to be deeper than ZFS, as disklabel/fdisk/etc also fail on the removed-and-reinserted disk. At the ZFS level, doing a zpool clear yields more errors on the removed disk; rebooting becomes the only option to make the pool healthy again. Is this normal? Did I miss any step? Regards, Hugo _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[hidden email]" |
|
On Dec 14, 2011, at 2:09 PM, Hugo Silva wrote: > Hello, > > First of all apologies if this has been fixed in RC3. I set this server > up with mfsbsd, which is RC1, and didn't get to update the system yet. > > This box has 6 hdds, a 2-mirror zpool was set up as the root pool, with > 2 spares. > > While testing hot swapping I noticed that while the controller detects > disk removal/insertion, the zpool will never recover. The problem seems > to be deeper than ZFS, as disklabel/fdisk/etc also fail on the > removed-and-reinserted disk. > > At the ZFS level, doing a zpool clear yields more errors on the removed > disk; rebooting becomes the only option to make the pool healthy again. > > > Is this normal? Did I miss any step? I assume that you have tried to use the H700 as a "JBOD" card, defining logical volume for each hard disk. The problem is: that gorgeous, fantastic, masterful, Nobel award candidate card, has a wonderful behavior in that case. If you extract one of the disks, the logical volume associated to it is invalidated. So, you insert a replacement disk, and the card refuses to recognize the volume. What is even worse, in order to recover it's mandatory to reboot the complete system *AND* go through the RAID configuration utility. That's the problem. The card refuses to work as a simple disk controller without frills, and the frills get in the way. To summarize: it isn't FreeBSD's fault, no matter which version you use. It's a "feature" coming directly from the geniuses who designed the card. Borja. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[hidden email]" |
|
On 15/12/2011, at 2:16 AM, Borja Marcos wrote:
> > On Dec 14, 2011, at 2:09 PM, Hugo Silva wrote: > >> Hello, >> >> First of all apologies if this has been fixed in RC3. I set this server >> up with mfsbsd, which is RC1, and didn't get to update the system yet. >> >> This box has 6 hdds, a 2-mirror zpool was set up as the root pool, with >> 2 spares. >> >> While testing hot swapping I noticed that while the controller detects >> disk removal/insertion, the zpool will never recover. The problem seems >> to be deeper than ZFS, as disklabel/fdisk/etc also fail on the >> removed-and-reinserted disk. >> >> At the ZFS level, doing a zpool clear yields more errors on the removed >> disk; rebooting becomes the only option to make the pool healthy again. >> >> >> Is this normal? Did I miss any step? > > I assume that you have tried to use the H700 as a "JBOD" card, defining logical volume for each hard disk. > > The problem is: that gorgeous, fantastic, masterful, Nobel award candidate card, has a wonderful behavior in that case. If you extract one of the disks, the logical volume associated to it is invalidated. So, you insert a replacement disk, and the card refuses to recognize the volume. What is even worse, in order to recover it's mandatory to reboot the complete system *AND* go through the RAID configuration utility. > > That's the problem. The card refuses to work as a simple disk controller without frills, and the frills get in the way. > > To summarize: it isn't FreeBSD's fault, no matter which version you use. It's a "feature" coming directly from the geniuses who designed the card. Hugo: You missed a step. Borja: No reboot required. For the mfi controllers I have been testing recently (MegaRAID 9261-8i), you need to install the sysutils/megacli port, and use that to clear the "foreignness" of the disk you just added. Something like: MegaCli -CfgForeign -Clear -a0 You should be able to then recreate it as a JBOD device, and progress through whatever higher level recovery you need to do. Regards, Jan Mikkelsen _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Borja Marcos-2
On 15/12/2011, at 2:16 AM, Borja Marcos wrote:
> > On Dec 14, 2011, at 2:09 PM, Hugo Silva wrote: > >> Hello, >> >> First of all apologies if this has been fixed in RC3. I set this server >> up with mfsbsd, which is RC1, and didn't get to update the system yet. >> >> This box has 6 hdds, a 2-mirror zpool was set up as the root pool, with >> 2 spares. >> >> While testing hot swapping I noticed that while the controller detects >> disk removal/insertion, the zpool will never recover. The problem seems >> to be deeper than ZFS, as disklabel/fdisk/etc also fail on the >> removed-and-reinserted disk. >> >> At the ZFS level, doing a zpool clear yields more errors on the removed >> disk; rebooting becomes the only option to make the pool healthy again. >> >> >> Is this normal? Did I miss any step? > > I assume that you have tried to use the H700 as a "JBOD" card, defining logical volume for each hard disk. > > The problem is: that gorgeous, fantastic, masterful, Nobel award candidate card, has a wonderful behavior in that case. If you extract one of the disks, the logical volume associated to it is invalidated. So, you insert a replacement disk, and the card refuses to recognize the volume. What is even worse, in order to recover it's mandatory to reboot the complete system *AND* go through the RAID configuration utility. > > That's the problem. The card refuses to work as a simple disk controller without frills, and the frills get in the way. > > To summarize: it isn't FreeBSD's fault, no matter which version you use. It's a "feature" coming directly from the geniuses who designed the card. (Sending again to avoid moderation.) Hugo: You missed a step. Borja: No reboot required. For the mfi controllers I have been testing recently (MegaRAID 9261-8i), you need to install the sysutils/megacli port, and use that to clear the "foreignness" of the disk you just added. Something like: MegaCli -CfgForeign -Clear -a0 You should be able to then recreate it as a JBOD device, and progress through whatever higher level recovery you need to do. Regards, Jan Mikkelsen _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[hidden email]" |
|
On Thursday, December 15, 2011 4:19:58 am Jan Mikkelsen wrote:
> On 15/12/2011, at 2:16 AM, Borja Marcos wrote: > > > > > On Dec 14, 2011, at 2:09 PM, Hugo Silva wrote: > > > >> Hello, > >> > >> First of all apologies if this has been fixed in RC3. I set this server > >> up with mfsbsd, which is RC1, and didn't get to update the system yet. > >> > >> This box has 6 hdds, a 2-mirror zpool was set up as the root pool, with > >> 2 spares. > >> > >> While testing hot swapping I noticed that while the controller detects > >> disk removal/insertion, the zpool will never recover. The problem seems > >> to be deeper than ZFS, as disklabel/fdisk/etc also fail on the > >> removed-and-reinserted disk. > >> > >> At the ZFS level, doing a zpool clear yields more errors on the removed > >> disk; rebooting becomes the only option to make the pool healthy again. > >> > >> > >> Is this normal? Did I miss any step? > > > > I assume that you have tried to use the H700 as a "JBOD" card, defining > > > > The problem is: that gorgeous, fantastic, masterful, Nobel award candidate card, has a wonderful behavior in that case. If you extract one of the disks, the logical volume associated to it is invalidated. So, you insert a replacement disk, and the card refuses to recognize the volume. What is even worse, in order to recover it's mandatory to reboot the complete system *AND* go through the RAID configuration utility. > > > > That's the problem. The card refuses to work as a simple disk controller without frills, and the frills get in the way. > > > > To summarize: it isn't FreeBSD's fault, no matter which version you use. It's a "feature" coming directly from the geniuses who designed the card. > > (Sending again to avoid moderation.) > > Hugo: You missed a step. Borja: No reboot required. > > For the mfi controllers I have been testing recently (MegaRAID 9261-8i), you need to install the sysutils/megacli port, and use that to clear the "foreignness" of the disk you just added. Something like: > > MegaCli -CfgForeign -Clear -a0 > > You should be able to then recreate it as a JBOD device, and progress through whatever higher level recovery you need to do. Can you do this by marking it as 'good' via mfiutil and then using mfiutil to create a volume? -- John Baldwin _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Jan Mikkelsen-4
On 12/15/11 09:19, Jan Mikkelsen wrote:
> > Hugo: You missed a step. Borja: No reboot required. > > For the mfi controllers I have been testing recently (MegaRAID 9261-8i), you need to install the sysutils/megacli port, and use that to clear the "foreignness" of the disk you just added. Something like: > > MegaCli -CfgForeign -Clear -a0 > > You should be able to then recreate it as a JBOD device, and progress through whatever higher level recovery you need to do. > > Regards, > > Jan Mikkelsen > > > _______________________________________________ > [hidden email] mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "[hidden email]" Hello, I tried that and got: There is no foreign configuration on controller 0. Exit Code: 0x00 Presently one drive is failed, following a mfiutil fail 2. Trying to bring it back with 'mfiutil good 2' results in: mfiutil: Command failed: Wrong firmware or drive state mfiutil: Failed to set drive 2 to UNCONFIGURED GOOD: Input/output error mfi0 Volumes: Id Size Level Stripe State Cache Name mfid0 ( 558G) RAID-0 64k OPTIMAL Disabled mfid1 ( 558G) RAID-0 64k OPTIMAL Disabled mfid2 ( 558G) RAID-0 64k OPTIMAL Disabled mfid3 ( 558G) RAID-0 64k OPTIMAL Disabled mfid4 ( 558G) RAID-0 64k OFFLINE Disabled mfid5 ( 558G) RAID-0 64k OPTIMAL Disabled As Borja said, part of the difficulty is the H700 abstracting a single disk as a RAID-0, I guess. So far I've been unable to find a way to bring the drive back, except by rebooting and recreating. Any other suggestions? Thanks, Hugo _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[hidden email]" |
|
On 12/15/11 15:28, Hugo Silva wrote:
> As Borja said, part of the difficulty is the H700 abstracting a single > disk as a RAID-0, I guess. So far I've been unable to find a way to > bring the drive back, except by rebooting and recreating. Turns out no interaction is needed after reboot. It was something else unrelated. The main issue then is convincing the controller to once again accept the hard disk. I'm going through MegaCli "documentation" (ie --help).. it's not a pretty place. Regards, Hugo _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[hidden email]" |
|
On Dec 15, 2011, at 4:19 AM, Jan Mikkelsen wrote: > For the mfi controllers I have been testing recently (MegaRAID 9261-8i), you need to install the sysutils/megacli port, and use that to clear the "foreignness" of the disk you just added. Something like: > > MegaCli -CfgForeign -Clear -a0 I don't think that's what you want. You want to use -Import, not -Clear, to keep your data intact. On Dec 15, 2011, at 11:03 AM, Hugo Silva wrote: > On 12/15/11 15:28, Hugo Silva wrote: >> As Borja said, part of the difficulty is the H700 abstracting a single >> disk as a RAID-0, I guess. So far I've been unable to find a way to >> bring the drive back, except by rebooting and recreating. > > Turns out no interaction is needed after reboot. It was something else > unrelated. The main issue then is convincing the controller to once > again accept the hard disk. I'm going through MegaCli "documentation" > (ie --help).. it's not a pretty place. I'm not sure it would even be possible to come up with a worse interface. It boggles the mind. I recommend you always run with this configuration: # MegaCli -AdpSetProp AutoEnhancedImportEnbl -aALL # MegaCli -AdpSetProp MaintainPdFailHistoryEnbl -0 -aALL AutoEnhancedImportEnbl will bring the foreign disk back in on a reboot. LSI recommends turning off MaintainPdFailHistory when using single-disk RAID0 configurations. To bring in a foreign disk without rebooting: # MegaCli -CfgForeign -Scan -aALL # MegaCli -CfgForeign -Import [x] -aN (where x is the config number listed in the scan, and N is the adapter number) Adding these capabilities to mfiutil is on my list of things to do, but it's not ready yet. Has anyone managed to get the real JBOD mode working on this controller? It advertises support in the firmware but doesn't seem to do anything. The documentation only lists JBOD mode as a feature of the lower-end controllers. Hope this helps. -Andrew -------------------------------------------------- Andrew Boyer [hidden email] _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[hidden email]" |
|
On 12/15/11 16:40, Andrew Boyer wrote:
> I'm not sure it would even be possible to come up with a worse interface. It boggles the mind. > > I recommend you always run with this configuration: > > # MegaCli -AdpSetProp AutoEnhancedImportEnbl -aALL > # MegaCli -AdpSetProp MaintainPdFailHistoryEnbl -0 -aALL > > AutoEnhancedImportEnbl will bring the foreign disk back in on a reboot. LSI recommends turning off MaintainPdFailHistory when using single-disk RAID0 configurations. > Any gotchas with this enabled? I'm thinking putting in a disk from another card, which is part of a raid, in this server, for instance. > To bring in a foreign disk without rebooting: > > # MegaCli -CfgForeign -Scan -aALL > # MegaCli -CfgForeign -Import [x] -aN (where x is the config number listed in the scan, and N is the adapter number) > > Adding these capabilities to mfiutil is on my list of things to do, but it's not ready yet. > > Has anyone managed to get the real JBOD mode working on this controller? It advertises support in the firmware but doesn't seem to do anything. The documentation only lists JBOD mode as a feature of the lower-end controllers. > > Hope this helps. > > -Andrew It does help - thanks! For the same disk being removed and then reinserted, the provided commands brought the disk/volume back to mfiutil show drives/volumes output, and after a zpool clear, ZFS has no complains. For recovery from a software-induced fail (mfiutil fail eX:sX), I couldn't perform a recovery using just mfiutil. MegaCli -PDOnline -PhysDrv[eX:sX] -aN did it, in that case. For the still-untested case of an altogether new disk being inserted, I guess mfiutil create jbod N would do the trick. BTW, the mfiutil is coredumping when provided with inexistant disks (just noticed) Regards, Hugo _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[hidden email]" |
|
On Dec 15, 2011, at 1:10 PM, Hugo Silva wrote: > On 12/15/11 16:40, Andrew Boyer wrote: >> I'm not sure it would even be possible to come up with a worse interface. It boggles the mind. >> >> I recommend you always run with this configuration: >> >> # MegaCli -AdpSetProp AutoEnhancedImportEnbl -aALL >> # MegaCli -AdpSetProp MaintainPdFailHistoryEnbl -0 -aALL >> >> AutoEnhancedImportEnbl will bring the foreign disk back in on a reboot. LSI recommends turning off MaintainPdFailHistory when using single-disk RAID0 configurations. >> > > Any gotchas with this enabled? I'm thinking putting in a disk from > another card, which is part of a raid, in this server, for instance. > My understanding is that it only imports a foreign configuration if it's complete - but I've never tested it. >> To bring in a foreign disk without rebooting: >> >> # MegaCli -CfgForeign -Scan -aALL >> # MegaCli -CfgForeign -Import [x] -aN (where x is the config number listed in the scan, and N is the adapter number) >> >> Adding these capabilities to mfiutil is on my list of things to do, but it's not ready yet. >> >> Has anyone managed to get the real JBOD mode working on this controller? It advertises support in the firmware but doesn't seem to do anything. The documentation only lists JBOD mode as a feature of the lower-end controllers. >> >> Hope this helps. >> >> -Andrew > > It does help - thanks! For the same disk being removed and then > reinserted, the provided commands brought the disk/volume back to > mfiutil show drives/volumes output, and after a zpool clear, ZFS has no > complains. > > > For recovery from a software-induced fail (mfiutil fail eX:sX), I > couldn't perform a recovery using just mfiutil. MegaCli -PDOnline > -PhysDrv[eX:sX] -aN did it, in that case. > > For the still-untested case of an altogether new disk being inserted, I > guess mfiutil create jbod N would do the trick. > > > BTW, the mfiutil is coredumping when provided with inexistant disks > (just noticed) Can you provide the exact command that produces a core? -Andrew -------------------------------------------------- Andrew Boyer [hidden email] _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by John Baldwin
On 16/12/2011, at 1:56 AM, John Baldwin wrote: > On Thursday, December 15, 2011 4:19:58 am Jan Mikkelsen wrote: >> For the mfi controllers I have been testing recently (MegaRAID 9261-8i), you > need to install the sysutils/megacli port, and use that to clear the > "foreignness" of the disk you just added. Something like: >> >> MegaCli -CfgForeign -Clear -a0 >> >> You should be able to then recreate it as a JBOD device, and progress > through whatever higher level recovery you need to do. > > Can you do this by marking it as 'good' via mfiutil and then using mfiutil > to create a volume? I was going to reply and say that mfiutil will complain about the drive being in the wrong state, but after reading the other replies I decided to test. With a blank drive, yes, you can use mfiutil to recreate the jbod device. You don't even need to do an "mfiutil good" first. If you use a drive that has previously been used by an mfi controller, it shows up as "bad". Doing "mfiutil good" makes it go to the "unconfigured good" state. Then creation of the jbod fails with this error: mfiutil: Command failed: Wrong firmware or drive state mfiutil: Failed to add volume: Input/output error At this point you need to reach for "MegaCli -CfgForeign" and deal with the now foreign drive. You can use -Import (as pointed out by Andrew Boyer) or -Clear. In my previous testing (on which my original reply was based), I used drives that were being moved between machines and so my procedure ended up being -Clear because I did not want the drive to have the same configuration as the last time it was used. That was followed a dd from /dev/zero and then the higher level steps. I have just tested -Import for the same slot and it worked fine for me. I have not tested -Import when putting the drive into a different slot. Regards, Jan. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Andrew Boyer-2
On 16/12/2011, at 3:40 AM, Andrew Boyer wrote: > > On Dec 15, 2011, at 4:19 AM, Jan Mikkelsen wrote: > >> For the mfi controllers I have been testing recently (MegaRAID 9261-8i), you need to install the sysutils/megacli port, and use that to clear the "foreignness" of the disk you just added. Something like: >> >> MegaCli -CfgForeign -Clear -a0 > > I don't think that's what you want. You want to use -Import, not -Clear, to keep your data intact. OK. When I did a -Clear and recreated the drive as a single disk raid0 volume, the data was still there, but I wanted it to go away. > On Dec 15, 2011, at 11:03 AM, Hugo Silva wrote: > >> On 12/15/11 15:28, Hugo Silva wrote: >>> As Borja said, part of the difficulty is the H700 abstracting a single >>> disk as a RAID-0, I guess. So far I've been unable to find a way to >>> bring the drive back, except by rebooting and recreating. >> >> Turns out no interaction is needed after reboot. It was something else >> unrelated. The main issue then is convincing the controller to once >> again accept the hard disk. I'm going through MegaCli "documentation" >> (ie --help).. it's not a pretty place. > > I'm not sure it would even be possible to come up with a worse interface. It boggles the mind. I agree. It is insanely bad. > I recommend you always run with this configuration: > > # MegaCli -AdpSetProp AutoEnhancedImportEnbl -aALL > # MegaCli -AdpSetProp MaintainPdFailHistoryEnbl -0 -aALL > > AutoEnhancedImportEnbl will bring the foreign disk back in on a reboot. LSI recommends turning off MaintainPdFailHistory when using single-disk RAID0 configurations. What does PD Fail History actually do? > Adding these capabilities to mfiutil is on my list of things to do, but it's not ready yet. Thanks. > Has anyone managed to get the real JBOD mode working on this controller? It advertises support in the firmware but doesn't seem to do anything. The documentation only lists JBOD mode as a feature of the lower-end controllers. You mean using "MegaCli -PDMakeJBOD"? No, it doesn't work from me on the 9281-8i. I get "Failed to change PD state". Single disk RAID-0 works fine. > Hope this helps. It does, thank you. Regards, Jan. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[hidden email]" |
|
On Dec 16, 2011, at 12:47 AM, Jan Mikkelsen wrote: > On 16/12/2011, at 3:40 AM, Andrew Boyer wrote: >> On Dec 15, 2011, at 4:19 AM, Jan Mikkelsen wrote: >> >>> For the mfi controllers I have been testing recently (MegaRAID 9261-8i), you need to install the sysutils/megacli port, and use that to clear the "foreignness" of the disk you just added. Something like: >>> >>> MegaCli -CfgForeign -Clear -a0 >> >> I don't think that's what you want. You want to use -Import, not -Clear, to keep your data intact. > > OK. When I did a -Clear and recreated the drive as a single disk raid0 volume, the data was still there, but I wanted it to go away. The RAID identity is stored on a 512MB partition at the end of the disk. Clearing and recreating it doesn't actually affect your data, as you discovered. You can even plug one of your RAID0 disks into a non-RAID controller and your data will be there. mfiutil has a 'drive clear' feature to zero disks. Or you could just dd a few megs of zeroes to the beginning of the disk. > >> I recommend you always run with this configuration: >> >> # MegaCli -AdpSetProp AutoEnhancedImportEnbl -aALL >> # MegaCli -AdpSetProp MaintainPdFailHistoryEnbl -0 -aALL >> >> AutoEnhancedImportEnbl will bring the foreign disk back in on a reboot. LSI recommends turning off MaintainPdFailHistory when using single-disk RAID0 configurations. > > What does PD Fail History actually do? See: http://kb.lsi.com/KnowledgebaseArticle16570.aspx -Andrew -------------------------------------------------- Andrew Boyer [hidden email] _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Andrew Boyer-2
Andrew Boyer writes:
[snip] | Has anyone managed to get the real JBOD mode working on this controller? | It advertises support in the firmware but doesn't seem to do anything. | The documentation only lists JBOD mode as a feature of the lower-end | controllers. [snip] The current mfi driver doesn't have support for JBOD support. The new one that is being working does have true JBOD support. Some people have been doing something like this via the cam passthough hack. With JBOD inserting the disk and removing a disk as hot swap is working. There are some rough edges on the new driver that are being worked out by a few people. It supports all current LSI MegaRAID cards. So things should get better in the near future. Thanks, Doug A. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[hidden email]" |
| Powered by Nabble | See how NAML generates this page |
