Quantcast

ZFS resilvering strangles IO

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

ZFS resilvering strangles IO

Michael Gmelin-2
Hello,

I know I'm not the first one to ask this, but I couldn't find a definitive answers in previous threads.

I'm running a FreeBSD 9.0 RELEASE-p1 amd64 system, 8 x 1TB SATA2 drives (not SAS) and an LSI SAS 9211 controller in IT mode (HBAs, da0-da7). Zpool version 28, raidz2 container. Machine has 4GB of RAM, therefore ZFS prefetch is disabled. No manual tuning of ZFS options. Pool contains about 1TB of data right now (so about 25% full). In normal operations the pool shows excellent performance. Yesterday I had to replace a drive, so resilvering started. The resilver process took about 15 hours - which seems a little bit slow to me, but whatever - what really struck me was that during resilvering the pool performance got really bad. Read performance was acceptable, but write performance got down to 500kb/s (for almost all of the 15 hours). After resilvering finished, system performance returned to normal.

Fortunately this is a backup server and no full backups were scheduled, so no drama, but I really don't want to have to replace a drive in a database (or other high IO) server this way (I would have been forced to offline the drive somehow and migrate data to another server).

So the question is, is there anything I can do to improve the situation? Is this because of memory constraints? Are there any other knobs to adjust? As far as I know zfs_resilver_delay can't be changed in FreeBSD yet.

I have more drives around, so I could replace another one in the server, just to replicate the exact situation.

Cheers,
Michael

Disk layout:

daXp1128 boot
daXp2 16G frebsd-swap
daXp3 915G freebsd-zfs


Zpool status during resilvering:

[root@backup /tmp]# zpool status -v
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Mon May  7 20:18:34 2012
    249G scanned out of 908G at 18.2M/s, 10h17m to go
    31.2G resilvered, 27.46% done
config:

        NAME                        STATE     READ WRITE CKSUM
        tank                        DEGRADED     0     0     0
          raidz2-0                  DEGRADED     0     0     0
            replacing-0             REMOVED      0     0     0
              15364271088212071398  REMOVED      0     0     0  was
/dev/da0p3/old
              da0p3                 ONLINE       0     0     0
(resilvering)
            da1p3                   ONLINE       0     0     0
            da2p3                   ONLINE       0     0     0
            da3p3                   ONLINE       0     0     0
            da4p3                   ONLINE       0     0     0
            da5p3                   ONLINE       0     0     0
            da6p3                   ONLINE       0     0     0
            da7p3                   ONLINE       0     0     0

errors: No known data errors

Zpool status later in the process:
root@backup /tmp]# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Mon May  7 20:18:34 2012
    833G scanned out of 908G at 19.1M/s, 1h7m to go
    104G resilvered, 91.70% done
config:

        NAME                        STATE     READ WRITE CKSUM
        tank                        DEGRADED     0     0     0
          raidz2-0                  DEGRADED     0     0     0
            replacing-0             REMOVED      0     0     0
              15364271088212071398  REMOVED      0     0     0  was
/dev/da0p3/old
              da0p3                 ONLINE       0     0     0
(resilvering)
            da1p3                   ONLINE       0     0     0
            da2p3                   ONLINE       0     0     0
            da3p3                   ONLINE       0     0     0
            da4p3                   ONLINE       0     0     0
            da5p3                   ONLINE       0     0     0
            da6p3                   ONLINE       0     0     0
            da7p3                   ONLINE       0     0     0

errors: No known data errors


Zpool status after resilvering finished:
root@backup /]# zpool status
  pool: tank
 state: ONLINE
 scan: resilvered 113G in 14h54m with 0 errors on Tue May  8 11:13:31 2012
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            da0p3   ONLINE       0     0     0
            da1p3   ONLINE       0     0     0
            da2p3   ONLINE       0     0     0
            da3p3   ONLINE       0     0     0
            da4p3   ONLINE       0     0     0
            da5p3   ONLINE       0     0     0
            da6p3   ONLINE       0     0     0
            da7p3   ONLINE       0     0     0

errors: No known data errors

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: ZFS resilvering strangles IO

Tom Evans-3
On Tue, May 8, 2012 at 3:33 PM, Michael Gmelin <[hidden email]> wrote:
> So the question is, is there anything I can do to improve the situation?
> Is this because of memory constraints? Are there any other knobs to
> adjust? As far as I know zfs_resilver_delay can't be changed in FreeBSD yet.
>
> I have more drives around, so I could replace another one in the server,
> just to replicate the exact situation.
>

In general, raidz is pretty fast, but when it's resilvering it is just
too busy. The first thing I would do to speed up writes is to add a
log device, preferably a SSD. Having a log device will allow the pool
to buffer writes to the pool much more effectively than normally
during a resilver.
Having lots of small writes will kill read speed during the resilver,
which is the critical thing.

If your workload would benefit, you could split the SSD down the
middle, use half for a log device, and half for a cache device to
accelerate reads.

I've never tried using a regular disk as a log device, I wonder if
that would speed up resilvering?

Cheers

Tom
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: ZFS resilvering strangles IO

Michael Gmelin-2
On May 8, 2012, at 16:58, Tom Evans wrote:

> On Tue, May 8, 2012 at 3:33 PM, Michael Gmelin <[hidden email]> wrote:
>> So the question is, is there anything I can do to improve the situation?
>> Is this because of memory constraints? Are there any other knobs to
>> adjust? As far as I know zfs_resilver_delay can't be changed in FreeBSD yet.
>>
>> I have more drives around, so I could replace another one in the server,
>> just to replicate the exact situation.
>>
>
> In general, raidz is pretty fast, but when it's resilvering it is just
> too busy. The first thing I would do to speed up writes is to add a
> log device, preferably a SSD. Having a log device will allow the pool
> to buffer writes to the pool much more effectively than normally
> during a resilver.
> Having lots of small writes will kill read speed during the resilver,
> which is the critical thing.
>
> If your workload would benefit, you could split the SSD down the
> middle, use half for a log device, and half for a cache device to
> accelerate reads.
>
> I've never tried using a regular disk as a log device, I wonder if
> that would speed up resilvering?
>
> Cheers
>
> Tom

Thanks for your constructive feedback. It would be interesting to see if adding an SSD could actually help in this case (it definitely would benefit the machine also during normal operation). Unfortunately it's not an option (the server is maxed out, there is simply no room to add a log device at the moment).

The general question remains - is there a way to make ZFS perform better during resilvering - has anybody experience tuning zfs_resilver_delay on Solaris and if this makes a difference (the variable is in the FreeBSD source code, but I couldn't find a way to change without touching the source)? - or is there something I missed that's specific about my setup. Especially in configurations using raidz2 and raidz3, that can withstand the loss of 2 or even 3 drives, having a longer resilver period shouldn't be an issue, as long as system performance is no degraded - or only degraded to a certain degree (I could see up to 50% more or less tolerable, in my case read performace was OKish, but write performance was reduced by more than 90%, so the machine was almost unusable).

Do you think it would make sense to try to play with zfs_resilver_delay directly in the ZFS kernel module?

(We have about 20 servers that could run ZFS around here, which currently run various combinations of UFS2+SU (no SUJ, since snapshots are broken currently), either on hardware RAID1 or some gmirror setup. I would like to standardize these setups to use ZFS, but I can't add logging devices to all of the for obvious reasons.)

I somehow feel that simulating this in a virtual machine is probably pointless :)

Cheers,
Michael

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: ZFS resilvering strangles IO

Bob Friesenhahn
On Tue, 8 May 2012, Michael Gmelin wrote:
>
> Do you think it would make sense to try to play with zfs_resilver_delay directly in the ZFS kernel module?

This may be the wrong approach if the issue is really that there are
too many I/Os queued for the device.  Finding a tunable which reduces
the maximum number of I/Os queued for a disk device may help reduce
write latencies by limiting the backlog.

On my Solaris 10 system, I accomplished this via a tunable in
/etc/system:
set zfs:zfs_vdev_max_pending = 5

What is the equivalent for FreeBSD?

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: ZFS resilvering strangles IO

Freddie Cash-8
On Tue, May 8, 2012 at 2:31 PM, Bob Friesenhahn
<[hidden email]> wrote:

> On Tue, 8 May 2012, Michael Gmelin wrote:
>>
>> Do you think it would make sense to try to play with zfs_resilver_delay
>> directly in the ZFS kernel module?
>
> This may be the wrong approach if the issue is really that there are too
> many I/Os queued for the device.  Finding a tunable which reduces the
> maximum number of I/Os queued for a disk device may help reduce write
> latencies by limiting the backlog.
>
> On my Solaris 10 system, I accomplished this via a tunable in /etc/system:
> set zfs:zfs_vdev_max_pending = 5
>
> What is the equivalent for FreeBSD?

Setting vfs.zfs.vdev_max_pending="4" in /boot/loader.conf (or whatever
value you want).  The default is 10.

> Bob
> --
> Bob Friesenhahn
> [hidden email], http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
> _______________________________________________
> [hidden email] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "[hidden email]"



--
Freddie Cash
[hidden email]
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: ZFS resilvering strangles IO

Artem Belevich-3
On Tue, May 8, 2012 at 2:33 PM, Freddie Cash <[hidden email]> wrote:

> On Tue, May 8, 2012 at 2:31 PM, Bob Friesenhahn
> <[hidden email]> wrote:
>> On Tue, 8 May 2012, Michael Gmelin wrote:
>>>
>>> Do you think it would make sense to try to play with zfs_resilver_delay
>>> directly in the ZFS kernel module?
>>
>> This may be the wrong approach if the issue is really that there are too
>> many I/Os queued for the device.  Finding a tunable which reduces the
>> maximum number of I/Os queued for a disk device may help reduce write
>> latencies by limiting the backlog.
>>
>> On my Solaris 10 system, I accomplished this via a tunable in /etc/system:
>> set zfs:zfs_vdev_max_pending = 5
>>
>> What is the equivalent for FreeBSD?
>
> Setting vfs.zfs.vdev_max_pending="4" in /boot/loader.conf (or whatever
> value you want).  The default is 10.

You may also want to look at vfs.zfs.scrub_limit sysctl. According to
description it's "Maximum scrub/resilver I/O queue" which sounds like
something that may help in this case.

--Artem
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: ZFS resilvering strangles IO

Michael Gmelin-2
On May 9, 2012, at 00:06, Artem Belevich wrote:

> On Tue, May 8, 2012 at 2:33 PM, Freddie Cash <[hidden email]> wrote:
>> On Tue, May 8, 2012 at 2:31 PM, Bob Friesenhahn
>> <[hidden email]> wrote:
>>> On Tue, 8 May 2012, Michael Gmelin wrote:
>>>>
>>>> Do you think it would make sense to try to play with zfs_resilver_delay
>>>> directly in the ZFS kernel module?
>>>
>>> This may be the wrong approach if the issue is really that there are too
>>> many I/Os queued for the device.  Finding a tunable which reduces the
>>> maximum number of I/Os queued for a disk device may help reduce write
>>> latencies by limiting the backlog.
>>>
>>> On my Solaris 10 system, I accomplished this via a tunable in /etc/system:
>>> set zfs:zfs_vdev_max_pending = 5
>>>
>>> What is the equivalent for FreeBSD?
>>
>> Setting vfs.zfs.vdev_max_pending="4" in /boot/loader.conf (or whatever
>> value you want).  The default is 10.
>

Do you think this will actually make a difference. As far as I
understand my primary problem is not latency but throughput. Simple
example is dd if=/dev/zero of=filename bs=1m, which gave me 500kb/s.
Latency might be an additional problem (or am I mislead and a shorter
queue would raise the processes chance to get data through?).

> You may also want to look at vfs.zfs.scrub_limit sysctl. According to
> description it's "Maximum scrub/resilver I/O queue" which sounds like
> something that may help in this case.
>
> --Artem

Very good point, thank you. I also found this entry in the FreeBSD
forums indicating that this might ease the pain (even though he's also
talking about scrub, not resilver, hopefully the tunable does both as
indicated in the comments):

http://forums.freebsd.org/showthread.php?t=31628

/* maximum scrub/resilver I/O queue per leaf vdev */ int
zfs_scrub_limit = 10;

TUNABLE_INT("vfs.zfs.scrub_limit", &zfs_scrub_limit);
SYSCTL_INT(_vfs_zfs, OID_AUTO, scrub_limit, CTLFLAG_RDTUN,
&zfs_scrub_limit, 0, "Maximum scrub/resilver I/O queue");    

I will try lowering the value zfs_scrub_limit to 6 in loader.conf
and replace the drive once more later this month.

--
Michael

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: ZFS resilvering strangles IO

Bob Friesenhahn
On Wed, 9 May 2012, Michael Gmelin wrote:
>>>
>>> Setting vfs.zfs.vdev_max_pending="4" in /boot/loader.conf (or whatever
>>> value you want).  The default is 10.
>
> Do you think this will actually make a difference. As far as I
> understand my primary problem is not latency but throughput. Simple
> example is dd if=/dev/zero of=filename bs=1m, which gave me 500kb/s.
> Latency might be an additional problem (or am I mislead and a shorter
> queue would raise the processes chance to get data through?).

The effect may be observed in real-time on a running system.  Latency
and throughput go hand in hand.  The 'dd' command is not threaded and
is sequential.  It waits for the current I/O to return before it
starts the next one.  If the wait is shorter (fewer pending requests
in line), then throughput does increase. System total throughput
(which includes the resilver operations) may not increase but the
throughput observed by an individual waiter may increase.

The default for vdev_max_pending on Solaris was/is 32.  If FreeBSD
uses a default of 10 then reducing from the default may be less
dramatic.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: ZFS resilvering strangles IO

Michael Gmelin-2
On May 9, 2012, at 00:42, Bob Friesenhahn wrote:

> On Wed, 9 May 2012, Michael Gmelin wrote:
>>>>
>>>> Setting vfs.zfs.vdev_max_pending="4" in /boot/loader.conf (or whatever
>>>> value you want).  The default is 10.
>>
>> Do you think this will actually make a difference. As far as I
>> understand my primary problem is not latency but throughput. Simple
>> example is dd if=/dev/zero of=filename bs=1m, which gave me 500kb/s.
>> Latency might be an additional problem (or am I mislead and a shorter
>> queue would raise the processes chance to get data through?).
>
> The effect may be observed in real-time on a running system.  Latency and throughput go hand in hand.  The 'dd' command is not threaded and is sequential.  It waits for the current I/O to return before it starts the next one.  If the wait is shorter (fewer pending requests in line), then throughput does increase. System total throughput (which includes the resilver operations) may not increase but the throughput observed by an individual waiter may increase.
>
> The default for vdev_max_pending on Solaris was/is 32.  If FreeBSD uses a default of 10 then reducing from the default may be less dramatic.
>

That makes sense.

I will run more sophisticated I/O tests next time to get a more
complete picture.

--
Michael

> Bob
> --
> Bob Friesenhahn
> [hidden email], http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: ZFS resilvering strangles IO

Peter Maloney
In reply to this post by Michael Gmelin-2
About the slow performance during resilver,

Are they consumer disks? If so, one guess is you have a bad disk. Check
by looking at load and ms per x on disks. If one is high and others are
low, then it's probably bad. If a single 'good' disk is bad, the whole
thing will run very slow. Bad consumer disks run very slow trying over
and over to read the not-yet-bad sectors where enterprise disks would
throw errors and fail.

My other guess is that this is because FreeBSD, unlike Linux and
Solaris, lacks IO scheduling. So there is no way for the zfs code to
truly put the resilver on lower priority than the regular production
applications. I've read that IO scheduling was developed for 8.2, but
never officially adopted. I would love to see it in FreeBSD... I use
"ionice" on Linux all the time (for copying, backups, zipping,
installing a a huge batch of packages [noticeable >300 MB], etc. while I
work on other things), so I miss it. IO scheduling on Solaris also helps
with dedup performance.

Does anyone know if there is a movement to add the IO scheduling code
into the base system?


On 05/08/2012 04:33 PM, Michael Gmelin wrote:

> Hello,
>
> I know I'm not the first one to ask this, but I couldn't find a definitive answers in previous threads.
>
> I'm running a FreeBSD 9.0 RELEASE-p1 amd64 system, 8 x 1TB SATA2 drives (not SAS) and an LSI SAS 9211 controller in IT mode (HBAs, da0-da7). Zpool version 28, raidz2 container. Machine has 4GB of RAM, therefore ZFS prefetch is disabled. No manual tuning of ZFS options. Pool contains about 1TB of data right now (so about 25% full). In normal operations the pool shows excellent performance. Yesterday I had to replace a drive, so resilvering started. The resilver process took about 15 hours - which seems a little bit slow to me, but whatever - what really struck me was that during resilvering the pool performance got really bad. Read performance was acceptable, but write performance got down to 500kb/s (for almost all of the 15 hours). After resilvering finished, system performance returned to normal.
>
> Fortunately this is a backup server and no full backups were scheduled, so no drama, but I really don't want to have to replace a drive in a database (or other high IO) server this way (I would have been forced to offline the drive somehow and migrate data to another server).
>
> So the question is, is there anything I can do to improve the situation? Is this because of memory constraints? Are there any other knobs to adjust? As far as I know zfs_resilver_delay can't be changed in FreeBSD yet.
>
> I have more drives around, so I could replace another one in the server, just to replicate the exact situation.
>
> Cheers,
> Michael
>
> Disk layout:
>
> daXp1128 boot
> daXp2 16G frebsd-swap
> daXp3 915G freebsd-zfs
>
>
> Zpool status during resilvering:
>
> [root@backup /tmp]# zpool status -v
>   pool: tank
>  state: DEGRADED
> status: One or more devices is currently being resilvered.  The pool will
>         continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>  scan: resilver in progress since Mon May  7 20:18:34 2012
>     249G scanned out of 908G at 18.2M/s, 10h17m to go
>     31.2G resilvered, 27.46% done
> config:
>
>         NAME                        STATE     READ WRITE CKSUM
>         tank                        DEGRADED     0     0     0
>           raidz2-0                  DEGRADED     0     0     0
>             replacing-0             REMOVED      0     0     0
>               15364271088212071398  REMOVED      0     0     0  was
> /dev/da0p3/old
>               da0p3                 ONLINE       0     0     0
> (resilvering)
>             da1p3                   ONLINE       0     0     0
>             da2p3                   ONLINE       0     0     0
>             da3p3                   ONLINE       0     0     0
>             da4p3                   ONLINE       0     0     0
>             da5p3                   ONLINE       0     0     0
>             da6p3                   ONLINE       0     0     0
>             da7p3                   ONLINE       0     0     0
>
> errors: No known data errors
>
> Zpool status later in the process:
> root@backup /tmp]# zpool status
>   pool: tank
>  state: DEGRADED
> status: One or more devices is currently being resilvered.  The pool will
>         continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>  scan: resilver in progress since Mon May  7 20:18:34 2012
>     833G scanned out of 908G at 19.1M/s, 1h7m to go
>     104G resilvered, 91.70% done
> config:
>
>         NAME                        STATE     READ WRITE CKSUM
>         tank                        DEGRADED     0     0     0
>           raidz2-0                  DEGRADED     0     0     0
>             replacing-0             REMOVED      0     0     0
>               15364271088212071398  REMOVED      0     0     0  was
> /dev/da0p3/old
>               da0p3                 ONLINE       0     0     0
> (resilvering)
>             da1p3                   ONLINE       0     0     0
>             da2p3                   ONLINE       0     0     0
>             da3p3                   ONLINE       0     0     0
>             da4p3                   ONLINE       0     0     0
>             da5p3                   ONLINE       0     0     0
>             da6p3                   ONLINE       0     0     0
>             da7p3                   ONLINE       0     0     0
>
> errors: No known data errors
>
>
> Zpool status after resilvering finished:
> root@backup /]# zpool status
>   pool: tank
>  state: ONLINE
>  scan: resilvered 113G in 14h54m with 0 errors on Tue May  8 11:13:31 2012
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         tank        ONLINE       0     0     0
>           raidz2-0  ONLINE       0     0     0
>             da0p3   ONLINE       0     0     0
>             da1p3   ONLINE       0     0     0
>             da2p3   ONLINE       0     0     0
>             da3p3   ONLINE       0     0     0
>             da4p3   ONLINE       0     0     0
>             da5p3   ONLINE       0     0     0
>             da6p3   ONLINE       0     0     0
>             da7p3   ONLINE       0     0     0
>
> errors: No known data errors
>
> _______________________________________________
> [hidden email] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "[hidden email]"


--

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: [hidden email]
Internet: http://www.brockmann-consult.de
--------------------------------------------

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: ZFS resilvering strangles IO

Johannes Totz-2
On 09/05/2012 07:55, Peter Maloney wrote:

> About the slow performance during resilver,
>
> Are they consumer disks? If so, one guess is you have a bad disk. Check
> by looking at load and ms per x on disks. If one is high and others are
> low, then it's probably bad. If a single 'good' disk is bad, the whole
> thing will run very slow. Bad consumer disks run very slow trying over
> and over to read the not-yet-bad sectors where enterprise disks would
> throw errors and fail.
>
> My other guess is that this is because FreeBSD, unlike Linux and
> Solaris, lacks IO scheduling. So there is no way for the zfs code to
> truly put the resilver on lower priority than the regular production
> applications. I've read that IO scheduling was developed for 8.2, but
> never officially adopted. I would love to see it in FreeBSD... I use
> "ionice" on Linux all the time (for copying, backups, zipping,
> installing a a huge batch of packages [noticeable >300 MB], etc. while I
> work on other things), so I miss it. IO scheduling on Solaris also helps
> with dedup performance.
>
> Does anyone know if there is a movement to add the IO scheduling code
> into the base system?

There was a geom module for io scheduling: gsched(8)
But I've never used it and don't know what the state of it is...


> On 05/08/2012 04:33 PM, Michael Gmelin wrote:
>> Hello,
>>
>> I know I'm not the first one to ask this, but I couldn't find a definitive answers in previous threads.
>>
>> I'm running a FreeBSD 9.0 RELEASE-p1 amd64 system, 8 x 1TB SATA2 drives (not SAS) and an LSI SAS 9211 controller in IT mode (HBAs, da0-da7). Zpool version 28, raidz2 container. Machine has 4GB of RAM, therefore ZFS prefetch is disabled. No manual tuning of ZFS options. Pool contains about 1TB of data right now (so about 25% full). In normal operations the pool shows excellent performance. Yesterday I had to replace a drive, so resilvering started. The resilver process took about 15 hours - which seems a little bit slow to me, but whatever - what really struck me was that during resilvering the pool performance got really bad. Read performance was acceptable, but write performance got down to 500kb/s (for almost all of the 15 hours). After resilvering finished, system performance returned
>   to normal.
>>
>> Fortunately this is a backup server and no full backups were scheduled, so no drama, but I really don't want to have to replace a drive in a database (or other high IO) server this way (I would have been forced to offline the drive somehow and migrate data to another server).
>>
>> So the question is, is there anything I can do to improve the situation? Is this because of memory constraints? Are there any other knobs to adjust? As far as I know zfs_resilver_delay can't be changed in FreeBSD yet.
>>
>> I have more drives around, so I could replace another one in the server, just to replicate the exact situation.
>>
>> Cheers,
>> Michael
>>
>> Disk layout:
>>
>> daXp1128 boot
>> daXp2 16G frebsd-swap
>> daXp3 915G freebsd-zfs
>>
>>
>> Zpool status during resilvering:
>>
>> [root@backup /tmp]# zpool status -v
>>   pool: tank
>>  state: DEGRADED
>> status: One or more devices is currently being resilvered.  The pool will
>>         continue to function, possibly in a degraded state.
>> action: Wait for the resilver to complete.
>>  scan: resilver in progress since Mon May  7 20:18:34 2012
>>     249G scanned out of 908G at 18.2M/s, 10h17m to go
>>     31.2G resilvered, 27.46% done
>> config:
>>
>>         NAME                        STATE     READ WRITE CKSUM
>>         tank                        DEGRADED     0     0     0
>>           raidz2-0                  DEGRADED     0     0     0
>>             replacing-0             REMOVED      0     0     0
>>               15364271088212071398  REMOVED      0     0     0  was
>> /dev/da0p3/old
>>               da0p3                 ONLINE       0     0     0
>> (resilvering)
>>             da1p3                   ONLINE       0     0     0
>>             da2p3                   ONLINE       0     0     0
>>             da3p3                   ONLINE       0     0     0
>>             da4p3                   ONLINE       0     0     0
>>             da5p3                   ONLINE       0     0     0
>>             da6p3                   ONLINE       0     0     0
>>             da7p3                   ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>> Zpool status later in the process:
>> root@backup /tmp]# zpool status
>>   pool: tank
>>  state: DEGRADED
>> status: One or more devices is currently being resilvered.  The pool will
>>         continue to function, possibly in a degraded state.
>> action: Wait for the resilver to complete.
>>  scan: resilver in progress since Mon May  7 20:18:34 2012
>>     833G scanned out of 908G at 19.1M/s, 1h7m to go
>>     104G resilvered, 91.70% done
>> config:
>>
>>         NAME                        STATE     READ WRITE CKSUM
>>         tank                        DEGRADED     0     0     0
>>           raidz2-0                  DEGRADED     0     0     0
>>             replacing-0             REMOVED      0     0     0
>>               15364271088212071398  REMOVED      0     0     0  was
>> /dev/da0p3/old
>>               da0p3                 ONLINE       0     0     0
>> (resilvering)
>>             da1p3                   ONLINE       0     0     0
>>             da2p3                   ONLINE       0     0     0
>>             da3p3                   ONLINE       0     0     0
>>             da4p3                   ONLINE       0     0     0
>>             da5p3                   ONLINE       0     0     0
>>             da6p3                   ONLINE       0     0     0
>>             da7p3                   ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>>
>> Zpool status after resilvering finished:
>> root@backup /]# zpool status
>>   pool: tank
>>  state: ONLINE
>>  scan: resilvered 113G in 14h54m with 0 errors on Tue May  8 11:13:31 2012
>> config:
>>
>>         NAME        STATE     READ WRITE CKSUM
>>         tank        ONLINE       0     0     0
>>           raidz2-0  ONLINE       0     0     0
>>             da0p3   ONLINE       0     0     0
>>             da1p3   ONLINE       0     0     0
>>             da2p3   ONLINE       0     0     0
>>             da3p3   ONLINE       0     0     0
>>             da4p3   ONLINE       0     0     0
>>             da5p3   ONLINE       0     0     0
>>             da6p3   ONLINE       0     0     0
>>             da7p3   ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>> _______________________________________________
>> [hidden email] mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "[hidden email]"
>
>


_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Loading...