Quantcast

ZFS: How to enable cache and logs.

classic Classic list List threaded Threaded
40 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

ZFS: How to enable cache and logs.

Dan Carroll
Hello all.

I've been using ZFS for some time now and have never had an issued
(except perhaps the issue of speed...)
When v28 is taken into -STABLE I will most likely upgrade to v28 at that
point.   Currently I am running v15 with v4 on disk.

When I move to v28 I will probably wish to enable a L2Arc and also
perhaps dedicated log devices.

I'm curious about a few things however.

1. Can I remove either the L2 ARC or the log devices if things don't go
as planned or if I need to free up some resources?
2. What are the best practices for setting up these?   Would a geom
mirror for the log device be the way to go.  Or can you just let ZFS
mirror the log itself?
3. What happens when one or both of the log devices fail.   Does ZFS
come to a crashing halt and kill all the data?   Or does it simply
complain that the ZIL is no longer active and continue on it's merry way?

In short, what is the best way to set up these two features?

-D
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ZFS: How to enable cache and logs.

Jeremy Chadwick
On Wed, May 11, 2011 at 07:25:52PM +1000, Danny Carroll wrote:

> I've been using ZFS for some time now and have never had an issued
> (except perhaps the issue of speed...)
> When v28 is taken into -STABLE I will most likely upgrade to v28 at that
> point.   Currently I am running v15 with v4 on disk.
>
> When I move to v28 I will probably wish to enable a L2Arc and also
> perhaps dedicated log devices.
>
> I'm curious about a few things however.
>
> 1. Can I remove either the L2 ARC or the log devices if things don't go
> as planned or if I need to free up some resources?

You can remove L2ARC ("cache") devices without impact, but you cannot
remove all log devices without the pool needing to be destroyed
(recreated).  Please keep reading for details of log devices.

L2ARC devices should primarily be something with extremely fast read
rates (e.g. SSDs).  USB1.x and 2.x memory sticks do not work well for
this purpose given protocol and bus speed limits + overhead.  (I only
mention them because people often think "Oh, USB flash would work great
for this!"  I disagree.)

Furthermore, something I found out on my own: the L2ARC is completely
lost in the case the system is cleanly rebooted.  This sometimes
surprises people (myself included) since L2ARC uses actual storage
devices; one might think the data is "restored" on reboot, but it isn't
(because the ARC ("layer 1") itself is lost on reboot, obviously).

The only way to see how much disk space a cache device is using -- to my
knowledge -- is via "zpool iostat -v".

> 2. What are the best practices for setting up these?   Would a geom
> mirror for the log device be the way to go.  Or can you just let ZFS
> mirror the log itself?

Let ZFS handle it.  There is no purpose (in my opinion) to added
complexity when ZFS can handle it itself.  The KISS concept applies
greatly here.

In the case of ZFS intent logs, you definitely want a mirror.  If you
have a single log device, loss of that device can/will result in full
data loss of the pool which makes use of the log device.

Furthermore, a log device is limited to a single pool; e.g. you cannot
use the same log device (e.g. ada6) on pool "foo" and pool "bar".  It's
one or the other.

You should read **all** of the data points listed below and pay close
attention to the details:

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Separate_Log_Devices

> 3. What happens when one or both of the log devices fail.   Does ZFS
> come to a crashing halt and kill all the data?   Or does it simply
> complain that the ZIL is no longer active and continue on it's merry way?

See above.

> In short, what is the best way to set up these two features?

See the zpool(1) man page for details on how to make use of log devices.
Examples are provided, including mirroring of such devices.

--
| Jeremy Chadwick                                   [hidden email] |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ZFS: How to enable cache and logs.

Daniel Kalchev


On 11.05.11 13:06, Jeremy Chadwick wrote:
> On Wed, May 11, 2011 at 07:25:52PM +1000, Danny Carroll wrote:
>> When I move to v28 I will probably wish to enable a L2Arc and also
>> perhaps dedicated log devices.
>>
> In the case of ZFS intent logs, you definitely want a mirror.  If you
> have a single log device, loss of that device can/will result in full
> data loss of the pool which makes use of the log device.

This is true for v15 pools, not true for v28 pools. In ZFS v28 you can
remove log devices and in the case of sudden loss of log device (or
whatever) roll back the pool to a 'good' state. Therefore, for most
installations single log device might be sufficient. If you value your
data, you will of course use mirrored log devices, possibly in hot-swap
configuration and .. have a backup :)

By the way, the SLOG (separate LOG) does not have to be SSD at all.
Separate rotating disk(s) will also suffice -- it all depends on the
type of workload. SSDs are better, for the higher end, because of the
low latency (but not all SSDs are low latency when writing!).

The idea of the SLOG is to separate the ZIL records from the main data
pool. ZIL records are small, even smaller in v28, but will cause
unnecessary head movements if kept in the main pool. The SLOG is "write
once, read on failure" media and is written sequentially. Almost all
current HDDs offer reasonable sequential write performance for small to
medium pools.

The L2ARC needs to be fast reading SSD. It is populated slowly, few
MB/sec so there is no point to have fast and high-bandwidth
write-optimized SSD. The benefit from L2ARC is the low latency. Sort of
slower RAM.

It is bad idea to use the same SSD for both SLOG and L2ARC, because most
SSDs behave poorly if you present them with high read and high write
loads. More expensive units might behave, but then... if you pay few k$
for a SSD, you know what you need :)

Daniel
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ZFS: How to enable cache and logs.

Jeremy Chadwick
On Wed, May 11, 2011 at 01:37:03PM +0300, Daniel Kalchev wrote:

> On 11.05.11 13:06, Jeremy Chadwick wrote:
> >On Wed, May 11, 2011 at 07:25:52PM +1000, Danny Carroll wrote:
> >>When I move to v28 I will probably wish to enable a L2Arc and also
> >>perhaps dedicated log devices.
> >>
> >In the case of ZFS intent logs, you definitely want a mirror.  If you
> >have a single log device, loss of that device can/will result in full
> >data loss of the pool which makes use of the log device.
>
> This is true for v15 pools, not true for v28 pools. In ZFS v28 you
> can remove log devices and in the case of sudden loss of log device
> (or whatever) roll back the pool to a 'good' state. Therefore, for
> most installations single log device might be sufficient. If you
> value your data, you will of course use mirrored log devices,
> possibly in hot-swap configuration and .. have a backup :)

Has anyone actually *tested* this on FreeBSD?  Set up a single log
device on classic (non-CAM/non-ahci.ko) ATA, then literally yank the
disk out to induce a very bad/rude failure?  Does the kernel panic or
anything weird happen?

I fully acknowledge that in ZFS pool v19 and higher the issue is fixed
(at least on Solaris/OpenSolaris), but at this point in time the RELEASE
and STABLE branches are running pool version 15.

There are numerous ongoing discussions about the ZFS v28 patches right
now with regards to STABLE specifically.  Recent threads:

- Patch did not apply correctly (errors/rejections)
- Patch applied correctly but build failed (use "patch -E" I believe?)
- Discussion about when v28 is *truly* coming to RELENG_8 and if it's
  truly ready for RELENG_8

And finally, there's the one thing that people often forget/miss: if you
upgrade your pool from v15 to v28 (needed to address the log removal
stuff you mention), you cannot roll back without recreating all of your
pools.  Folks considering v28 need to take that into consideration.

> By the way, the SLOG (separate LOG) does not have to be SSD at all.
> Separate rotating disk(s) will also suffice -- it all depends on the
> type of workload. SSDs are better, for the higher end, because of
> the low latency (but not all SSDs are low latency when writing!).

I didn't state log devices should be SSDs.  I stated cache devices
(L2ARC) should be SSDs.  :-)  A non-high-end SSD for a log device is
probably a very bad idea given the sub-par write speeds, agreed.  A
FusionIO card/setup on the other hand would probably work wonderfully,
but that's much more expensive (you cover that below).

> The idea of the SLOG is to separate the ZIL records from the main
> data pool. ZIL records are small, even smaller in v28, but will
> cause unnecessary head movements if kept in the main pool. The SLOG
> is "write once, read on failure" media and is written sequentially.
> Almost all current HDDs offer reasonable sequential write
> performance for small to medium pools.
>
> The L2ARC needs to be fast reading SSD. It is populated slowly, few
> MB/sec so there is no point to have fast and high-bandwidth
> write-optimized SSD. The benefit from L2ARC is the low latency. Sort
> of slower RAM.

Agreed, and the overall point to L2ARC is to help with improved random
reads, if I remember right.  The concept is that it's a 2nd layer
of caching that shouldn't hurt or hinder performance when used/put in
place, but can greatly help when the "layer 1" ARC lacks an entry.

> It is bad idea to use the same SSD for both SLOG and L2ARC, because
> most SSDs behave poorly if you present them with high read and high
> write loads. More expensive units might behave, but then... if you
> pay few k$ for a SSD, you know what you need :)

Again, agreed.

Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks
should also keep that in mind when putting an SSD into use in this
fashion.

--
| Jeremy Chadwick                                   [hidden email] |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ZFS: How to enable cache and logs.

Daniel Kalchev


On 11.05.11 13:51, Jeremy Chadwick wrote:
>
> Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks
> should also keep that in mind when putting an SSD into use in this
> fashion.
>
By the way, what would be the use of TRIM for SLOG and L2ARC devices?
I see absolutely no benefit from TRIM for the L2ARC, because it is
written slowly (on purpose). Any current, or 1-2 generations back SSD
would handle that write load without TRIM and without any performance
degradation.

Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC SSD
for the SLOG, for many reasons. The write regions on the SLC NAND should
be smaller (my wild guess, current practice may differ) and the need for
rewriting will be small. If you don't need to rewrite already written
data, TRIM does not help. Also, as far as I understand, most "serious"
SSDs (typical for SLC I guess) would have twice or more the advertised
size and always write to fresh cells, scheduling an background erase of
the 'overwritten' cell.

Does Solaris have TRIM for ZFS? Where? How does it help? I can imagine
TRIM for the data pool, that would be good fit for ZFS, but SSD-only
pool.. are we already there?

Daniel
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ZFS: How to enable cache and logs.

Jeremy Chadwick
On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote:

> On 11.05.11 13:51, Jeremy Chadwick wrote:
> >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks
> >should also keep that in mind when putting an SSD into use in this
> >fashion.
>
> By the way, what would be the use of TRIM for SLOG and L2ARC devices?
> I see absolutely no benefit from TRIM for the L2ARC, because it is
> written slowly (on purpose).  Any current, or 1-2 generations back SSD
> would handle that write load without TRIM and without any performance
> degradation.
>
> Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC
> SSD for the SLOG, for many reasons. The write regions on the SLC
> NAND should be smaller (my wild guess, current practice may differ)
> and the need for rewriting will be small. If you don't need to
> rewrite already written data, TRIM does not help. Also, as far as I
> understand, most "serious" SSDs (typical for SLC I guess) would have
> twice or more the advertised size and always write to fresh cells,
> scheduling an background erase of the 'overwritten' cell.

AFAIK, drive manufacturers do not disclose just how much reallocation
space they keep available on an SSD.  I'd rather not speculate as to how
much, as I'm certain it varies per vendor.

I can talk a bit about SSD drive performance from a consumer level (that
is to say: low-end consumer Intel SSDs such as the X25-V, X25-M, and
latest 320 and 510 series -- I use them all over the place), both before
and after TRIM operations.  I don't use any of these SSDs on ZFS
however, only UFS (and I have seen the results both before and after
TRIM support was added to UFS; and yeah, all the drives are running the
latest firmware).

What's confusing to me is why someone would say TRIM doesn't really
matter in the case of an intent log device or a cache device; these
devices both implement some degree of write operations, correct?  The
drive has to erase the NAND flash block (well, page really) before the
block can be re-used (written to once again), so by not doing TRIM
effectively you're relying 100% on drives' garbage collection
mechanisms, which isn't that great (at least WRT the above drives).
There are some sites that go over Intel SSD performance out-of-the-box
as well as once its been used for a bit, and the performance difference
is pretty substantial (50% drop in performance for reads, ~60-70% drop
in performance for writes).  Something to keep in mind.

Furthermore, most people aren't buying SLC given the cost.  Right now
the absolute #1 or #2 focus of any operation is to save money; one
cannot argue with the current economic condition.  I think this is also
why many SSD companies are focusing primarily on MLC right now; they
know the majority of their client base isn't going to spend the money
for SLC.

> Does Solaris have TRIM for ZFS? Where? How does it help? I can
> imagine TRIM for the data pool, that would be good fit for ZFS, but
> SSD-only pool.. are we already there?

The following blog post, and mailing list thread, provides answers to
all of the above questions, including why TRIM is useful on ZFS (see the
comments section; not referring to slog or cache however).  But it
doesn't look like it's actually made use of in ZFS as of January 2011.
There is a long discussion about ZFS, TRIM, and slog/cache in the 2nd
thread.  There's also a reply from pjd@ in there WRT FreeBSD.

http://www.c0t0d0s0.org/archives/6792-SATA-TRIM-support-in-Opensolaris.html
http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/44855

There's a recommendation to permit TRIM in ZFS, but limit the number of
txgs based on a sysctl, since TRIM is a slow/expensive operation.
SSDs are neat, but man, NAND-based flash sure makes me a sad panda.

--
| Jeremy Chadwick                                   [hidden email] |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ZFS: How to enable cache and logs.

Alexander Leidinger
In reply to this post by Jeremy Chadwick
Quoting Jeremy Chadwick <[hidden email]> (from Wed, 11 May  
2011 03:06:56 -0700):

> L2ARC devices should primarily be something with extremely fast read
> rates (e.g. SSDs).  USB1.x and 2.x memory sticks do not work well for
> this purpose given protocol and bus speed limits + overhead.  (I only
> mention them because people often think "Oh, USB flash would work great
> for this!"  I disagree.)

Using USB flash may work acceptable. It depends upon the rest of the  
system. If you have very fast harddisks (or only USB 1 hardware), USB  
flash will not give you a faster FS. If you have slow (and low-power)  
desktop disks, a fast USB flash (attention, there are also slow ones)  
connected via USB 2 (or 3) will give you a speed improvement you notice.

As a matter of fact, I have this:
  - Pentium 4
  - 1 GB RAM
  - 1 Western Digital Caviar Blue
  - 2 Seagate Barracuda 7200.10
  - an ICH5 controller (no NCQ)
  - no name cheap give-away 1 GB USB flash (so not a very fast one)

The disks are used in a RAIDZ, with the USB flash as a cache device.

My use case was connecting to a webmail system over a slow line (ADSL  
224 kilobit/s). I noticed directly when the cache was in use or not.

I also have another system, ICH 10 with NCQ, 5 disks (WD RE4 RAID) in  
RAIDZ2, Intel Xeon 4-core, 12 GB RAM. There USB flash does not make  
sense at all (and the SSD makes sense if you compare the price of the  
entire system with the price of a small or medium SSD).

For the first system, it does not make sense to spend 200 units of  
money for a SSD, the system itself is not worth much more now.  
Spending 5-10 units of money for this system is ok, and gives a speed  
improvement.

Bye,
Alexander.

--
Even God cannot change the past.
                -- Joseph Stalin

http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID = 72077137
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ZFS: How to enable cache and logs.

James L. Lauser
In reply to this post by Daniel Kalchev
On Wed, May 11, 2011 at 6:37 AM, Daniel Kalchev <[hidden email]> wrote:

>
>
> On 11.05.11 13:06, Jeremy Chadwick wrote:
>
>> On Wed, May 11, 2011 at 07:25:52PM +1000, Danny Carroll wrote:
>>
>>> When I move to v28 I will probably wish to enable a L2Arc and also
>>> perhaps dedicated log devices.
>>>
>>>  In the case of ZFS intent logs, you definitely want a mirror.  If you
>> have a single log device, loss of that device can/will result in full
>> data loss of the pool which makes use of the log device.
>>
>
> This is true for v15 pools, not true for v28 pools. In ZFS v28 you can
> remove log devices and in the case of sudden loss of log device (or
> whatever) roll back the pool to a 'good' state. Therefore, for most
> installations single log device might be sufficient. If you value your data,
> you will of course use mirrored log devices, possibly in hot-swap
> configuration and .. have a backup :)
>
> By the way, the SLOG (separate LOG) does not have to be SSD at all.
> Separate rotating disk(s) will also suffice -- it all depends on the type of
> workload. SSDs are better, for the higher end, because of the low latency
> (but not all SSDs are low latency when writing!).
>
> The idea of the SLOG is to separate the ZIL records from the main data
> pool. ZIL records are small, even smaller in v28, but will cause unnecessary
> head movements if kept in the main pool. The SLOG is "write once, read on
> failure" media and is written sequentially. Almost all current HDDs offer
> reasonable sequential write performance for small to medium pools.
>
> The L2ARC needs to be fast reading SSD. It is populated slowly, few MB/sec
> so there is no point to have fast and high-bandwidth write-optimized SSD.
> The benefit from L2ARC is the low latency. Sort of slower RAM.
>
> It is bad idea to use the same SSD for both SLOG and L2ARC, because most
> SSDs behave poorly if you present them with high read and high write loads.
> More expensive units might behave, but then... if you pay few k$ for a SSD,
> you know what you need :)
>
> Daniel
>
> _______________________________________________
> [hidden email] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "[hidden email]"
>


I recently learned the hard way that you need to be very careful what you
choose as your ZIL.  On my personal file server, my pool is comprised of 4x
500 GB disks in a RAID-Z and 2x 1.5 TB disks in a mirror.  I also had a 1 GB
Compact Flash card plugged into an IDE adapter, running as the ZIL.  For the
longest time, my write performance was capped at about 5 MB/sec.  In an
attempt to figure out why, I ran gstat, to see that the CF device was pegged
at 100%.

Having recently upgraded to ZFSv28, I decided to try removing the log
device.  Write performance instantly jumped to 45 MB/sec.  Lesson
learned...  If you're going to have a dedicated ZIL, make sure its write
performance exceeds the performance of the pool itself.

On the other hand, again having upgrading to v28, I attempted to use
deduplication on my pool.  Write performance dropped to an abysmal 1
MB/sec.  Why?  Because, as I found out, my system doesn't have enough memory
to keep the dedupe table in memory, nor can it be upgraded to.  But with the
application of a sufficiently large cache device, performance goes right
back up to where it's supposed to be.

--  James L. Lauser
    [hidden email]
    http://jlauser.net/
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ZFS: How to enable cache and logs.

Jason Hellenthal
In reply to this post by Jeremy Chadwick

Jeremy,

On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote:

> On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote:
> > On 11.05.11 13:51, Jeremy Chadwick wrote:
> > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks
> > >should also keep that in mind when putting an SSD into use in this
> > >fashion.
> >
> > By the way, what would be the use of TRIM for SLOG and L2ARC devices?
> > I see absolutely no benefit from TRIM for the L2ARC, because it is
> > written slowly (on purpose).  Any current, or 1-2 generations back SSD
> > would handle that write load without TRIM and without any performance
> > degradation.
> >
> > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC
> > SSD for the SLOG, for many reasons. The write regions on the SLC
> > NAND should be smaller (my wild guess, current practice may differ)
> > and the need for rewriting will be small. If you don't need to
> > rewrite already written data, TRIM does not help. Also, as far as I
> > understand, most "serious" SSDs (typical for SLC I guess) would have
> > twice or more the advertised size and always write to fresh cells,
> > scheduling an background erase of the 'overwritten' cell.
>
> AFAIK, drive manufacturers do not disclose just how much reallocation
> space they keep available on an SSD.  I'd rather not speculate as to how
> much, as I'm certain it varies per vendor.
>
Lets not forget here: The size of the separate log device may be quite
small. A rule of thumb is that you should size the separate log to be able
to handle 10 seconds of your expected synchronous write workload. It would
be rare to need more than 100 MB in a separate log device, but the
separate log must be at least 64 MB.

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide


So in other words how much is TRIM really even effective give the above ?

Even with a high database write load on the disks at full compacity of the
incoming link I would find it hard to believe that anyone could get the
ZIL to even come close to 512MB.


Given most SSD's come at a size greater than 32GB I hope this comes as a
early reminder that the ZIL you are buying that disk for is only going to
be using a small percent of that disk and I hope you justify cost over its
actual use. If you do happen to justify creating a ZIL for your pool then
I hope that you partition it wisely to make use of the rest of the space
that is untouched.

For all other cases I would reccomend if you still want to have a ZIL that
you take some sort of PCI->SD CARD or USB stick into account with
mirroring.

--

 Regards, (jhell)
 Jason Hellenthal


attachment0 (534 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ZFS: How to enable cache and logs.

Jeremy Chadwick
On Wed, May 11, 2011 at 06:38:49PM -0400, Jason Hellenthal wrote:

>
> Jeremy,
>
> On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote:
> > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote:
> > > On 11.05.11 13:51, Jeremy Chadwick wrote:
> > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks
> > > >should also keep that in mind when putting an SSD into use in this
> > > >fashion.
> > >
> > > By the way, what would be the use of TRIM for SLOG and L2ARC devices?
> > > I see absolutely no benefit from TRIM for the L2ARC, because it is
> > > written slowly (on purpose).  Any current, or 1-2 generations back SSD
> > > would handle that write load without TRIM and without any performance
> > > degradation.
> > >
> > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC
> > > SSD for the SLOG, for many reasons. The write regions on the SLC
> > > NAND should be smaller (my wild guess, current practice may differ)
> > > and the need for rewriting will be small. If you don't need to
> > > rewrite already written data, TRIM does not help. Also, as far as I
> > > understand, most "serious" SSDs (typical for SLC I guess) would have
> > > twice or more the advertised size and always write to fresh cells,
> > > scheduling an background erase of the 'overwritten' cell.
> >
> > AFAIK, drive manufacturers do not disclose just how much reallocation
> > space they keep available on an SSD.  I'd rather not speculate as to how
> > much, as I'm certain it varies per vendor.
> >
>
> Lets not forget here: The size of the separate log device may be quite
> small. A rule of thumb is that you should size the separate log to be able
> to handle 10 seconds of your expected synchronous write workload. It would
> be rare to need more than 100 MB in a separate log device, but the
> separate log must be at least 64 MB.
>
> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
>
> So in other words how much is TRIM really even effective give the above ?
>
> Even with a high database write load on the disks at full compacity of the
> incoming link I would find it hard to believe that anyone could get the
> ZIL to even come close to 512MB.

In the case of an SSD being used as a log device (ZIL), I imagine it
would only matter the longer the drive was kept in use.  I do not use
log devices anywhere with ZFS, so I can't really comment.

In the case of an SSD being used as a cache device (L2ARC), I imagine it
would matter much more.

In the case of an SSD being used as a pool device, it matters greatly.

Why it matters: there's two methods of "reclaiming" blocks which were
used: internal SSD "garbage collection" and TRIM.  For a NAND block to be
reclaimed, it has to be erased -- SSDs erase things in pages rather
than individual LBAs.  With TRIM, you submit the data management command
via ATA with a list of LBAs you wish to inform the drive are no longer
used.  The drive aggregates the LBA ranges, determines if an entire
flash page can be erased, and does it.  If it can't, it makes some sort
of mental note that the individual LBA (in some particular page)
shouldn't be used.

The "garbage collection" works when the SSD is idle.  I have no idea
what "idle" actually means operationally, because again, vendors don't
disclose what the idle intervals are.  5 minutes?  24 hours?  It
matters, but they don't tell us.  (What confuses me about the "idle GC"
method is how it determines what it can erase -- if the OS didn't tell
it what it's using, how does it know it can erase the page?)

Anyway, how all this manifests itself performance-wise is intriguing.
It's not speculation: there's hard evidence that not using TRIM results
in SSD performance, bluntly put, sucking badly on some SSDs.

There's this mentality that wear levelling completely solves all of the
**performance** concerns -- that isn't the case at all.  In fact, I'm
under the impression it probably hurts performance, but it depends on
how it's implemented within the drive firmware.

bit-tech did an experiment using Windows 7 -- which supports and uses
TRIM assuming the device advertises the capability -- with different
models of SSDs.  The testing procedure is documented here, but I'll
document it as well:

http://www.bit-tech.net/hardware/storage/2010/02/04/windows-7-ssd-performance-and-trim/4

Again, remember, this is done on a Windows 7 system which does support
TRIM if the device supports it.  The testing steps, in this order:

1) SSD without TRIM support -- all LBAs are zeroed.
2) Took read/write benchmark readings.
3) SSD without TRIM support -- partitioned and formatted as NTFS
   (cluster size unknown), copied 100GB of data to the drive, deleted all
   the data, and repeated this method 10 times.
4) Step #2 repeated.
5) Upgraded SSD firmware to a version that supports TRIM.
6) SSD with TRIM support -- step #1 repeated.
7) Step #2 repeated.
8) SSD with TRIM support -- step #3 repeated.
9) Step #2 repeated.

Without TRIM, some drives drop their read performance by more than 50%,
and write performance by almost 70%.  I'm focusing on Intel SSDs here,
by the way.  I do not care for OCZ or Corsair products.

So because ZFS on FreeBSD (and Solaris/OpenSolaris) doesn't support
TRIM, effectively the benchmarks shown pre-firmware-upgrade are what ZFS
on FreeBSD will mimic (to some degree).

Therefore, simply put, users should be concerned when using ZFS on
FreeBSD with SSDs.  It doesn't matter to me if you're only using
64MBytes of a 40GB drive or if you're using the entire thing; no TRIM
means degraded performance over time.

Can you refute any of this evidence?

> Given most SSD's come at a size greater than 32GB I hope this comes as a
> early reminder that the ZIL you are buying that disk for is only going to
> be using a small percent of that disk and I hope you justify cost over its
> actual use. If you do happen to justify creating a ZIL for your pool then
> I hope that you partition it wisely to make use of the rest of the space
> that is untouched.
>
> For all other cases I would reccomend if you still want to have a ZIL that
> you take some sort of PCI->SD CARD or USB stick into account with
> mirroring.

Others have pointed out this isn't effective (re: USB sticks).  The read
and write speeds are too slow, and limit the overall performance of ZFS
in a very bad way.  I can absolutely confirm this claim (I've tested it
myself, using a high-end USB flash drive as a cache device (L2ARC)).

Alexander Leidinger pointed out that using a USB stick for cache/L2ARC
*does* improve performance on older systems which have slower disk I/O
(e.g. ICH5-based systems).

--
| Jeremy Chadwick                                   [hidden email] |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ZFS: How to enable cache and logs.

Jason Hellenthal

Jeremy, As always the qaulity of your messages are 101% spot on and I
always find some new new information that becomes handy more often than I
could say, and there is always something to be learned.

Thanks.

On Wed, May 11, 2011 at 06:04:33PM -0700, Jeremy Chadwick wrote:

> On Wed, May 11, 2011 at 06:38:49PM -0400, Jason Hellenthal wrote:
> >
> > Jeremy,
> >
> > On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote:
> > > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote:
> > > > On 11.05.11 13:51, Jeremy Chadwick wrote:
> > > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks
> > > > >should also keep that in mind when putting an SSD into use in this
> > > > >fashion.
> > > >
> > > > By the way, what would be the use of TRIM for SLOG and L2ARC devices?
> > > > I see absolutely no benefit from TRIM for the L2ARC, because it is
> > > > written slowly (on purpose).  Any current, or 1-2 generations back SSD
> > > > would handle that write load without TRIM and without any performance
> > > > degradation.
> > > >
> > > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC
> > > > SSD for the SLOG, for many reasons. The write regions on the SLC
> > > > NAND should be smaller (my wild guess, current practice may differ)
> > > > and the need for rewriting will be small. If you don't need to
> > > > rewrite already written data, TRIM does not help. Also, as far as I
> > > > understand, most "serious" SSDs (typical for SLC I guess) would have
> > > > twice or more the advertised size and always write to fresh cells,
> > > > scheduling an background erase of the 'overwritten' cell.
> > >
> > > AFAIK, drive manufacturers do not disclose just how much reallocation
> > > space they keep available on an SSD.  I'd rather not speculate as to how
> > > much, as I'm certain it varies per vendor.
> > >
> >
> > Lets not forget here: The size of the separate log device may be quite
> > small. A rule of thumb is that you should size the separate log to be able
> > to handle 10 seconds of your expected synchronous write workload. It would
> > be rare to need more than 100 MB in a separate log device, but the
> > separate log must be at least 64 MB.
> >
> > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
> >
> > So in other words how much is TRIM really even effective give the above ?
> >
> > Even with a high database write load on the disks at full compacity of the
> > incoming link I would find it hard to believe that anyone could get the
> > ZIL to even come close to 512MB.
>
> In the case of an SSD being used as a log device (ZIL), I imagine it
> would only matter the longer the drive was kept in use.  I do not use
> log devices anywhere with ZFS, so I can't really comment.
>
> In the case of an SSD being used as a cache device (L2ARC), I imagine it
> would matter much more.
>
> In the case of an SSD being used as a pool device, it matters greatly.
>
> Why it matters: there's two methods of "reclaiming" blocks which were
> used: internal SSD "garbage collection" and TRIM.  For a NAND block to be
> reclaimed, it has to be erased -- SSDs erase things in pages rather
> than individual LBAs.  With TRIM, you submit the data management command
> via ATA with a list of LBAs you wish to inform the drive are no longer
> used.  The drive aggregates the LBA ranges, determines if an entire
> flash page can be erased, and does it.  If it can't, it makes some sort
> of mental note that the individual LBA (in some particular page)
> shouldn't be used.
>
> The "garbage collection" works when the SSD is idle.  I have no idea
> what "idle" actually means operationally, because again, vendors don't
> disclose what the idle intervals are.  5 minutes?  24 hours?  It
> matters, but they don't tell us.  (What confuses me about the "idle GC"
> method is how it determines what it can erase -- if the OS didn't tell
> it what it's using, how does it know it can erase the page?)
>
> Anyway, how all this manifests itself performance-wise is intriguing.
> It's not speculation: there's hard evidence that not using TRIM results
> in SSD performance, bluntly put, sucking badly on some SSDs.
>
> There's this mentality that wear levelling completely solves all of the
> **performance** concerns -- that isn't the case at all.  In fact, I'm
> under the impression it probably hurts performance, but it depends on
> how it's implemented within the drive firmware.
>
> bit-tech did an experiment using Windows 7 -- which supports and uses
> TRIM assuming the device advertises the capability -- with different
> models of SSDs.  The testing procedure is documented here, but I'll
> document it as well:
>
> http://www.bit-tech.net/hardware/storage/2010/02/04/windows-7-ssd-performance-and-trim/4
>
> Again, remember, this is done on a Windows 7 system which does support
> TRIM if the device supports it.  The testing steps, in this order:
>
> 1) SSD without TRIM support -- all LBAs are zeroed.
> 2) Took read/write benchmark readings.
> 3) SSD without TRIM support -- partitioned and formatted as NTFS
>    (cluster size unknown), copied 100GB of data to the drive, deleted all
>    the data, and repeated this method 10 times.
> 4) Step #2 repeated.
> 5) Upgraded SSD firmware to a version that supports TRIM.
> 6) SSD with TRIM support -- step #1 repeated.
> 7) Step #2 repeated.
> 8) SSD with TRIM support -- step #3 repeated.
> 9) Step #2 repeated.
>
> Without TRIM, some drives drop their read performance by more than 50%,
> and write performance by almost 70%.  I'm focusing on Intel SSDs here,
> by the way.  I do not care for OCZ or Corsair products.
>
> So because ZFS on FreeBSD (and Solaris/OpenSolaris) doesn't support
> TRIM, effectively the benchmarks shown pre-firmware-upgrade are what ZFS
> on FreeBSD will mimic (to some degree).
>
> Therefore, simply put, users should be concerned when using ZFS on
> FreeBSD with SSDs.  It doesn't matter to me if you're only using
> 64MBytes of a 40GB drive or if you're using the entire thing; no TRIM
> means degraded performance over time.
>
> Can you refute any of this evidence?
>
At least now at the moment NO. But I can say depending on how large of a
use of SSDs with OpenSolaris users from before the Oracle reaping that I
didnt recall seeing any relative bug reports on degradation. But like I
said... I havent seen them but thats not to say there wasnt a lack of use
either. Definately more to look into, test, benchmark & test again.

> > Given most SSD's come at a size greater than 32GB I hope this comes as a
> > early reminder that the ZIL you are buying that disk for is only going to
> > be using a small percent of that disk and I hope you justify cost over its
> > actual use. If you do happen to justify creating a ZIL for your pool then
> > I hope that you partition it wisely to make use of the rest of the space
> > that is untouched.
> >
> > For all other cases I would reccomend if you still want to have a ZIL that
> > you take some sort of PCI->SD CARD or USB stick into account with
> > mirroring.
>
> Others have pointed out this isn't effective (re: USB sticks).  The read
> and write speeds are too slow, and limit the overall performance of ZFS
> in a very bad way.  I can absolutely confirm this claim (I've tested it
> myself, using a high-end USB flash drive as a cache device (L2ARC)).
>
> Alexander Leidinger pointed out that using a USB stick for cache/L2ARC
> *does* improve performance on older systems which have slower disk I/O
> (e.g. ICH5-based systems).
>
Agreed. Soon as the bus speed, write speeds are greater than the speeds
that USB 2.0 can handle, then any USB based solution is useless. ICH5 and
up would be right about that time you would see this starting to happen.

sdcards/cfcards mileage may vary depending on the transfer rates. But
still the same situation applies like you said once your main pool
throughput outweighs the throughput on your ZIL then its probably not
worth even having a ZIL or a Cache device. Emphasis on Cache moreso than
ZIL.


Anyway all good information for those to make the judgement whether they
need a cache or a zil.


Thanks again Jeremy. Always appreciated.

--

 Regards, (jhell)
 Jason Hellenthal


attachment0 (534 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ZFS: How to enable cache and logs.

Jeremy Chadwick
On Wed, May 11, 2011 at 09:48:48PM -0400, Jason Hellenthal wrote:

> Jeremy, As always the qaulity of your messages are 101% spot on and I
> always find some new new information that becomes handy more often than I
> could say, and there is always something to be learned.
>
> Thanks.
>
> On Wed, May 11, 2011 at 06:04:33PM -0700, Jeremy Chadwick wrote:
> > On Wed, May 11, 2011 at 06:38:49PM -0400, Jason Hellenthal wrote:
> > >
> > > Jeremy,
> > >
> > > On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote:
> > > > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote:
> > > > > On 11.05.11 13:51, Jeremy Chadwick wrote:
> > > > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks
> > > > > >should also keep that in mind when putting an SSD into use in this
> > > > > >fashion.
> > > > >
> > > > > By the way, what would be the use of TRIM for SLOG and L2ARC devices?
> > > > > I see absolutely no benefit from TRIM for the L2ARC, because it is
> > > > > written slowly (on purpose).  Any current, or 1-2 generations back SSD
> > > > > would handle that write load without TRIM and without any performance
> > > > > degradation.
> > > > >
> > > > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC
> > > > > SSD for the SLOG, for many reasons. The write regions on the SLC
> > > > > NAND should be smaller (my wild guess, current practice may differ)
> > > > > and the need for rewriting will be small. If you don't need to
> > > > > rewrite already written data, TRIM does not help. Also, as far as I
> > > > > understand, most "serious" SSDs (typical for SLC I guess) would have
> > > > > twice or more the advertised size and always write to fresh cells,
> > > > > scheduling an background erase of the 'overwritten' cell.
> > > >
> > > > AFAIK, drive manufacturers do not disclose just how much reallocation
> > > > space they keep available on an SSD.  I'd rather not speculate as to how
> > > > much, as I'm certain it varies per vendor.
> > > >
> > >
> > > Lets not forget here: The size of the separate log device may be quite
> > > small. A rule of thumb is that you should size the separate log to be able
> > > to handle 10 seconds of your expected synchronous write workload. It would
> > > be rare to need more than 100 MB in a separate log device, but the
> > > separate log must be at least 64 MB.
> > >
> > > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
> > >
> > > So in other words how much is TRIM really even effective give the above ?
> > >
> > > Even with a high database write load on the disks at full compacity of the
> > > incoming link I would find it hard to believe that anyone could get the
> > > ZIL to even come close to 512MB.
> >
> > In the case of an SSD being used as a log device (ZIL), I imagine it
> > would only matter the longer the drive was kept in use.  I do not use
> > log devices anywhere with ZFS, so I can't really comment.
> >
> > In the case of an SSD being used as a cache device (L2ARC), I imagine it
> > would matter much more.
> >
> > In the case of an SSD being used as a pool device, it matters greatly.
> >
> > Why it matters: there's two methods of "reclaiming" blocks which were
> > used: internal SSD "garbage collection" and TRIM.  For a NAND block to be
> > reclaimed, it has to be erased -- SSDs erase things in pages rather
> > than individual LBAs.  With TRIM, you submit the data management command
> > via ATA with a list of LBAs you wish to inform the drive are no longer
> > used.  The drive aggregates the LBA ranges, determines if an entire
> > flash page can be erased, and does it.  If it can't, it makes some sort
> > of mental note that the individual LBA (in some particular page)
> > shouldn't be used.
> >
> > The "garbage collection" works when the SSD is idle.  I have no idea
> > what "idle" actually means operationally, because again, vendors don't
> > disclose what the idle intervals are.  5 minutes?  24 hours?  It
> > matters, but they don't tell us.  (What confuses me about the "idle GC"
> > method is how it determines what it can erase -- if the OS didn't tell
> > it what it's using, how does it know it can erase the page?)
> >
> > Anyway, how all this manifests itself performance-wise is intriguing.
> > It's not speculation: there's hard evidence that not using TRIM results
> > in SSD performance, bluntly put, sucking badly on some SSDs.
> >
> > There's this mentality that wear levelling completely solves all of the
> > **performance** concerns -- that isn't the case at all.  In fact, I'm
> > under the impression it probably hurts performance, but it depends on
> > how it's implemented within the drive firmware.
> >
> > bit-tech did an experiment using Windows 7 -- which supports and uses
> > TRIM assuming the device advertises the capability -- with different
> > models of SSDs.  The testing procedure is documented here, but I'll
> > document it as well:
> >
> > http://www.bit-tech.net/hardware/storage/2010/02/04/windows-7-ssd-performance-and-trim/4
> >
> > Again, remember, this is done on a Windows 7 system which does support
> > TRIM if the device supports it.  The testing steps, in this order:
> >
> > 1) SSD without TRIM support -- all LBAs are zeroed.
> > 2) Took read/write benchmark readings.
> > 3) SSD without TRIM support -- partitioned and formatted as NTFS
> >    (cluster size unknown), copied 100GB of data to the drive, deleted all
> >    the data, and repeated this method 10 times.
> > 4) Step #2 repeated.
> > 5) Upgraded SSD firmware to a version that supports TRIM.
> > 6) SSD with TRIM support -- step #1 repeated.
> > 7) Step #2 repeated.
> > 8) SSD with TRIM support -- step #3 repeated.
> > 9) Step #2 repeated.
> >
> > Without TRIM, some drives drop their read performance by more than 50%,
> > and write performance by almost 70%.  I'm focusing on Intel SSDs here,
> > by the way.  I do not care for OCZ or Corsair products.
> >
> > So because ZFS on FreeBSD (and Solaris/OpenSolaris) doesn't support
> > TRIM, effectively the benchmarks shown pre-firmware-upgrade are what ZFS
> > on FreeBSD will mimic (to some degree).
> >
> > Therefore, simply put, users should be concerned when using ZFS on
> > FreeBSD with SSDs.  It doesn't matter to me if you're only using
> > 64MBytes of a 40GB drive or if you're using the entire thing; no TRIM
> > means degraded performance over time.
> >
> > Can you refute any of this evidence?
> >
>
> At least now at the moment NO. But I can say depending on how large of a
> use of SSDs with OpenSolaris users from before the Oracle reaping that I
> didnt recall seeing any relative bug reports on degradation. But like I
> said... I havent seen them but thats not to say there wasnt a lack of use
> either. Definately more to look into, test, benchmark & test again.
>
> > > Given most SSD's come at a size greater than 32GB I hope this comes as a
> > > early reminder that the ZIL you are buying that disk for is only going to
> > > be using a small percent of that disk and I hope you justify cost over its
> > > actual use. If you do happen to justify creating a ZIL for your pool then
> > > I hope that you partition it wisely to make use of the rest of the space
> > > that is untouched.
> > >
> > > For all other cases I would reccomend if you still want to have a ZIL that
> > > you take some sort of PCI->SD CARD or USB stick into account with
> > > mirroring.
> >
> > Others have pointed out this isn't effective (re: USB sticks).  The read
> > and write speeds are too slow, and limit the overall performance of ZFS
> > in a very bad way.  I can absolutely confirm this claim (I've tested it
> > myself, using a high-end USB flash drive as a cache device (L2ARC)).
> >
> > Alexander Leidinger pointed out that using a USB stick for cache/L2ARC
> > *does* improve performance on older systems which have slower disk I/O
> > (e.g. ICH5-based systems).
> >
>
> Agreed. Soon as the bus speed, write speeds are greater than the speeds
> that USB 2.0 can handle, then any USB based solution is useless. ICH5 and
> up would be right about that time you would see this starting to happen.
>
> sdcards/cfcards mileage may vary depending on the transfer rates. But
> still the same situation applies like you said once your main pool
> throughput outweighs the throughput on your ZIL then its probably not
> worth even having a ZIL or a Cache device. Emphasis on Cache moreso than
> ZIL.
>
>
> Anyway all good information for those to make the judgement whether they
> need a cache or a zil.
>
>
> Thanks again Jeremy. Always appreciated.

You're welcome.

It's important to note that much of what I say is stuff I've learned and
read (technical documentation usually) on my own -- which means I almost
certainly misunderstand certain pieces of technology.  There are a *lot*
of people here who understand it much better than I do.  (I'm looking at
you, jhb@  ;-) )

As such, I probably should have CC'd pjd@ on this thread, since he's
talked a bit about how to get ZFS on FreeBSD to work with TRIM, and when
to issue the erasing of said blocks.

--
| Jeremy Chadwick                                   [hidden email] |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ZFS: How to enable cache and logs.

Dan Carroll
In reply to this post by Dan Carroll
On 11/05/2011 7:25 PM, Danny Carroll wrote:

> Hello all.
>
> I've been using ZFS for some time now and have never had an issued
> (except perhaps the issue of speed...)
> When v28 is taken into -STABLE I will most likely upgrade to v28 at that
> point.   Currently I am running v15 with v4 on disk.
>
> When I move to v28 I will probably wish to enable a L2Arc and also
> perhaps dedicated log devices.
>
> I'm curious about a few things however.
>
> 1. Can I remove either the L2 ARC or the log devices if things don't go
> as planned or if I need to free up some resources?
> 2. What are the best practices for setting up these?   Would a geom
> mirror for the log device be the way to go.  Or can you just let ZFS
> mirror the log itself?
> 3. What happens when one or both of the log devices fail.   Does ZFS
> come to a crashing halt and kill all the data?   Or does it simply
> complain that the ZIL is no longer active and continue on it's merry way?
>
> In short, what is the best way to set up these two features?
>


Replying to myself in order to summarise the recommendations (when using
v28):
 - Don't use SSD for the Log device.  Write speed tends to be a problem.
 - SSD ok for cache if the sizing is right, but without TRIM, don't
expect to take full advantage of the SSD.
 - Do use two devices for log and mirror them with ZFS.  Bad things
*can* happen if*all* the log devices die.
 - Don't colocate L2ARC and Log devices.
 - Log devices can be small, ZFS Best practices guide specifies about
50% of RAM as max.  Minimum should be Throughput * 10 (1Gb for 100MB/sec
of writes).


let me know if I got anything wrong or missed something important.

Remaining questions.
- Is there any advantage to using a spare partition on a SCSI or SATA
drive as L2Arc?  Assuming it was in the machine already but doing nothing?
- If I have 2 pools like this:
# zpool status
  pool: tank
 state: ONLINE
 scrub: scrub completed after 11h7m with 0 errors on Sun May  8 14:17:07
2011
config:

        NAME            STATE     READ WRITE CKSUM
        tank            ONLINE       0     0     0
          raidz1        ONLINE       0     0     0
            gpt/data0   ONLINE       0     0     0
            gpt/data1   ONLINE       0     0     0
            gpt/data2   ONLINE       0     0     0
            gpt/data3   ONLINE       0     0     0
            gpt/data4   ONLINE       0     0     0
            gpt/data5   ONLINE       0     0     0
          raidz1        ONLINE       0     0     0
            gpt/data6   ONLINE       0     0     0
            gpt/data7   ONLINE       0     0     0
            gpt/data8   ONLINE       0     0     0
            gpt/data9   ONLINE       0     0     0
            gpt/data10  ONLINE       0     0     0
            gpt/data11  ONLINE       0     0     0

errors: No known data errors

  pool: system
 state: ONLINE
 scrub: scrub completed after 1h1m with 0 errors on Sun May  8 15:18:23 2011
config:

        NAME             STATE     READ WRITE CKSUM
        system           ONLINE       0     0     0
          mirror         ONLINE       0     0     0
            gpt/system0  ONLINE       0     0     0
            gpt/system1  ONLINE       0     0     0



And I have free space on the "system" disks.   I could give two new
partitions on the system disks to ZFS for the log devices of the "tank"
pool?
If I were worried about performance of my "system" pool, I could also
use spare partitions on (a couple of) the "tank" disks in a similar way.
But it would be silly to use the same disk for ZIL and pool data.  In
that case, why would I bother to alter the default.

Thanks for the info!

-D
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ZFS: How to enable cache and logs.

Bob Friesenhahn
In reply to this post by Jeremy Chadwick
On Wed, 11 May 2011, Jeremy Chadwick wrote:
>
> The "garbage collection" works when the SSD is idle.  I have no idea
> what "idle" actually means operationally, because again, vendors don't
> disclose what the idle intervals are.  5 minutes?  24 hours?  It
> matters, but they don't tell us.  (What confuses me about the "idle GC"
> method is how it determines what it can erase -- if the OS didn't tell
> it what it's using, how does it know it can erase the page?)

Garbage collection is not necessarily just when the drive is idle.
Regardless, if one "overwrites" a page (or part of a page), the drive
can implement that by reading any non-overlapped existing content
(which it already has to do), allocating a fresh (already erased)
page, and then writing the composite to that new page.  The
"overwritten" page is then scheduled for erasure.  This sort of
garbage collector works by over-provisioning the actual amount of
flash in the drive, which should be done anyway in a quality product.

This simple recirculating/COW algorithm is a reason why TRIM is not
really needed given sufficiently intelligent SSD design.

> Therefore, simply put, users should be concerned when using ZFS on
> FreeBSD with SSDs.  It doesn't matter to me if you're only using
> 64MBytes of a 40GB drive or if you're using the entire thing; no TRIM
> means degraded performance over time.

This seems unduely harsh.  Even with TRIM, SSDs will suffer in
continually write-heavy (e.g. server) environments.  The reason is
that the blocks still need to be erased and the erasure performance is
limited.  It is not uncommon for servers to be run close to their
limits most of the time.

One should not be ashamed with purchasing a larger SSD than the space
consumption appears to warrant.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ZFS: How to enable cache and logs.

Bob Friesenhahn
In reply to this post by Dan Carroll
On Thu, 12 May 2011, Danny Carroll wrote:
>
> Replying to myself in order to summarise the recommendations (when using
> v28):
> - Don't use SSD for the Log device.  Write speed tends to be a problem.

DO use SSD for the log device.  The log device is only used for
synchronous writes.  Except for certain usages (E.g. database and NFS
server) most writes will be asynchronous and never be written to the
log.  Huge synchronous writes will also bypass the SSD log device.
The log device is for reducing latency on small synchronous writes.

> - Is there any advantage to using a spare partition on a SCSI or SATA
> drive as L2Arc?  Assuming it was in the machine already but doing nothing?

The L2ARC is intended to reduce read latency and is random accessed.
It is unlikely that rotating media will work well for that.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ZFS: How to enable cache and logs.

Jeremy Chadwick
On Wed, May 11, 2011 at 09:51:58PM -0500, Bob Friesenhahn wrote:

> On Thu, 12 May 2011, Danny Carroll wrote:
> >
> >Replying to myself in order to summarise the recommendations (when using
> >v28):
> >- Don't use SSD for the Log device.  Write speed tends to be a problem.
>
> DO use SSD for the log device.  The log device is only used for
> synchronous writes.  Except for certain usages (E.g. database and
> NFS server) most writes will be asynchronous and never be written to
> the log.  Huge synchronous writes will also bypass the SSD log
> device. The log device is for reducing latency on small synchronous
> writes.

Bob, please correct me if I'm wrong, but as I understand it a log device
(ZIL) effectively limits the overall write speed of the pool itself.
Consumer-level SSDs do not have extremely high write performance (and it
gets worse without TRIM; again a 70% decrease in write speed in some
cases).

I imagine a very high-end SSD (FusionIO, etc. -- the things that cost
$900 and higher) would have extremely high write performance and would
work perfectly for this role.  Or a battery-backed DDR RAM device.

What's amusing (to me anyway) is that when ZFS was originally presented,
engineers from Sun folks kept focusing on how "you can buy cheap,
generic disks and accomplish goals!" yet if the above statement of mine
is accurate, that goes against the original principle.

Danny might also find this URL useful:

http://constantin.glez.de/blog/2011/02/frequently-asked-questions-about-flash-memory-ssds-and-zfs

> >- Is there any advantage to using a spare partition on a SCSI or SATA
> >drive as L2Arc?  Assuming it was in the machine already but doing nothing?
>
> The L2ARC is intended to reduce read latency and is random accessed.
> It is unlikely that rotating media will work well for that.

Agreed -- this is why I tell folks that an SSD would work very well for
L2ARC, but my opinion is just to buy more RAM for the ARC ("layer 1").

--
| Jeremy Chadwick                                   [hidden email] |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ZFS: How to enable cache and logs.

Freddie Cash-8
On Wed, May 11, 2011 at 8:36 PM, Jeremy Chadwick
<[hidden email]> wrote:

> On Wed, May 11, 2011 at 09:51:58PM -0500, Bob Friesenhahn wrote:
>> On Thu, 12 May 2011, Danny Carroll wrote:
>> >
>> >Replying to myself in order to summarise the recommendations (when using
>> >v28):
>> >- Don't use SSD for the Log device.  Write speed tends to be a problem.
>>
>> DO use SSD for the log device.  The log device is only used for
>> synchronous writes.  Except for certain usages (E.g. database and
>> NFS server) most writes will be asynchronous and never be written to
>> the log.  Huge synchronous writes will also bypass the SSD log
>> device. The log device is for reducing latency on small synchronous
>> writes.
>
> Bob, please correct me if I'm wrong, but as I understand it a log device
> (ZIL) effectively limits the overall write speed of the pool itself.

Nope.  Using a separate log device removes sync writes from the I/O
path of the rest of the pool, thus increasing the total write
throughput for the pool.

> Danny might also find this URL useful:
>
> http://constantin.glez.de/blog/2011/02/frequently-asked-questions-about-flash-memory-ssds-and-zfs

Read the linked articles.  For example:
http://constantin.glez.de/blog/2010/07/solaris-zfs-synchronous-writes-and-zil-explained

Most sync writes go to the ZIL.  If the ZIL is part of the pool, then
the pool has to issue two separate writes (once to the ZIL, then later
to the pool as part of the normal async txg).  If the ZIL is a
separate device, then there's no write contention with the rest of the
pool.

Not every sync write goes to the ZIL.  Only writes under a certain
size (64 KB or something like that).

Every OpenSolaris, Oracle Solaris, Nexenta admin will recommend
getting an enterprise-grade, write-optimised, SLC-based SSD
(preferably with a supercap) for use as the SLOG device.  Especially
if you're using ZFS for anything database-related, or serving files
over NFS, everyone says the same:  get an SSD for SLOG usage.

Why would it be any different for ZFS on FreeBSD?

There are plenty of benchmarks online and in the zfs-discuss mailing
list that shows the benefits to using an SSD-based SLOG.

--
Freddie Cash
[hidden email]
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ZFS: How to enable cache and logs.

Daniel Kalchev
In reply to this post by Jeremy Chadwick


On 12.05.11 06:36, Jeremy Chadwick wrote:

> On Wed, May 11, 2011 at 09:51:58PM -0500, Bob Friesenhahn wrote:
>> On Thu, 12 May 2011, Danny Carroll wrote:
>>> Replying to myself in order to summarise the recommendations (when using
>>> v28):
>>> - Don't use SSD for the Log device.  Write speed tends to be a problem.
>> DO use SSD for the log device.  The log device is only used for
>> synchronous writes.  Except for certain usages (E.g. database and
>> NFS server) most writes will be asynchronous and never be written to
>> the log.  Huge synchronous writes will also bypass the SSD log
>> device. The log device is for reducing latency on small synchronous
>> writes.
> Bob, please correct me if I'm wrong, but as I understand it a log device
> (ZIL) effectively limits the overall write speed of the pool itself.
>
Perhaps I misstated it in my first post, but there is nothing wrong with
using SSD for the SLOG.

You can of course create usage/benchmark scenario, where an (cheap) SSD
based SLOG will be worse than an (fast) HDD based SLOG, especially if
you are not concerned about latency. The SLOG resolves two issues, it
increases the pool throughput (primary storage) by removing small
synchronous writes from it, that will unnecessarily introduce head
movement and more IOPS and it provided low latency for small synchronous
writes.

The later is only valid if the SSD is sufficiently write-optimized. Most
consumer SSDs end up saturated by writes. Sequential write IOPS is what
matters here.

About TRIM. As it was already mentioned, you will use only small portion
of an (for example) 32GB SSD for the SLOG. If you do not allocate the
entire SSD, then wear leveling will be able to play well and it is very
likely you will not suffer any performance degradation.

By the way, I do not believe Windows benchmark has any significance in
our ZFS usage for the SSDs. How is TRIM implemented in Windows? How does
it relate to SSD usage as SLOG and L2ARC?

How can ever TRIM support influence reading from the drive?!

TRIM is an slow operation. How often are these issued? What is the
impact of issuing TRIM to an otherwise loaded SSD?

Daniel
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ZFS: How to enable cache and logs.

Daniel Kalchev
In reply to this post by Dan Carroll
On 12.05.11 05:26, Danny Carroll wrote:
>
>   - Don't use SSD for the Log device.  Write speed tends to be a problem.
It all depends on your usage. You need to experiment, unfortunately.

>   - SSD ok for cache if the sizing is right, but without TRIM, don't
> expect to take full advantage of the SSD.
I do not believe TRIM has any effect on L2ARC.

Why?
- TRIM is a technique to optimize future writes;
- L2ARC is written at controlled, very low rate, I believe something
like 8MB/sec. There is no SSD currently on the market, with or without
TRIM that has any trouble sustaining that rate.
- TRIM might introduce delays, it is very 'expensive' command. But that
will surely wary by drive/manufacturer.
- There is no way TRIM can influence reading from the flash media.
Reading from L2ARC with low latency and high speed is it's main purpose
anyway.

> Remaining questions.
> - Is there any advantage to using a spare partition on a SCSI or SATA
> drive as L2Arc?  Assuming it was in the machine already but doing nothing?
Absolutely no advantage. You want L2ARC to be very low latency and
high-bandwidth for random reading. Especially low-latency. This does not
apply to rotating disks.

Daniel
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ZFS: How to enable cache and logs.

Šimun Mikecin

On 12. svi. 2011., at 08:44, Daniel Kalchev wrote:

> On 12.05.11 05:26, Danny Carroll wrote:
>>
>>  - Don't use SSD for the Log device.  Write speed tends to be a problem.
> It all depends on your usage. You need to experiment, unfortunately.

What is the alternative for log devices if you are not using SSD?
Rotating hard drives?

AFAIK, two factors define the speed of log device: write transfer rate and write latency.
You will not find a rotating hard drive that has a write latency anything near the write latency of even a slowest SSD you can find on the market.
On the other hand, only a very few rotating hard drives have a write transfer rate that can be compared to SSD's.

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
12
Loading...