|
Hello all.
I've been using ZFS for some time now and have never had an issued (except perhaps the issue of speed...) When v28 is taken into -STABLE I will most likely upgrade to v28 at that point. Currently I am running v15 with v4 on disk. When I move to v28 I will probably wish to enable a L2Arc and also perhaps dedicated log devices. I'm curious about a few things however. 1. Can I remove either the L2 ARC or the log devices if things don't go as planned or if I need to free up some resources? 2. What are the best practices for setting up these? Would a geom mirror for the log device be the way to go. Or can you just let ZFS mirror the log itself? 3. What happens when one or both of the log devices fail. Does ZFS come to a crashing halt and kill all the data? Or does it simply complain that the ZIL is no longer active and continue on it's merry way? In short, what is the best way to set up these two features? -D _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
On Wed, May 11, 2011 at 07:25:52PM +1000, Danny Carroll wrote:
> I've been using ZFS for some time now and have never had an issued > (except perhaps the issue of speed...) > When v28 is taken into -STABLE I will most likely upgrade to v28 at that > point. Currently I am running v15 with v4 on disk. > > When I move to v28 I will probably wish to enable a L2Arc and also > perhaps dedicated log devices. > > I'm curious about a few things however. > > 1. Can I remove either the L2 ARC or the log devices if things don't go > as planned or if I need to free up some resources? You can remove L2ARC ("cache") devices without impact, but you cannot remove all log devices without the pool needing to be destroyed (recreated). Please keep reading for details of log devices. L2ARC devices should primarily be something with extremely fast read rates (e.g. SSDs). USB1.x and 2.x memory sticks do not work well for this purpose given protocol and bus speed limits + overhead. (I only mention them because people often think "Oh, USB flash would work great for this!" I disagree.) Furthermore, something I found out on my own: the L2ARC is completely lost in the case the system is cleanly rebooted. This sometimes surprises people (myself included) since L2ARC uses actual storage devices; one might think the data is "restored" on reboot, but it isn't (because the ARC ("layer 1") itself is lost on reboot, obviously). The only way to see how much disk space a cache device is using -- to my knowledge -- is via "zpool iostat -v". > 2. What are the best practices for setting up these? Would a geom > mirror for the log device be the way to go. Or can you just let ZFS > mirror the log itself? Let ZFS handle it. There is no purpose (in my opinion) to added complexity when ZFS can handle it itself. The KISS concept applies greatly here. In the case of ZFS intent logs, you definitely want a mirror. If you have a single log device, loss of that device can/will result in full data loss of the pool which makes use of the log device. Furthermore, a log device is limited to a single pool; e.g. you cannot use the same log device (e.g. ada6) on pool "foo" and pool "bar". It's one or the other. You should read **all** of the data points listed below and pay close attention to the details: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Separate_Log_Devices > 3. What happens when one or both of the log devices fail. Does ZFS > come to a crashing halt and kill all the data? Or does it simply > complain that the ZIL is no longer active and continue on it's merry way? See above. > In short, what is the best way to set up these two features? See the zpool(1) man page for details on how to make use of log devices. Examples are provided, including mirroring of such devices. -- | Jeremy Chadwick [hidden email] | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
On 11.05.11 13:06, Jeremy Chadwick wrote: > On Wed, May 11, 2011 at 07:25:52PM +1000, Danny Carroll wrote: >> When I move to v28 I will probably wish to enable a L2Arc and also >> perhaps dedicated log devices. >> > In the case of ZFS intent logs, you definitely want a mirror. If you > have a single log device, loss of that device can/will result in full > data loss of the pool which makes use of the log device. This is true for v15 pools, not true for v28 pools. In ZFS v28 you can remove log devices and in the case of sudden loss of log device (or whatever) roll back the pool to a 'good' state. Therefore, for most installations single log device might be sufficient. If you value your data, you will of course use mirrored log devices, possibly in hot-swap configuration and .. have a backup :) By the way, the SLOG (separate LOG) does not have to be SSD at all. Separate rotating disk(s) will also suffice -- it all depends on the type of workload. SSDs are better, for the higher end, because of the low latency (but not all SSDs are low latency when writing!). The idea of the SLOG is to separate the ZIL records from the main data pool. ZIL records are small, even smaller in v28, but will cause unnecessary head movements if kept in the main pool. The SLOG is "write once, read on failure" media and is written sequentially. Almost all current HDDs offer reasonable sequential write performance for small to medium pools. The L2ARC needs to be fast reading SSD. It is populated slowly, few MB/sec so there is no point to have fast and high-bandwidth write-optimized SSD. The benefit from L2ARC is the low latency. Sort of slower RAM. It is bad idea to use the same SSD for both SLOG and L2ARC, because most SSDs behave poorly if you present them with high read and high write loads. More expensive units might behave, but then... if you pay few k$ for a SSD, you know what you need :) Daniel _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
On Wed, May 11, 2011 at 01:37:03PM +0300, Daniel Kalchev wrote:
> On 11.05.11 13:06, Jeremy Chadwick wrote: > >On Wed, May 11, 2011 at 07:25:52PM +1000, Danny Carroll wrote: > >>When I move to v28 I will probably wish to enable a L2Arc and also > >>perhaps dedicated log devices. > >> > >In the case of ZFS intent logs, you definitely want a mirror. If you > >have a single log device, loss of that device can/will result in full > >data loss of the pool which makes use of the log device. > > This is true for v15 pools, not true for v28 pools. In ZFS v28 you > can remove log devices and in the case of sudden loss of log device > (or whatever) roll back the pool to a 'good' state. Therefore, for > most installations single log device might be sufficient. If you > value your data, you will of course use mirrored log devices, > possibly in hot-swap configuration and .. have a backup :) Has anyone actually *tested* this on FreeBSD? Set up a single log device on classic (non-CAM/non-ahci.ko) ATA, then literally yank the disk out to induce a very bad/rude failure? Does the kernel panic or anything weird happen? I fully acknowledge that in ZFS pool v19 and higher the issue is fixed (at least on Solaris/OpenSolaris), but at this point in time the RELEASE and STABLE branches are running pool version 15. There are numerous ongoing discussions about the ZFS v28 patches right now with regards to STABLE specifically. Recent threads: - Patch did not apply correctly (errors/rejections) - Patch applied correctly but build failed (use "patch -E" I believe?) - Discussion about when v28 is *truly* coming to RELENG_8 and if it's truly ready for RELENG_8 And finally, there's the one thing that people often forget/miss: if you upgrade your pool from v15 to v28 (needed to address the log removal stuff you mention), you cannot roll back without recreating all of your pools. Folks considering v28 need to take that into consideration. > By the way, the SLOG (separate LOG) does not have to be SSD at all. > Separate rotating disk(s) will also suffice -- it all depends on the > type of workload. SSDs are better, for the higher end, because of > the low latency (but not all SSDs are low latency when writing!). I didn't state log devices should be SSDs. I stated cache devices (L2ARC) should be SSDs. :-) A non-high-end SSD for a log device is probably a very bad idea given the sub-par write speeds, agreed. A FusionIO card/setup on the other hand would probably work wonderfully, but that's much more expensive (you cover that below). > The idea of the SLOG is to separate the ZIL records from the main > data pool. ZIL records are small, even smaller in v28, but will > cause unnecessary head movements if kept in the main pool. The SLOG > is "write once, read on failure" media and is written sequentially. > Almost all current HDDs offer reasonable sequential write > performance for small to medium pools. > > The L2ARC needs to be fast reading SSD. It is populated slowly, few > MB/sec so there is no point to have fast and high-bandwidth > write-optimized SSD. The benefit from L2ARC is the low latency. Sort > of slower RAM. Agreed, and the overall point to L2ARC is to help with improved random reads, if I remember right. The concept is that it's a 2nd layer of caching that shouldn't hurt or hinder performance when used/put in place, but can greatly help when the "layer 1" ARC lacks an entry. > It is bad idea to use the same SSD for both SLOG and L2ARC, because > most SSDs behave poorly if you present them with high read and high > write loads. More expensive units might behave, but then... if you > pay few k$ for a SSD, you know what you need :) Again, agreed. Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks should also keep that in mind when putting an SSD into use in this fashion. -- | Jeremy Chadwick [hidden email] | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
On 11.05.11 13:51, Jeremy Chadwick wrote: > > Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks > should also keep that in mind when putting an SSD into use in this > fashion. > By the way, what would be the use of TRIM for SLOG and L2ARC devices? I see absolutely no benefit from TRIM for the L2ARC, because it is written slowly (on purpose). Any current, or 1-2 generations back SSD would handle that write load without TRIM and without any performance degradation. Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC SSD for the SLOG, for many reasons. The write regions on the SLC NAND should be smaller (my wild guess, current practice may differ) and the need for rewriting will be small. If you don't need to rewrite already written data, TRIM does not help. Also, as far as I understand, most "serious" SSDs (typical for SLC I guess) would have twice or more the advertised size and always write to fresh cells, scheduling an background erase of the 'overwritten' cell. Does Solaris have TRIM for ZFS? Where? How does it help? I can imagine TRIM for the data pool, that would be good fit for ZFS, but SSD-only pool.. are we already there? Daniel _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote:
> On 11.05.11 13:51, Jeremy Chadwick wrote: > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks > >should also keep that in mind when putting an SSD into use in this > >fashion. > > By the way, what would be the use of TRIM for SLOG and L2ARC devices? > I see absolutely no benefit from TRIM for the L2ARC, because it is > written slowly (on purpose). Any current, or 1-2 generations back SSD > would handle that write load without TRIM and without any performance > degradation. > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC > SSD for the SLOG, for many reasons. The write regions on the SLC > NAND should be smaller (my wild guess, current practice may differ) > and the need for rewriting will be small. If you don't need to > rewrite already written data, TRIM does not help. Also, as far as I > understand, most "serious" SSDs (typical for SLC I guess) would have > twice or more the advertised size and always write to fresh cells, > scheduling an background erase of the 'overwritten' cell. AFAIK, drive manufacturers do not disclose just how much reallocation space they keep available on an SSD. I'd rather not speculate as to how much, as I'm certain it varies per vendor. I can talk a bit about SSD drive performance from a consumer level (that is to say: low-end consumer Intel SSDs such as the X25-V, X25-M, and latest 320 and 510 series -- I use them all over the place), both before and after TRIM operations. I don't use any of these SSDs on ZFS however, only UFS (and I have seen the results both before and after TRIM support was added to UFS; and yeah, all the drives are running the latest firmware). What's confusing to me is why someone would say TRIM doesn't really matter in the case of an intent log device or a cache device; these devices both implement some degree of write operations, correct? The drive has to erase the NAND flash block (well, page really) before the block can be re-used (written to once again), so by not doing TRIM effectively you're relying 100% on drives' garbage collection mechanisms, which isn't that great (at least WRT the above drives). There are some sites that go over Intel SSD performance out-of-the-box as well as once its been used for a bit, and the performance difference is pretty substantial (50% drop in performance for reads, ~60-70% drop in performance for writes). Something to keep in mind. Furthermore, most people aren't buying SLC given the cost. Right now the absolute #1 or #2 focus of any operation is to save money; one cannot argue with the current economic condition. I think this is also why many SSD companies are focusing primarily on MLC right now; they know the majority of their client base isn't going to spend the money for SLC. > Does Solaris have TRIM for ZFS? Where? How does it help? I can > imagine TRIM for the data pool, that would be good fit for ZFS, but > SSD-only pool.. are we already there? The following blog post, and mailing list thread, provides answers to all of the above questions, including why TRIM is useful on ZFS (see the comments section; not referring to slog or cache however). But it doesn't look like it's actually made use of in ZFS as of January 2011. There is a long discussion about ZFS, TRIM, and slog/cache in the 2nd thread. There's also a reply from pjd@ in there WRT FreeBSD. http://www.c0t0d0s0.org/archives/6792-SATA-TRIM-support-in-Opensolaris.html http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/44855 There's a recommendation to permit TRIM in ZFS, but limit the number of txgs based on a sysctl, since TRIM is a slow/expensive operation. SSDs are neat, but man, NAND-based flash sure makes me a sad panda. -- | Jeremy Chadwick [hidden email] | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Jeremy Chadwick
Quoting Jeremy Chadwick <[hidden email]> (from Wed, 11 May
2011 03:06:56 -0700): > L2ARC devices should primarily be something with extremely fast read > rates (e.g. SSDs). USB1.x and 2.x memory sticks do not work well for > this purpose given protocol and bus speed limits + overhead. (I only > mention them because people often think "Oh, USB flash would work great > for this!" I disagree.) Using USB flash may work acceptable. It depends upon the rest of the system. If you have very fast harddisks (or only USB 1 hardware), USB flash will not give you a faster FS. If you have slow (and low-power) desktop disks, a fast USB flash (attention, there are also slow ones) connected via USB 2 (or 3) will give you a speed improvement you notice. As a matter of fact, I have this: - Pentium 4 - 1 GB RAM - 1 Western Digital Caviar Blue - 2 Seagate Barracuda 7200.10 - an ICH5 controller (no NCQ) - no name cheap give-away 1 GB USB flash (so not a very fast one) The disks are used in a RAIDZ, with the USB flash as a cache device. My use case was connecting to a webmail system over a slow line (ADSL 224 kilobit/s). I noticed directly when the cache was in use or not. I also have another system, ICH 10 with NCQ, 5 disks (WD RE4 RAID) in RAIDZ2, Intel Xeon 4-core, 12 GB RAM. There USB flash does not make sense at all (and the SSD makes sense if you compare the price of the entire system with the price of a small or medium SSD). For the first system, it does not make sense to spend 200 units of money for a SSD, the system itself is not worth much more now. Spending 5-10 units of money for this system is ok, and gives a speed improvement. Bye, Alexander. -- Even God cannot change the past. -- Joseph Stalin http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Daniel Kalchev
On Wed, May 11, 2011 at 6:37 AM, Daniel Kalchev <[hidden email]> wrote:
> > > On 11.05.11 13:06, Jeremy Chadwick wrote: > >> On Wed, May 11, 2011 at 07:25:52PM +1000, Danny Carroll wrote: >> >>> When I move to v28 I will probably wish to enable a L2Arc and also >>> perhaps dedicated log devices. >>> >>> In the case of ZFS intent logs, you definitely want a mirror. If you >> have a single log device, loss of that device can/will result in full >> data loss of the pool which makes use of the log device. >> > > This is true for v15 pools, not true for v28 pools. In ZFS v28 you can > remove log devices and in the case of sudden loss of log device (or > whatever) roll back the pool to a 'good' state. Therefore, for most > installations single log device might be sufficient. If you value your data, > you will of course use mirrored log devices, possibly in hot-swap > configuration and .. have a backup :) > > By the way, the SLOG (separate LOG) does not have to be SSD at all. > Separate rotating disk(s) will also suffice -- it all depends on the type of > workload. SSDs are better, for the higher end, because of the low latency > (but not all SSDs are low latency when writing!). > > The idea of the SLOG is to separate the ZIL records from the main data > pool. ZIL records are small, even smaller in v28, but will cause unnecessary > head movements if kept in the main pool. The SLOG is "write once, read on > failure" media and is written sequentially. Almost all current HDDs offer > reasonable sequential write performance for small to medium pools. > > The L2ARC needs to be fast reading SSD. It is populated slowly, few MB/sec > so there is no point to have fast and high-bandwidth write-optimized SSD. > The benefit from L2ARC is the low latency. Sort of slower RAM. > > It is bad idea to use the same SSD for both SLOG and L2ARC, because most > SSDs behave poorly if you present them with high read and high write loads. > More expensive units might behave, but then... if you pay few k$ for a SSD, > you know what you need :) > > Daniel > > _______________________________________________ > [hidden email] mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "[hidden email]" > I recently learned the hard way that you need to be very careful what you choose as your ZIL. On my personal file server, my pool is comprised of 4x 500 GB disks in a RAID-Z and 2x 1.5 TB disks in a mirror. I also had a 1 GB Compact Flash card plugged into an IDE adapter, running as the ZIL. For the longest time, my write performance was capped at about 5 MB/sec. In an attempt to figure out why, I ran gstat, to see that the CF device was pegged at 100%. Having recently upgraded to ZFSv28, I decided to try removing the log device. Write performance instantly jumped to 45 MB/sec. Lesson learned... If you're going to have a dedicated ZIL, make sure its write performance exceeds the performance of the pool itself. On the other hand, again having upgrading to v28, I attempted to use deduplication on my pool. Write performance dropped to an abysmal 1 MB/sec. Why? Because, as I found out, my system doesn't have enough memory to keep the dedupe table in memory, nor can it be upgraded to. But with the application of a sufficiently large cache device, performance goes right back up to where it's supposed to be. -- James L. Lauser [hidden email] http://jlauser.net/ _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Jeremy Chadwick
Jeremy, On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote: > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote: > > On 11.05.11 13:51, Jeremy Chadwick wrote: > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks > > >should also keep that in mind when putting an SSD into use in this > > >fashion. > > > > By the way, what would be the use of TRIM for SLOG and L2ARC devices? > > I see absolutely no benefit from TRIM for the L2ARC, because it is > > written slowly (on purpose). Any current, or 1-2 generations back SSD > > would handle that write load without TRIM and without any performance > > degradation. > > > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC > > SSD for the SLOG, for many reasons. The write regions on the SLC > > NAND should be smaller (my wild guess, current practice may differ) > > and the need for rewriting will be small. If you don't need to > > rewrite already written data, TRIM does not help. Also, as far as I > > understand, most "serious" SSDs (typical for SLC I guess) would have > > twice or more the advertised size and always write to fresh cells, > > scheduling an background erase of the 'overwritten' cell. > > AFAIK, drive manufacturers do not disclose just how much reallocation > space they keep available on an SSD. I'd rather not speculate as to how > much, as I'm certain it varies per vendor. > small. A rule of thumb is that you should size the separate log to be able to handle 10 seconds of your expected synchronous write workload. It would be rare to need more than 100 MB in a separate log device, but the separate log must be at least 64 MB. http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide So in other words how much is TRIM really even effective give the above ? Even with a high database write load on the disks at full compacity of the incoming link I would find it hard to believe that anyone could get the ZIL to even come close to 512MB. Given most SSD's come at a size greater than 32GB I hope this comes as a early reminder that the ZIL you are buying that disk for is only going to be using a small percent of that disk and I hope you justify cost over its actual use. If you do happen to justify creating a ZIL for your pool then I hope that you partition it wisely to make use of the rest of the space that is untouched. For all other cases I would reccomend if you still want to have a ZIL that you take some sort of PCI->SD CARD or USB stick into account with mirroring. -- Regards, (jhell) Jason Hellenthal |
|
On Wed, May 11, 2011 at 06:38:49PM -0400, Jason Hellenthal wrote:
> > Jeremy, > > On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote: > > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote: > > > On 11.05.11 13:51, Jeremy Chadwick wrote: > > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks > > > >should also keep that in mind when putting an SSD into use in this > > > >fashion. > > > > > > By the way, what would be the use of TRIM for SLOG and L2ARC devices? > > > I see absolutely no benefit from TRIM for the L2ARC, because it is > > > written slowly (on purpose). Any current, or 1-2 generations back SSD > > > would handle that write load without TRIM and without any performance > > > degradation. > > > > > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC > > > SSD for the SLOG, for many reasons. The write regions on the SLC > > > NAND should be smaller (my wild guess, current practice may differ) > > > and the need for rewriting will be small. If you don't need to > > > rewrite already written data, TRIM does not help. Also, as far as I > > > understand, most "serious" SSDs (typical for SLC I guess) would have > > > twice or more the advertised size and always write to fresh cells, > > > scheduling an background erase of the 'overwritten' cell. > > > > AFAIK, drive manufacturers do not disclose just how much reallocation > > space they keep available on an SSD. I'd rather not speculate as to how > > much, as I'm certain it varies per vendor. > > > > Lets not forget here: The size of the separate log device may be quite > small. A rule of thumb is that you should size the separate log to be able > to handle 10 seconds of your expected synchronous write workload. It would > be rare to need more than 100 MB in a separate log device, but the > separate log must be at least 64 MB. > > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide > > So in other words how much is TRIM really even effective give the above ? > > Even with a high database write load on the disks at full compacity of the > incoming link I would find it hard to believe that anyone could get the > ZIL to even come close to 512MB. In the case of an SSD being used as a log device (ZIL), I imagine it would only matter the longer the drive was kept in use. I do not use log devices anywhere with ZFS, so I can't really comment. In the case of an SSD being used as a cache device (L2ARC), I imagine it would matter much more. In the case of an SSD being used as a pool device, it matters greatly. Why it matters: there's two methods of "reclaiming" blocks which were used: internal SSD "garbage collection" and TRIM. For a NAND block to be reclaimed, it has to be erased -- SSDs erase things in pages rather than individual LBAs. With TRIM, you submit the data management command via ATA with a list of LBAs you wish to inform the drive are no longer used. The drive aggregates the LBA ranges, determines if an entire flash page can be erased, and does it. If it can't, it makes some sort of mental note that the individual LBA (in some particular page) shouldn't be used. The "garbage collection" works when the SSD is idle. I have no idea what "idle" actually means operationally, because again, vendors don't disclose what the idle intervals are. 5 minutes? 24 hours? It matters, but they don't tell us. (What confuses me about the "idle GC" method is how it determines what it can erase -- if the OS didn't tell it what it's using, how does it know it can erase the page?) Anyway, how all this manifests itself performance-wise is intriguing. It's not speculation: there's hard evidence that not using TRIM results in SSD performance, bluntly put, sucking badly on some SSDs. There's this mentality that wear levelling completely solves all of the **performance** concerns -- that isn't the case at all. In fact, I'm under the impression it probably hurts performance, but it depends on how it's implemented within the drive firmware. bit-tech did an experiment using Windows 7 -- which supports and uses TRIM assuming the device advertises the capability -- with different models of SSDs. The testing procedure is documented here, but I'll document it as well: http://www.bit-tech.net/hardware/storage/2010/02/04/windows-7-ssd-performance-and-trim/4 Again, remember, this is done on a Windows 7 system which does support TRIM if the device supports it. The testing steps, in this order: 1) SSD without TRIM support -- all LBAs are zeroed. 2) Took read/write benchmark readings. 3) SSD without TRIM support -- partitioned and formatted as NTFS (cluster size unknown), copied 100GB of data to the drive, deleted all the data, and repeated this method 10 times. 4) Step #2 repeated. 5) Upgraded SSD firmware to a version that supports TRIM. 6) SSD with TRIM support -- step #1 repeated. 7) Step #2 repeated. 8) SSD with TRIM support -- step #3 repeated. 9) Step #2 repeated. Without TRIM, some drives drop their read performance by more than 50%, and write performance by almost 70%. I'm focusing on Intel SSDs here, by the way. I do not care for OCZ or Corsair products. So because ZFS on FreeBSD (and Solaris/OpenSolaris) doesn't support TRIM, effectively the benchmarks shown pre-firmware-upgrade are what ZFS on FreeBSD will mimic (to some degree). Therefore, simply put, users should be concerned when using ZFS on FreeBSD with SSDs. It doesn't matter to me if you're only using 64MBytes of a 40GB drive or if you're using the entire thing; no TRIM means degraded performance over time. Can you refute any of this evidence? > Given most SSD's come at a size greater than 32GB I hope this comes as a > early reminder that the ZIL you are buying that disk for is only going to > be using a small percent of that disk and I hope you justify cost over its > actual use. If you do happen to justify creating a ZIL for your pool then > I hope that you partition it wisely to make use of the rest of the space > that is untouched. > > For all other cases I would reccomend if you still want to have a ZIL that > you take some sort of PCI->SD CARD or USB stick into account with > mirroring. Others have pointed out this isn't effective (re: USB sticks). The read and write speeds are too slow, and limit the overall performance of ZFS in a very bad way. I can absolutely confirm this claim (I've tested it myself, using a high-end USB flash drive as a cache device (L2ARC)). Alexander Leidinger pointed out that using a USB stick for cache/L2ARC *does* improve performance on older systems which have slower disk I/O (e.g. ICH5-based systems). -- | Jeremy Chadwick [hidden email] | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
Jeremy, As always the qaulity of your messages are 101% spot on and I always find some new new information that becomes handy more often than I could say, and there is always something to be learned. Thanks. On Wed, May 11, 2011 at 06:04:33PM -0700, Jeremy Chadwick wrote: > On Wed, May 11, 2011 at 06:38:49PM -0400, Jason Hellenthal wrote: > > > > Jeremy, > > > > On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote: > > > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote: > > > > On 11.05.11 13:51, Jeremy Chadwick wrote: > > > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks > > > > >should also keep that in mind when putting an SSD into use in this > > > > >fashion. > > > > > > > > By the way, what would be the use of TRIM for SLOG and L2ARC devices? > > > > I see absolutely no benefit from TRIM for the L2ARC, because it is > > > > written slowly (on purpose). Any current, or 1-2 generations back SSD > > > > would handle that write load without TRIM and without any performance > > > > degradation. > > > > > > > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC > > > > SSD for the SLOG, for many reasons. The write regions on the SLC > > > > NAND should be smaller (my wild guess, current practice may differ) > > > > and the need for rewriting will be small. If you don't need to > > > > rewrite already written data, TRIM does not help. Also, as far as I > > > > understand, most "serious" SSDs (typical for SLC I guess) would have > > > > twice or more the advertised size and always write to fresh cells, > > > > scheduling an background erase of the 'overwritten' cell. > > > > > > AFAIK, drive manufacturers do not disclose just how much reallocation > > > space they keep available on an SSD. I'd rather not speculate as to how > > > much, as I'm certain it varies per vendor. > > > > > > > Lets not forget here: The size of the separate log device may be quite > > small. A rule of thumb is that you should size the separate log to be able > > to handle 10 seconds of your expected synchronous write workload. It would > > be rare to need more than 100 MB in a separate log device, but the > > separate log must be at least 64 MB. > > > > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide > > > > So in other words how much is TRIM really even effective give the above ? > > > > Even with a high database write load on the disks at full compacity of the > > incoming link I would find it hard to believe that anyone could get the > > ZIL to even come close to 512MB. > > In the case of an SSD being used as a log device (ZIL), I imagine it > would only matter the longer the drive was kept in use. I do not use > log devices anywhere with ZFS, so I can't really comment. > > In the case of an SSD being used as a cache device (L2ARC), I imagine it > would matter much more. > > In the case of an SSD being used as a pool device, it matters greatly. > > Why it matters: there's two methods of "reclaiming" blocks which were > used: internal SSD "garbage collection" and TRIM. For a NAND block to be > reclaimed, it has to be erased -- SSDs erase things in pages rather > than individual LBAs. With TRIM, you submit the data management command > via ATA with a list of LBAs you wish to inform the drive are no longer > used. The drive aggregates the LBA ranges, determines if an entire > flash page can be erased, and does it. If it can't, it makes some sort > of mental note that the individual LBA (in some particular page) > shouldn't be used. > > The "garbage collection" works when the SSD is idle. I have no idea > what "idle" actually means operationally, because again, vendors don't > disclose what the idle intervals are. 5 minutes? 24 hours? It > matters, but they don't tell us. (What confuses me about the "idle GC" > method is how it determines what it can erase -- if the OS didn't tell > it what it's using, how does it know it can erase the page?) > > Anyway, how all this manifests itself performance-wise is intriguing. > It's not speculation: there's hard evidence that not using TRIM results > in SSD performance, bluntly put, sucking badly on some SSDs. > > There's this mentality that wear levelling completely solves all of the > **performance** concerns -- that isn't the case at all. In fact, I'm > under the impression it probably hurts performance, but it depends on > how it's implemented within the drive firmware. > > bit-tech did an experiment using Windows 7 -- which supports and uses > TRIM assuming the device advertises the capability -- with different > models of SSDs. The testing procedure is documented here, but I'll > document it as well: > > http://www.bit-tech.net/hardware/storage/2010/02/04/windows-7-ssd-performance-and-trim/4 > > Again, remember, this is done on a Windows 7 system which does support > TRIM if the device supports it. The testing steps, in this order: > > 1) SSD without TRIM support -- all LBAs are zeroed. > 2) Took read/write benchmark readings. > 3) SSD without TRIM support -- partitioned and formatted as NTFS > (cluster size unknown), copied 100GB of data to the drive, deleted all > the data, and repeated this method 10 times. > 4) Step #2 repeated. > 5) Upgraded SSD firmware to a version that supports TRIM. > 6) SSD with TRIM support -- step #1 repeated. > 7) Step #2 repeated. > 8) SSD with TRIM support -- step #3 repeated. > 9) Step #2 repeated. > > Without TRIM, some drives drop their read performance by more than 50%, > and write performance by almost 70%. I'm focusing on Intel SSDs here, > by the way. I do not care for OCZ or Corsair products. > > So because ZFS on FreeBSD (and Solaris/OpenSolaris) doesn't support > TRIM, effectively the benchmarks shown pre-firmware-upgrade are what ZFS > on FreeBSD will mimic (to some degree). > > Therefore, simply put, users should be concerned when using ZFS on > FreeBSD with SSDs. It doesn't matter to me if you're only using > 64MBytes of a 40GB drive or if you're using the entire thing; no TRIM > means degraded performance over time. > > Can you refute any of this evidence? > use of SSDs with OpenSolaris users from before the Oracle reaping that I didnt recall seeing any relative bug reports on degradation. But like I said... I havent seen them but thats not to say there wasnt a lack of use either. Definately more to look into, test, benchmark & test again. > > Given most SSD's come at a size greater than 32GB I hope this comes as a > > early reminder that the ZIL you are buying that disk for is only going to > > be using a small percent of that disk and I hope you justify cost over its > > actual use. If you do happen to justify creating a ZIL for your pool then > > I hope that you partition it wisely to make use of the rest of the space > > that is untouched. > > > > For all other cases I would reccomend if you still want to have a ZIL that > > you take some sort of PCI->SD CARD or USB stick into account with > > mirroring. > > Others have pointed out this isn't effective (re: USB sticks). The read > and write speeds are too slow, and limit the overall performance of ZFS > in a very bad way. I can absolutely confirm this claim (I've tested it > myself, using a high-end USB flash drive as a cache device (L2ARC)). > > Alexander Leidinger pointed out that using a USB stick for cache/L2ARC > *does* improve performance on older systems which have slower disk I/O > (e.g. ICH5-based systems). > that USB 2.0 can handle, then any USB based solution is useless. ICH5 and up would be right about that time you would see this starting to happen. sdcards/cfcards mileage may vary depending on the transfer rates. But still the same situation applies like you said once your main pool throughput outweighs the throughput on your ZIL then its probably not worth even having a ZIL or a Cache device. Emphasis on Cache moreso than ZIL. Anyway all good information for those to make the judgement whether they need a cache or a zil. Thanks again Jeremy. Always appreciated. -- Regards, (jhell) Jason Hellenthal |
|
On Wed, May 11, 2011 at 09:48:48PM -0400, Jason Hellenthal wrote:
> Jeremy, As always the qaulity of your messages are 101% spot on and I > always find some new new information that becomes handy more often than I > could say, and there is always something to be learned. > > Thanks. > > On Wed, May 11, 2011 at 06:04:33PM -0700, Jeremy Chadwick wrote: > > On Wed, May 11, 2011 at 06:38:49PM -0400, Jason Hellenthal wrote: > > > > > > Jeremy, > > > > > > On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote: > > > > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote: > > > > > On 11.05.11 13:51, Jeremy Chadwick wrote: > > > > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks > > > > > >should also keep that in mind when putting an SSD into use in this > > > > > >fashion. > > > > > > > > > > By the way, what would be the use of TRIM for SLOG and L2ARC devices? > > > > > I see absolutely no benefit from TRIM for the L2ARC, because it is > > > > > written slowly (on purpose). Any current, or 1-2 generations back SSD > > > > > would handle that write load without TRIM and without any performance > > > > > degradation. > > > > > > > > > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC > > > > > SSD for the SLOG, for many reasons. The write regions on the SLC > > > > > NAND should be smaller (my wild guess, current practice may differ) > > > > > and the need for rewriting will be small. If you don't need to > > > > > rewrite already written data, TRIM does not help. Also, as far as I > > > > > understand, most "serious" SSDs (typical for SLC I guess) would have > > > > > twice or more the advertised size and always write to fresh cells, > > > > > scheduling an background erase of the 'overwritten' cell. > > > > > > > > AFAIK, drive manufacturers do not disclose just how much reallocation > > > > space they keep available on an SSD. I'd rather not speculate as to how > > > > much, as I'm certain it varies per vendor. > > > > > > > > > > Lets not forget here: The size of the separate log device may be quite > > > small. A rule of thumb is that you should size the separate log to be able > > > to handle 10 seconds of your expected synchronous write workload. It would > > > be rare to need more than 100 MB in a separate log device, but the > > > separate log must be at least 64 MB. > > > > > > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide > > > > > > So in other words how much is TRIM really even effective give the above ? > > > > > > Even with a high database write load on the disks at full compacity of the > > > incoming link I would find it hard to believe that anyone could get the > > > ZIL to even come close to 512MB. > > > > In the case of an SSD being used as a log device (ZIL), I imagine it > > would only matter the longer the drive was kept in use. I do not use > > log devices anywhere with ZFS, so I can't really comment. > > > > In the case of an SSD being used as a cache device (L2ARC), I imagine it > > would matter much more. > > > > In the case of an SSD being used as a pool device, it matters greatly. > > > > Why it matters: there's two methods of "reclaiming" blocks which were > > used: internal SSD "garbage collection" and TRIM. For a NAND block to be > > reclaimed, it has to be erased -- SSDs erase things in pages rather > > than individual LBAs. With TRIM, you submit the data management command > > via ATA with a list of LBAs you wish to inform the drive are no longer > > used. The drive aggregates the LBA ranges, determines if an entire > > flash page can be erased, and does it. If it can't, it makes some sort > > of mental note that the individual LBA (in some particular page) > > shouldn't be used. > > > > The "garbage collection" works when the SSD is idle. I have no idea > > what "idle" actually means operationally, because again, vendors don't > > disclose what the idle intervals are. 5 minutes? 24 hours? It > > matters, but they don't tell us. (What confuses me about the "idle GC" > > method is how it determines what it can erase -- if the OS didn't tell > > it what it's using, how does it know it can erase the page?) > > > > Anyway, how all this manifests itself performance-wise is intriguing. > > It's not speculation: there's hard evidence that not using TRIM results > > in SSD performance, bluntly put, sucking badly on some SSDs. > > > > There's this mentality that wear levelling completely solves all of the > > **performance** concerns -- that isn't the case at all. In fact, I'm > > under the impression it probably hurts performance, but it depends on > > how it's implemented within the drive firmware. > > > > bit-tech did an experiment using Windows 7 -- which supports and uses > > TRIM assuming the device advertises the capability -- with different > > models of SSDs. The testing procedure is documented here, but I'll > > document it as well: > > > > http://www.bit-tech.net/hardware/storage/2010/02/04/windows-7-ssd-performance-and-trim/4 > > > > Again, remember, this is done on a Windows 7 system which does support > > TRIM if the device supports it. The testing steps, in this order: > > > > 1) SSD without TRIM support -- all LBAs are zeroed. > > 2) Took read/write benchmark readings. > > 3) SSD without TRIM support -- partitioned and formatted as NTFS > > (cluster size unknown), copied 100GB of data to the drive, deleted all > > the data, and repeated this method 10 times. > > 4) Step #2 repeated. > > 5) Upgraded SSD firmware to a version that supports TRIM. > > 6) SSD with TRIM support -- step #1 repeated. > > 7) Step #2 repeated. > > 8) SSD with TRIM support -- step #3 repeated. > > 9) Step #2 repeated. > > > > Without TRIM, some drives drop their read performance by more than 50%, > > and write performance by almost 70%. I'm focusing on Intel SSDs here, > > by the way. I do not care for OCZ or Corsair products. > > > > So because ZFS on FreeBSD (and Solaris/OpenSolaris) doesn't support > > TRIM, effectively the benchmarks shown pre-firmware-upgrade are what ZFS > > on FreeBSD will mimic (to some degree). > > > > Therefore, simply put, users should be concerned when using ZFS on > > FreeBSD with SSDs. It doesn't matter to me if you're only using > > 64MBytes of a 40GB drive or if you're using the entire thing; no TRIM > > means degraded performance over time. > > > > Can you refute any of this evidence? > > > > At least now at the moment NO. But I can say depending on how large of a > use of SSDs with OpenSolaris users from before the Oracle reaping that I > didnt recall seeing any relative bug reports on degradation. But like I > said... I havent seen them but thats not to say there wasnt a lack of use > either. Definately more to look into, test, benchmark & test again. > > > > Given most SSD's come at a size greater than 32GB I hope this comes as a > > > early reminder that the ZIL you are buying that disk for is only going to > > > be using a small percent of that disk and I hope you justify cost over its > > > actual use. If you do happen to justify creating a ZIL for your pool then > > > I hope that you partition it wisely to make use of the rest of the space > > > that is untouched. > > > > > > For all other cases I would reccomend if you still want to have a ZIL that > > > you take some sort of PCI->SD CARD or USB stick into account with > > > mirroring. > > > > Others have pointed out this isn't effective (re: USB sticks). The read > > and write speeds are too slow, and limit the overall performance of ZFS > > in a very bad way. I can absolutely confirm this claim (I've tested it > > myself, using a high-end USB flash drive as a cache device (L2ARC)). > > > > Alexander Leidinger pointed out that using a USB stick for cache/L2ARC > > *does* improve performance on older systems which have slower disk I/O > > (e.g. ICH5-based systems). > > > > Agreed. Soon as the bus speed, write speeds are greater than the speeds > that USB 2.0 can handle, then any USB based solution is useless. ICH5 and > up would be right about that time you would see this starting to happen. > > sdcards/cfcards mileage may vary depending on the transfer rates. But > still the same situation applies like you said once your main pool > throughput outweighs the throughput on your ZIL then its probably not > worth even having a ZIL or a Cache device. Emphasis on Cache moreso than > ZIL. > > > Anyway all good information for those to make the judgement whether they > need a cache or a zil. > > > Thanks again Jeremy. Always appreciated. You're welcome. It's important to note that much of what I say is stuff I've learned and read (technical documentation usually) on my own -- which means I almost certainly misunderstand certain pieces of technology. There are a *lot* of people here who understand it much better than I do. (I'm looking at you, jhb@ ;-) ) As such, I probably should have CC'd pjd@ on this thread, since he's talked a bit about how to get ZFS on FreeBSD to work with TRIM, and when to issue the erasing of said blocks. -- | Jeremy Chadwick [hidden email] | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Dan Carroll
On 11/05/2011 7:25 PM, Danny Carroll wrote:
> Hello all. > > I've been using ZFS for some time now and have never had an issued > (except perhaps the issue of speed...) > When v28 is taken into -STABLE I will most likely upgrade to v28 at that > point. Currently I am running v15 with v4 on disk. > > When I move to v28 I will probably wish to enable a L2Arc and also > perhaps dedicated log devices. > > I'm curious about a few things however. > > 1. Can I remove either the L2 ARC or the log devices if things don't go > as planned or if I need to free up some resources? > 2. What are the best practices for setting up these? Would a geom > mirror for the log device be the way to go. Or can you just let ZFS > mirror the log itself? > 3. What happens when one or both of the log devices fail. Does ZFS > come to a crashing halt and kill all the data? Or does it simply > complain that the ZIL is no longer active and continue on it's merry way? > > In short, what is the best way to set up these two features? > Replying to myself in order to summarise the recommendations (when using v28): - Don't use SSD for the Log device. Write speed tends to be a problem. - SSD ok for cache if the sizing is right, but without TRIM, don't expect to take full advantage of the SSD. - Do use two devices for log and mirror them with ZFS. Bad things *can* happen if*all* the log devices die. - Don't colocate L2ARC and Log devices. - Log devices can be small, ZFS Best practices guide specifies about 50% of RAM as max. Minimum should be Throughput * 10 (1Gb for 100MB/sec of writes). let me know if I got anything wrong or missed something important. Remaining questions. - Is there any advantage to using a spare partition on a SCSI or SATA drive as L2Arc? Assuming it was in the machine already but doing nothing? - If I have 2 pools like this: # zpool status pool: tank state: ONLINE scrub: scrub completed after 11h7m with 0 errors on Sun May 8 14:17:07 2011 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 gpt/data0 ONLINE 0 0 0 gpt/data1 ONLINE 0 0 0 gpt/data2 ONLINE 0 0 0 gpt/data3 ONLINE 0 0 0 gpt/data4 ONLINE 0 0 0 gpt/data5 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 gpt/data6 ONLINE 0 0 0 gpt/data7 ONLINE 0 0 0 gpt/data8 ONLINE 0 0 0 gpt/data9 ONLINE 0 0 0 gpt/data10 ONLINE 0 0 0 gpt/data11 ONLINE 0 0 0 errors: No known data errors pool: system state: ONLINE scrub: scrub completed after 1h1m with 0 errors on Sun May 8 15:18:23 2011 config: NAME STATE READ WRITE CKSUM system ONLINE 0 0 0 mirror ONLINE 0 0 0 gpt/system0 ONLINE 0 0 0 gpt/system1 ONLINE 0 0 0 And I have free space on the "system" disks. I could give two new partitions on the system disks to ZFS for the log devices of the "tank" pool? If I were worried about performance of my "system" pool, I could also use spare partitions on (a couple of) the "tank" disks in a similar way. But it would be silly to use the same disk for ZIL and pool data. In that case, why would I bother to alter the default. Thanks for the info! -D _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Jeremy Chadwick
On Wed, 11 May 2011, Jeremy Chadwick wrote:
> > The "garbage collection" works when the SSD is idle. I have no idea > what "idle" actually means operationally, because again, vendors don't > disclose what the idle intervals are. 5 minutes? 24 hours? It > matters, but they don't tell us. (What confuses me about the "idle GC" > method is how it determines what it can erase -- if the OS didn't tell > it what it's using, how does it know it can erase the page?) Garbage collection is not necessarily just when the drive is idle. Regardless, if one "overwrites" a page (or part of a page), the drive can implement that by reading any non-overlapped existing content (which it already has to do), allocating a fresh (already erased) page, and then writing the composite to that new page. The "overwritten" page is then scheduled for erasure. This sort of garbage collector works by over-provisioning the actual amount of flash in the drive, which should be done anyway in a quality product. This simple recirculating/COW algorithm is a reason why TRIM is not really needed given sufficiently intelligent SSD design. > Therefore, simply put, users should be concerned when using ZFS on > FreeBSD with SSDs. It doesn't matter to me if you're only using > 64MBytes of a 40GB drive or if you're using the entire thing; no TRIM > means degraded performance over time. This seems unduely harsh. Even with TRIM, SSDs will suffer in continually write-heavy (e.g. server) environments. The reason is that the blocks still need to be erased and the erasure performance is limited. It is not uncommon for servers to be run close to their limits most of the time. One should not be ashamed with purchasing a larger SSD than the space consumption appears to warrant. Bob -- Bob Friesenhahn [hidden email], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Dan Carroll
On Thu, 12 May 2011, Danny Carroll wrote:
> > Replying to myself in order to summarise the recommendations (when using > v28): > - Don't use SSD for the Log device. Write speed tends to be a problem. DO use SSD for the log device. The log device is only used for synchronous writes. Except for certain usages (E.g. database and NFS server) most writes will be asynchronous and never be written to the log. Huge synchronous writes will also bypass the SSD log device. The log device is for reducing latency on small synchronous writes. > - Is there any advantage to using a spare partition on a SCSI or SATA > drive as L2Arc? Assuming it was in the machine already but doing nothing? The L2ARC is intended to reduce read latency and is random accessed. It is unlikely that rotating media will work well for that. Bob -- Bob Friesenhahn [hidden email], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
On Wed, May 11, 2011 at 09:51:58PM -0500, Bob Friesenhahn wrote:
> On Thu, 12 May 2011, Danny Carroll wrote: > > > >Replying to myself in order to summarise the recommendations (when using > >v28): > >- Don't use SSD for the Log device. Write speed tends to be a problem. > > DO use SSD for the log device. The log device is only used for > synchronous writes. Except for certain usages (E.g. database and > NFS server) most writes will be asynchronous and never be written to > the log. Huge synchronous writes will also bypass the SSD log > device. The log device is for reducing latency on small synchronous > writes. Bob, please correct me if I'm wrong, but as I understand it a log device (ZIL) effectively limits the overall write speed of the pool itself. Consumer-level SSDs do not have extremely high write performance (and it gets worse without TRIM; again a 70% decrease in write speed in some cases). I imagine a very high-end SSD (FusionIO, etc. -- the things that cost $900 and higher) would have extremely high write performance and would work perfectly for this role. Or a battery-backed DDR RAM device. What's amusing (to me anyway) is that when ZFS was originally presented, engineers from Sun folks kept focusing on how "you can buy cheap, generic disks and accomplish goals!" yet if the above statement of mine is accurate, that goes against the original principle. Danny might also find this URL useful: http://constantin.glez.de/blog/2011/02/frequently-asked-questions-about-flash-memory-ssds-and-zfs > >- Is there any advantage to using a spare partition on a SCSI or SATA > >drive as L2Arc? Assuming it was in the machine already but doing nothing? > > The L2ARC is intended to reduce read latency and is random accessed. > It is unlikely that rotating media will work well for that. Agreed -- this is why I tell folks that an SSD would work very well for L2ARC, but my opinion is just to buy more RAM for the ARC ("layer 1"). -- | Jeremy Chadwick [hidden email] | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
On Wed, May 11, 2011 at 8:36 PM, Jeremy Chadwick
<[hidden email]> wrote: > On Wed, May 11, 2011 at 09:51:58PM -0500, Bob Friesenhahn wrote: >> On Thu, 12 May 2011, Danny Carroll wrote: >> > >> >Replying to myself in order to summarise the recommendations (when using >> >v28): >> >- Don't use SSD for the Log device. Write speed tends to be a problem. >> >> DO use SSD for the log device. The log device is only used for >> synchronous writes. Except for certain usages (E.g. database and >> NFS server) most writes will be asynchronous and never be written to >> the log. Huge synchronous writes will also bypass the SSD log >> device. The log device is for reducing latency on small synchronous >> writes. > > Bob, please correct me if I'm wrong, but as I understand it a log device > (ZIL) effectively limits the overall write speed of the pool itself. Nope. Using a separate log device removes sync writes from the I/O path of the rest of the pool, thus increasing the total write throughput for the pool. > Danny might also find this URL useful: > > http://constantin.glez.de/blog/2011/02/frequently-asked-questions-about-flash-memory-ssds-and-zfs Read the linked articles. For example: http://constantin.glez.de/blog/2010/07/solaris-zfs-synchronous-writes-and-zil-explained Most sync writes go to the ZIL. If the ZIL is part of the pool, then the pool has to issue two separate writes (once to the ZIL, then later to the pool as part of the normal async txg). If the ZIL is a separate device, then there's no write contention with the rest of the pool. Not every sync write goes to the ZIL. Only writes under a certain size (64 KB or something like that). Every OpenSolaris, Oracle Solaris, Nexenta admin will recommend getting an enterprise-grade, write-optimised, SLC-based SSD (preferably with a supercap) for use as the SLOG device. Especially if you're using ZFS for anything database-related, or serving files over NFS, everyone says the same: get an SSD for SLOG usage. Why would it be any different for ZFS on FreeBSD? There are plenty of benchmarks online and in the zfs-discuss mailing list that shows the benefits to using an SSD-based SLOG. -- Freddie Cash [hidden email] _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Jeremy Chadwick
On 12.05.11 06:36, Jeremy Chadwick wrote: > On Wed, May 11, 2011 at 09:51:58PM -0500, Bob Friesenhahn wrote: >> On Thu, 12 May 2011, Danny Carroll wrote: >>> Replying to myself in order to summarise the recommendations (when using >>> v28): >>> - Don't use SSD for the Log device. Write speed tends to be a problem. >> DO use SSD for the log device. The log device is only used for >> synchronous writes. Except for certain usages (E.g. database and >> NFS server) most writes will be asynchronous and never be written to >> the log. Huge synchronous writes will also bypass the SSD log >> device. The log device is for reducing latency on small synchronous >> writes. > Bob, please correct me if I'm wrong, but as I understand it a log device > (ZIL) effectively limits the overall write speed of the pool itself. > using SSD for the SLOG. You can of course create usage/benchmark scenario, where an (cheap) SSD based SLOG will be worse than an (fast) HDD based SLOG, especially if you are not concerned about latency. The SLOG resolves two issues, it increases the pool throughput (primary storage) by removing small synchronous writes from it, that will unnecessarily introduce head movement and more IOPS and it provided low latency for small synchronous writes. The later is only valid if the SSD is sufficiently write-optimized. Most consumer SSDs end up saturated by writes. Sequential write IOPS is what matters here. About TRIM. As it was already mentioned, you will use only small portion of an (for example) 32GB SSD for the SLOG. If you do not allocate the entire SSD, then wear leveling will be able to play well and it is very likely you will not suffer any performance degradation. By the way, I do not believe Windows benchmark has any significance in our ZFS usage for the SSDs. How is TRIM implemented in Windows? How does it relate to SSD usage as SLOG and L2ARC? How can ever TRIM support influence reading from the drive?! TRIM is an slow operation. How often are these issued? What is the impact of issuing TRIM to an otherwise loaded SSD? Daniel _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Dan Carroll
On 12.05.11 05:26, Danny Carroll wrote:
> > - Don't use SSD for the Log device. Write speed tends to be a problem. It all depends on your usage. You need to experiment, unfortunately. > - SSD ok for cache if the sizing is right, but without TRIM, don't > expect to take full advantage of the SSD. I do not believe TRIM has any effect on L2ARC. Why? - TRIM is a technique to optimize future writes; - L2ARC is written at controlled, very low rate, I believe something like 8MB/sec. There is no SSD currently on the market, with or without TRIM that has any trouble sustaining that rate. - TRIM might introduce delays, it is very 'expensive' command. But that will surely wary by drive/manufacturer. - There is no way TRIM can influence reading from the flash media. Reading from L2ARC with low latency and high speed is it's main purpose anyway. > Remaining questions. > - Is there any advantage to using a spare partition on a SCSI or SATA > drive as L2Arc? Assuming it was in the machine already but doing nothing? Absolutely no advantage. You want L2ARC to be very low latency and high-bandwidth for random reading. Especially low-latency. This does not apply to rotating disks. Daniel _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
On 12. svi. 2011., at 08:44, Daniel Kalchev wrote: > On 12.05.11 05:26, Danny Carroll wrote: >> >> - Don't use SSD for the Log device. Write speed tends to be a problem. > It all depends on your usage. You need to experiment, unfortunately. What is the alternative for log devices if you are not using SSD? Rotating hard drives? AFAIK, two factors define the speed of log device: write transfer rate and write latency. You will not find a rotating hard drive that has a write latency anything near the write latency of even a slowest SSD you can find on the market. On the other hand, only a very few rotating hard drives have a write transfer rate that can be compared to SSD's. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
| Powered by Nabble | Edit this page |
