|
Hi,
I've bought an OCZ Vertex2 E (120 GB SSD) and installed FreeBSD i386 stable/8 on it, using UFS (UFS2, to be exact). I've made sure that the partitions are aligned properly, and used newfs with 4k fragsize and 32k blocksize. It works very well so far. Now, the only thing missing would be TRIM support for UFS. I did some research and found the following patch: http://people.freebsd.org/~pjd/patches/ufsdel.patch Apparently that patch is supposed to make UFS use TRIM, among other things. However, the patch seems to be rather old (time stamp says 2007-02-03), and nothing has been comitted to current or stable so far (I've grepped through the past 12 months of SVN logs, after TRIM support was added to ada(4) by mav@). So, my question is, are there plans to add TRIM support to UFS? Is anyone working on it? Or is it already there and I just overlooked it? Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "A language that doesn't have everything is actually easier to program in than some that do." -- Dennis M. Ritchie _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
On Tue, Dec 07, 2010 at 04:31:14PM +0100, Oliver Fromme wrote:
> Hi, > > I've bought an OCZ Vertex2 E (120 GB SSD) and installed > FreeBSD i386 stable/8 on it, using UFS (UFS2, to be exact). > I've made sure that the partitions are aligned properly, > and used newfs with 4k fragsize and 32k blocksize. > It works very well so far. > > Now, the only thing missing would be TRIM support for UFS. > I did some research and found the following patch: > > http://people.freebsd.org/~pjd/patches/ufsdel.patch > > Apparently that patch is supposed to make UFS use TRIM, > among other things. However, the patch seems to be rather > old (time stamp says 2007-02-03), and nothing has been > comitted to current or stable so far (I've grepped through > the past 12 months of SVN logs, after TRIM support was > added to ada(4) by mav@). > > So, my question is, are there plans to add TRIM support > to UFS? Is anyone working on it? Or is it already there > and I just overlooked it? on file remove space can be returned to the system. I think you should ask Kirk what to do about that, but I'm afraid my patch can break SU - what if we TRIM, but then panic and on fsck decide to actually use the block? BTW. Have you actually observed any performance degradation without TRIM? I've similar SSDs and from what I tested it somehow can handle wear leveling internally. You can to TRIM entire disk using this simple program below, newfs it and test it. Then fill it with random data, newfs it again, test it and compare results. === Makefile: PROG= trim LDADD= -lgeom NO_MAN= WARNS?= 6 .include <bsd.prog.mk> === trim.c: #include <err.h> #include <stdlib.h> #include <libgeom.h> int main(int argc, char *argv[]) { off_t mediasize; int fd; if (argc != 2) errx(1, "usage: %s disk", getprogname()); if ((fd = g_open(argv[1], 1)) < 0) err(2, "g_open(%s)", argv[1]); if ((mediasize = g_mediasize(fd)) < 0) err(3, "g_mediasize()"); if (g_delete(fd, 0, mediasize) < 0) err(4, "g_delete()"); g_close(fd); exit(0); } -- Pawel Jakub Dawidek http://www.wheelsystems.com [hidden email] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! |
|
> BTW. Have you actually observed any performance degradation without > TRIM? I've similar SSDs and from what I tested it somehow can handle > wear leveling internally. You can to TRIM entire disk using this simple > program below, newfs it and test it. Then fill it with random data, > newfs it again, test it and compare results. Yes. I've seen performance degradation. It's not clear that TRIM is the most useful in all cases, partly because it's generally a very slow command and it's not quite clear how well the wear levelling algorithms incorporate TRIM changes. One thing is certainly true is that if SSD drives are for caching only, a SECURE ERASE at each 'mount' time along with the appropriate newfs helps substantially. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Pawel Jakub Dawidek
Pawel Jakub Dawidek wrote:
> On Tue, Dec 07, 2010 at 04:31:14PM +0100, Oliver Fromme wrote: > > I've bought an OCZ Vertex2 E (120 GB SSD) and installed > > FreeBSD i386 stable/8 on it, using UFS (UFS2, to be exact). > > I've made sure that the partitions are aligned properly, > > and used newfs with 4k fragsize and 32k blocksize. > > It works very well so far. (I should also mention that I mounted all filesystems from the SSD with the "noatime" option, to reduce writes during normal operation.) > > So, my question is, are there plans to add TRIM support > > to UFS? Is anyone working on it? Or is it already there > > and I just overlooked it? > > I hacked up this patch mostly for Kris and md(4) memory-backed UFS, so > on file remove space can be returned to the system. I see. > I think you should ask Kirk what to do about that, but I'm afraid my > patch can break SU - what if we TRIM, but then panic and on fsck decide > to actually use the block? Oh, you're right. That could be a problem. Maybe it would be better to write a separate tool that performs TRIM commands on areas of the file system that are unused for a while. I also remember that mav@ wrote that the TRIM command is very slow. So, it's probably not feasible to execute it each time some blocks are freed, because it would make the file system much slower and nullify all advantages of the SSD. Just found his comment from r201139: "I have no idea whether it is normal, but for some reason it takes 200ms to handle any TRIM command on this drive, that was making delete extremely slow. But TRIM command is able to accept long list of LBAs and the length of that list seems doesn't affect it's execution time. Implemented request clusting algorithm allowed me to rise delete rate up to reasonable numbers, when many parallel DELETE requests running." > BTW. Have you actually observed any performance degradation without > TRIM? Not yet. My SSD is still very new. It carries only the base system (/home is on a normal 1TB disk), so not many writes happened so far. But as soon as I start doing more write access (buildworld + installworld, updating ports and so on), I expect that performance will degrade over time. I've also heard from several people on various mailing lists that the performance of their SSD drives got worse after some time. That performance degradation is caused by so-called "static wear leveling". The drive will have to move the contents of blocks that are never (or rarely) written to to other blocks, so they can be overwritten, in order to distribute wear equally over all blocks. If a block is known to be unused (which is the case when the drive is new, or after a TRIM command), the contents don't have to be moved, so the write operation is much faster. I think all modern SSD drives use static wear leveling. Without TRIM support in the file system, a work-around is to "newfs -E" the file system when the performance gets too bad. This requires a backup-restore cycle, of course, so it's a somewhat annoying. Another work-around is to leave some space unused, i.e. don't use 20% at the end of the SSD for any file systems, for example. Since those 20% are never written to, they are known to be unused to the SSD's firmware, so it can use them for wear leveling. This will postpone the performance degradation somewhat, but it won't completely avoid it, ultimately. And wasting some space is not a very satisfying solution either. > I've similar SSDs and from what I tested it somehow can handle > wear leveling internally. You can to TRIM entire disk using this simple > program below, newfs it and test it. It does basically the same as "newfs -E", right? > Then fill it with random data, newfs it again, test it and compare > results. Filling it just once will probably not have much of an effect. In fact, wear leveling will probably not kick in if you just fill the whole disk, because all blocks are used equally anyway. The performance degradation will only start to occur after a while (weeks or months) when some blocks are written much more often than others. In this situation, (static) wear leveling will kick in and start moving data in order to re-use seldom-written-to blocks. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "C is quirky, flawed, and an enormous success." -- Dennis M. Ritchie. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
kirk does have some trim patches
which he sent to me once.. let me look... hmmm ah here it is.. this may or may not be out of date. I'll let Kirk chime in if he thinks it 's worth it.. I include the email from him as an attachment, hopefully it wont get stripped by the list, but you should both get it.. julian On 12/8/10 8:58 AM, Oliver Fromme wrote: > Pawel Jakub Dawidek wrote: > > On Tue, Dec 07, 2010 at 04:31:14PM +0100, Oliver Fromme wrote: > > > I've bought an OCZ Vertex2 E (120 GB SSD) and installed > > > FreeBSD i386 stable/8 on it, using UFS (UFS2, to be exact). > > > I've made sure that the partitions are aligned properly, > > > and used newfs with 4k fragsize and 32k blocksize. > > > It works very well so far. > > (I should also mention that I mounted all filesystems from > the SSD with the "noatime" option, to reduce writes during > normal operation.) > > > > So, my question is, are there plans to add TRIM support > > > to UFS? Is anyone working on it? Or is it already there > > > and I just overlooked it? > > > > I hacked up this patch mostly for Kris and md(4) memory-backed UFS, so > > on file remove space can be returned to the system. > > I see. > > > I think you should ask Kirk what to do about that, but I'm afraid my > > patch can break SU - what if we TRIM, but then panic and on fsck decide > > to actually use the block? > > Oh, you're right. That could be a problem. > > Maybe it would be better to write a separate tool that > performs TRIM commands on areas of the file system that > are unused for a while. > > I also remember that mav@ wrote that the TRIM command is > very slow. So, it's probably not feasible to execute it > each time some blocks are freed, because it would make the > file system much slower and nullify all advantages of the > SSD. > > Just found his comment from r201139: > "I have no idea whether it is normal, but for some reason it takes 200ms > to handle any TRIM command on this drive, that was making delete extremely > slow. But TRIM command is able to accept long list of LBAs and the length of > that list seems doesn't affect it's execution time. Implemented request > clusting algorithm allowed me to rise delete rate up to reasonable numbers, > when many parallel DELETE requests running." > > > BTW. Have you actually observed any performance degradation without > > TRIM? > > Not yet. My SSD is still very new. It carries only the > base system (/home is on a normal 1TB disk), so not many > writes happened so far. But as soon as I start doing more > write access (buildworld + installworld, updating ports > and so on), I expect that performance will degrade over > time. > > I've also heard from several people on various mailing lists > that the performance of their SSD drives got worse after > some time. > > That performance degradation is caused by so-called "static > wear leveling". The drive will have to move the contents > of blocks that are never (or rarely) written to to other > blocks, so they can be overwritten, in order to distribute > wear equally over all blocks. If a block is known to be > unused (which is the case when the drive is new, or after > a TRIM command), the contents don't have to be moved, so > the write operation is much faster. I think all modern > SSD drives use static wear leveling. > > Without TRIM support in the file system, a work-around is > to "newfs -E" the file system when the performance gets > too bad. This requires a backup-restore cycle, of course, > so it's a somewhat annoying. > > Another work-around is to leave some space unused, i.e. > don't use 20% at the end of the SSD for any file systems, > for example. Since those 20% are never written to, they > are known to be unused to the SSD's firmware, so it can > use them for wear leveling. This will postpone the > performance degradation somewhat, but it won't completely > avoid it, ultimately. And wasting some space is not a > very satisfying solution either. > > > I've similar SSDs and from what I tested it somehow can handle > > wear leveling internally. You can to TRIM entire disk using this simple > > program below, newfs it and test it. > > It does basically the same as "newfs -E", right? > > > Then fill it with random data, newfs it again, test it and compare > > results. > > Filling it just once will probably not have much of an > effect. In fact, wear leveling will probably not kick > in if you just fill the whole disk, because all blocks > are used equally anyway. > > The performance degradation will only start to occur > after a while (weeks or months) when some blocks are > written much more often than others. In this situation, > (static) wear leveling will kick in and start moving > data in order to re-use seldom-written-to blocks. > > Best regards > Oliver > > Date: Tue, 03 Nov 2009 14:03:14 -0800
Enclosed is my proposed patch that I sent to Poul_henning Kamp.
> From: Julian Elischer <[hidden email]> > To: Kirk McKusick <[hidden email]> > CC: Scott Long <[hidden email]> > Subject: UFS2 and TRIM command > > Kirk, > > You mentioned at BSDCan that you had done some work with the 'trim' > command to tell a drive that a filesystem was no longer intersted > in a particular block. > > can you let us know what the state of that work was and whether you > have done anything more on it? > > thanks, > > Julian He was working with flash media at the time and wanted notification when blocks were released so he could pre-clear them so they could be used directly when next allocated. i never heard back from him and consequently never followed up on it. ~Kirk =-=-=-= From: Kirk McKusick <[hidden email]> Date: Wed, 21 May 2008 13:19:18 -0700 To: Poul-Henning Kamp <[hidden email]> Subject: UFS and BIO_DELETE X-URL: http://WWW.McKusick.COM/ Reply-To: Kirk McKusick <[hidden email]> I enclose below my proposed patch to add BIO_DELETE to UFS (goes at the end of ffs_blkfree). As I have no way to test it, I am wondering if you could let me know if it works. Also, I am thinking of only enabling it for filesystems mounted with a new flag requesting the behavior since the geteblk is a rather expensive call for the usual no-op case. I did look at just allocating a `struct bio' as a local variable and using that, but it looked like I needed to also come up with a `producer' and/or `consumer' if I wanted to pass it to GEOM directly, so in the end I went with this more expensive solution. If there is an easy way to just pass a bio structure to GEOM, I would much prefer that approach. ~Kirk *** ffs_alloc.c Wed May 21 20:11:04 2008 --- ffs_alloc.c.new Wed May 21 20:10:50 2008 *************** *** 1945,1950 **** --- 1945,1962 ---- ACTIVECLEAR(fs, cg); UFS_UNLOCK(ump); bdwrite(bp); + /* + * Request that the block be cleared. + */ + bp = geteblk(size); + bp->b_iocmd = BIO_DELETE; + bp->b_vp = devvp; + bp->b_blkno = fsbtodb(fs, bno); + bp->b_offset = dbtob(bp->b_blkno); + bp->b_iooffset = bp->b_offset; + bp->b_bcount = size; + BUF_KERNPROC(bp); + BO_STRATEGY(&devvp->v_bufobj, bp); } #ifdef INVARIANTS _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Oliver Fromme
Perhaps a better solution here is a different filesystem? Since seek
time is not an issue for SSDs, establishing large read-only errors with persistent large write journals (a la sprite) in one (large block) locale would be a better fit. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Julian Elischer-5
On Wed, Dec 08, 2010 at 09:07:56AM -0800, Julian Elischer wrote:
> > From: Kirk McKusick <[hidden email]> > Date: Wed, 21 May 2008 13:19:18 -0700 > To: Poul-Henning Kamp <[hidden email]> > Subject: UFS and BIO_DELETE > X-URL: http://WWW.McKusick.COM/ > Reply-To: Kirk McKusick <[hidden email]> > > I enclose below my proposed patch to add BIO_DELETE to UFS > (goes at the end of ffs_blkfree). As I have no way to test > it, I am wondering if you could let me know if it works. > > Also, I am thinking of only enabling it for filesystems mounted > with a new flag requesting the behavior since the geteblk is a > rather expensive call for the usual no-op case. > > I did look at just allocating a `struct bio' as a local variable > and using that, but it looked like I needed to also come up with a > `producer' and/or `consumer' if I wanted to pass it to GEOM directly, > so in the end I went with this more expensive solution. If there > is an easy way to just pass a bio structure to GEOM, I would much > prefer that approach. > > ~Kirk > > > *** ffs_alloc.c Wed May 21 20:11:04 2008 > --- ffs_alloc.c.new Wed May 21 20:10:50 2008 > *************** > *** 1945,1950 **** > --- 1945,1962 ---- > ACTIVECLEAR(fs, cg); > UFS_UNLOCK(ump); > bdwrite(bp); > + /* > + * Request that the block be cleared. > + */ > + bp = geteblk(size); > + bp->b_iocmd = BIO_DELETE; > + bp->b_vp = devvp; > + bp->b_blkno = fsbtodb(fs, bno); > + bp->b_offset = dbtob(bp->b_blkno); > + bp->b_iooffset = bp->b_offset; > + bp->b_bcount = size; > + BUF_KERNPROC(bp); > + BO_STRATEGY(&devvp->v_bufobj, bp); > } > > #ifdef INVARIANTS > used to start BIO_DELETE command without fake bufer allocation. g_io_strategy() may be used as the sample. But, I have a question about the patch. The bitmap block in ffs_blkfree() is handled with delayed write, by bdwrite() call. I think that it is quite possible for BIO_DELETE to be executed before the bitmap block is written and on-disk bit is set in bitmap. Would not this allow for the blocks to be deleted before the bitmap is updated, and potentially before on-disk pointers are cleared that points to the blocks ? SU just rolls back the unfinished dependencies on write, but BIO_DELETE cannot. Also, shouldn't the BIO_DELETE only be issued for real devices, and not the snapshots ? diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c index b740bbb..4ff57ae 100644 --- a/sys/ufs/ffs/ffs_alloc.c +++ b/sys/ufs/ffs/ffs_alloc.c @@ -83,6 +83,8 @@ __FBSDID("$FreeBSD$"); #include <security/audit/audit.h> +#include <geom/geom.h> + #include <ufs/ufs/dir.h> #include <ufs/ufs/extattr.h> #include <ufs/ufs/quota.h> @@ -1850,6 +1852,7 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd) u_int cg; u_int8_t *blksfree; struct cdev *dev; + struct bio *bip; cg = dtog(fs, bno); if (devvp->v_type == VREG) { @@ -1962,6 +1965,16 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd) softdep_setup_blkfree(UFSTOVFS(ump), bp, bno, numfrags(fs, size), dephd); bdwrite(bp); + + if (devvp->v_type != VREG) { + bip = g_alloc_bio(); + bip->bio_cmd = BIO_DELETE; + bip->bio_offset = dbtob(fsbtodb(fs, bno)); + bip->bio_done = g_destroy_bio; + bip->bio_length = size; + g_io_request(bip, + (struct g_consumer *)devvp->v_bufobj.bo_private); + } } #ifdef INVARIANTS |
|
> Date: Wed, 8 Dec 2010 23:49:09 +0200
> From: Kostik Belousov <[hidden email]> > To: Julian Elischer <[hidden email]> > Cc: [hidden email], [hidden email], > Oliver Fromme <[hidden email]>, > Kirk McKusick <[hidden email]> > Subject: Re: TRIM support for UFS? > > On Wed, Dec 08, 2010 at 09:07:56AM -0800, Julian Elischer wrote: > > > > From: Kirk McKusick <[hidden email]> > > Date: Wed, 21 May 2008 13:19:18 -0700 > > To: Poul-Henning Kamp <[hidden email]> > > Subject: UFS and BIO_DELETE > > X-URL: http://WWW.McKusick.COM/ > > Reply-To: Kirk McKusick <[hidden email]> > > > > I enclose below my proposed patch to add BIO_DELETE to UFS > > (goes at the end of ffs_blkfree). As I have no way to test > > it, I am wondering if you could let me know if it works. > > > > Also, I am thinking of only enabling it for filesystems mounted > > with a new flag requesting the behavior since the geteblk is a > > rather expensive call for the usual no-op case. > > > > I did look at just allocating a `struct bio' as a local variable > > and using that, but it looked like I needed to also come up with a > > `producer' and/or `consumer' if I wanted to pass it to GEOM directly, > > so in the end I went with this more expensive solution. If there > > is an easy way to just pass a bio structure to GEOM, I would much > > prefer that approach. > > > > ~Kirk > > > > > > *** ffs_alloc.c Wed May 21 20:11:04 2008 > > --- ffs_alloc.c.new Wed May 21 20:10:50 2008 > > *************** > > *** 1945,1950 **** > > --- 1945,1962 ---- > > ACTIVECLEAR(fs, cg); > > UFS_UNLOCK(ump); > > bdwrite(bp); > > + /* > > + * Request that the block be cleared. > > + */ > > + bp = geteblk(size); > > + bp->b_iocmd = BIO_DELETE; > > + bp->b_vp = devvp; > > + bp->b_blkno = fsbtodb(fs, bno); > > + bp->b_offset = dbtob(bp->b_blkno); > > + bp->b_iooffset = bp->b_offset; > > + bp->b_bcount = size; > > + BUF_KERNPROC(bp); > > + BO_STRATEGY(&devvp->v_bufobj, bp); > > } > > > > #ifdef INVARIANTS > > > The UFS devvp contains the pointer to struct g_consumer in > devpp->v_bufobj->bo_private, so g_alloc_bio() and g_io_request() can be > used to start BIO_DELETE command without fake bufer allocation. > g_io_strategy() may be used as the sample. Thanks for that update to moderize the patch! > But, I have a question about the patch. The bitmap block in ffs_blkfree() > is handled with delayed write, by bdwrite() call. I think that it is > quite possible for BIO_DELETE to be executed before the bitmap block > is written and on-disk bit is set in bitmap. Would not this allow > for the blocks to be deleted before the bitmap is updated, > and potentially before on-disk pointers are cleared that points to > the blocks ? SU just rolls back the unfinished dependencies on write, > but BIO_DELETE cannot. It is safe to do the BIO_DELETE here because at this point we have progressed through the file delete to the point where the inode that claimed this block has been updated on the disk with a NULL pointer. The only step that remains is to update the on-disk bitmap to reflect that the block is available. If the bitmap write is not done before a system crash the only action that will be taken with respect to the freed block is that either background fsck will restore it to the bitmap (SU) or the journal will restore it to the bitmap (SUJ). There is no case where either SU or SUJ will put it back into the file once we have reached this point. > Also, shouldn't the BIO_DELETE only be issued for real devices, > and not the snapshots ? They should also be issued for snapshots. The only time that snapshots give up blocks is when they are deleted. The blocks that they give up at that time are all copies of blocks that were later changed in the filesystem. With the snapshot gone there will be no remaining references to those blocks and they will be made available for reallocation. As such, it would be helpful if they had been TRIMMED. > diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c > index b740bbb..4ff57ae 100644 > --- a/sys/ufs/ffs/ffs_alloc.c > +++ b/sys/ufs/ffs/ffs_alloc.c > @@ -83,6 +83,8 @@ __FBSDID("$FreeBSD$"); > > #include <security/audit/audit.h> > > +#include <geom/geom.h> > + > #include <ufs/ufs/dir.h> > #include <ufs/ufs/extattr.h> > #include <ufs/ufs/quota.h> > @@ -1850,6 +1852,7 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd) > u_int cg; > u_int8_t *blksfree; > struct cdev *dev; > + struct bio *bip; > > cg = dtog(fs, bno); > if (devvp->v_type == VREG) { > @@ -1962,6 +1965,16 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd) > softdep_setup_blkfree(UFSTOVFS(ump), bp, bno, > numfrags(fs, size), dephd); > bdwrite(bp); > + > + if (devvp->v_type != VREG) { > + bip = g_alloc_bio(); > + bip->bio_cmd = BIO_DELETE; > + bip->bio_offset = dbtob(fsbtodb(fs, bno)); > + bip->bio_done = g_destroy_bio; > + bip->bio_length = size; > + g_io_request(bip, > + (struct g_consumer *)devvp->v_bufobj.bo_private); > + } > } > > #ifdef INVARIANTS The above patch looks good though I would do it unconditionally (e.g., also for snapshots). It seems sensible to make it conditional on a sysctl variable so that folks could experiment with it more easily. And I would leave it off by default as non-SSD disks are unlikely to benefit from it. If it does prove useful for SSD disks then I would make it conditional on a filesystem flag so that it is possible to enable it on a filesystem-by-filesystem basis (e.g., on for filesystems on the SSD and off for the rest). Kirk McKusick _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
On Wed, Dec 08, 2010 at 02:23:52PM -0800, Kirk McKusick wrote:
> The above patch looks good though I would do it unconditionally > (e.g., also for snapshots). It seems sensible to make it conditional > on a sysctl variable so that folks could experiment with it more > easily. And I would leave it off by default as non-SSD disks are > unlikely to benefit from it. If it does prove useful for SSD disks > then I would make it conditional on a filesystem flag so that it > is possible to enable it on a filesystem-by-filesystem basis (e.g., > on for filesystems on the SSD and off for the rest). We already have a flag for disks: DISKFLAG_CANDELETE, which tells if the disk support TRIM or not. Next we should add BIO_GETATTR attribute for DISK class to return true if TRIM is supported. This way UFS can ask if TRIM is supported on mount and don't bother sending BIO_DELETE if it is not supported. -- Pawel Jakub Dawidek http://www.wheelsystems.com [hidden email] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! |
|
In reply to this post by Kirk McKusick-2
On Wed, Dec 08, 2010 at 02:23:52PM -0800, Kirk McKusick wrote:
> > Date: Wed, 8 Dec 2010 23:49:09 +0200 > > From: Kostik Belousov <[hidden email]> > > To: Julian Elischer <[hidden email]> > > Cc: [hidden email], [hidden email], > > Oliver Fromme <[hidden email]>, > > Kirk McKusick <[hidden email]> > > Subject: Re: TRIM support for UFS? > > > > On Wed, Dec 08, 2010 at 09:07:56AM -0800, Julian Elischer wrote: > > > > > > From: Kirk McKusick <[hidden email]> > > > Date: Wed, 21 May 2008 13:19:18 -0700 > > > To: Poul-Henning Kamp <[hidden email]> > > > Subject: UFS and BIO_DELETE > > > X-URL: http://WWW.McKusick.COM/ > > > Reply-To: Kirk McKusick <[hidden email]> > > > > > > I enclose below my proposed patch to add BIO_DELETE to UFS > > > (goes at the end of ffs_blkfree). As I have no way to test > > > it, I am wondering if you could let me know if it works. > > > > > > Also, I am thinking of only enabling it for filesystems mounted > > > with a new flag requesting the behavior since the geteblk is a > > > rather expensive call for the usual no-op case. > > > > > > I did look at just allocating a `struct bio' as a local variable > > > and using that, but it looked like I needed to also come up with a > > > `producer' and/or `consumer' if I wanted to pass it to GEOM directly, > > > so in the end I went with this more expensive solution. If there > > > is an easy way to just pass a bio structure to GEOM, I would much > > > prefer that approach. > > > > > > ~Kirk > > > > > > > > > *** ffs_alloc.c Wed May 21 20:11:04 2008 > > > --- ffs_alloc.c.new Wed May 21 20:10:50 2008 > > > *************** > > > *** 1945,1950 **** > > > --- 1945,1962 ---- > > > ACTIVECLEAR(fs, cg); > > > UFS_UNLOCK(ump); > > > bdwrite(bp); > > > + /* > > > + * Request that the block be cleared. > > > + */ > > > + bp = geteblk(size); > > > + bp->b_iocmd = BIO_DELETE; > > > + bp->b_vp = devvp; > > > + bp->b_blkno = fsbtodb(fs, bno); > > > + bp->b_offset = dbtob(bp->b_blkno); > > > + bp->b_iooffset = bp->b_offset; > > > + bp->b_bcount = size; > > > + BUF_KERNPROC(bp); > > > + BO_STRATEGY(&devvp->v_bufobj, bp); > > > } > > > > > > #ifdef INVARIANTS > > > > > The UFS devvp contains the pointer to struct g_consumer in > > devpp->v_bufobj->bo_private, so g_alloc_bio() and g_io_request() can be > > used to start BIO_DELETE command without fake bufer allocation. > > g_io_strategy() may be used as the sample. > > Thanks for that update to moderize the patch! > > > But, I have a question about the patch. The bitmap block in ffs_blkfree() > > is handled with delayed write, by bdwrite() call. I think that it is > > quite possible for BIO_DELETE to be executed before the bitmap block > > is written and on-disk bit is set in bitmap. Would not this allow > > for the blocks to be deleted before the bitmap is updated, > > and potentially before on-disk pointers are cleared that points to > > the blocks ? SU just rolls back the unfinished dependencies on write, > > but BIO_DELETE cannot. > > It is safe to do the BIO_DELETE here because at this point we have > progressed through the file delete to the point where the inode that > claimed this block has been updated on the disk with a NULL pointer. > The only step that remains is to update the on-disk bitmap to reflect > that the block is available. If the bitmap write is not done before > a system crash the only action that will be taken with respect to > the freed block is that either background fsck will restore it to > the bitmap (SU) or the journal will restore it to the bitmap (SUJ). > There is no case where either SU or SUJ will put it back into the > file once we have reached this point. > > > Also, shouldn't the BIO_DELETE only be issued for real devices, > > and not the snapshots ? > > They should also be issued for snapshots. The only time that snapshots > give up blocks is when they are deleted. The blocks that they give up > at that time are all copies of blocks that were later changed in the > filesystem. With the snapshot gone there will be no remaining references > to those blocks and they will be made available for reallocation. As > such, it would be helpful if they had been TRIMMED. > > > diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c > > index b740bbb..4ff57ae 100644 > > --- a/sys/ufs/ffs/ffs_alloc.c > > +++ b/sys/ufs/ffs/ffs_alloc.c > > @@ -83,6 +83,8 @@ __FBSDID("$FreeBSD$"); > > > > #include <security/audit/audit.h> > > > > +#include <geom/geom.h> > > + > > #include <ufs/ufs/dir.h> > > #include <ufs/ufs/extattr.h> > > #include <ufs/ufs/quota.h> > > @@ -1850,6 +1852,7 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd) > > u_int cg; > > u_int8_t *blksfree; > > struct cdev *dev; > > + struct bio *bip; > > > > cg = dtog(fs, bno); > > if (devvp->v_type == VREG) { > > @@ -1962,6 +1965,16 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd) > > softdep_setup_blkfree(UFSTOVFS(ump), bp, bno, > > numfrags(fs, size), dephd); > > bdwrite(bp); > > + > > + if (devvp->v_type != VREG) { > > + bip = g_alloc_bio(); > > + bip->bio_cmd = BIO_DELETE; > > + bip->bio_offset = dbtob(fsbtodb(fs, bno)); > > + bip->bio_done = g_destroy_bio; > > + bip->bio_length = size; > > + g_io_request(bip, > > + (struct g_consumer *)devvp->v_bufobj.bo_private); > > + } > > } > > > > #ifdef INVARIANTS > > The above patch looks good though I would do it unconditionally > (e.g., also for snapshots). It seems sensible to make it conditional > on a sysctl variable so that folks could experiment with it more > easily. And I would leave it off by default as non-SSD disks are > unlikely to benefit from it. If it does prove useful for SSD disks > then I would make it conditional on a filesystem flag so that it > is possible to enable it on a filesystem-by-filesystem basis (e.g., > on for filesystems on the SSD and off for the rest). option. diff --git a/sbin/dumpfs/dumpfs.c b/sbin/dumpfs/dumpfs.c index 38c05f6..fa562dc 100644 --- a/sbin/dumpfs/dumpfs.c +++ b/sbin/dumpfs/dumpfs.c @@ -253,9 +253,11 @@ dumpfs(const char *name) printf("fs_flags expanded "); if (fsflags & FS_NFS4ACLS) printf("nfsv4acls "); + if (fsflags & FS_TRIM) + printf("trim "); fsflags &= ~(FS_UNCLEAN | FS_DOSOFTDEP | FS_NEEDSFSCK | FS_INDEXDIRS | FS_ACLS | FS_MULTILABEL | FS_GJOURNAL | FS_FLAGS_UPDATED | - FS_NFS4ACLS | FS_SUJ); + FS_NFS4ACLS | FS_SUJ | FS_TRIM); if (fsflags != 0) printf("unknown flags (%#x)", fsflags); putchar('\n'); diff --git a/sbin/tunefs/tunefs.c b/sbin/tunefs/tunefs.c index b2ea602..0ed713d 100644 --- a/sbin/tunefs/tunefs.c +++ b/sbin/tunefs/tunefs.c @@ -82,11 +82,13 @@ int main(int argc, char *argv[]) { char *avalue, *jvalue, *Jvalue, *Lvalue, *lvalue, *Nvalue, *nvalue; + char *tvalue; const char *special, *on; const char *name; int active; int Aflag, aflag, eflag, evalue, fflag, fvalue, jflag, Jflag, Lflag; int lflag, mflag, mvalue, Nflag, nflag, oflag, ovalue, pflag, sflag; + int tflag; int svalue, Sflag, Svalue; int ch, found_arg, i; const char *chg[2]; @@ -101,7 +103,8 @@ main(int argc, char *argv[]) evalue = fvalue = mvalue = ovalue = svalue = Svalue = 0; active = 0; found_arg = 0; /* At least one arg is required. */ - while ((ch = getopt(argc, argv, "Aa:e:f:j:J:L:l:m:N:n:o:ps:S:")) != -1) + while ((ch = getopt(argc, argv, "Aa:e:f:j:J:L:l:m:N:n:o:ps:S:t:")) + != -1) switch (ch) { case 'A': @@ -268,6 +271,18 @@ main(int argc, char *argv[]) Sflag = 1; break; + case 't': + found_arg = 1; + name = "trim"; + tvalue = optarg; + if (strcmp(tvalue, "enable") != 0 && + strcmp(tvalue, "disable") != 0) { + errx(10, "bad %s (options are %s)", + name, "`enable' or `disable'"); + } + tflag = 1; + break; + default: usage(); } @@ -493,6 +508,24 @@ main(int argc, char *argv[]) sblock.fs_avgfpdir = svalue; } } + if (tflag) { + name = "issue TRIM to the disk"; + if (strcmp(tvalue, "enable") == 0) { + if (sblock.fs_flags & FS_TRIM) + warnx("%s remains unchanged as enabled", name); + else { + sblock.fs_flags |= FS_TRIM; + warnx("%s set", name); + } + } else if (strcmp(tvalue, "disable") == 0) { + if ((~sblock.fs_flags & FS_TRIM) == FS_TRIM) + warnx("%s remains unchanged as disabled", name); + else { + sblock.fs_flags &= ~FS_TRIM; + warnx("%s cleared", name); + } + } + } if (sbwrite(&disk, Aflag) == -1) goto err; @@ -1011,12 +1044,13 @@ out: void usage(void) { - fprintf(stderr, "%s\n%s\n%s\n%s\n%s\n", + fprintf(stderr, "%s\n%s\n%s\n%s\n%s\n%s\n", "usage: tunefs [-A] [-a enable | disable] [-e maxbpg] [-f avgfilesize]", " [-J enable | disable] [-j enable | disable]", " [-L volname] [-l enable | disable] [-m minfree]", " [-N enable | disable] [-n enable | disable]", -" [-o space | time] [-p] [-s avgfpdir] special | filesystem"); +" [-o space | time] [-p] [-s avgfpdir] [-t enable | disable]", +" special | filesystem"); exit(2); } @@ -1035,6 +1069,8 @@ printfs(void) (sblock.fs_flags & FS_SUJ)? "enabled" : "disabled"); warnx("gjournal: (-J) %s", (sblock.fs_flags & FS_GJOURNAL)? "enabled" : "disabled"); + warnx("trim: (-p) %s", + (sblock.fs_flags & FS_TRIM)? "enabled" : "disabled"); warnx("maximum blocks per file in a cylinder group: (-e) %d", sblock.fs_maxbpg); warnx("average file size: (-f) %d", diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c index b740bbb..ed39aa3 100644 --- a/sys/ufs/ffs/ffs_alloc.c +++ b/sys/ufs/ffs/ffs_alloc.c @@ -83,6 +83,8 @@ __FBSDID("$FreeBSD$"); #include <security/audit/audit.h> +#include <geom/geom.h> + #include <ufs/ufs/dir.h> #include <ufs/ufs/extattr.h> #include <ufs/ufs/quota.h> @@ -1850,6 +1852,7 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd) u_int cg; u_int8_t *blksfree; struct cdev *dev; + struct bio *bip; cg = dtog(fs, bno); if (devvp->v_type == VREG) { @@ -1962,6 +1965,16 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd) softdep_setup_blkfree(UFSTOVFS(ump), bp, bno, numfrags(fs, size), dephd); bdwrite(bp); + + if (fs->fs_flags & FS_TRIM) { + bip = g_alloc_bio(); + bip->bio_cmd = BIO_DELETE; + bip->bio_offset = dbtob(fsbtodb(fs, bno)); + bip->bio_done = g_destroy_bio; + bip->bio_length = size; + g_io_request(bip, + (struct g_consumer *)devvp->v_bufobj.bo_private); + } } #ifdef INVARIANTS diff --git a/sys/ufs/ffs/fs.h b/sys/ufs/ffs/fs.h index ba98ed3..13d9ede 100644 --- a/sys/ufs/ffs/fs.h +++ b/sys/ufs/ffs/fs.h @@ -417,6 +417,7 @@ CTASSERT(sizeof(struct fs) == 1376); #define FS_FLAGS_UPDATED 0x0080 /* flags have been moved to new location */ #define FS_NFS4ACLS 0x0100 /* file system has NFSv4 ACLs enabled */ #define FS_INDEXDIRS 0x0200 /* kernel supports indexed directories */ +#define FS_TRIM 0x0400 /* issue BIO_DELETE for deleted blocks */ /* * Macros to access bits in the fs_active array. |
|
> Date: Thu, 9 Dec 2010 00:53:52 +0200
> From: Kostik Belousov <[hidden email]> > To: Kirk McKusick <[hidden email]> > Cc: Julian Elischer <[hidden email]>, [hidden email], > [hidden email], Oliver Fromme <[hidden email]> > Subject: Re: TRIM support for UFS? > > On Wed, Dec 08, 2010 at 02:23:52PM -0800, Kirk McKusick wrote: > > > > It is safe to do the BIO_DELETE here because at this point we have > > progressed through the file delete to the point where the inode that > > claimed this block has been updated on the disk with a NULL pointer. > > Thanks for the explanation. On the other hand, can my scenario be real > for async mounts ? Async mounts are not compatible with SU or SUJ. You are running the filesystem in a mode where there is no guarantee of recovery after a crash. So yes, you may TRIM blocks that are still claimed by on-disk inodes. But that is likely the least of your worries. Since async makes no claims about filesystem consistency after a crash, adding one more way for it to break does not seem like a big deal :-) > > The above patch looks good though I would do it unconditionally > > (e.g., also for snapshots). It seems sensible to make it conditional > > on a sysctl variable so that folks could experiment with it more > > easily. And I would leave it off by default as non-SSD disks are > > unlikely to benefit from it. If it does prove useful for SSD disks > > then I would make it conditional on a filesystem flag so that it > > is possible to enable it on a filesystem-by-filesystem basis (e.g., > > on for filesystems on the SSD and off for the rest). > > I think it is better to have a flag in superblock instead of the mount > option. > > Most of diff deleted... > > diff --git a/sys/ufs/ffs/fs.h b/sys/ufs/ffs/fs.h > index ba98ed3..13d9ede 100644 > --- a/sys/ufs/ffs/fs.h > +++ b/sys/ufs/ffs/fs.h > @@ -417,6 +417,7 @@ CTASSERT(sizeof(struct fs) == 1376); > #define FS_FLAGS_UPDATED 0x0080 /* flags have been moved to new location */ > #define FS_NFS4ACLS 0x0100 /* file system has NFSv4 ACLs enabled */ > #define FS_INDEXDIRS 0x0200 /* kernel supports indexed directories */ > +#define FS_TRIM 0x0400 /* issue BIO_DELETE for deleted blocks */ > > /* > * Macros to access bits in the fs_active array. I approve of this change and am happy to see it (finally) get added. > On Wed, 8 Dec 2010 at 23:33:27 +0100, Pawel Jakub Dawidek wrote: >> On Wed, Dec 08, 2010 at 02:23:52PM -0800, Kirk McKusick wrote: >> The above patch looks good though I would do it unconditionally >> (e.g., also for snapshots). It seems sensible to make it conditional >> on a sysctl variable so that folks could experiment with it more >> easily. And I would leave it off by default as non-SSD disks are >> unlikely to benefit from it. If it does prove useful for SSD disks >> then I would make it conditional on a filesystem flag so that it >> is possible to enable it on a filesystem-by-filesystem basis (e.g., >> on for filesystems on the SSD and off for the rest). > > We already have a flag for disks: DISKFLAG_CANDELETE, which tells if the > disk support TRIM or not. Next we should add BIO_GETATTR attribute for > DISK class to return true if TRIM is supported. This way UFS can ask if > TRIM is supported on mount and don't bother sending BIO_DELETE if it is > not supported. Clearly asking the underlying device is the most automated way to decide whether to do TRIM. Once it is possible to ask the underlying disk if it supports TRIM, your change could be modified to do the querry to the disk and set the FS_TRIM flag accordingly each time that the filesystem is mounted. If it is easy to add a BIO_GETATTR attribute for the DISK class, then we can skip the intermediate stage of using tunefs to enable/disable it. Kirk McKusick _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
On Wed, Dec 08, 2010 at 03:56:59PM -0800, Kirk McKusick wrote:
> > Date: Thu, 9 Dec 2010 00:53:52 +0200 > > From: Kostik Belousov <[hidden email]> > > To: Kirk McKusick <[hidden email]> > > Cc: Julian Elischer <[hidden email]>, [hidden email], > > [hidden email], Oliver Fromme <[hidden email]> > > Subject: Re: TRIM support for UFS? > > > > On Wed, Dec 08, 2010 at 02:23:52PM -0800, Kirk McKusick wrote: > > > > > > It is safe to do the BIO_DELETE here because at this point we have > > > progressed through the file delete to the point where the inode that > > > claimed this block has been updated on the disk with a NULL pointer. > > > > Thanks for the explanation. On the other hand, can my scenario be real > > for async mounts ? > > Async mounts are not compatible with SU or SUJ. You are running the > filesystem in a mode where there is no guarantee of recovery after > a crash. So yes, you may TRIM blocks that are still claimed by on-disk > inodes. But that is likely the least of your worries. Since async makes > no claims about filesystem consistency after a crash, adding one more > way for it to break does not seem like a big deal :-) > > > > The above patch looks good though I would do it unconditionally > > > (e.g., also for snapshots). It seems sensible to make it conditional > > > on a sysctl variable so that folks could experiment with it more > > > easily. And I would leave it off by default as non-SSD disks are > > > unlikely to benefit from it. If it does prove useful for SSD disks > > > then I would make it conditional on a filesystem flag so that it > > > is possible to enable it on a filesystem-by-filesystem basis (e.g., > > > on for filesystems on the SSD and off for the rest). > > > > I think it is better to have a flag in superblock instead of the mount > > option. > > > > Most of diff deleted... > > > > diff --git a/sys/ufs/ffs/fs.h b/sys/ufs/ffs/fs.h > > index ba98ed3..13d9ede 100644 > > --- a/sys/ufs/ffs/fs.h > > +++ b/sys/ufs/ffs/fs.h > > @@ -417,6 +417,7 @@ CTASSERT(sizeof(struct fs) == 1376); > > #define FS_FLAGS_UPDATED 0x0080 /* flags have been moved to new location */ > > #define FS_NFS4ACLS 0x0100 /* file system has NFSv4 ACLs enabled */ > > #define FS_INDEXDIRS 0x0200 /* kernel supports indexed directories */ > > +#define FS_TRIM 0x0400 /* issue BIO_DELETE for deleted blocks */ > > > > /* > > * Macros to access bits in the fs_active array. > > I approve of this change and am happy to see it (finally) get added. > > > On Wed, 8 Dec 2010 at 23:33:27 +0100, Pawel Jakub Dawidek wrote: > >> On Wed, Dec 08, 2010 at 02:23:52PM -0800, Kirk McKusick wrote: > >> The above patch looks good though I would do it unconditionally > >> (e.g., also for snapshots). It seems sensible to make it conditional > >> on a sysctl variable so that folks could experiment with it more > >> easily. And I would leave it off by default as non-SSD disks are > >> unlikely to benefit from it. If it does prove useful for SSD disks > >> then I would make it conditional on a filesystem flag so that it > >> is possible to enable it on a filesystem-by-filesystem basis (e.g., > >> on for filesystems on the SSD and off for the rest). > > > > We already have a flag for disks: DISKFLAG_CANDELETE, which tells if the > > disk support TRIM or not. Next we should add BIO_GETATTR attribute for > > DISK class to return true if TRIM is supported. This way UFS can ask if > > TRIM is supported on mount and don't bother sending BIO_DELETE if it is > > not supported. > > Clearly asking the underlying device is the most automated way to decide > whether to do TRIM. Once it is possible to ask the underlying disk if it > supports TRIM, your change could be modified to do the querry to the disk > and set the FS_TRIM flag accordingly each time that the filesystem is > mounted. If it is easy to add a BIO_GETATTR attribute for the DISK class, > then we can skip the intermediate stage of using tunefs to enable/disable it. Below is the BIO_GETATTR change integrated. Now, both FS_TRIM should be set and device shall report the GEOM::candelete attribute as true to have BIO_DELETE issued. geom_disk and md(4) report GEOM::candelete. Also, I updated tunefs(8) manpage, missed in the previous version. diff --git a/sbin/dumpfs/dumpfs.c b/sbin/dumpfs/dumpfs.c index 38c05f6..fa562dc 100644 --- a/sbin/dumpfs/dumpfs.c +++ b/sbin/dumpfs/dumpfs.c @@ -253,9 +253,11 @@ dumpfs(const char *name) printf("fs_flags expanded "); if (fsflags & FS_NFS4ACLS) printf("nfsv4acls "); + if (fsflags & FS_TRIM) + printf("trim "); fsflags &= ~(FS_UNCLEAN | FS_DOSOFTDEP | FS_NEEDSFSCK | FS_INDEXDIRS | FS_ACLS | FS_MULTILABEL | FS_GJOURNAL | FS_FLAGS_UPDATED | - FS_NFS4ACLS | FS_SUJ); + FS_NFS4ACLS | FS_SUJ | FS_TRIM); if (fsflags != 0) printf("unknown flags (%#x)", fsflags); putchar('\n'); diff --git a/sbin/tunefs/tunefs.8 b/sbin/tunefs/tunefs.8 index a883cd4..c62af94 100644 --- a/sbin/tunefs/tunefs.8 +++ b/sbin/tunefs/tunefs.8 @@ -28,7 +28,7 @@ .\" @(#)tunefs.8 8.2 (Berkeley) 12/11/93 .\" $FreeBSD$ .\" -.Dd March 6, 2010 +.Dd December 9, 2010 .Dt TUNEFS 8 .Os .Sh NAME @@ -51,6 +51,7 @@ .Op Fl p .Op Fl s Ar avgfpdir .Op Fl S Ar size +.Op Fl t Cm enable | disable .Ar special | filesystem .Sh DESCRIPTION The @@ -143,6 +144,10 @@ Specify the expected number of files per directory. .It Fl S Ar size Specify the softdep journal size in bytes. The minimum is 4M. +.It Fl t Cm enable | disable +Turn on/off the TRIM enable flag. +If enabled, and if the underlaying device supports BIO_DELETE +command, kernel will delete the freed blocks. .El .Pp At least one of the above flags is required. diff --git a/sbin/tunefs/tunefs.c b/sbin/tunefs/tunefs.c index b2ea602..0ed713d 100644 --- a/sbin/tunefs/tunefs.c +++ b/sbin/tunefs/tunefs.c @@ -82,11 +82,13 @@ int main(int argc, char *argv[]) { char *avalue, *jvalue, *Jvalue, *Lvalue, *lvalue, *Nvalue, *nvalue; + char *tvalue; const char *special, *on; const char *name; int active; int Aflag, aflag, eflag, evalue, fflag, fvalue, jflag, Jflag, Lflag; int lflag, mflag, mvalue, Nflag, nflag, oflag, ovalue, pflag, sflag; + int tflag; int svalue, Sflag, Svalue; int ch, found_arg, i; const char *chg[2]; @@ -101,7 +103,8 @@ main(int argc, char *argv[]) evalue = fvalue = mvalue = ovalue = svalue = Svalue = 0; active = 0; found_arg = 0; /* At least one arg is required. */ - while ((ch = getopt(argc, argv, "Aa:e:f:j:J:L:l:m:N:n:o:ps:S:")) != -1) + while ((ch = getopt(argc, argv, "Aa:e:f:j:J:L:l:m:N:n:o:ps:S:t:")) + != -1) switch (ch) { case 'A': @@ -268,6 +271,18 @@ main(int argc, char *argv[]) Sflag = 1; break; + case 't': + found_arg = 1; + name = "trim"; + tvalue = optarg; + if (strcmp(tvalue, "enable") != 0 && + strcmp(tvalue, "disable") != 0) { + errx(10, "bad %s (options are %s)", + name, "`enable' or `disable'"); + } + tflag = 1; + break; + default: usage(); } @@ -493,6 +508,24 @@ main(int argc, char *argv[]) sblock.fs_avgfpdir = svalue; } } + if (tflag) { + name = "issue TRIM to the disk"; + if (strcmp(tvalue, "enable") == 0) { + if (sblock.fs_flags & FS_TRIM) + warnx("%s remains unchanged as enabled", name); + else { + sblock.fs_flags |= FS_TRIM; + warnx("%s set", name); + } + } else if (strcmp(tvalue, "disable") == 0) { + if ((~sblock.fs_flags & FS_TRIM) == FS_TRIM) + warnx("%s remains unchanged as disabled", name); + else { + sblock.fs_flags &= ~FS_TRIM; + warnx("%s cleared", name); + } + } + } if (sbwrite(&disk, Aflag) == -1) goto err; @@ -1011,12 +1044,13 @@ out: void usage(void) { - fprintf(stderr, "%s\n%s\n%s\n%s\n%s\n", + fprintf(stderr, "%s\n%s\n%s\n%s\n%s\n%s\n", "usage: tunefs [-A] [-a enable | disable] [-e maxbpg] [-f avgfilesize]", " [-J enable | disable] [-j enable | disable]", " [-L volname] [-l enable | disable] [-m minfree]", " [-N enable | disable] [-n enable | disable]", -" [-o space | time] [-p] [-s avgfpdir] special | filesystem"); +" [-o space | time] [-p] [-s avgfpdir] [-t enable | disable]", +" special | filesystem"); exit(2); } @@ -1035,6 +1069,8 @@ printfs(void) (sblock.fs_flags & FS_SUJ)? "enabled" : "disabled"); warnx("gjournal: (-J) %s", (sblock.fs_flags & FS_GJOURNAL)? "enabled" : "disabled"); + warnx("trim: (-p) %s", + (sblock.fs_flags & FS_TRIM)? "enabled" : "disabled"); warnx("maximum blocks per file in a cylinder group: (-e) %d", sblock.fs_maxbpg); warnx("average file size: (-f) %d", diff --git a/sys/dev/md/md.c b/sys/dev/md/md.c index 6c3484e..c294d1b 100644 --- a/sys/dev/md/md.c +++ b/sys/dev/md/md.c @@ -713,11 +713,12 @@ md_kthread(void *arg) } mtx_unlock(&sc->queue_mtx); if (bp->bio_cmd == BIO_GETATTR) { - if (sc->fwsectors && sc->fwheads && + if ((sc->fwsectors && sc->fwheads && (g_handleattr_int(bp, "GEOM::fwsectors", sc->fwsectors) || g_handleattr_int(bp, "GEOM::fwheads", - sc->fwheads))) + sc->fwheads))) || + g_handleattr_int(bp, "GEOM::candelete", 1)) error = -1; else error = EOPNOTSUPP; diff --git a/sys/geom/geom_disk.c b/sys/geom/geom_disk.c index 25d2e3b..eda702c 100644 --- a/sys/geom/geom_disk.c +++ b/sys/geom/geom_disk.c @@ -297,6 +297,9 @@ g_disk_start(struct bio *bp) } while (bp2 != NULL); break; case BIO_GETATTR: + if (g_handleattr_int(bp, "GEOM::candelete", + (dp->d_flags & DISKFLAG_CANDELETE) != 0)) + break; if (g_handleattr_int(bp, "GEOM::fwsectors", dp->d_fwsectors)) break; else if (g_handleattr_int(bp, "GEOM::fwheads", dp->d_fwheads)) diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c index b740bbb..926229c 100644 --- a/sys/ufs/ffs/ffs_alloc.c +++ b/sys/ufs/ffs/ffs_alloc.c @@ -83,6 +83,8 @@ __FBSDID("$FreeBSD$"); #include <security/audit/audit.h> +#include <geom/geom.h> + #include <ufs/ufs/dir.h> #include <ufs/ufs/extattr.h> #include <ufs/ufs/quota.h> @@ -1850,6 +1852,7 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd) u_int cg; u_int8_t *blksfree; struct cdev *dev; + struct bio *bip; cg = dtog(fs, bno); if (devvp->v_type == VREG) { @@ -1962,6 +1965,16 @@ ffs_blkfree(ump, fs, devvp, bno, size, inum, dephd) softdep_setup_blkfree(UFSTOVFS(ump), bp, bno, numfrags(fs, size), dephd); bdwrite(bp); + + if (ump->um_candelete) { + bip = g_alloc_bio(); + bip->bio_cmd = BIO_DELETE; + bip->bio_offset = dbtob(fsbtodb(fs, bno)); + bip->bio_done = g_destroy_bio; + bip->bio_length = size; + g_io_request(bip, + (struct g_consumer *)devvp->v_bufobj.bo_private); + } } #ifdef INVARIANTS diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c index 72f40da..f578382 100644 --- a/sys/ufs/ffs/ffs_vfsops.c +++ b/sys/ufs/ffs/ffs_vfsops.c @@ -895,6 +895,21 @@ ffs_mountfs(devvp, mp, td) mp->mnt_stat.f_mntonname); #endif } + if ((fs->fs_flags & FS_TRIM) != 0) { + size = sizeof(int); + if (g_io_getattr("GEOM::candelete", cp, &size, + &ump->um_candelete) == 0) { + if (!ump->um_candelete) + printf( +"WARNING: %s: TRIM flag on fs but disk does not support TRIM\n", + mp->mnt_stat.f_mntonname); + } else { + printf( +"WARNING: %s: TRIM flag on fs but cannot get whether disk supports TRIM\n", + mp->mnt_stat.f_mntonname); + ump->um_candelete = 0; + } + } ump->um_mountp = mp; ump->um_dev = dev; diff --git a/sys/ufs/ffs/fs.h b/sys/ufs/ffs/fs.h index ba98ed3..13d9ede 100644 --- a/sys/ufs/ffs/fs.h +++ b/sys/ufs/ffs/fs.h @@ -417,6 +417,7 @@ CTASSERT(sizeof(struct fs) == 1376); #define FS_FLAGS_UPDATED 0x0080 /* flags have been moved to new location */ #define FS_NFS4ACLS 0x0100 /* file system has NFSv4 ACLs enabled */ #define FS_INDEXDIRS 0x0200 /* kernel supports indexed directories */ +#define FS_TRIM 0x0400 /* issue BIO_DELETE for deleted blocks */ /* * Macros to access bits in the fs_active array. diff --git a/sys/ufs/ufs/ufsmount.h b/sys/ufs/ufs/ufsmount.h index 03edd73..b13db40 100644 --- a/sys/ufs/ufs/ufsmount.h +++ b/sys/ufs/ufs/ufsmount.h @@ -95,6 +95,7 @@ struct ufsmount { time_t um_itime[MAXQUOTAS]; /* inode quota time limit */ char um_qflags[MAXQUOTAS]; /* quota specific flags */ int64_t um_savedmaxfilesize; /* XXX - limit maxfilesize */ + int um_candelete; /* devvp supports TRIM */ int (*um_balloc)(struct vnode *, off_t, int, struct ucred *, int, struct buf **); int (*um_blkatoff)(struct vnode *, off_t, char **, struct buf **); int (*um_truncate)(struct vnode *, off_t, int, struct ucred *, struct thread *); |
|
On Thu, Dec 09, 2010 at 12:34:11PM +0200, Kostik Belousov wrote:
> I still think that TRIM should be opt-in, at least for the first time. > Below is the BIO_GETATTR change integrated. > > Now, both FS_TRIM should be set and device shall report the > GEOM::candelete attribute as true to have BIO_DELETE issued. geom_disk > and md(4) report GEOM::candelete. Also, I updated tunefs(8) manpage, > missed in the previous version. [...] > --- a/sbin/tunefs/tunefs.c > +++ b/sbin/tunefs/tunefs.c [...] > @@ -1035,6 +1069,8 @@ printfs(void) > (sblock.fs_flags & FS_SUJ)? "enabled" : "disabled"); > warnx("gjournal: (-J) %s", > (sblock.fs_flags & FS_GJOURNAL)? "enabled" : "disabled"); > + warnx("trim: (-p) %s", s/-p/-t/ Other than that looks nice and clean. -- Pawel Jakub Dawidek http://www.wheelsystems.com [hidden email] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! |
|
In reply to this post by Konstantin Belousov
Other than the nit pointed out by Pawel, the diffs look good to me.
You should consider adding the -t option to newfs so that the TRIM option can be specified at the time the filesystem is created (as a general rule, anything you can do with tunefs should also be possible with newfs). I agree with your decision to let administrators opt-out of doing TRIM. If experience shows it to be generally useful to have it on, we can change the default to enabled later. If we do change the default to enabled, then we will want to delete the warning about TRIM not being supported by the underlying disk that you added at mount time as we would start getting a lot of them for all the non-SSD disks. Kirk McKusick _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
On 12/9/10 10:13 AM, Kirk McKusick wrote:
> Other than the nit pointed out by Pawel, the diffs look good to me. > You should consider adding the -t option to newfs so that the TRIM > option can be specified at the time the filesystem is created (as > a general rule, anything you can do with tunefs should also be > possible with newfs). > > I agree with your decision to let administrators opt-out of doing > TRIM. If experience shows it to be generally useful to have it on, > we can change the default to enabled later. If we do change the > default to enabled, then we will want to delete the warning about TRIM > not being supported by the underlying disk that you added at mount > time as we would start getting a lot of them for all the non-SSD disks. > > Kirk McKusick > I am very happy to see this happen, and if it goes into 8.x I'll try see that we (fusion-io) support it in our FreeBSD driver as soon as we can. Several people have asked why trim is important for flash based devices.. The answer is that 'trimmed' packets get added back to the 'free pool of pages available for a flash device to use and that helps in two ways. firstly the average free space recoverable when you wring out a used block of sectors increases, meaning that you have to garbage collect fewer of them, which means that you have less overhead eating into your performance, and secondly, when you start off after a period of low utilisation, you have more headroom to write into before you need to start garbage collection. it's a win-win.. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Konstantin Belousov
On 12/9/10 2:34 AM, Kostik Belousov wrote:
[patch] commit when ready.. I'm looking forwards to being able to try it out. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Kirk McKusick-2
On Thu, 09 Dec 2010 10:13:39 PST Kirk McKusick <[hidden email]> wrote:
> Other than the nit pointed out by Pawel, the diffs look good to me. > You should consider adding the -t option to newfs so that the TRIM > option can be specified at the time the filesystem is created (as > a general rule, anything you can do with tunefs should also be > possible with newfs). > > I agree with your decision to let administrators opt-out of doing > TRIM. If experience shows it to be generally useful to have it on, > we can change the default to enabled later. If we do change the > default to enabled, then we will want to delete the warning about TRIM > not being supported by the underlying disk that you added at mount > time as we would start getting a lot of them for all the non-SSD disks. Would be nice if something like ftrim(fd, offset, size) or trim(path, offsetm size) or TRIM file ioctl is added, to free up blocks undelying a given range in a file. ftruncate can delete blocks at the end but there is no facility to lose blocks in the middle. Mainly handy for virtual disks and databases (and would work nicely with SEEK_DATA, SEEK_HOLE). _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Julian Elischer-5
On Thu, Dec 09, 2010 at 04:23:56PM -0800, Julian Elischer wrote:
> On 12/9/10 2:34 AM, Kostik Belousov wrote: > > [patch] > > commit when ready.. I'm looking forwards to being able to try it out. And how does one actually verify TRIM is working? Most SSDs I've used do not have a SMART attribute that can help determine this. I would rather not make a bunch of assumptions about Attributes 226, 232, or 233 to help give some indication of it working. For folks to test this, there needs to be some indicator/verification mechanism that it's working. -- | Jeremy Chadwick [hidden email] | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Bakul Shah
> To: Kirk McKusick <[hidden email]>
> cc: Kostik Belousov <[hidden email]>, [hidden email], > [hidden email], Oliver Fromme <[hidden email]> > Subject: Re: TRIM support for UFS? > Date: Thu, 09 Dec 2010 16:37:48 -0800 > From: Bakul Shah <[hidden email]> > X-ASK-Info: Message Queued (2010/12/09 16:37:54) > X-ASK-Info: Confirmed by User (2010/12/09 16:39:17) > > Would be nice if something like ftrim(fd, offset, size) or > trim(path, offsetm size) or TRIM file ioctl is added, to free > up blocks undelying a given range in a file. ftruncate can > delete blocks at the end but there is no facility to lose > blocks in the middle. Mainly handy for virtual disks and > databases (and would work nicely with SEEK_DATA, SEEK_HOLE). We have discussed adding such a system call over the years (actually we called it `release' rather than `trim'). If such a call ever does get added, any blocks that it actually manages to release will get passed through to BIO_DELETE by the changes that Kostik is soon to add to the system. Kirk McKusick _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Bakul Shah
On Thu, Dec 09, 2010 at 04:37:48PM -0800, Bakul Shah wrote:
> On Thu, 09 Dec 2010 10:13:39 PST Kirk McKusick <[hidden email]> wrote: > > Other than the nit pointed out by Pawel, the diffs look good to me. > > You should consider adding the -t option to newfs so that the TRIM > > option can be specified at the time the filesystem is created (as > > a general rule, anything you can do with tunefs should also be > > possible with newfs). > > > > I agree with your decision to let administrators opt-out of doing > > TRIM. If experience shows it to be generally useful to have it on, > > we can change the default to enabled later. If we do change the > > default to enabled, then we will want to delete the warning about TRIM > > not being supported by the underlying disk that you added at mount > > time as we would start getting a lot of them for all the non-SSD disks. > > Would be nice if something like ftrim(fd, offset, size) or > trim(path, offsetm size) or TRIM file ioctl is added, to free > up blocks undelying a given range in a file. ftruncate can > delete blocks at the end but there is no facility to lose > blocks in the middle. Mainly handy for virtual disks and > databases (and would work nicely with SEEK_DATA, SEEK_HOLE). int g_delete(int fd, off_t offset, off_t length); -- Pawel Jakub Dawidek http://www.wheelsystems.com [hidden email] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! |
| Powered by Nabble | Edit this page |
