|
I did not expect my problems to have vanished, but I wanted to try again.
Should I use the git based port http://stuff.mit.edu/afs/sipb.mit.edu/user/kaduk/freebsd/openafs/openafs-devel.shar.txt you pointed me to earlier for testing? Or should I always use http://web.mit.edu/freebsd/openafs/openafs.shar that you posted to the Quarterly Status Report? With both, I run into the same problem compiling on FreeBSD 8.1. http://svn.freebsd.org/viewvc/base?view=revision&revision=209524 changed the definition of ifa_ifwithnet. In rx/rx_kernel.h, FreeBSD 8.1 needs the same definition of rx_ifaddr_withnet as AFS_OBSD46_ENV (while FreeBSD 8.0 needs the generic one). Should FreeBSD 8.0 still be supported? With the git based port, I get an error on "kldload libafs": "can't load libafs: Exec format error" (missing symbol?) -- openafs-1.5.75 (the other port) does not seem to have this problem. Starting afsd, I realized that I had not updated my CellServDB and thus tried to shutdown afsd, which complained about afs still being mounted. Trying to umount /afs, I got a segfault in the kernel. (I had not actually accessed /afs before doing that.) I guess restarting the afsd is not possible for now. (No big deal.) I listed a few directories without blocks for longer periods of time as with my last testing. Good. Copying a huge file from AFS was terribly slow (even for my DSL connection), but it steadily progressed and I was able to abort it without deadlocking or crashing. Copying a 16MB file to AFS blocked a parallel "ls -l" on the same directory I was copying to, but it eventually finished. The file was not corrupted. Great. The main differences besides being on FreeBSD 8.1 now and using a newer version of the OpenAFS port are that this time I was testing from a slow DSL connection (over a WLAN) and not the LAN connection in my university and I was testing against the AFS of a different department. I will try to repeat under the same conditions as the last tests (aside from the software versions) later. pagsh does not immediately crash anymore -- another improvement, even if it is minor compared to FreeBSD not crashing anymore using AFS. BTW: Thanks for all your work! Cheers, Jan Henrik _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-afs To unsubscribe, send any mail to "[hidden email]" |
|
On 07/23/2010 12:30, Jan Henrik Sylvester wrote:
> I listed a few directories without blocks for longer periods of time as > with my last testing. Good. Copying a huge file from AFS was terribly > slow (even for my DSL connection), but it steadily progressed and I was > able to abort it without deadlocking or crashing. Copying a 16MB file to > AFS blocked a parallel "ls -l" on the same directory I was copying to, > but it eventually finished. The file was not corrupted. Great. I did more testing from University to both of the AFS' I had been testing before. Copying a few MB from AFS and copying a 16MB file to AFS was both fine (showing 6MB/s while copying). Trying to copy a 512MB file to AFS locked all AFS after two seconds that it was showing copy rates of 40MB/s (while the network is only 100Mbit/s). After increasing the AFS cache size to 512MB, almost all of the file got copied before AFS would lock. With a cache of 1GB, the file got copied without a deadlock or corruption. (All this is on MP, I have not tried to disable all but one core.) Rebooting the machine after having done nothing but the successful copy of the 512MB file, I got: Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 05 fault virtual address = 0x290 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff805959ae stack pointer = 0x28:0xffffff807500c6c0 frame pointer = 0x28:0xffffff807500c6e0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 1944 (afsd) trap number = 12 panic: page fault cpuid = 3 Overall, the only problems I got during my tests were copying files larger than the cache size and shutting down afsd. So far, AFS seems to become usable for me (even on MP). Thanks again, Jan Henrik _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-afs To unsubscribe, send any mail to "[hidden email]" |
|
Hi Jan,
Sorry for the long delay in responding -- mail piled up a bit during a busy week. On Fri, 23 Jul 2010, Jan Henrik Sylvester wrote: > On 07/23/2010 12:30, Jan Henrik Sylvester wrote: >> I listed a few directories without blocks for longer periods of time as >> with my last testing. Good. Copying a huge file from AFS was terribly >> slow (even for my DSL connection), but it steadily progressed and I was >> able to abort it without deadlocking or crashing. Copying a 16MB file to >> AFS blocked a parallel "ls -l" on the same directory I was copying to, I'm pretty sure that we're holding an exclusive vnode lock when we're not supposed to, but haven't looked into why the lock diagnostics don't complain about it. >> but it eventually finished. The file was not corrupted. Great. > > I did more testing from University to both of the AFS' I had been testing > before. Copying a few MB from AFS and copying a 16MB file to AFS was both > fine (showing 6MB/s while copying). > > Trying to copy a 512MB file to AFS locked all AFS after two seconds that it > was showing copy rates of 40MB/s (while the network is only 100Mbit/s). After > increasing the AFS cache size to 512MB, almost all of the file got copied > before AFS would lock. With a cache of 1GB, the file got copied without a > deadlock or corruption. (All this is on MP, I have not tried to disable all > but one core.) Do you remember if this was with the git-based port or the 1.5.75 linked from the status report? The latter has an extra patch which band-aids around a reference-counting bug when we need to reclaim used vnodes due to a space crunch. > > Rebooting the machine after having done nothing but the successful copy of > the 512MB file, I got: > Fatal trap 12: page fault while in kernel mode Hm, hard to do much about that without a backtrace. I've seen occasional errors when shutting down afsd (various manifestations), but I'd say it completes successfully at least half the time (umount -f, that is). > > Overall, the only problems I got during my tests were copying files larger > than the cache size and shutting down afsd. So far, AFS seems to become > usable for me (even on MP). Glad to hear things are getting better. On Fri, 23 Jul 2010, Jan Henrik Sylvester wrote: > > I did not expect my problems to have vanished, but I wanted to try again. > > Should I use the git based port > http://stuff.mit.edu/afs/sipb.mit.edu/user/kaduk/freebsd/openafs/openafs-devel.shar.txt > you pointed me to earlier for testing? Or should I always use > http://web.mit.edu/freebsd/openafs/openafs.shar that you posted to the > Quarterly Status Report? I would probably stick to the git-based port, as that will give more useful reports when things break (such as the one you mention below). As I mentioned above, there is one patch in the latter shar which is not in git; it's http://gerrit.openafs.org/2321 . You can add it to the git-based port by stopping after the 'make patch' stage, going into the work directory and running: git pull git://git.openafs.org/openafs refs/changes/21/2321/1 and then proceeding with the configure, build, and install stages. > > With both, I run into the same problem compiling on FreeBSD 8.1. > http://svn.freebsd.org/viewvc/base?view=revision&revision=209524 changed > the definition of ifa_ifwithnet. In rx/rx_kernel.h, FreeBSD 8.1 needs > the same definition of rx_ifaddr_withnet as AFS_OBSD46_ENV (while > FreeBSD 8.0 needs the generic one). Should FreeBSD 8.0 still be supported? > I'll try to get that fix in this weekend (if not sooner). I only have 9-current test boxes, and I think Derrick only has 8.0, so 8.1-specific things would otherwise rely on me noticing relevant changes in the commit emails that go by; this doesn't work very well when I don't have much time to read them :) > With the git based port, I get an error on "kldload libafs": "can't load > libafs: Exec format error" (missing symbol?) -- openafs-1.5.75 (the > other port) does not seem to have this problem. > Sounds like someone introduced a regression since then; thanks for the report. > Starting afsd, I realized that I had not updated my CellServDB and thus > tried to shutdown afsd, which complained about afs still being mounted. > Trying to umount /afs, I got a segfault in the kernel. (I had not > actually accessed /afs before doing that.) I guess restarting the afsd > is not possible for now. (No big deal.) > It ... should be possible, though it is not fully reliable. Be sure to unload and reload the kernel module between unmounting /afs and restarting afsd, though. -Ben Kaduk > > pagsh does not immediately crash anymore -- another improvement, even if > it is minor compared to FreeBSD not crashing anymore using AFS. > > BTW: Thanks for all your work! > > Cheers, > Jan Henrik > _______________________________________________ > [hidden email] mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-afs > To unsubscribe, send any mail to "[hidden email]" [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-afs To unsubscribe, send any mail to "[hidden email]" |
|
On Wed, 28 Jul 2010, Benjamin Kaduk wrote:
> Hi Jan, > > Sorry for the long delay in responding -- mail piled up a bit during a busy > week. > > > On Fri, 23 Jul 2010, Jan Henrik Sylvester wrote: > >> >> >> With both, I run into the same problem compiling on FreeBSD 8.1. >> http://svn.freebsd.org/viewvc/base?view=revision&revision=209524 changed >> the definition of ifa_ifwithnet. In rx/rx_kernel.h, FreeBSD 8.1 needs the >> same definition of rx_ifaddr_withnet as AFS_OBSD46_ENV (while FreeBSD 8.0 >> needs the generic one). Should FreeBSD 8.0 still be supported? >> > > I'll try to get that fix in this weekend (if not sooner). I only have > 9-current test boxes, and I think Derrick only has 8.0, so 8.1-specific > things would otherwise rely on me noticing relevant changes in the commit > emails that go by; this doesn't work very well when I don't have much time to > read them :) That fix is in the tree -- thanks! > >> With the git based port, I get an error on "kldload libafs": "can't load >> libafs: Exec format error" (missing symbol?) -- openafs-1.5.75 (the other >> port) does not seem to have this problem. >> > > Sounds like someone introduced a regression since then; thanks for the > report. > This one proves to be quite a bit more difficult; if you look at the console when you try to load the module (or in dmesg), it complains about a particular undefined symbol, afs_FlushVS. This function is supposed to be called when we are short on cache space (and other things) and need to reclaim space. However ... it isn't implemented anywhere. This codepath changed with commit d29643b0553011cbe60dd127fd31c1847fe02ddb, which enabled disconnected mode always for unix clients -- the old version (for non-disconnected mode) used a different function. It will probably be a while before we rewrite things properly (as it is not exactly clear what "properly" means, at least to me). -Ben _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-afs To unsubscribe, send any mail to "[hidden email]" |
| Powered by Nabble | Edit this page |
