|
The program fio (an IO test in ports) uses pthreads
the following code (from fio-2.0.3, but its in earlier code too) has suddenly started misbehaving. clock_gettime(CLOCK_REALTIME, &t); t.tv_sec += seconds + 10; pthread_mutex_lock(&mutex->lock); while (!mutex->value && !ret) { mutex->waiters++; ret = pthread_cond_timedwait(&mutex->cond, &mutex->lock, &t); mutex->waiters--; } if (!ret) { mutex->value--; pthread_mutex_unlock(&mutex->lock); } It turns out that 'ret' sometimes comes back instantly (on my machine) with a value of 60 (ETIMEDOUT) despite the fact that we set the timeout 10 seconds into the future. Has anyone else seen anything like this? (and yes the condition variable attribute have been set to use the REALTIME clock). _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-threads To unsubscribe, send any mail to "[hidden email]" |
|
on 15/02/2012 23:41 Julian Elischer said the following:
> The program fio (an IO test in ports) uses pthreads > > the following code (from fio-2.0.3, but its in earlier code too) > has suddenly started misbehaving. > > clock_gettime(CLOCK_REALTIME, &t); > t.tv_sec += seconds + 10; > > pthread_mutex_lock(&mutex->lock); > > while (!mutex->value && !ret) { > mutex->waiters++; > ret = pthread_cond_timedwait(&mutex->cond, &mutex->lock, &t); > mutex->waiters--; > } > > if (!ret) { > mutex->value--; > pthread_mutex_unlock(&mutex->lock); > } > > > It turns out that 'ret' sometimes comes back instantly (on my machine) with a > value of 60 (ETIMEDOUT) > despite the fact that we set the timeout 10 seconds into the future. > > Has anyone else seen anything like this? > (and yes the condition variable attribute have been set to use the REALTIME clock). But why? Just a hypothesis that maybe there is some issue with time keeping on that system. How would that code work out for you with MONOTONIC? -- Andriy Gapon _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-threads To unsubscribe, send any mail to "[hidden email]" |
|
On 2/16/12 9:34 AM, Andriy Gapon wrote:
> on 15/02/2012 23:41 Julian Elischer said the following: >> The program fio (an IO test in ports) uses pthreads >> >> the following code (from fio-2.0.3, but its in earlier code too) >> has suddenly started misbehaving. >> >> clock_gettime(CLOCK_REALTIME,&t); >> t.tv_sec += seconds + 10; >> >> pthread_mutex_lock(&mutex->lock); >> >> while (!mutex->value&& !ret) { >> mutex->waiters++; >> ret = pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >> mutex->waiters--; >> } >> >> if (!ret) { >> mutex->value--; >> pthread_mutex_unlock(&mutex->lock); >> } >> >> >> It turns out that 'ret' sometimes comes back instantly (on my machine) with a >> value of 60 (ETIMEDOUT) >> despite the fact that we set the timeout 10 seconds into the future. >> >> Has anyone else seen anything like this? >> (and yes the condition variable attribute have been set to use the REALTIME clock). > But why? > > Just a hypothesis that maybe there is some issue with time keeping on that system. > How would that code work out for you with MONOTONIC? Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC, and they both had the same problem.. i.e. random early returns with ETIMEDOUT. I think we will try move out machine forward to a newer -stable to see if it resolves. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-threads To unsubscribe, send any mail to "[hidden email]" |
|
On 2012-02-16 22:06, Julian Elischer wrote:
> On 2/16/12 9:34 AM, Andriy Gapon wrote: >> on 15/02/2012 23:41 Julian Elischer said the following: >>> The program fio (an IO test in ports) uses pthreads >>> >>> the following code (from fio-2.0.3, but its in earlier code too) >>> has suddenly started misbehaving. >>> >>> clock_gettime(CLOCK_REALTIME,&t); >>> t.tv_sec += seconds + 10; >>> >>> pthread_mutex_lock(&mutex->lock); >>> >>> while (!mutex->value&& !ret) { >>> mutex->waiters++; >>> ret = pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>> mutex->waiters--; >>> } >>> >>> if (!ret) { >>> mutex->value--; >>> pthread_mutex_unlock(&mutex->lock); >>> } >>> >>> >>> It turns out that 'ret' sometimes comes back instantly (on my machine) with a >>> value of 60 (ETIMEDOUT) >>> despite the fact that we set the timeout 10 seconds into the future. >>> >>> Has anyone else seen anything like this? >>> (and yes the condition variable attribute have been set to use the REALTIME clock). >> But why? >> >> Just a hypothesis that maybe there is some issue with time keeping on that system. >> How would that code work out for you with MONOTONIC? > > Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC, and > they both had the same problem.. > i.e. random early returns with ETIMEDOUT. Yep indeed, using either MONOTONIC or REALTIME (and having set both with pthread_condattr_setclock()), no change in behaviour. -- Jens Axboe _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-threads To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Julian Elischer-5
On 2/16/12 1:06 PM, Julian Elischer wrote:
> On 2/16/12 9:34 AM, Andriy Gapon wrote: >> on 15/02/2012 23:41 Julian Elischer said the following: >>> The program fio (an IO test in ports) uses pthreads >>> >>> the following code (from fio-2.0.3, but its in earlier code too) >>> has suddenly started misbehaving. >>> >>> clock_gettime(CLOCK_REALTIME,&t); >>> t.tv_sec += seconds + 10; >>> >>> pthread_mutex_lock(&mutex->lock); >>> >>> while (!mutex->value&& !ret) { >>> mutex->waiters++; >>> ret = >>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>> mutex->waiters--; >>> } >>> >>> if (!ret) { >>> mutex->value--; >>> pthread_mutex_unlock(&mutex->lock); >>> } >>> >>> >>> It turns out that 'ret' sometimes comes back instantly (on my >>> machine) with a >>> value of 60 (ETIMEDOUT) >>> despite the fact that we set the timeout 10 seconds into the future. >>> >>> Has anyone else seen anything like this? >>> (and yes the condition variable attribute have been set to use the >>> REALTIME clock). >> But why? >> >> Just a hypothesis that maybe there is some issue with time keeping >> on that system. >> How would that code work out for you with MONOTONIC? > > Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC, > and they both had the same problem.. > i.e. random early returns with ETIMEDOUT. > > I think we will try move out machine forward to a newer -stable to > see if it resolves. problem still occurs. 8.x does not have this problem. I have not got a 9-RELEASE machine to test on.. so I can not tell if this came in with the burst of stuff that came in after the 9.x branch was unfrozen after the release of 9.0. > > > _______________________________________________ > [hidden email] mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-threads > To unsubscribe, send any mail to > "[hidden email]" > _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-threads To unsubscribe, send any mail to "[hidden email]" |
|
Adding David Xu for his thoughts since he reqrote the code in quesiton
in revision 213098 On 2/16/12 2:57 PM, Julian Elischer wrote: > On 2/16/12 1:06 PM, Julian Elischer wrote: >> On 2/16/12 9:34 AM, Andriy Gapon wrote: >>> on 15/02/2012 23:41 Julian Elischer said the following: >>>> The program fio (an IO test in ports) uses pthreads >>>> >>>> the following code (from fio-2.0.3, but its in earlier code too) >>>> has suddenly started misbehaving. >>>> >>>> clock_gettime(CLOCK_REALTIME,&t); >>>> t.tv_sec += seconds + 10; >>>> >>>> pthread_mutex_lock(&mutex->lock); >>>> >>>> while (!mutex->value&& !ret) { >>>> mutex->waiters++; >>>> ret = >>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>>> mutex->waiters--; >>>> } >>>> >>>> if (!ret) { >>>> mutex->value--; >>>> pthread_mutex_unlock(&mutex->lock); >>>> } >>>> >>>> >>>> It turns out that 'ret' sometimes comes back instantly (on my >>>> machine) with a >>>> value of 60 (ETIMEDOUT) >>>> despite the fact that we set the timeout 10 seconds into the future. >>>> >>>> Has anyone else seen anything like this? >>>> (and yes the condition variable attribute have been set to use >>>> the REALTIME clock). >>> But why? >>> >>> Just a hypothesis that maybe there is some issue with time keeping >>> on that system. >>> How would that code work out for you with MONOTONIC? >> >> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC, >> and they both had the same problem.. >> i.e. random early returns with ETIMEDOUT. >> >> I think we will try move out machine forward to a newer -stable to >> see if it resolves. > Kan upgraded the machine today to today's 9.x branch tip and the > problem still occurs. > 8.x does not have this problem. > > I have not got a 9-RELEASE machine to test on.. so I can not tell if > this came in with the burst of stuff > that came in after the 9.x branch was unfrozen after the release of > 9.0. > > _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-threads To unsubscribe, send any mail to "[hidden email]" |
|
kern.timecounter.tick: 1 kern.timecounter.choice: TSC-low(1000) i8254(0) HPET(950) ACPI-fast(900) dummy(-1000000) kern.timecounter.hardware: ACPI-fast kern.timecounter.stepwarnings: 0 switching the machine from TSC_low to ACPI-fast fixes the problem. in 8.x it used to default to ACPI but I used to switch it to "TSC" to get better performance. I wonder why TSC-low is now bad to use.. maybe the TSCs are not as well sychronised as they were in 8.x? maybe the pthreads code didn't get the memo about changing timers? _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-threads To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Julian Elischer-5
On 2012/2/17 8:42, Julian Elischer wrote:
> Adding David Xu for his thoughts since he reqrote the code in quesiton > in revision 213098 > > On 2/16/12 2:57 PM, Julian Elischer wrote: >> On 2/16/12 1:06 PM, Julian Elischer wrote: >>> On 2/16/12 9:34 AM, Andriy Gapon wrote: >>>> on 15/02/2012 23:41 Julian Elischer said the following: >>>>> The program fio (an IO test in ports) uses pthreads >>>>> >>>>> the following code (from fio-2.0.3, but its in earlier code too) >>>>> has suddenly started misbehaving. >>>>> >>>>> clock_gettime(CLOCK_REALTIME,&t); >>>>> t.tv_sec += seconds + 10; >>>>> >>>>> pthread_mutex_lock(&mutex->lock); >>>>> >>>>> while (!mutex->value&& !ret) { >>>>> mutex->waiters++; >>>>> ret = >>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>>>> mutex->waiters--; >>>>> } >>>>> >>>>> if (!ret) { >>>>> mutex->value--; >>>>> pthread_mutex_unlock(&mutex->lock); >>>>> } >>>>> >>>>> >>>>> It turns out that 'ret' sometimes comes back instantly (on my >>>>> machine) with a >>>>> value of 60 (ETIMEDOUT) >>>>> despite the fact that we set the timeout 10 seconds into the future. >>>>> >>>>> Has anyone else seen anything like this? >>>>> (and yes the condition variable attribute have been set to use the >>>>> REALTIME clock). >>>> But why? >>>> >>>> Just a hypothesis that maybe there is some issue with time keeping >>>> on that system. >>>> How would that code work out for you with MONOTONIC? >>> >>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC, >>> and they both had the same problem.. >>> i.e. random early returns with ETIMEDOUT. >>> >>> I think we will try move out machine forward to a newer -stable to >>> see if it resolves. >> Kan upgraded the machine today to today's 9.x branch tip and the >> problem still occurs. >> 8.x does not have this problem. >> >> I have not got a 9-RELEASE machine to test on.. so I can not tell if >> this came in with the burst of stuff >> that came in after the 9.x branch was unfrozen after the release of 9.0. >> >> > to test ? Regards, David Xu _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-threads To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Julian Elischer-5
On 2012/2/17 9:55, Julian Elischer wrote:
> > kern.timecounter.tick: 1 > kern.timecounter.choice: TSC-low(1000) i8254(0) HPET(950) > ACPI-fast(900) dummy(-1000000) > kern.timecounter.hardware: ACPI-fast > kern.timecounter.stepwarnings: 0 > > switching the machine from TSC_low to ACPI-fast fixes the problem. > > in 8.x it used to default to ACPI > but I used to switch it to "TSC" to get better performance. > > I wonder why TSC-low is now bad to use.. > maybe the TSCs are not as well sychronised as they were in 8.x? > maybe the pthreads code didn't get the memo about changing timers? > _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-threads To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by David Xu
On 2/16/12 5:56 PM, David Xu wrote:
> On 2012/2/17 8:42, Julian Elischer wrote: >> Adding David Xu for his thoughts since he reqrote the code in >> quesiton in revision 213098 >> >> On 2/16/12 2:57 PM, Julian Elischer wrote: >>> On 2/16/12 1:06 PM, Julian Elischer wrote: >>>> On 2/16/12 9:34 AM, Andriy Gapon wrote: >>>>> on 15/02/2012 23:41 Julian Elischer said the following: >>>>>> The program fio (an IO test in ports) uses pthreads >>>>>> >>>>>> the following code (from fio-2.0.3, but its in earlier code too) >>>>>> has suddenly started misbehaving. >>>>>> >>>>>> clock_gettime(CLOCK_REALTIME,&t); >>>>>> t.tv_sec += seconds + 10; >>>>>> >>>>>> pthread_mutex_lock(&mutex->lock); >>>>>> >>>>>> while (!mutex->value&& !ret) { >>>>>> mutex->waiters++; >>>>>> ret = >>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>>>>> mutex->waiters--; >>>>>> } >>>>>> >>>>>> if (!ret) { >>>>>> mutex->value--; >>>>>> pthread_mutex_unlock(&mutex->lock); >>>>>> } >>>>>> >>>>>> >>>>>> It turns out that 'ret' sometimes comes back instantly (on my >>>>>> machine) with a >>>>>> value of 60 (ETIMEDOUT) >>>>>> despite the fact that we set the timeout 10 seconds into the >>>>>> future. >>>>>> >>>>>> Has anyone else seen anything like this? >>>>>> (and yes the condition variable attribute have been set to use >>>>>> the REALTIME clock). >>>>> But why? >>>>> >>>>> Just a hypothesis that maybe there is some issue with time >>>>> keeping on that system. >>>>> How would that code work out for you with MONOTONIC? >>>> >>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC, >>>> and they both had the same problem.. >>>> i.e. random early returns with ETIMEDOUT. >>>> >>>> I think we will try move out machine forward to a newer -stable >>>> to see if it resolves. >>> Kan upgraded the machine today to today's 9.x branch tip and the >>> problem still occurs. >>> 8.x does not have this problem. >>> >>> I have not got a 9-RELEASE machine to test on.. so I can not tell >>> if this came in with the burst of stuff >>> that came in after the 9.x branch was unfrozen after the release >>> of 9.0. >>> >>> >> > I am trying to reproduce the problem, do you have complete sample > code to test ? I'm still looking the exact set but on my machine (4 cpus) the program from ports sysutils/fio exhibits the problem when used with kern.timecounter.hardware=TSC-low and with the following config file: pu05 # cat config.fio [global] #clocksource=cpu direct=1 rw=randread bs=4096 fill_device=1 numjobs=16 iodepth=16 #ioengine=posixaio #ioengine=psync ioengine=psync group_reporting norandommap time_based runtime=60000 randrepeat=0 [file1] filename=/dev/ada0 pu05 # pu05 # fio config.fio fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning. file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 ... file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 fio 2.0.3 Starting 15 threads and 1 process fio: job startup hung? exiting. fio: 5 jobs failed to start Segmentation fault (core dumped) pu05# The reason 5 jobs failed to start is because the parent timed out on them immediately. It didn't time out on 10 of them apparently. if I set the timer to ACPI-fast it works as expected.. > > Regards, > David Xu > > _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-threads To unsubscribe, send any mail to "[hidden email]" |
|
On 2012/2/17 10:19, Julian Elischer wrote:
> On 2/16/12 5:56 PM, David Xu wrote: >> On 2012/2/17 8:42, Julian Elischer wrote: >>> Adding David Xu for his thoughts since he reqrote the code in >>> quesiton in revision 213098 >>> >>> On 2/16/12 2:57 PM, Julian Elischer wrote: >>>> On 2/16/12 1:06 PM, Julian Elischer wrote: >>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote: >>>>>> on 15/02/2012 23:41 Julian Elischer said the following: >>>>>>> The program fio (an IO test in ports) uses pthreads >>>>>>> >>>>>>> the following code (from fio-2.0.3, but its in earlier code too) >>>>>>> has suddenly started misbehaving. >>>>>>> >>>>>>> clock_gettime(CLOCK_REALTIME,&t); >>>>>>> t.tv_sec += seconds + 10; >>>>>>> >>>>>>> pthread_mutex_lock(&mutex->lock); >>>>>>> >>>>>>> while (!mutex->value&& !ret) { >>>>>>> mutex->waiters++; >>>>>>> ret = >>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>>>>>> mutex->waiters--; >>>>>>> } >>>>>>> >>>>>>> if (!ret) { >>>>>>> mutex->value--; >>>>>>> pthread_mutex_unlock(&mutex->lock); >>>>>>> } >>>>>>> >>>>>>> >>>>>>> It turns out that 'ret' sometimes comes back instantly (on my >>>>>>> machine) with a >>>>>>> value of 60 (ETIMEDOUT) >>>>>>> despite the fact that we set the timeout 10 seconds into the >>>>>>> future. >>>>>>> >>>>>>> Has anyone else seen anything like this? >>>>>>> (and yes the condition variable attribute have been set to use >>>>>>> the REALTIME clock). >>>>>> But why? >>>>>> >>>>>> Just a hypothesis that maybe there is some issue with time >>>>>> keeping on that system. >>>>>> How would that code work out for you with MONOTONIC? >>>>> >>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC, >>>>> and they both had the same problem.. >>>>> i.e. random early returns with ETIMEDOUT. >>>>> >>>>> I think we will try move out machine forward to a newer -stable to >>>>> see if it resolves. >>>> Kan upgraded the machine today to today's 9.x branch tip and the >>>> problem still occurs. >>>> 8.x does not have this problem. >>>> >>>> I have not got a 9-RELEASE machine to test on.. so I can not tell >>>> if this came in with the burst of stuff >>>> that came in after the 9.x branch was unfrozen after the release of >>>> 9.0. >>>> >>>> >>> >> I am trying to reproduce the problem, do you have complete sample >> code to test ? > > I'm still looking the exact set > but on my machine (4 cpus) the program from ports sysutils/fio > exhibits the problem when used with > kern.timecounter.hardware=TSC-low and with the following config file: > > pu05 # cat config.fio > > [global] > #clocksource=cpu > direct=1 > rw=randread > bs=4096 > fill_device=1 > numjobs=16 > iodepth=16 > #ioengine=posixaio > #ioengine=psync > ioengine=psync > group_reporting > norandommap > time_based > runtime=60000 > randrepeat=0 > > [file1] > filename=/dev/ada0 > > pu05 # > pu05 # fio config.fio > fio: this platform does not support process shared mutexes, forcing > use of threads. Use the 'thread' option to get rid of this warning. > file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 > ... > file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 > fio 2.0.3 > Starting 15 threads and 1 process > fio: job startup hung? exiting. > fio: 5 jobs failed to start > Segmentation fault (core dumped) > pu05# > > > The reason 5 jobs failed to start is because the parent timed out on > them immediately. > It didn't time out on 10 of them apparently. > > > if I set the timer to ACPI-fast it works as expected.. on each cpu. gettimeofday(&prev, NULL); int cpu = 0; for (;;) { cpuset_t set; cpu = ++cpu % 4; CPU_ZERO(&set); CPU_SET(cpu, &set); pthread_setaffinity_np(pthread_self(), sizeof(set), &set); gettimeofday(&cur, NULL); if ( timercmp(&prev, &cur, >=)) { abort(); } } _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-threads To unsubscribe, send any mail to "[hidden email]" |
|
On 2012/2/17 10:42, David Xu wrote:
> aybe following code can check to see if TSC-LOW works by let the > thread run > on each cpu. > > refresh: gettimeofday(&prev, NULL); int cpu = 0; for (;;) { cpuset_t set; cpu = ++cpu % 4; CPU_ZERO(&set); CPU_SET(cpu, &set); pthread_setaffinity_np(pthread_self(), sizeof(set), &set); gettimeofday(&cur, NULL); if ( timercmp(&prev, &cur, >)) { abort(); } prev = cur; } _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-threads To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by David Xu
adding jkim as he seems to be the last person working with TSC.
On 2/16/12 6:42 PM, David Xu wrote: > On 2012/2/17 10:19, Julian Elischer wrote: >> On 2/16/12 5:56 PM, David Xu wrote: >>> On 2012/2/17 8:42, Julian Elischer wrote: >>>> Adding David Xu for his thoughts since he reqrote the code in >>>> quesiton in revision 213098 >>>> >>>> On 2/16/12 2:57 PM, Julian Elischer wrote: >>>>> On 2/16/12 1:06 PM, Julian Elischer wrote: >>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote: >>>>>>> on 15/02/2012 23:41 Julian Elischer said the following: >>>>>>>> The program fio (an IO test in ports) uses pthreads >>>>>>>> >>>>>>>> the following code (from fio-2.0.3, but its in earlier code too) >>>>>>>> has suddenly started misbehaving. >>>>>>>> >>>>>>>> clock_gettime(CLOCK_REALTIME,&t); >>>>>>>> t.tv_sec += seconds + 10; >>>>>>>> >>>>>>>> pthread_mutex_lock(&mutex->lock); >>>>>>>> >>>>>>>> while (!mutex->value&& !ret) { >>>>>>>> mutex->waiters++; >>>>>>>> ret = >>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>>>>>>> mutex->waiters--; >>>>>>>> } >>>>>>>> >>>>>>>> if (!ret) { >>>>>>>> mutex->value--; >>>>>>>> pthread_mutex_unlock(&mutex->lock); >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> It turns out that 'ret' sometimes comes back instantly (on my >>>>>>>> machine) with a >>>>>>>> value of 60 (ETIMEDOUT) >>>>>>>> despite the fact that we set the timeout 10 seconds into the >>>>>>>> future. >>>>>>>> >>>>>>>> Has anyone else seen anything like this? >>>>>>>> (and yes the condition variable attribute have been set to >>>>>>>> use the REALTIME clock). >>>>>>> But why? >>>>>>> >>>>>>> Just a hypothesis that maybe there is some issue with time >>>>>>> keeping on that system. >>>>>>> How would that code work out for you with MONOTONIC? >>>>>> >>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and >>>>>> CLOCK_MONOTONIC, and they both had the same problem.. >>>>>> i.e. random early returns with ETIMEDOUT. >>>>>> >>>>>> I think we will try move out machine forward to a newer -stable >>>>>> to see if it resolves. >>>>> Kan upgraded the machine today to today's 9.x branch tip and the >>>>> problem still occurs. >>>>> 8.x does not have this problem. >>>>> >>>>> I have not got a 9-RELEASE machine to test on.. so I can not >>>>> tell if this came in with the burst of stuff >>>>> that came in after the 9.x branch was unfrozen after the release >>>>> of 9.0. >>>>> >>>>> >>>> >>> I am trying to reproduce the problem, do you have complete sample >>> code to test ? >> >> I'm still looking the exact set >> but on my machine (4 cpus) the program from ports sysutils/fio >> exhibits the problem when used with >> kern.timecounter.hardware=TSC-low and with the following config file: >> >> pu05 # cat config.fio >> >> [global] >> #clocksource=cpu >> direct=1 >> rw=randread >> bs=4096 >> fill_device=1 >> numjobs=16 >> iodepth=16 >> #ioengine=posixaio >> #ioengine=psync >> ioengine=psync >> group_reporting >> norandommap >> time_based >> runtime=60000 >> randrepeat=0 >> >> [file1] >> filename=/dev/ada0 >> >> pu05 # >> pu05 # fio config.fio >> fio: this platform does not support process shared mutexes, forcing >> use of threads. Use the 'thread' option to get rid of this warning. >> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 >> ... >> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 >> fio 2.0.3 >> Starting 15 threads and 1 process >> fio: job startup hung? exiting. >> fio: 5 jobs failed to start >> Segmentation fault (core dumped) >> pu05# >> >> >> The reason 5 jobs failed to start is because the parent timed out >> on them immediately. >> It didn't time out on 10 of them apparently. >> >> >> if I set the timer to ACPI-fast it works as expected.. > maybe following code can check to see if TSC-LOW works by let the > thread run > on each cpu. > > gettimeofday(&prev, NULL); > int cpu = 0; > for (;;) { > cpuset_t set; > cpu = ++cpu % 4; > CPU_ZERO(&set); > CPU_SET(cpu, &set); > pthread_setaffinity_np(pthread_self(), sizeof(set), &set); > gettimeofday(&cur, NULL); > if ( timercmp(&prev, &cur, >=)) { > abort(); > } > } > > _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-threads To unsubscribe, send any mail to "[hidden email]" |
|
On 2/16/12 11:41 PM, Julian Elischer wrote:
> adding jkim as he seems to be the last person working with TSC. > > > On 2/16/12 6:42 PM, David Xu wrote: >> On 2012/2/17 10:19, Julian Elischer wrote: >>> On 2/16/12 5:56 PM, David Xu wrote: >>>> On 2012/2/17 8:42, Julian Elischer wrote: >>>>> Adding David Xu for his thoughts since he reqrote the code in >>>>> quesiton in revision 213098 >>>>> >>>>> On 2/16/12 2:57 PM, Julian Elischer wrote: >>>>>> On 2/16/12 1:06 PM, Julian Elischer wrote: >>>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote: >>>>>>>> on 15/02/2012 23:41 Julian Elischer said the following: >>>>>>>>> The program fio (an IO test in ports) uses pthreads >>>>>>>>> >>>>>>>>> the following code (from fio-2.0.3, but its in earlier code >>>>>>>>> too) >>>>>>>>> has suddenly started misbehaving. >>>>>>>>> >>>>>>>>> clock_gettime(CLOCK_REALTIME,&t); >>>>>>>>> t.tv_sec += seconds + 10; >>>>>>>>> >>>>>>>>> pthread_mutex_lock(&mutex->lock); >>>>>>>>> >>>>>>>>> while (!mutex->value&& !ret) { >>>>>>>>> mutex->waiters++; >>>>>>>>> ret = >>>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>>>>>>>> mutex->waiters--; >>>>>>>>> } >>>>>>>>> >>>>>>>>> if (!ret) { >>>>>>>>> mutex->value--; >>>>>>>>> pthread_mutex_unlock(&mutex->lock); >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> It turns out that 'ret' sometimes comes back instantly (on >>>>>>>>> my machine) with a >>>>>>>>> value of 60 (ETIMEDOUT) >>>>>>>>> despite the fact that we set the timeout 10 seconds into the >>>>>>>>> future. >>>>>>>>> >>>>>>>>> Has anyone else seen anything like this? >>>>>>>>> (and yes the condition variable attribute have been set to >>>>>>>>> use the REALTIME clock). >>>>>>>> But why? >>>>>>>> >>>>>>>> Just a hypothesis that maybe there is some issue with time >>>>>>>> keeping on that system. >>>>>>>> How would that code work out for you with MONOTONIC? >>>>>>> >>>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and >>>>>>> CLOCK_MONOTONIC, and they both had the same problem.. >>>>>>> i.e. random early returns with ETIMEDOUT. >>>>>>> >>>>>>> I think we will try move out machine forward to a newer >>>>>>> -stable to see if it resolves. >>>>>> Kan upgraded the machine today to today's 9.x branch tip and >>>>>> the problem still occurs. >>>>>> 8.x does not have this problem. >>>>>> >>>>>> I have not got a 9-RELEASE machine to test on.. so I can not >>>>>> tell if this came in with the burst of stuff >>>>>> that came in after the 9.x branch was unfrozen after the >>>>>> release of 9.0. >>>>>> >>>>>> >>>>> >>>> I am trying to reproduce the problem, do you have complete >>>> sample code to test ? >>> >>> I'm still looking the exact set >>> but on my machine (4 cpus) the program from ports sysutils/fio >>> exhibits the problem when used with >>> kern.timecounter.hardware=TSC-low and with the following config file: >>> >>> pu05 # cat config.fio >>> >>> [global] >>> #clocksource=cpu >>> direct=1 >>> rw=randread >>> bs=4096 >>> fill_device=1 >>> numjobs=16 >>> iodepth=16 >>> #ioengine=posixaio >>> #ioengine=psync >>> ioengine=psync >>> group_reporting >>> norandommap >>> time_based >>> runtime=60000 >>> randrepeat=0 >>> >>> [file1] >>> filename=/dev/ada0 >>> >>> pu05 # >>> pu05 # fio config.fio >>> fio: this platform does not support process shared mutexes, >>> forcing use of threads. Use the 'thread' option to get rid of this >>> warning. >>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 >>> ... >>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 >>> fio 2.0.3 >>> Starting 15 threads and 1 process >>> fio: job startup hung? exiting. >>> fio: 5 jobs failed to start >>> Segmentation fault (core dumped) >>> pu05# >>> >>> >>> The reason 5 jobs failed to start is because the parent timed out >>> on them immediately. >>> It didn't time out on 10 of them apparently. >>> >>> >>> if I set the timer to ACPI-fast it works as expected.. >> maybe following code can check to see if TSC-LOW works by let the >> thread run >> on each cpu. >> >> gettimeofday(&prev, NULL); >> int cpu = 0; >> for (;;) { >> cpuset_t set; >> cpu = ++cpu % 4; >> CPU_ZERO(&set); >> CPU_SET(cpu, &set); >> pthread_setaffinity_np(pthread_self(), sizeof(set), &set); >> gettimeofday(&cur, NULL); >> if ( timercmp(&prev, &cur, >=)) { >> abort(); >> } >> } >> >> pu05# sysctl kern.timecounter.hardware=TSC-low kern.timecounter.hardware: ACPI-fast -> TSC-low pu05# ./test ^C pu05# cat test.c #include <stdlib.h> #include <sys/param.h> #include <sys/cpuset.h> #include <pthread_np.h> #include <sys/time.h> main() { int cpu = 0; struct timeval prev, cur; gettimeofday(&prev, NULL); for (;;) { cpuset_t set; cpu = ++cpu % 4; CPU_ZERO(&set); CPU_SET(cpu, &set); pthread_setaffinity_np(pthread_self(), sizeof(set), &set); gettimeofday(&cur, NULL); if ( timercmp(&prev, &cur, >)) { abort(); } prev = cur; } } pu05# ./test minutes pass....... ^C pu05# so it looks as if the TSC is working ok.. I'm just going to check that the program is actually moving CPU... yes it is moving around but I can't tell at what speed. (according to top). so we are still left with a question of "where is the problem?" kernel TSC driver? generic gettimeofday() code? pthreads cond code? the application? _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-threads To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Julian Elischer-5
on 17/02/2012 03:55 Julian Elischer said the following:
> > kern.timecounter.tick: 1 > kern.timecounter.choice: TSC-low(1000) i8254(0) HPET(950) ACPI-fast(900) > dummy(-1000000) > kern.timecounter.hardware: ACPI-fast > kern.timecounter.stepwarnings: 0 > > switching the machine from TSC_low to ACPI-fast fixes the problem. > > in 8.x it used to default to ACPI > but I used to switch it to "TSC" to get better performance. > > I wonder why TSC-low is now bad to use.. > maybe the TSCs are not as well sychronised as they were in 8.x? > maybe the pthreads code didn't get the memo about changing timers? More useful information that you can provide: - C-states configuration - CPU identification I see that you've already contacted jkim, that's useful too. -- Andriy Gapon _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-threads To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Julian Elischer-5
On 2012/2/17 16:06, Julian Elischer wrote:
> On 2/16/12 11:41 PM, Julian Elischer wrote: >> adding jkim as he seems to be the last person working with TSC. >> >> >> On 2/16/12 6:42 PM, David Xu wrote: >>> On 2012/2/17 10:19, Julian Elischer wrote: >>>> On 2/16/12 5:56 PM, David Xu wrote: >>>>> On 2012/2/17 8:42, Julian Elischer wrote: >>>>>> Adding David Xu for his thoughts since he reqrote the code in >>>>>> quesiton in revision 213098 >>>>>> >>>>>> On 2/16/12 2:57 PM, Julian Elischer wrote: >>>>>>> On 2/16/12 1:06 PM, Julian Elischer wrote: >>>>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote: >>>>>>>>> on 15/02/2012 23:41 Julian Elischer said the following: >>>>>>>>>> The program fio (an IO test in ports) uses pthreads >>>>>>>>>> >>>>>>>>>> the following code (from fio-2.0.3, but its in earlier code too) >>>>>>>>>> has suddenly started misbehaving. >>>>>>>>>> >>>>>>>>>> clock_gettime(CLOCK_REALTIME,&t); >>>>>>>>>> t.tv_sec += seconds + 10; >>>>>>>>>> >>>>>>>>>> pthread_mutex_lock(&mutex->lock); >>>>>>>>>> >>>>>>>>>> while (!mutex->value&& !ret) { >>>>>>>>>> mutex->waiters++; >>>>>>>>>> ret = >>>>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>>>>>>>>> mutex->waiters--; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> if (!ret) { >>>>>>>>>> mutex->value--; >>>>>>>>>> pthread_mutex_unlock(&mutex->lock); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> It turns out that 'ret' sometimes comes back instantly (on my >>>>>>>>>> machine) with a >>>>>>>>>> value of 60 (ETIMEDOUT) >>>>>>>>>> despite the fact that we set the timeout 10 seconds into the >>>>>>>>>> future. >>>>>>>>>> >>>>>>>>>> Has anyone else seen anything like this? >>>>>>>>>> (and yes the condition variable attribute have been set to >>>>>>>>>> use the REALTIME clock). >>>>>>>>> But why? >>>>>>>>> >>>>>>>>> Just a hypothesis that maybe there is some issue with time >>>>>>>>> keeping on that system. >>>>>>>>> How would that code work out for you with MONOTONIC? >>>>>>>> >>>>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and >>>>>>>> CLOCK_MONOTONIC, and they both had the same problem.. >>>>>>>> i.e. random early returns with ETIMEDOUT. >>>>>>>> >>>>>>>> I think we will try move out machine forward to a newer -stable >>>>>>>> to see if it resolves. >>>>>>> Kan upgraded the machine today to today's 9.x branch tip and the >>>>>>> problem still occurs. >>>>>>> 8.x does not have this problem. >>>>>>> >>>>>>> I have not got a 9-RELEASE machine to test on.. so I can not >>>>>>> tell if this came in with the burst of stuff >>>>>>> that came in after the 9.x branch was unfrozen after the release >>>>>>> of 9.0. >>>>>>> >>>>>>> >>>>>> >>>>> I am trying to reproduce the problem, do you have complete sample >>>>> code to test ? >>>> >>>> I'm still looking the exact set >>>> but on my machine (4 cpus) the program from ports sysutils/fio >>>> exhibits the problem when used with >>>> kern.timecounter.hardware=TSC-low and with the following config file: >>>> >>>> pu05 # cat config.fio >>>> >>>> [global] >>>> #clocksource=cpu >>>> direct=1 >>>> rw=randread >>>> bs=4096 >>>> fill_device=1 >>>> numjobs=16 >>>> iodepth=16 >>>> #ioengine=posixaio >>>> #ioengine=psync >>>> ioengine=psync >>>> group_reporting >>>> norandommap >>>> time_based >>>> runtime=60000 >>>> randrepeat=0 >>>> >>>> [file1] >>>> filename=/dev/ada0 >>>> >>>> pu05 # >>>> pu05 # fio config.fio >>>> fio: this platform does not support process shared mutexes, forcing >>>> use of threads. Use the 'thread' option to get rid of this warning. >>>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 >>>> ... >>>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16 >>>> fio 2.0.3 >>>> Starting 15 threads and 1 process >>>> fio: job startup hung? exiting. >>>> fio: 5 jobs failed to start >>>> Segmentation fault (core dumped) >>>> pu05# >>>> >>>> >>>> The reason 5 jobs failed to start is because the parent timed out >>>> on them immediately. >>>> It didn't time out on 10 of them apparently. >>>> >>>> >>>> if I set the timer to ACPI-fast it works as expected.. >>> maybe following code can check to see if TSC-LOW works by let the >>> thread run >>> on each cpu. >>> >>> gettimeofday(&prev, NULL); >>> int cpu = 0; >>> for (;;) { >>> cpuset_t set; >>> cpu = ++cpu % 4; >>> CPU_ZERO(&set); >>> CPU_SET(cpu, &set); >>> pthread_setaffinity_np(pthread_self(), sizeof(set), &set); >>> gettimeofday(&cur, NULL); >>> if ( timercmp(&prev, &cur, >=)) { >>> abort(); >>> } >>> } >>> >>> > > pu05# sysctl kern.timecounter.hardware=TSC-low > kern.timecounter.hardware: ACPI-fast -> TSC-low > pu05# ./test > ^C > pu05# cat test.c > > #include <stdlib.h> > #include <sys/param.h> > #include <sys/cpuset.h> > #include <pthread_np.h> > > #include <sys/time.h> > > main() > { > int cpu = 0; > struct timeval prev, cur; > > gettimeofday(&prev, NULL); > for (;;) { > cpuset_t set; > cpu = ++cpu % 4; > CPU_ZERO(&set); > CPU_SET(cpu, &set); > pthread_setaffinity_np(pthread_self(), sizeof(set), &set); > gettimeofday(&cur, NULL); > if ( timercmp(&prev, &cur, >)) { > abort(); > } > prev = cur; > } > } > > pu05# ./test > > minutes pass....... > > ^C > pu05# > > so it looks as if the TSC is working ok.. > I'm just going to check that the program is actually moving CPU... > yes it is moving around but I can't tell at what speed. (according to > top). > > so we are still left with a question of "where is the problem?" > > kernel TSC driver? > generic gettimeofday() code? > pthreads cond code? > the application? > > it is on 9.0-RC3, I can not reproduce the problem for minutes, then I interrupt it with ctrl-c: http://people.freebsd.org/~davidxu/tsc_pthread/dmesg.txt http://people.freebsd.org/~davidxu/tsc_pthread/tc.txt http://people.freebsd.org/~davidxu/tsc_pthread/fio.txt _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-threads To unsubscribe, send any mail to "[hidden email]" |
|
On Friday 17 February 2012 06:28 am, David Xu wrote:
> On 2012/2/17 16:06, Julian Elischer wrote: > > On 2/16/12 11:41 PM, Julian Elischer wrote: > >> adding jkim as he seems to be the last person working with TSC. > >> > >> On 2/16/12 6:42 PM, David Xu wrote: > >>> On 2012/2/17 10:19, Julian Elischer wrote: > >>>> On 2/16/12 5:56 PM, David Xu wrote: > >>>>> On 2012/2/17 8:42, Julian Elischer wrote: > >>>>>> Adding David Xu for his thoughts since he reqrote the code > >>>>>> in quesiton in revision 213098 > >>>>>> > >>>>>> On 2/16/12 2:57 PM, Julian Elischer wrote: > >>>>>>> On 2/16/12 1:06 PM, Julian Elischer wrote: > >>>>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote: > >>>>>>>>> on 15/02/2012 23:41 Julian Elischer said the following: > >>>>>>>>>> The program fio (an IO test in ports) uses pthreads > >>>>>>>>>> > >>>>>>>>>> the following code (from fio-2.0.3, but its in earlier > >>>>>>>>>> code too) has suddenly started misbehaving. > >>>>>>>>>> > >>>>>>>>>> clock_gettime(CLOCK_REALTIME,&t); > >>>>>>>>>> t.tv_sec += seconds + 10; > >>>>>>>>>> > >>>>>>>>>> pthread_mutex_lock(&mutex->lock); > >>>>>>>>>> > >>>>>>>>>> while (!mutex->value&& !ret) { > >>>>>>>>>> mutex->waiters++; > >>>>>>>>>> ret = > >>>>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); > >>>>>>>>>> mutex->waiters--; > >>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> if (!ret) { > >>>>>>>>>> mutex->value--; > >>>>>>>>>> pthread_mutex_unlock(&mutex->lock); > >>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> It turns out that 'ret' sometimes comes back instantly > >>>>>>>>>> (on my machine) with a > >>>>>>>>>> value of 60 (ETIMEDOUT) > >>>>>>>>>> despite the fact that we set the timeout 10 seconds into > >>>>>>>>>> the future. > >>>>>>>>>> > >>>>>>>>>> Has anyone else seen anything like this? > >>>>>>>>>> (and yes the condition variable attribute have been set > >>>>>>>>>> to use the REALTIME clock). > >>>>>>>>> > >>>>>>>>> But why? > >>>>>>>>> > >>>>>>>>> Just a hypothesis that maybe there is some issue with > >>>>>>>>> time keeping on that system. > >>>>>>>>> How would that code work out for you with MONOTONIC? > >>>>>>>> > >>>>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and > >>>>>>>> CLOCK_MONOTONIC, and they both had the same problem.. > >>>>>>>> i.e. random early returns with ETIMEDOUT. > >>>>>>>> > >>>>>>>> I think we will try move out machine forward to a newer > >>>>>>>> -stable to see if it resolves. > >>>>>>> > >>>>>>> Kan upgraded the machine today to today's 9.x branch tip > >>>>>>> and the problem still occurs. > >>>>>>> 8.x does not have this problem. > >>>>>>> > >>>>>>> I have not got a 9-RELEASE machine to test on.. so I can > >>>>>>> not tell if this came in with the burst of stuff > >>>>>>> that came in after the 9.x branch was unfrozen after the > >>>>>>> release of 9.0. > >>>>> > >>>>> I am trying to reproduce the problem, do you have complete > >>>>> sample code to test ? > >>>> > >>>> I'm still looking the exact set > >>>> but on my machine (4 cpus) the program from ports sysutils/fio > >>>> exhibits the problem when used with > >>>> kern.timecounter.hardware=TSC-low and with the following > >>>> config file: > >>>> > >>>> pu05 # cat config.fio > >>>> > >>>> [global] > >>>> #clocksource=cpu > >>>> direct=1 > >>>> rw=randread > >>>> bs=4096 > >>>> fill_device=1 > >>>> numjobs=16 > >>>> iodepth=16 > >>>> #ioengine=posixaio > >>>> #ioengine=psync > >>>> ioengine=psync > >>>> group_reporting > >>>> norandommap > >>>> time_based > >>>> runtime=60000 > >>>> randrepeat=0 > >>>> > >>>> [file1] > >>>> filename=/dev/ada0 > >>>> > >>>> pu05 # > >>>> pu05 # fio config.fio > >>>> fio: this platform does not support process shared mutexes, > >>>> forcing use of threads. Use the 'thread' option to get rid of > >>>> this warning. file1: (g=0): rw=randread, bs=4K-4K/4K-4K, > >>>> ioengine=psync, iodepth=16 ... > >>>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, > >>>> iodepth=16 fio 2.0.3 > >>>> Starting 15 threads and 1 process > >>>> fio: job startup hung? exiting. > >>>> fio: 5 jobs failed to start > >>>> Segmentation fault (core dumped) > >>>> pu05# > >>>> > >>>> > >>>> The reason 5 jobs failed to start is because the parent timed > >>>> out on them immediately. > >>>> It didn't time out on 10 of them apparently. > >>>> > >>>> > >>>> if I set the timer to ACPI-fast it works as expected.. > >>> > >>> maybe following code can check to see if TSC-LOW works by let > >>> the thread run > >>> on each cpu. > >>> > >>> gettimeofday(&prev, NULL); > >>> int cpu = 0; > >>> for (;;) { > >>> cpuset_t set; > >>> cpu = ++cpu % 4; > >>> CPU_ZERO(&set); > >>> CPU_SET(cpu, &set); > >>> pthread_setaffinity_np(pthread_self(), sizeof(set), &set); > >>> gettimeofday(&cur, NULL); > >>> if ( timercmp(&prev, &cur, >=)) { > >>> abort(); > >>> } > >>> } > > > > pu05# sysctl kern.timecounter.hardware=TSC-low > > kern.timecounter.hardware: ACPI-fast -> TSC-low > > pu05# ./test > > ^C > > pu05# cat test.c > > > > #include <stdlib.h> > > #include <sys/param.h> > > #include <sys/cpuset.h> > > #include <pthread_np.h> > > > > #include <sys/time.h> > > > > main() > > { > > int cpu = 0; > > struct timeval prev, cur; > > > > gettimeofday(&prev, NULL); > > for (;;) { > > cpuset_t set; > > cpu = ++cpu % 4; > > CPU_ZERO(&set); > > CPU_SET(cpu, &set); > > pthread_setaffinity_np(pthread_self(), sizeof(set), > > &set); gettimeofday(&cur, NULL); > > if ( timercmp(&prev, &cur, >)) { > > abort(); > > } > > prev = cur; > > } > > } > > > > pu05# ./test > > > > minutes pass....... > > > > ^C > > pu05# > > > > so it looks as if the TSC is working ok.. > > I'm just going to check that the program is actually moving > > CPU... yes it is moving around but I can't tell at what speed. > > (according to top). > > > > so we are still left with a question of "where is the problem?" > > > > kernel TSC driver? > > generic gettimeofday() code? > > pthreads cond code? > > the application? > > I am running the fio test on my notebook which is using TSC-low, > it is on 9.0-RC3, I can not reproduce the problem for > minutes, then I interrupt it with ctrl-c: > > http://people.freebsd.org/~davidxu/tsc_pthread/dmesg.txt > http://people.freebsd.org/~davidxu/tsc_pthread/tc.txt > http://people.freebsd.org/~davidxu/tsc_pthread/fio.txt Your CPU is single-package, dual-core, and SMT-enabled. All cores should be in perfect sync. Jung-uk Kim _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-threads To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Julian Elischer-5
On Thursday 16 February 2012 08:55 pm, Julian Elischer wrote:
> kern.timecounter.tick: 1 > kern.timecounter.choice: TSC-low(1000) i8254(0) HPET(950) > ACPI-fast(900) dummy(-1000000) > kern.timecounter.hardware: ACPI-fast > kern.timecounter.stepwarnings: 0 > > switching the machine from TSC_low to ACPI-fast fixes the problem. > > in 8.x it used to default to ACPI > but I used to switch it to "TSC" to get better performance. > > I wonder why TSC-low is now bad to use.. > maybe the TSCs are not as well sychronised as they were in 8.x? Can you please show us verbose dmesg output? FYI, TSC and TSC-low are not very different. TSC-low is just lower resolution version of TSC for SMP. Only difference is, we have automated your timecounter choice, i.e., if TSCs seem reasonably well-synchronized, select it by default but give lower resolution. In other words, if your TSC timecounter was never going backwards previously, TSC-low timecounter won't, guaranteed. So, the root cause should be somewhere else. Jung-uk Kim _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-threads To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by David Xu
On 2/17/12 3:28 AM, David Xu wrote:
> On 2012/2/17 16:06, Julian Elischer wrote: >> On 2/16/12 11:41 PM, Julian Elischer wrote: >>> adding jkim as he seems to be the last person working with TSC. >>> >>> >>> On 2/16/12 6:42 PM, David Xu wrote: >>>> On 2012/2/17 10:19, Julian Elischer wrote: >>>>> On 2/16/12 5:56 PM, David Xu wrote: >>>>>> On 2012/2/17 8:42, Julian Elischer wrote: >>>>>>> Adding David Xu for his thoughts since he reqrote the code in >>>>>>> quesiton in revision 213098 >>>>>>> >>>>>>> On 2/16/12 2:57 PM, Julian Elischer wrote: >>>>>>>> On 2/16/12 1:06 PM, Julian Elischer wrote: >>>>>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote: >>>>>>>>>> on 15/02/2012 23:41 Julian Elischer said the following: >>>>>>>>>>> The program fio (an IO test in ports) uses pthreads >>>>>>>>>>> >>>>>>>>>>> the following code (from fio-2.0.3, but its in earlier >>>>>>>>>>> code too) >>>>>>>>>>> has suddenly started misbehaving. >>>>>>>>>>> >>>>>>>>>>> clock_gettime(CLOCK_REALTIME,&t); >>>>>>>>>>> t.tv_sec += seconds + 10; >>>>>>>>>>> >>>>>>>>>>> pthread_mutex_lock(&mutex->lock); >>>>>>>>>>> >>>>>>>>>>> while (!mutex->value&& !ret) { >>>>>>>>>>> mutex->waiters++; >>>>>>>>>>> ret = >>>>>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>>>>>>>>>> mutex->waiters--; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> if (!ret) { >>>>>>>>>>> mutex->value--; >>>>>>>>>>> pthread_mutex_unlock(&mutex->lock); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> It turns out that 'ret' sometimes comes back instantly (on >>>>>>>>>>> my machine) with a >>>>>>>>>>> value of 60 (ETIMEDOUT) >>>>>>>>>>> despite the fact that we set the timeout 10 seconds into >>>>>>>>>>> the future. >>>>>>>>>>> >>>>>>>>>>> Has anyone else seen anything like this? >>>>>>>>>>> (and yes the condition variable attribute have been set to >>>>>>>>>>> use the REALTIME clock). >>>>>>>>>> But why? >>>>>>>>>> >>>>>>>>>> Just a hypothesis that maybe there is some issue with time >>>>>>>>>> keeping on that system. >>>>>>>>>> How would that code work out for you with MONOTONIC? >>>>>>>>> >>>>>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and >>>>>>>>> CLOCK_MONOTONIC, and they both had the same problem.. >>>>>>>>> i.e. random early returns with ETIMEDOUT. >>>>>>>>> >>>>>>>>> I think we will try move out machine forward to a newer >>>>>>>>> -stable to see if it resolves. >>>>>>>> Kan upgraded the machine today to today's 9.x branch tip and >>>>>>>> the problem still occurs. >>>>>>>> 8.x does not have this problem. >>>>>>>> >>>>>>>> I have not got a 9-RELEASE machine to test on.. so I can not >>>>>>>> tell if this came in with the burst of stuff >>>>>>>> that came in after the 9.x branch was unfrozen after the >>>>>>>> release of 9.0. >>>>>>>> >>>>>>>> >>>>>>> >>>>>> I am trying to reproduce the problem, do you have complete >>>>>> sample code to test ? >>>>> >>>>> I'm still looking the exact set >>>>> but on my machine (4 cpus) the program from ports sysutils/fio >>>>> exhibits the problem when used with >>>>> kern.timecounter.hardware=TSC-low and with the following config >>>>> file: >>>>> >>>>> pu05 # cat config.fio >>>>> >>>>> [global] >>>>> #clocksource=cpu >>>>> direct=1 >>>>> rw=randread >>>>> bs=4096 >>>>> fill_device=1 >>>>> numjobs=16 >>>>> iodepth=16 >>>>> #ioengine=posixaio >>>>> #ioengine=psync >>>>> ioengine=psync >>>>> group_reporting >>>>> norandommap >>>>> time_based >>>>> runtime=60000 >>>>> randrepeat=0 >>>>> >>>>> [file1] >>>>> filename=/dev/ada0 >>>>> >>>>> pu05 # >>>>> pu05 # fio config.fio >>>>> fio: this platform does not support process shared mutexes, >>>>> forcing use of threads. Use the 'thread' option to get rid of >>>>> this warning. >>>>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, >>>>> iodepth=16 >>>>> ... >>>>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, >>>>> iodepth=16 >>>>> fio 2.0.3 >>>>> Starting 15 threads and 1 process >>>>> fio: job startup hung? exiting. >>>>> fio: 5 jobs failed to start >>>>> Segmentation fault (core dumped) >>>>> pu05# >>>>> >>>>> >>>>> The reason 5 jobs failed to start is because the parent timed >>>>> out on them immediately. >>>>> It didn't time out on 10 of them apparently. >>>>> >>>>> >>>>> if I set the timer to ACPI-fast it works as expected.. >>>> maybe following code can check to see if TSC-LOW works by let the >>>> thread run >>>> on each cpu. >>>> >>>> gettimeofday(&prev, NULL); >>>> int cpu = 0; >>>> for (;;) { >>>> cpuset_t set; >>>> cpu = ++cpu % 4; >>>> CPU_ZERO(&set); >>>> CPU_SET(cpu, &set); >>>> pthread_setaffinity_np(pthread_self(), sizeof(set), &set); >>>> gettimeofday(&cur, NULL); >>>> if ( timercmp(&prev, &cur, >=)) { >>>> abort(); >>>> } >>>> } >>>> >>>> >> >> pu05# sysctl kern.timecounter.hardware=TSC-low >> kern.timecounter.hardware: ACPI-fast -> TSC-low >> pu05# ./test >> ^C >> pu05# cat test.c >> >> #include <stdlib.h> >> #include <sys/param.h> >> #include <sys/cpuset.h> >> #include <pthread_np.h> >> >> #include <sys/time.h> >> >> main() >> { >> int cpu = 0; >> struct timeval prev, cur; >> >> gettimeofday(&prev, NULL); >> for (;;) { >> cpuset_t set; >> cpu = ++cpu % 4; >> CPU_ZERO(&set); >> CPU_SET(cpu, &set); >> pthread_setaffinity_np(pthread_self(), sizeof(set), &set); >> gettimeofday(&cur, NULL); >> if ( timercmp(&prev, &cur, >)) { >> abort(); >> } >> prev = cur; >> } >> } >> >> pu05# ./test >> >> minutes pass....... >> >> ^C >> pu05# >> >> so it looks as if the TSC is working ok.. >> I'm just going to check that the program is actually moving CPU... >> yes it is moving around but I can't tell at what speed. (according >> to top). >> >> so we are still left with a question of "where is the problem?" >> >> kernel TSC driver? >> generic gettimeofday() code? >> pthreads cond code? >> the application? >> >> > I am running the fio test on my notebook which is using TSC-low, > it is on 9.0-RC3, I can not reproduce the problem for > minutes, then I interrupt it with ctrl-c: looks mot > > http://people.freebsd.org/~davidxu/tsc_pthread/dmesg.txt > http://people.freebsd.org/~davidxu/tsc_pthread/tc.txt > http://people.freebsd.org/~davidxu/tsc_pthread/fio.txt > > I have to been able to test this on a 9-RELEASE machine.. just 9-stable.. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-threads To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Jung-uk Kim
> On Friday 17 February 2012 06:28 am, David Xu wrote:
>> On 2012/2/17 16:06, Julian Elischer wrote: >>> On 2/16/12 11:41 PM, Julian Elischer wrote: >>>> adding jkim as he seems to be the last person working with TSC. >>>> >>>> On 2/16/12 6:42 PM, David Xu wrote: >>>>> On 2012/2/17 10:19, Julian Elischer wrote: >>>>>> On 2/16/12 5:56 PM, David Xu wrote: >>>>>>> On 2012/2/17 8:42, Julian Elischer wrote: >>>>>>>> Adding David Xu for his thoughts since he reqrote the code >>>>>>>> in quesiton in revision 213098 >>>>>>>> >>>>>>>> On 2/16/12 2:57 PM, Julian Elischer wrote: >>>>>>>>> On 2/16/12 1:06 PM, Julian Elischer wrote: >>>>>>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote: >>>>>>>>>>> on 15/02/2012 23:41 Julian Elischer said the following: >>>>>>>>>>>> The program fio (an IO test in ports) uses pthreads >>>>>>>>>>>> >>>>>>>>>>>> the following code (from fio-2.0.3, but its in earlier >>>>>>>>>>>> code too) has suddenly started misbehaving. >>>>>>>>>>>> >>>>>>>>>>>> clock_gettime(CLOCK_REALTIME,&t); >>>>>>>>>>>> t.tv_sec += seconds + 10; >>>>>>>>>>>> >>>>>>>>>>>> pthread_mutex_lock(&mutex->lock); >>>>>>>>>>>> >>>>>>>>>>>> while (!mutex->value&& !ret) { >>>>>>>>>>>> mutex->waiters++; >>>>>>>>>>>> ret = >>>>>>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t); >>>>>>>>>>>> mutex->waiters--; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> if (!ret) { >>>>>>>>>>>> mutex->value--; >>>>>>>>>>>> pthread_mutex_unlock(&mutex->lock); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> It turns out that 'ret' sometimes comes back instantly >>>>>>>>>>>> (on my machine) with a >>>>>>>>>>>> value of 60 (ETIMEDOUT) >>>>>>>>>>>> despite the fact that we set the timeout 10 seconds into >>>>>>>>>>>> the future. >>>>>>>>>>>> >>>>>>>>>>>> Has anyone else seen anything like this? >>>>>>>>>>>> (and yes the condition variable attribute have been set >>>>>>>>>>>> to use the REALTIME clock). >>>>>>>>>>> But why? >>>>>>>>>>> >>>>>>>>>>> Just a hypothesis that maybe there is some issue with >>>>>>>>>>> time keeping on that system. >>>>>>>>>>> How would that code work out for you with MONOTONIC? >>>>>>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and >>>>>>>>>> CLOCK_MONOTONIC, and they both had the same problem.. >>>>>>>>>> i.e. random early returns with ETIMEDOUT. >>>>>>>>>> >>>>>>>>>> I think we will try move out machine forward to a newer >>>>>>>>>> -stable to see if it resolves. >>>>>>>>> Kan upgraded the machine today to today's 9.x branch tip >>>>>>>>> and the problem still occurs. >>>>>>>>> 8.x does not have this problem. >>>>>>>>> >>>>>>>>> I have not got a 9-RELEASE machine to test on.. so I can >>>>>>>>> not tell if this came in with the burst of stuff >>>>>>>>> that came in after the 9.x branch was unfrozen after the >>>>>>>>> release of 9.0. >>>>>>> I am trying to reproduce the problem, do you have complete >>>>>>> sample code to test ? >>>>>> I'm still looking the exact set >>>>>> but on my machine (4 cpus) the program from ports sysutils/fio >>>>>> exhibits the problem when used with >>>>>> kern.timecounter.hardware=TSC-low and with the following >>>>>> config file: >>>>>> >>>>>> pu05 # cat config.fio >>>>>> >>>>>> [global] >>>>>> #clocksource=cpu >>>>>> direct=1 >>>>>> rw=randread >>>>>> bs=4096 >>>>>> fill_device=1 >>>>>> numjobs=16 >>>>>> iodepth=16 >>>>>> #ioengine=posixaio >>>>>> #ioengine=psync >>>>>> ioengine=psync >>>>>> group_reporting >>>>>> norandommap >>>>>> time_based >>>>>> runtime=60000 >>>>>> randrepeat=0 >>>>>> >>>>>> [file1] >>>>>> filename=/dev/ada0 >>>>>> >>>>>> pu05 # >>>>>> pu05 # fio config.fio >>>>>> fio: this platform does not support process shared mutexes, >>>>>> forcing use of threads. Use the 'thread' option to get rid of >>>>>> this warning. file1: (g=0): rw=randread, bs=4K-4K/4K-4K, >>>>>> ioengine=psync, iodepth=16 ... >>>>>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, >>>>>> iodepth=16 fio 2.0.3 >>>>>> Starting 15 threads and 1 process >>>>>> fio: job startup hung? exiting. >>>>>> fio: 5 jobs failed to start >>>>>> Segmentation fault (core dumped) >>>>>> pu05# >>>>>> >>>>>> >>>>>> The reason 5 jobs failed to start is because the parent timed >>>>>> out on them immediately. >>>>>> It didn't time out on 10 of them apparently. >>>>>> >>>>>> >>>>>> if I set the timer to ACPI-fast it works as expected.. >>>>> maybe following code can check to see if TSC-LOW works by let >>>>> the thread run >>>>> on each cpu. >>>>> >>>>> gettimeofday(&prev, NULL); >>>>> int cpu = 0; >>>>> for (;;) { >>>>> cpuset_t set; >>>>> cpu = ++cpu % 4; >>>>> CPU_ZERO(&set); >>>>> CPU_SET(cpu,&set); >>>>> pthread_setaffinity_np(pthread_self(), sizeof(set),&set); >>>>> gettimeofday(&cur, NULL); >>>>> if ( timercmp(&prev,&cur,>=)) { >>>>> abort(); >>>>> } >>>>> } >>> pu05# sysctl kern.timecounter.hardware=TSC-low >>> kern.timecounter.hardware: ACPI-fast -> TSC-low >>> pu05# ./test >>> ^C >>> pu05# cat test.c >>> >>> #include<stdlib.h> >>> #include<sys/param.h> >>> #include<sys/cpuset.h> >>> #include<pthread_np.h> >>> >>> #include<sys/time.h> >>> >>> main() >>> { >>> int cpu = 0; >>> struct timeval prev, cur; >>> >>> gettimeofday(&prev, NULL); >>> for (;;) { >>> cpuset_t set; >>> cpu = ++cpu % 4; >>> CPU_ZERO(&set); >>> CPU_SET(cpu,&set); >>> pthread_setaffinity_np(pthread_self(), sizeof(set), >>> &set); gettimeofday(&cur, NULL); >>> if ( timercmp(&prev,&cur,>)) { >>> abort(); >>> } >>> prev = cur; >>> } >>> } >>> >>> pu05# ./test >>> >>> minutes pass....... >>> >>> ^C >>> pu05# >>> >>> so it looks as if the TSC is working ok.. >>> I'm just going to check that the program is actually moving >>> CPU... yes it is moving around but I can't tell at what speed. >>> (according to top). >>> >>> so we are still left with a question of "where is the problem?" >>> >>> kernel TSC driver? >>> generic gettimeofday() code? >>> pthreads cond code? >>> the application? >> I am running the fio test on my notebook which is using TSC-low, >> it is on 9.0-RC3, I can not reproduce the problem for >> minutes, then I interrupt it with ctrl-c: >> >> http://people.freebsd.org/~davidxu/tsc_pthread/dmesg.txt >> http://people.freebsd.org/~davidxu/tsc_pthread/tc.txt >> http://people.freebsd.org/~davidxu/tsc_pthread/fio.txt > Your CPU is single-package, dual-core, and SMT-enabled. All cores > should be in perfect sync. > > Jung-uk Kim > mine is too, yet it still has problems.. CPU: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (2500.14-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x10676 Family = 6 Model = 17 Stepping = 6 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0xce3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1> AMD Features=0x20100800<SYSCALL,NX,LM> AMD Features2=0x1<LAHF> TSC: P-state invariant, performance statistics real memory = 8589934592 (8192 MB) avail memory = 8214368256 (7833 MB) Event timer "LAPIC" quality 400 ACPI APIC Table: <PTLTD APIC > FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 ioapic0 <Version 2.0> irqs 0-23 on motherboard ioapic1 <Version 2.0> irqs 24-47 on motherboard _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-threads To unsubscribe, send any mail to "[hidden email]" |
| Powered by Nabble | Edit this page |
