Quantcast

pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

Julian Elischer-5
The program fio (an IO test in ports) uses pthreads

the following code (from fio-2.0.3, but its in earlier code too)
has suddenly started misbehaving.

         clock_gettime(CLOCK_REALTIME, &t);
         t.tv_sec += seconds + 10;

         pthread_mutex_lock(&mutex->lock);

         while (!mutex->value && !ret) {
                 mutex->waiters++;
                 ret = pthread_cond_timedwait(&mutex->cond,
&mutex->lock, &t);
                 mutex->waiters--;
         }

         if (!ret) {
                 mutex->value--;
                 pthread_mutex_unlock(&mutex->lock);
         }


It turns out that 'ret' sometimes comes back instantly (on my machine)
with a value of 60 (ETIMEDOUT)
despite the fact that we set the timeout 10 seconds into the future.

Has anyone else seen anything like this?
(and yes the condition variable attribute have been set to use the
REALTIME clock).


_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-threads
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

Andriy Gapon
on 15/02/2012 23:41 Julian Elischer said the following:

> The program fio (an IO test in ports) uses pthreads
>
> the following code (from fio-2.0.3, but its in earlier code too)
> has suddenly started misbehaving.
>
>         clock_gettime(CLOCK_REALTIME, &t);
>         t.tv_sec += seconds + 10;
>
>         pthread_mutex_lock(&mutex->lock);
>
>         while (!mutex->value && !ret) {
>                 mutex->waiters++;
>                 ret = pthread_cond_timedwait(&mutex->cond, &mutex->lock, &t);
>                 mutex->waiters--;
>         }
>
>         if (!ret) {
>                 mutex->value--;
>                 pthread_mutex_unlock(&mutex->lock);
>         }
>
>
> It turns out that 'ret' sometimes comes back instantly (on my machine) with a
> value of 60 (ETIMEDOUT)
> despite the fact that we set the timeout 10 seconds into the future.
>
> Has anyone else seen anything like this?
> (and yes the condition variable attribute have been set to use the REALTIME clock).

But why?

Just a hypothesis that maybe there is some issue with time keeping on that system.
How would that code work out for you with MONOTONIC?

--
Andriy Gapon
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-threads
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

Julian Elischer-5
On 2/16/12 9:34 AM, Andriy Gapon wrote:

> on 15/02/2012 23:41 Julian Elischer said the following:
>> The program fio (an IO test in ports) uses pthreads
>>
>> the following code (from fio-2.0.3, but its in earlier code too)
>> has suddenly started misbehaving.
>>
>>          clock_gettime(CLOCK_REALTIME,&t);
>>          t.tv_sec += seconds + 10;
>>
>>          pthread_mutex_lock(&mutex->lock);
>>
>>          while (!mutex->value&&  !ret) {
>>                  mutex->waiters++;
>>                  ret = pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);
>>                  mutex->waiters--;
>>          }
>>
>>          if (!ret) {
>>                  mutex->value--;
>>                  pthread_mutex_unlock(&mutex->lock);
>>          }
>>
>>
>> It turns out that 'ret' sometimes comes back instantly (on my machine) with a
>> value of 60 (ETIMEDOUT)
>> despite the fact that we set the timeout 10 seconds into the future.
>>
>> Has anyone else seen anything like this?
>> (and yes the condition variable attribute have been set to use the REALTIME clock).
> But why?
>
> Just a hypothesis that maybe there is some issue with time keeping on that system.
> How would that code work out for you with MONOTONIC?

Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC, and
they both had the same problem..
i.e. random early returns with ETIMEDOUT.

I think we will try move out machine forward to a newer -stable to see
if it resolves.


_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-threads
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

Jens Axboe
On 2012-02-16 22:06, Julian Elischer wrote:

> On 2/16/12 9:34 AM, Andriy Gapon wrote:
>> on 15/02/2012 23:41 Julian Elischer said the following:
>>> The program fio (an IO test in ports) uses pthreads
>>>
>>> the following code (from fio-2.0.3, but its in earlier code too)
>>> has suddenly started misbehaving.
>>>
>>>          clock_gettime(CLOCK_REALTIME,&t);
>>>          t.tv_sec += seconds + 10;
>>>
>>>          pthread_mutex_lock(&mutex->lock);
>>>
>>>          while (!mutex->value&&  !ret) {
>>>                  mutex->waiters++;
>>>                  ret = pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);
>>>                  mutex->waiters--;
>>>          }
>>>
>>>          if (!ret) {
>>>                  mutex->value--;
>>>                  pthread_mutex_unlock(&mutex->lock);
>>>          }
>>>
>>>
>>> It turns out that 'ret' sometimes comes back instantly (on my machine) with a
>>> value of 60 (ETIMEDOUT)
>>> despite the fact that we set the timeout 10 seconds into the future.
>>>
>>> Has anyone else seen anything like this?
>>> (and yes the condition variable attribute have been set to use the REALTIME clock).
>> But why?
>>
>> Just a hypothesis that maybe there is some issue with time keeping on that system.
>> How would that code work out for you with MONOTONIC?
>
> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC, and
> they both had the same problem..
> i.e. random early returns with ETIMEDOUT.

Yep indeed, using either MONOTONIC or REALTIME (and having set both with
pthread_condattr_setclock()), no change in behaviour.

--
Jens Axboe

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-threads
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

Julian Elischer-5
In reply to this post by Julian Elischer-5
On 2/16/12 1:06 PM, Julian Elischer wrote:

> On 2/16/12 9:34 AM, Andriy Gapon wrote:
>> on 15/02/2012 23:41 Julian Elischer said the following:
>>> The program fio (an IO test in ports) uses pthreads
>>>
>>> the following code (from fio-2.0.3, but its in earlier code too)
>>> has suddenly started misbehaving.
>>>
>>>          clock_gettime(CLOCK_REALTIME,&t);
>>>          t.tv_sec += seconds + 10;
>>>
>>>          pthread_mutex_lock(&mutex->lock);
>>>
>>>          while (!mutex->value&&  !ret) {
>>>                  mutex->waiters++;
>>>                  ret =
>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);
>>>                  mutex->waiters--;
>>>          }
>>>
>>>          if (!ret) {
>>>                  mutex->value--;
>>>                  pthread_mutex_unlock(&mutex->lock);
>>>          }
>>>
>>>
>>> It turns out that 'ret' sometimes comes back instantly (on my
>>> machine) with a
>>> value of 60 (ETIMEDOUT)
>>> despite the fact that we set the timeout 10 seconds into the future.
>>>
>>> Has anyone else seen anything like this?
>>> (and yes the condition variable attribute have been set to use the
>>> REALTIME clock).
>> But why?
>>
>> Just a hypothesis that maybe there is some issue with time keeping
>> on that system.
>> How would that code work out for you with MONOTONIC?
>
> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC,
> and they both had the same problem..
> i.e. random early returns with ETIMEDOUT.
>
> I think we will try move out machine forward to a newer -stable to
> see if it resolves.
Kan upgraded the machine today to today's 9.x branch tip and the
problem still occurs.
8.x does not have this problem.

I have not got a 9-RELEASE machine to test on.. so I can not tell if
this came in with the burst of stuff
that came in after the 9.x branch was unfrozen after the release of 9.0.

>
>
> _______________________________________________
> [hidden email] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-threads
> To unsubscribe, send any mail to
> "[hidden email]"
>

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-threads
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

Julian Elischer-5
Adding David Xu for his thoughts since he reqrote the code in quesiton
in revision 213098

On 2/16/12 2:57 PM, Julian Elischer wrote:

> On 2/16/12 1:06 PM, Julian Elischer wrote:
>> On 2/16/12 9:34 AM, Andriy Gapon wrote:
>>> on 15/02/2012 23:41 Julian Elischer said the following:
>>>> The program fio (an IO test in ports) uses pthreads
>>>>
>>>> the following code (from fio-2.0.3, but its in earlier code too)
>>>> has suddenly started misbehaving.
>>>>
>>>>          clock_gettime(CLOCK_REALTIME,&t);
>>>>          t.tv_sec += seconds + 10;
>>>>
>>>>          pthread_mutex_lock(&mutex->lock);
>>>>
>>>>          while (!mutex->value&&  !ret) {
>>>>                  mutex->waiters++;
>>>>                  ret =
>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);
>>>>                  mutex->waiters--;
>>>>          }
>>>>
>>>>          if (!ret) {
>>>>                  mutex->value--;
>>>>                  pthread_mutex_unlock(&mutex->lock);
>>>>          }
>>>>
>>>>
>>>> It turns out that 'ret' sometimes comes back instantly (on my
>>>> machine) with a
>>>> value of 60 (ETIMEDOUT)
>>>> despite the fact that we set the timeout 10 seconds into the future.
>>>>
>>>> Has anyone else seen anything like this?
>>>> (and yes the condition variable attribute have been set to use
>>>> the REALTIME clock).
>>> But why?
>>>
>>> Just a hypothesis that maybe there is some issue with time keeping
>>> on that system.
>>> How would that code work out for you with MONOTONIC?
>>
>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC,
>> and they both had the same problem..
>> i.e. random early returns with ETIMEDOUT.
>>
>> I think we will try move out machine forward to a newer -stable to
>> see if it resolves.
> Kan upgraded the machine today to today's 9.x branch tip and the
> problem still occurs.
> 8.x does not have this problem.
>
> I have not got a 9-RELEASE machine to test on.. so I can not tell if
> this came in with the burst of stuff
> that came in after the 9.x branch was unfrozen after the release of
> 9.0.
>
>

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-threads
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pthread_cond_timedwait() broken in 9-stable? [possible answer]

Julian Elischer-5

kern.timecounter.tick: 1
kern.timecounter.choice: TSC-low(1000) i8254(0) HPET(950)
ACPI-fast(900) dummy(-1000000)
kern.timecounter.hardware: ACPI-fast
kern.timecounter.stepwarnings: 0

switching the machine from TSC_low to ACPI-fast  fixes the problem.

in 8.x it used to default to ACPI
but I used to switch it to "TSC" to get better performance.

I wonder why TSC-low is now bad to use..
maybe the TSCs are not as well sychronised as they were in 8.x?
maybe the pthreads code didn't get the memo about changing timers?
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-threads
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

David Xu
In reply to this post by Julian Elischer-5
On 2012/2/17 8:42, Julian Elischer wrote:

> Adding David Xu for his thoughts since he reqrote the code in quesiton
> in revision 213098
>
> On 2/16/12 2:57 PM, Julian Elischer wrote:
>> On 2/16/12 1:06 PM, Julian Elischer wrote:
>>> On 2/16/12 9:34 AM, Andriy Gapon wrote:
>>>> on 15/02/2012 23:41 Julian Elischer said the following:
>>>>> The program fio (an IO test in ports) uses pthreads
>>>>>
>>>>> the following code (from fio-2.0.3, but its in earlier code too)
>>>>> has suddenly started misbehaving.
>>>>>
>>>>>          clock_gettime(CLOCK_REALTIME,&t);
>>>>>          t.tv_sec += seconds + 10;
>>>>>
>>>>>          pthread_mutex_lock(&mutex->lock);
>>>>>
>>>>>          while (!mutex->value&&  !ret) {
>>>>>                  mutex->waiters++;
>>>>>                  ret =
>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);
>>>>>                  mutex->waiters--;
>>>>>          }
>>>>>
>>>>>          if (!ret) {
>>>>>                  mutex->value--;
>>>>>                  pthread_mutex_unlock(&mutex->lock);
>>>>>          }
>>>>>
>>>>>
>>>>> It turns out that 'ret' sometimes comes back instantly (on my
>>>>> machine) with a
>>>>> value of 60 (ETIMEDOUT)
>>>>> despite the fact that we set the timeout 10 seconds into the future.
>>>>>
>>>>> Has anyone else seen anything like this?
>>>>> (and yes the condition variable attribute have been set to use the
>>>>> REALTIME clock).
>>>> But why?
>>>>
>>>> Just a hypothesis that maybe there is some issue with time keeping
>>>> on that system.
>>>> How would that code work out for you with MONOTONIC?
>>>
>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC,
>>> and they both had the same problem..
>>> i.e. random early returns with ETIMEDOUT.
>>>
>>> I think we will try move out machine forward to a newer -stable to
>>> see if it resolves.
>> Kan upgraded the machine today to today's 9.x branch tip and the
>> problem still occurs.
>> 8.x does not have this problem.
>>
>> I have not got a 9-RELEASE machine to test on.. so I can not tell if
>> this came in with the burst of stuff
>> that came in after the 9.x branch was unfrozen after the release of 9.0.
>>
>>
>
I am trying to reproduce the problem,  do you have complete sample code
to test ?

Regards,
David Xu

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-threads
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pthread_cond_timedwait() broken in 9-stable? [possible answer]

David Xu
In reply to this post by Julian Elischer-5
On 2012/2/17 9:55, Julian Elischer wrote:

>
> kern.timecounter.tick: 1
> kern.timecounter.choice: TSC-low(1000) i8254(0) HPET(950)
> ACPI-fast(900) dummy(-1000000)
> kern.timecounter.hardware: ACPI-fast
> kern.timecounter.stepwarnings: 0
>
> switching the machine from TSC_low to ACPI-fast  fixes the problem.
>
> in 8.x it used to default to ACPI
> but I used to switch it to "TSC" to get better performance.
>
> I wonder why TSC-low is now bad to use..
> maybe the TSCs are not as well sychronised as they were in 8.x?
> maybe the pthreads code didn't get the memo about changing timers?
>
pthread code does not know timer setting, same as other code in kernel. ;-)


_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-threads
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

Julian Elischer-5
In reply to this post by David Xu
On 2/16/12 5:56 PM, David Xu wrote:

> On 2012/2/17 8:42, Julian Elischer wrote:
>> Adding David Xu for his thoughts since he reqrote the code in
>> quesiton in revision 213098
>>
>> On 2/16/12 2:57 PM, Julian Elischer wrote:
>>> On 2/16/12 1:06 PM, Julian Elischer wrote:
>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote:
>>>>> on 15/02/2012 23:41 Julian Elischer said the following:
>>>>>> The program fio (an IO test in ports) uses pthreads
>>>>>>
>>>>>> the following code (from fio-2.0.3, but its in earlier code too)
>>>>>> has suddenly started misbehaving.
>>>>>>
>>>>>>          clock_gettime(CLOCK_REALTIME,&t);
>>>>>>          t.tv_sec += seconds + 10;
>>>>>>
>>>>>>          pthread_mutex_lock(&mutex->lock);
>>>>>>
>>>>>>          while (!mutex->value&&  !ret) {
>>>>>>                  mutex->waiters++;
>>>>>>                  ret =
>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);
>>>>>>                  mutex->waiters--;
>>>>>>          }
>>>>>>
>>>>>>          if (!ret) {
>>>>>>                  mutex->value--;
>>>>>>                  pthread_mutex_unlock(&mutex->lock);
>>>>>>          }
>>>>>>
>>>>>>
>>>>>> It turns out that 'ret' sometimes comes back instantly (on my
>>>>>> machine) with a
>>>>>> value of 60 (ETIMEDOUT)
>>>>>> despite the fact that we set the timeout 10 seconds into the
>>>>>> future.
>>>>>>
>>>>>> Has anyone else seen anything like this?
>>>>>> (and yes the condition variable attribute have been set to use
>>>>>> the REALTIME clock).
>>>>> But why?
>>>>>
>>>>> Just a hypothesis that maybe there is some issue with time
>>>>> keeping on that system.
>>>>> How would that code work out for you with MONOTONIC?
>>>>
>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC,
>>>> and they both had the same problem..
>>>> i.e. random early returns with ETIMEDOUT.
>>>>
>>>> I think we will try move out machine forward to a newer -stable
>>>> to see if it resolves.
>>> Kan upgraded the machine today to today's 9.x branch tip and the
>>> problem still occurs.
>>> 8.x does not have this problem.
>>>
>>> I have not got a 9-RELEASE machine to test on.. so I can not tell
>>> if this came in with the burst of stuff
>>> that came in after the 9.x branch was unfrozen after the release
>>> of 9.0.
>>>
>>>
>>
> I am trying to reproduce the problem,  do you have complete sample
> code to test ?

I'm still looking the exact set
but on my machine (4 cpus) the program from ports sysutils/fio
exhibits the problem when used with
kern.timecounter.hardware=TSC-low and with the following config file:

pu05 # cat config.fio

[global]
#clocksource=cpu
direct=1
rw=randread
bs=4096
fill_device=1
numjobs=16
iodepth=16
#ioengine=posixaio
#ioengine=psync
ioengine=psync
group_reporting
norandommap
time_based
runtime=60000
randrepeat=0

[file1]
filename=/dev/ada0

pu05 #
pu05 # fio config.fio
fio: this platform does not support process shared mutexes, forcing
use of threads. Use the 'thread' option to get rid of this warning.
file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
...
file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
fio 2.0.3
Starting 15 threads and 1 process
fio: job startup hung? exiting.
fio: 5 jobs failed to start
Segmentation fault (core dumped)
pu05#


The reason 5 jobs failed to start is because the parent timed out on
them immediately.
It didn't time out on 10 of them apparently.


if I set the timer to ACPI-fast it works as expected..
>
> Regards,
> David Xu
>
>

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-threads
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

David Xu
On 2012/2/17 10:19, Julian Elischer wrote:

> On 2/16/12 5:56 PM, David Xu wrote:
>> On 2012/2/17 8:42, Julian Elischer wrote:
>>> Adding David Xu for his thoughts since he reqrote the code in
>>> quesiton in revision 213098
>>>
>>> On 2/16/12 2:57 PM, Julian Elischer wrote:
>>>> On 2/16/12 1:06 PM, Julian Elischer wrote:
>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote:
>>>>>> on 15/02/2012 23:41 Julian Elischer said the following:
>>>>>>> The program fio (an IO test in ports) uses pthreads
>>>>>>>
>>>>>>> the following code (from fio-2.0.3, but its in earlier code too)
>>>>>>> has suddenly started misbehaving.
>>>>>>>
>>>>>>>          clock_gettime(CLOCK_REALTIME,&t);
>>>>>>>          t.tv_sec += seconds + 10;
>>>>>>>
>>>>>>>          pthread_mutex_lock(&mutex->lock);
>>>>>>>
>>>>>>>          while (!mutex->value&&  !ret) {
>>>>>>>                  mutex->waiters++;
>>>>>>>                  ret =
>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);
>>>>>>>                  mutex->waiters--;
>>>>>>>          }
>>>>>>>
>>>>>>>          if (!ret) {
>>>>>>>                  mutex->value--;
>>>>>>>                  pthread_mutex_unlock(&mutex->lock);
>>>>>>>          }
>>>>>>>
>>>>>>>
>>>>>>> It turns out that 'ret' sometimes comes back instantly (on my
>>>>>>> machine) with a
>>>>>>> value of 60 (ETIMEDOUT)
>>>>>>> despite the fact that we set the timeout 10 seconds into the
>>>>>>> future.
>>>>>>>
>>>>>>> Has anyone else seen anything like this?
>>>>>>> (and yes the condition variable attribute have been set to use
>>>>>>> the REALTIME clock).
>>>>>> But why?
>>>>>>
>>>>>> Just a hypothesis that maybe there is some issue with time
>>>>>> keeping on that system.
>>>>>> How would that code work out for you with MONOTONIC?
>>>>>
>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and CLOCK_MONOTONIC,
>>>>> and they both had the same problem..
>>>>> i.e. random early returns with ETIMEDOUT.
>>>>>
>>>>> I think we will try move out machine forward to a newer -stable to
>>>>> see if it resolves.
>>>> Kan upgraded the machine today to today's 9.x branch tip and the
>>>> problem still occurs.
>>>> 8.x does not have this problem.
>>>>
>>>> I have not got a 9-RELEASE machine to test on.. so I can not tell
>>>> if this came in with the burst of stuff
>>>> that came in after the 9.x branch was unfrozen after the release of
>>>> 9.0.
>>>>
>>>>
>>>
>> I am trying to reproduce the problem,  do you have complete sample
>> code to test ?
>
> I'm still looking the exact set
> but on my machine (4 cpus) the program from ports sysutils/fio
> exhibits the problem when used with
> kern.timecounter.hardware=TSC-low and with the following config file:
>
> pu05 # cat config.fio
>
> [global]
> #clocksource=cpu
> direct=1
> rw=randread
> bs=4096
> fill_device=1
> numjobs=16
> iodepth=16
> #ioengine=posixaio
> #ioengine=psync
> ioengine=psync
> group_reporting
> norandommap
> time_based
> runtime=60000
> randrepeat=0
>
> [file1]
> filename=/dev/ada0
>
> pu05 #
> pu05 # fio config.fio
> fio: this platform does not support process shared mutexes, forcing
> use of threads. Use the 'thread' option to get rid of this warning.
> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
> ...
> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
> fio 2.0.3
> Starting 15 threads and 1 process
> fio: job startup hung? exiting.
> fio: 5 jobs failed to start
> Segmentation fault (core dumped)
> pu05#
>
>
> The reason 5 jobs failed to start is because the parent timed out on
> them immediately.
> It didn't time out on 10 of them apparently.
>
>
> if I set the timer to ACPI-fast it works as expected..
maybe following code can check to see if TSC-LOW works by let the thread run
on each cpu.

gettimeofday(&prev, NULL);
int cpu = 0;
for (;;) {
      cpuset_t set;
      cpu = ++cpu % 4;
      CPU_ZERO(&set);
      CPU_SET(cpu, &set);
      pthread_setaffinity_np(pthread_self(), sizeof(set), &set);
      gettimeofday(&cur, NULL);
      if ( timercmp(&prev, &cur, >=)) {
         abort();
    }
}

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-threads
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

David Xu
On 2012/2/17 10:42, David Xu wrote:
> aybe following code can check to see if TSC-LOW works by let the
> thread run
> on each cpu.
>
>
refresh:

gettimeofday(&prev, NULL);
int cpu = 0;
for (;;) {
      cpuset_t set;
      cpu = ++cpu % 4;
      CPU_ZERO(&set);
      CPU_SET(cpu, &set);
      pthread_setaffinity_np(pthread_self(), sizeof(set), &set);
      gettimeofday(&cur, NULL);
      if ( timercmp(&prev, &cur, >)) {
         abort();
    }
    prev = cur;
}

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-threads
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

Julian Elischer-5
In reply to this post by David Xu
adding jkim as he seems to be the last person working with TSC.


On 2/16/12 6:42 PM, David Xu wrote:

> On 2012/2/17 10:19, Julian Elischer wrote:
>> On 2/16/12 5:56 PM, David Xu wrote:
>>> On 2012/2/17 8:42, Julian Elischer wrote:
>>>> Adding David Xu for his thoughts since he reqrote the code in
>>>> quesiton in revision 213098
>>>>
>>>> On 2/16/12 2:57 PM, Julian Elischer wrote:
>>>>> On 2/16/12 1:06 PM, Julian Elischer wrote:
>>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote:
>>>>>>> on 15/02/2012 23:41 Julian Elischer said the following:
>>>>>>>> The program fio (an IO test in ports) uses pthreads
>>>>>>>>
>>>>>>>> the following code (from fio-2.0.3, but its in earlier code too)
>>>>>>>> has suddenly started misbehaving.
>>>>>>>>
>>>>>>>>          clock_gettime(CLOCK_REALTIME,&t);
>>>>>>>>          t.tv_sec += seconds + 10;
>>>>>>>>
>>>>>>>>          pthread_mutex_lock(&mutex->lock);
>>>>>>>>
>>>>>>>>          while (!mutex->value&&  !ret) {
>>>>>>>>                  mutex->waiters++;
>>>>>>>>                  ret =
>>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);
>>>>>>>>                  mutex->waiters--;
>>>>>>>>          }
>>>>>>>>
>>>>>>>>          if (!ret) {
>>>>>>>>                  mutex->value--;
>>>>>>>>                  pthread_mutex_unlock(&mutex->lock);
>>>>>>>>          }
>>>>>>>>
>>>>>>>>
>>>>>>>> It turns out that 'ret' sometimes comes back instantly (on my
>>>>>>>> machine) with a
>>>>>>>> value of 60 (ETIMEDOUT)
>>>>>>>> despite the fact that we set the timeout 10 seconds into the
>>>>>>>> future.
>>>>>>>>
>>>>>>>> Has anyone else seen anything like this?
>>>>>>>> (and yes the condition variable attribute have been set to
>>>>>>>> use the REALTIME clock).
>>>>>>> But why?
>>>>>>>
>>>>>>> Just a hypothesis that maybe there is some issue with time
>>>>>>> keeping on that system.
>>>>>>> How would that code work out for you with MONOTONIC?
>>>>>>
>>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and
>>>>>> CLOCK_MONOTONIC, and they both had the same problem..
>>>>>> i.e. random early returns with ETIMEDOUT.
>>>>>>
>>>>>> I think we will try move out machine forward to a newer -stable
>>>>>> to see if it resolves.
>>>>> Kan upgraded the machine today to today's 9.x branch tip and the
>>>>> problem still occurs.
>>>>> 8.x does not have this problem.
>>>>>
>>>>> I have not got a 9-RELEASE machine to test on.. so I can not
>>>>> tell if this came in with the burst of stuff
>>>>> that came in after the 9.x branch was unfrozen after the release
>>>>> of 9.0.
>>>>>
>>>>>
>>>>
>>> I am trying to reproduce the problem,  do you have complete sample
>>> code to test ?
>>
>> I'm still looking the exact set
>> but on my machine (4 cpus) the program from ports sysutils/fio
>> exhibits the problem when used with
>> kern.timecounter.hardware=TSC-low and with the following config file:
>>
>> pu05 # cat config.fio
>>
>> [global]
>> #clocksource=cpu
>> direct=1
>> rw=randread
>> bs=4096
>> fill_device=1
>> numjobs=16
>> iodepth=16
>> #ioengine=posixaio
>> #ioengine=psync
>> ioengine=psync
>> group_reporting
>> norandommap
>> time_based
>> runtime=60000
>> randrepeat=0
>>
>> [file1]
>> filename=/dev/ada0
>>
>> pu05 #
>> pu05 # fio config.fio
>> fio: this platform does not support process shared mutexes, forcing
>> use of threads. Use the 'thread' option to get rid of this warning.
>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
>> ...
>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
>> fio 2.0.3
>> Starting 15 threads and 1 process
>> fio: job startup hung? exiting.
>> fio: 5 jobs failed to start
>> Segmentation fault (core dumped)
>> pu05#
>>
>>
>> The reason 5 jobs failed to start is because the parent timed out
>> on them immediately.
>> It didn't time out on 10 of them apparently.
>>
>>
>> if I set the timer to ACPI-fast it works as expected..
> maybe following code can check to see if TSC-LOW works by let the
> thread run
> on each cpu.
>
> gettimeofday(&prev, NULL);
> int cpu = 0;
> for (;;) {
>      cpuset_t set;
>      cpu = ++cpu % 4;
>      CPU_ZERO(&set);
>      CPU_SET(cpu, &set);
>      pthread_setaffinity_np(pthread_self(), sizeof(set), &set);
>      gettimeofday(&cur, NULL);
>      if ( timercmp(&prev, &cur, >=)) {
>         abort();
>    }
> }
>
>

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-threads
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

Julian Elischer-5
On 2/16/12 11:41 PM, Julian Elischer wrote:

> adding jkim as he seems to be the last person working with TSC.
>
>
> On 2/16/12 6:42 PM, David Xu wrote:
>> On 2012/2/17 10:19, Julian Elischer wrote:
>>> On 2/16/12 5:56 PM, David Xu wrote:
>>>> On 2012/2/17 8:42, Julian Elischer wrote:
>>>>> Adding David Xu for his thoughts since he reqrote the code in
>>>>> quesiton in revision 213098
>>>>>
>>>>> On 2/16/12 2:57 PM, Julian Elischer wrote:
>>>>>> On 2/16/12 1:06 PM, Julian Elischer wrote:
>>>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote:
>>>>>>>> on 15/02/2012 23:41 Julian Elischer said the following:
>>>>>>>>> The program fio (an IO test in ports) uses pthreads
>>>>>>>>>
>>>>>>>>> the following code (from fio-2.0.3, but its in earlier code
>>>>>>>>> too)
>>>>>>>>> has suddenly started misbehaving.
>>>>>>>>>
>>>>>>>>>          clock_gettime(CLOCK_REALTIME,&t);
>>>>>>>>>          t.tv_sec += seconds + 10;
>>>>>>>>>
>>>>>>>>>          pthread_mutex_lock(&mutex->lock);
>>>>>>>>>
>>>>>>>>>          while (!mutex->value&&  !ret) {
>>>>>>>>>                  mutex->waiters++;
>>>>>>>>>                  ret =
>>>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);
>>>>>>>>>                  mutex->waiters--;
>>>>>>>>>          }
>>>>>>>>>
>>>>>>>>>          if (!ret) {
>>>>>>>>>                  mutex->value--;
>>>>>>>>>                  pthread_mutex_unlock(&mutex->lock);
>>>>>>>>>          }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It turns out that 'ret' sometimes comes back instantly (on
>>>>>>>>> my machine) with a
>>>>>>>>> value of 60 (ETIMEDOUT)
>>>>>>>>> despite the fact that we set the timeout 10 seconds into the
>>>>>>>>> future.
>>>>>>>>>
>>>>>>>>> Has anyone else seen anything like this?
>>>>>>>>> (and yes the condition variable attribute have been set to
>>>>>>>>> use the REALTIME clock).
>>>>>>>> But why?
>>>>>>>>
>>>>>>>> Just a hypothesis that maybe there is some issue with time
>>>>>>>> keeping on that system.
>>>>>>>> How would that code work out for you with MONOTONIC?
>>>>>>>
>>>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and
>>>>>>> CLOCK_MONOTONIC, and they both had the same problem..
>>>>>>> i.e. random early returns with ETIMEDOUT.
>>>>>>>
>>>>>>> I think we will try move out machine forward to a newer
>>>>>>> -stable to see if it resolves.
>>>>>> Kan upgraded the machine today to today's 9.x branch tip and
>>>>>> the problem still occurs.
>>>>>> 8.x does not have this problem.
>>>>>>
>>>>>> I have not got a 9-RELEASE machine to test on.. so I can not
>>>>>> tell if this came in with the burst of stuff
>>>>>> that came in after the 9.x branch was unfrozen after the
>>>>>> release of 9.0.
>>>>>>
>>>>>>
>>>>>
>>>> I am trying to reproduce the problem,  do you have complete
>>>> sample code to test ?
>>>
>>> I'm still looking the exact set
>>> but on my machine (4 cpus) the program from ports sysutils/fio
>>> exhibits the problem when used with
>>> kern.timecounter.hardware=TSC-low and with the following config file:
>>>
>>> pu05 # cat config.fio
>>>
>>> [global]
>>> #clocksource=cpu
>>> direct=1
>>> rw=randread
>>> bs=4096
>>> fill_device=1
>>> numjobs=16
>>> iodepth=16
>>> #ioengine=posixaio
>>> #ioengine=psync
>>> ioengine=psync
>>> group_reporting
>>> norandommap
>>> time_based
>>> runtime=60000
>>> randrepeat=0
>>>
>>> [file1]
>>> filename=/dev/ada0
>>>
>>> pu05 #
>>> pu05 # fio config.fio
>>> fio: this platform does not support process shared mutexes,
>>> forcing use of threads. Use the 'thread' option to get rid of this
>>> warning.
>>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
>>> ...
>>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
>>> fio 2.0.3
>>> Starting 15 threads and 1 process
>>> fio: job startup hung? exiting.
>>> fio: 5 jobs failed to start
>>> Segmentation fault (core dumped)
>>> pu05#
>>>
>>>
>>> The reason 5 jobs failed to start is because the parent timed out
>>> on them immediately.
>>> It didn't time out on 10 of them apparently.
>>>
>>>
>>> if I set the timer to ACPI-fast it works as expected..
>> maybe following code can check to see if TSC-LOW works by let the
>> thread run
>> on each cpu.
>>
>> gettimeofday(&prev, NULL);
>> int cpu = 0;
>> for (;;) {
>>      cpuset_t set;
>>      cpu = ++cpu % 4;
>>      CPU_ZERO(&set);
>>      CPU_SET(cpu, &set);
>>      pthread_setaffinity_np(pthread_self(), sizeof(set), &set);
>>      gettimeofday(&cur, NULL);
>>      if ( timercmp(&prev, &cur, >=)) {
>>         abort();
>>    }
>> }
>>
>>

pu05# sysctl kern.timecounter.hardware=TSC-low
kern.timecounter.hardware: ACPI-fast -> TSC-low
pu05# ./test
^C
pu05# cat test.c

#include <stdlib.h>
#include <sys/param.h>
#include <sys/cpuset.h>
#include <pthread_np.h>

#include <sys/time.h>

main()
{
     int cpu = 0;
     struct timeval prev, cur;

     gettimeofday(&prev, NULL);
     for (;;) {
          cpuset_t set;
          cpu = ++cpu % 4;
          CPU_ZERO(&set);
          CPU_SET(cpu, &set);
          pthread_setaffinity_np(pthread_self(), sizeof(set), &set);
          gettimeofday(&cur, NULL);
          if ( timercmp(&prev, &cur, >)) {
             abort();
        }
        prev = cur;
     }
}

pu05# ./test

minutes pass.......

^C
pu05#

so it looks as if the TSC is working ok..
I'm just going to check that the program is actually moving CPU...
yes it is moving around but I can't tell at what speed. (according to
top).

so we are still left with a question of "where is the problem?"

kernel TSC driver?
generic gettimeofday() code?
pthreads cond code?
the application?





_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-threads
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pthread_cond_timedwait() broken in 9-stable? [possible answer]

Andriy Gapon
In reply to this post by Julian Elischer-5
on 17/02/2012 03:55 Julian Elischer said the following:

>
> kern.timecounter.tick: 1
> kern.timecounter.choice: TSC-low(1000) i8254(0) HPET(950) ACPI-fast(900)
> dummy(-1000000)
> kern.timecounter.hardware: ACPI-fast
> kern.timecounter.stepwarnings: 0
>
> switching the machine from TSC_low to ACPI-fast  fixes the problem.
>
> in 8.x it used to default to ACPI
> but I used to switch it to "TSC" to get better performance.
>
> I wonder why TSC-low is now bad to use..
> maybe the TSCs are not as well sychronised as they were in 8.x?
> maybe the pthreads code didn't get the memo about changing timers?

More useful information that you can provide:
- C-states configuration
- CPU identification

I see that you've already contacted jkim, that's useful too.

--
Andriy Gapon
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-threads
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

David Xu
In reply to this post by Julian Elischer-5
On 2012/2/17 16:06, Julian Elischer wrote:

> On 2/16/12 11:41 PM, Julian Elischer wrote:
>> adding jkim as he seems to be the last person working with TSC.
>>
>>
>> On 2/16/12 6:42 PM, David Xu wrote:
>>> On 2012/2/17 10:19, Julian Elischer wrote:
>>>> On 2/16/12 5:56 PM, David Xu wrote:
>>>>> On 2012/2/17 8:42, Julian Elischer wrote:
>>>>>> Adding David Xu for his thoughts since he reqrote the code in
>>>>>> quesiton in revision 213098
>>>>>>
>>>>>> On 2/16/12 2:57 PM, Julian Elischer wrote:
>>>>>>> On 2/16/12 1:06 PM, Julian Elischer wrote:
>>>>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote:
>>>>>>>>> on 15/02/2012 23:41 Julian Elischer said the following:
>>>>>>>>>> The program fio (an IO test in ports) uses pthreads
>>>>>>>>>>
>>>>>>>>>> the following code (from fio-2.0.3, but its in earlier code too)
>>>>>>>>>> has suddenly started misbehaving.
>>>>>>>>>>
>>>>>>>>>>          clock_gettime(CLOCK_REALTIME,&t);
>>>>>>>>>>          t.tv_sec += seconds + 10;
>>>>>>>>>>
>>>>>>>>>>          pthread_mutex_lock(&mutex->lock);
>>>>>>>>>>
>>>>>>>>>>          while (!mutex->value&&  !ret) {
>>>>>>>>>>                  mutex->waiters++;
>>>>>>>>>>                  ret =
>>>>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);
>>>>>>>>>>                  mutex->waiters--;
>>>>>>>>>>          }
>>>>>>>>>>
>>>>>>>>>>          if (!ret) {
>>>>>>>>>>                  mutex->value--;
>>>>>>>>>>                  pthread_mutex_unlock(&mutex->lock);
>>>>>>>>>>          }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It turns out that 'ret' sometimes comes back instantly (on my
>>>>>>>>>> machine) with a
>>>>>>>>>> value of 60 (ETIMEDOUT)
>>>>>>>>>> despite the fact that we set the timeout 10 seconds into the
>>>>>>>>>> future.
>>>>>>>>>>
>>>>>>>>>> Has anyone else seen anything like this?
>>>>>>>>>> (and yes the condition variable attribute have been set to
>>>>>>>>>> use the REALTIME clock).
>>>>>>>>> But why?
>>>>>>>>>
>>>>>>>>> Just a hypothesis that maybe there is some issue with time
>>>>>>>>> keeping on that system.
>>>>>>>>> How would that code work out for you with MONOTONIC?
>>>>>>>>
>>>>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and
>>>>>>>> CLOCK_MONOTONIC, and they both had the same problem..
>>>>>>>> i.e. random early returns with ETIMEDOUT.
>>>>>>>>
>>>>>>>> I think we will try move out machine forward to a newer -stable
>>>>>>>> to see if it resolves.
>>>>>>> Kan upgraded the machine today to today's 9.x branch tip and the
>>>>>>> problem still occurs.
>>>>>>> 8.x does not have this problem.
>>>>>>>
>>>>>>> I have not got a 9-RELEASE machine to test on.. so I can not
>>>>>>> tell if this came in with the burst of stuff
>>>>>>> that came in after the 9.x branch was unfrozen after the release
>>>>>>> of 9.0.
>>>>>>>
>>>>>>>
>>>>>>
>>>>> I am trying to reproduce the problem,  do you have complete sample
>>>>> code to test ?
>>>>
>>>> I'm still looking the exact set
>>>> but on my machine (4 cpus) the program from ports sysutils/fio
>>>> exhibits the problem when used with
>>>> kern.timecounter.hardware=TSC-low and with the following config file:
>>>>
>>>> pu05 # cat config.fio
>>>>
>>>> [global]
>>>> #clocksource=cpu
>>>> direct=1
>>>> rw=randread
>>>> bs=4096
>>>> fill_device=1
>>>> numjobs=16
>>>> iodepth=16
>>>> #ioengine=posixaio
>>>> #ioengine=psync
>>>> ioengine=psync
>>>> group_reporting
>>>> norandommap
>>>> time_based
>>>> runtime=60000
>>>> randrepeat=0
>>>>
>>>> [file1]
>>>> filename=/dev/ada0
>>>>
>>>> pu05 #
>>>> pu05 # fio config.fio
>>>> fio: this platform does not support process shared mutexes, forcing
>>>> use of threads. Use the 'thread' option to get rid of this warning.
>>>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
>>>> ...
>>>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync, iodepth=16
>>>> fio 2.0.3
>>>> Starting 15 threads and 1 process
>>>> fio: job startup hung? exiting.
>>>> fio: 5 jobs failed to start
>>>> Segmentation fault (core dumped)
>>>> pu05#
>>>>
>>>>
>>>> The reason 5 jobs failed to start is because the parent timed out
>>>> on them immediately.
>>>> It didn't time out on 10 of them apparently.
>>>>
>>>>
>>>> if I set the timer to ACPI-fast it works as expected..
>>> maybe following code can check to see if TSC-LOW works by let the
>>> thread run
>>> on each cpu.
>>>
>>> gettimeofday(&prev, NULL);
>>> int cpu = 0;
>>> for (;;) {
>>>      cpuset_t set;
>>>      cpu = ++cpu % 4;
>>>      CPU_ZERO(&set);
>>>      CPU_SET(cpu, &set);
>>>      pthread_setaffinity_np(pthread_self(), sizeof(set), &set);
>>>      gettimeofday(&cur, NULL);
>>>      if ( timercmp(&prev, &cur, >=)) {
>>>         abort();
>>>    }
>>> }
>>>
>>>
>
> pu05# sysctl kern.timecounter.hardware=TSC-low
> kern.timecounter.hardware: ACPI-fast -> TSC-low
> pu05# ./test
> ^C
> pu05# cat test.c
>
> #include <stdlib.h>
> #include <sys/param.h>
> #include <sys/cpuset.h>
> #include <pthread_np.h>
>
> #include <sys/time.h>
>
> main()
> {
>     int cpu = 0;
>     struct timeval prev, cur;
>
>     gettimeofday(&prev, NULL);
>     for (;;) {
>          cpuset_t set;
>          cpu = ++cpu % 4;
>          CPU_ZERO(&set);
>          CPU_SET(cpu, &set);
>          pthread_setaffinity_np(pthread_self(), sizeof(set), &set);
>          gettimeofday(&cur, NULL);
>          if ( timercmp(&prev, &cur, >)) {
>             abort();
>        }
>        prev = cur;
>     }
> }
>
> pu05# ./test
>
> minutes pass.......
>
> ^C
> pu05#
>
> so it looks as if the TSC is working ok..
> I'm just going to check that the program is actually moving CPU...
> yes it is moving around but I can't tell at what speed. (according to
> top).
>
> so we are still left with a question of "where is the problem?"
>
> kernel TSC driver?
> generic gettimeofday() code?
> pthreads cond code?
> the application?
>
>
I am running the fio test on my notebook which is using TSC-low,
it is on 9.0-RC3, I can not reproduce the problem for
minutes, then I interrupt it with ctrl-c:

http://people.freebsd.org/~davidxu/tsc_pthread/dmesg.txt
http://people.freebsd.org/~davidxu/tsc_pthread/tc.txt
http://people.freebsd.org/~davidxu/tsc_pthread/fio.txt



_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-threads
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

Jung-uk Kim
On Friday 17 February 2012 06:28 am, David Xu wrote:

> On 2012/2/17 16:06, Julian Elischer wrote:
> > On 2/16/12 11:41 PM, Julian Elischer wrote:
> >> adding jkim as he seems to be the last person working with TSC.
> >>
> >> On 2/16/12 6:42 PM, David Xu wrote:
> >>> On 2012/2/17 10:19, Julian Elischer wrote:
> >>>> On 2/16/12 5:56 PM, David Xu wrote:
> >>>>> On 2012/2/17 8:42, Julian Elischer wrote:
> >>>>>> Adding David Xu for his thoughts since he reqrote the code
> >>>>>> in quesiton in revision 213098
> >>>>>>
> >>>>>> On 2/16/12 2:57 PM, Julian Elischer wrote:
> >>>>>>> On 2/16/12 1:06 PM, Julian Elischer wrote:
> >>>>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote:
> >>>>>>>>> on 15/02/2012 23:41 Julian Elischer said the following:
> >>>>>>>>>> The program fio (an IO test in ports) uses pthreads
> >>>>>>>>>>
> >>>>>>>>>> the following code (from fio-2.0.3, but its in earlier
> >>>>>>>>>> code too) has suddenly started misbehaving.
> >>>>>>>>>>
> >>>>>>>>>>          clock_gettime(CLOCK_REALTIME,&t);
> >>>>>>>>>>          t.tv_sec += seconds + 10;
> >>>>>>>>>>
> >>>>>>>>>>          pthread_mutex_lock(&mutex->lock);
> >>>>>>>>>>
> >>>>>>>>>>          while (!mutex->value&&  !ret) {
> >>>>>>>>>>                  mutex->waiters++;
> >>>>>>>>>>                  ret =
> >>>>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);
> >>>>>>>>>>                  mutex->waiters--;
> >>>>>>>>>>          }
> >>>>>>>>>>
> >>>>>>>>>>          if (!ret) {
> >>>>>>>>>>                  mutex->value--;
> >>>>>>>>>>                  pthread_mutex_unlock(&mutex->lock);
> >>>>>>>>>>          }
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> It turns out that 'ret' sometimes comes back instantly
> >>>>>>>>>> (on my machine) with a
> >>>>>>>>>> value of 60 (ETIMEDOUT)
> >>>>>>>>>> despite the fact that we set the timeout 10 seconds into
> >>>>>>>>>> the future.
> >>>>>>>>>>
> >>>>>>>>>> Has anyone else seen anything like this?
> >>>>>>>>>> (and yes the condition variable attribute have been set
> >>>>>>>>>> to use the REALTIME clock).
> >>>>>>>>>
> >>>>>>>>> But why?
> >>>>>>>>>
> >>>>>>>>> Just a hypothesis that maybe there is some issue with
> >>>>>>>>> time keeping on that system.
> >>>>>>>>> How would that code work out for you with MONOTONIC?
> >>>>>>>>
> >>>>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and
> >>>>>>>> CLOCK_MONOTONIC, and they both had the same problem..
> >>>>>>>> i.e. random early returns with ETIMEDOUT.
> >>>>>>>>
> >>>>>>>> I think we will try move out machine forward to a newer
> >>>>>>>> -stable to see if it resolves.
> >>>>>>>
> >>>>>>> Kan upgraded the machine today to today's 9.x branch tip
> >>>>>>> and the problem still occurs.
> >>>>>>> 8.x does not have this problem.
> >>>>>>>
> >>>>>>> I have not got a 9-RELEASE machine to test on.. so I can
> >>>>>>> not tell if this came in with the burst of stuff
> >>>>>>> that came in after the 9.x branch was unfrozen after the
> >>>>>>> release of 9.0.
> >>>>>
> >>>>> I am trying to reproduce the problem,  do you have complete
> >>>>> sample code to test ?
> >>>>
> >>>> I'm still looking the exact set
> >>>> but on my machine (4 cpus) the program from ports sysutils/fio
> >>>> exhibits the problem when used with
> >>>> kern.timecounter.hardware=TSC-low and with the following
> >>>> config file:
> >>>>
> >>>> pu05 # cat config.fio
> >>>>
> >>>> [global]
> >>>> #clocksource=cpu
> >>>> direct=1
> >>>> rw=randread
> >>>> bs=4096
> >>>> fill_device=1
> >>>> numjobs=16
> >>>> iodepth=16
> >>>> #ioengine=posixaio
> >>>> #ioengine=psync
> >>>> ioengine=psync
> >>>> group_reporting
> >>>> norandommap
> >>>> time_based
> >>>> runtime=60000
> >>>> randrepeat=0
> >>>>
> >>>> [file1]
> >>>> filename=/dev/ada0
> >>>>
> >>>> pu05 #
> >>>> pu05 # fio config.fio
> >>>> fio: this platform does not support process shared mutexes,
> >>>> forcing use of threads. Use the 'thread' option to get rid of
> >>>> this warning. file1: (g=0): rw=randread, bs=4K-4K/4K-4K,
> >>>> ioengine=psync, iodepth=16 ...
> >>>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync,
> >>>> iodepth=16 fio 2.0.3
> >>>> Starting 15 threads and 1 process
> >>>> fio: job startup hung? exiting.
> >>>> fio: 5 jobs failed to start
> >>>> Segmentation fault (core dumped)
> >>>> pu05#
> >>>>
> >>>>
> >>>> The reason 5 jobs failed to start is because the parent timed
> >>>> out on them immediately.
> >>>> It didn't time out on 10 of them apparently.
> >>>>
> >>>>
> >>>> if I set the timer to ACPI-fast it works as expected..
> >>>
> >>> maybe following code can check to see if TSC-LOW works by let
> >>> the thread run
> >>> on each cpu.
> >>>
> >>> gettimeofday(&prev, NULL);
> >>> int cpu = 0;
> >>> for (;;) {
> >>>      cpuset_t set;
> >>>      cpu = ++cpu % 4;
> >>>      CPU_ZERO(&set);
> >>>      CPU_SET(cpu, &set);
> >>>      pthread_setaffinity_np(pthread_self(), sizeof(set), &set);
> >>>      gettimeofday(&cur, NULL);
> >>>      if ( timercmp(&prev, &cur, >=)) {
> >>>         abort();
> >>>    }
> >>> }
> >
> > pu05# sysctl kern.timecounter.hardware=TSC-low
> > kern.timecounter.hardware: ACPI-fast -> TSC-low
> > pu05# ./test
> > ^C
> > pu05# cat test.c
> >
> > #include <stdlib.h>
> > #include <sys/param.h>
> > #include <sys/cpuset.h>
> > #include <pthread_np.h>
> >
> > #include <sys/time.h>
> >
> > main()
> > {
> >     int cpu = 0;
> >     struct timeval prev, cur;
> >
> >     gettimeofday(&prev, NULL);
> >     for (;;) {
> >          cpuset_t set;
> >          cpu = ++cpu % 4;
> >          CPU_ZERO(&set);
> >          CPU_SET(cpu, &set);
> >          pthread_setaffinity_np(pthread_self(), sizeof(set),
> > &set); gettimeofday(&cur, NULL);
> >          if ( timercmp(&prev, &cur, >)) {
> >             abort();
> >        }
> >        prev = cur;
> >     }
> > }
> >
> > pu05# ./test
> >
> > minutes pass.......
> >
> > ^C
> > pu05#
> >
> > so it looks as if the TSC is working ok..
> > I'm just going to check that the program is actually moving
> > CPU... yes it is moving around but I can't tell at what speed.
> > (according to top).
> >
> > so we are still left with a question of "where is the problem?"
> >
> > kernel TSC driver?
> > generic gettimeofday() code?
> > pthreads cond code?
> > the application?
>
> I am running the fio test on my notebook which is using TSC-low,
> it is on 9.0-RC3, I can not reproduce the problem for
> minutes, then I interrupt it with ctrl-c:
>
> http://people.freebsd.org/~davidxu/tsc_pthread/dmesg.txt
> http://people.freebsd.org/~davidxu/tsc_pthread/tc.txt
> http://people.freebsd.org/~davidxu/tsc_pthread/fio.txt

Your CPU is single-package, dual-core, and SMT-enabled.  All cores
should be in perfect sync.

Jung-uk Kim
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-threads
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pthread_cond_timedwait() broken in 9-stable? [possible answer]

Jung-uk Kim
In reply to this post by Julian Elischer-5
On Thursday 16 February 2012 08:55 pm, Julian Elischer wrote:

> kern.timecounter.tick: 1
> kern.timecounter.choice: TSC-low(1000) i8254(0) HPET(950)
> ACPI-fast(900) dummy(-1000000)
> kern.timecounter.hardware: ACPI-fast
> kern.timecounter.stepwarnings: 0
>
> switching the machine from TSC_low to ACPI-fast  fixes the problem.
>
> in 8.x it used to default to ACPI
> but I used to switch it to "TSC" to get better performance.
>
> I wonder why TSC-low is now bad to use..
> maybe the TSCs are not as well sychronised as they were in 8.x?

Can you please show us verbose dmesg output?

FYI, TSC and TSC-low are not very different.  TSC-low is just lower
resolution version of TSC for SMP.  Only difference is, we have
automated your timecounter choice, i.e., if TSCs seem reasonably
well-synchronized, select it by default but give lower resolution.  
In other words, if your TSC timecounter was never going backwards
previously, TSC-low timecounter won't, guaranteed.  So, the root
cause should be somewhere else.

Jung-uk Kim
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-threads
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

Julian Elischer-5
In reply to this post by David Xu
On 2/17/12 3:28 AM, David Xu wrote:

> On 2012/2/17 16:06, Julian Elischer wrote:
>> On 2/16/12 11:41 PM, Julian Elischer wrote:
>>> adding jkim as he seems to be the last person working with TSC.
>>>
>>>
>>> On 2/16/12 6:42 PM, David Xu wrote:
>>>> On 2012/2/17 10:19, Julian Elischer wrote:
>>>>> On 2/16/12 5:56 PM, David Xu wrote:
>>>>>> On 2012/2/17 8:42, Julian Elischer wrote:
>>>>>>> Adding David Xu for his thoughts since he reqrote the code in
>>>>>>> quesiton in revision 213098
>>>>>>>
>>>>>>> On 2/16/12 2:57 PM, Julian Elischer wrote:
>>>>>>>> On 2/16/12 1:06 PM, Julian Elischer wrote:
>>>>>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote:
>>>>>>>>>> on 15/02/2012 23:41 Julian Elischer said the following:
>>>>>>>>>>> The program fio (an IO test in ports) uses pthreads
>>>>>>>>>>>
>>>>>>>>>>> the following code (from fio-2.0.3, but its in earlier
>>>>>>>>>>> code too)
>>>>>>>>>>> has suddenly started misbehaving.
>>>>>>>>>>>
>>>>>>>>>>>          clock_gettime(CLOCK_REALTIME,&t);
>>>>>>>>>>>          t.tv_sec += seconds + 10;
>>>>>>>>>>>
>>>>>>>>>>>          pthread_mutex_lock(&mutex->lock);
>>>>>>>>>>>
>>>>>>>>>>>          while (!mutex->value&&  !ret) {
>>>>>>>>>>>                  mutex->waiters++;
>>>>>>>>>>>                  ret =
>>>>>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);
>>>>>>>>>>>                  mutex->waiters--;
>>>>>>>>>>>          }
>>>>>>>>>>>
>>>>>>>>>>>          if (!ret) {
>>>>>>>>>>>                  mutex->value--;
>>>>>>>>>>>                  pthread_mutex_unlock(&mutex->lock);
>>>>>>>>>>>          }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It turns out that 'ret' sometimes comes back instantly (on
>>>>>>>>>>> my machine) with a
>>>>>>>>>>> value of 60 (ETIMEDOUT)
>>>>>>>>>>> despite the fact that we set the timeout 10 seconds into
>>>>>>>>>>> the future.
>>>>>>>>>>>
>>>>>>>>>>> Has anyone else seen anything like this?
>>>>>>>>>>> (and yes the condition variable attribute have been set to
>>>>>>>>>>> use the REALTIME clock).
>>>>>>>>>> But why?
>>>>>>>>>>
>>>>>>>>>> Just a hypothesis that maybe there is some issue with time
>>>>>>>>>> keeping on that system.
>>>>>>>>>> How would that code work out for you with MONOTONIC?
>>>>>>>>>
>>>>>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and
>>>>>>>>> CLOCK_MONOTONIC, and they both had the same problem..
>>>>>>>>> i.e. random early returns with ETIMEDOUT.
>>>>>>>>>
>>>>>>>>> I think we will try move out machine forward to a newer
>>>>>>>>> -stable to see if it resolves.
>>>>>>>> Kan upgraded the machine today to today's 9.x branch tip and
>>>>>>>> the problem still occurs.
>>>>>>>> 8.x does not have this problem.
>>>>>>>>
>>>>>>>> I have not got a 9-RELEASE machine to test on.. so I can not
>>>>>>>> tell if this came in with the burst of stuff
>>>>>>>> that came in after the 9.x branch was unfrozen after the
>>>>>>>> release of 9.0.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> I am trying to reproduce the problem,  do you have complete
>>>>>> sample code to test ?
>>>>>
>>>>> I'm still looking the exact set
>>>>> but on my machine (4 cpus) the program from ports sysutils/fio
>>>>> exhibits the problem when used with
>>>>> kern.timecounter.hardware=TSC-low and with the following config
>>>>> file:
>>>>>
>>>>> pu05 # cat config.fio
>>>>>
>>>>> [global]
>>>>> #clocksource=cpu
>>>>> direct=1
>>>>> rw=randread
>>>>> bs=4096
>>>>> fill_device=1
>>>>> numjobs=16
>>>>> iodepth=16
>>>>> #ioengine=posixaio
>>>>> #ioengine=psync
>>>>> ioengine=psync
>>>>> group_reporting
>>>>> norandommap
>>>>> time_based
>>>>> runtime=60000
>>>>> randrepeat=0
>>>>>
>>>>> [file1]
>>>>> filename=/dev/ada0
>>>>>
>>>>> pu05 #
>>>>> pu05 # fio config.fio
>>>>> fio: this platform does not support process shared mutexes,
>>>>> forcing use of threads. Use the 'thread' option to get rid of
>>>>> this warning.
>>>>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync,
>>>>> iodepth=16
>>>>> ...
>>>>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync,
>>>>> iodepth=16
>>>>> fio 2.0.3
>>>>> Starting 15 threads and 1 process
>>>>> fio: job startup hung? exiting.
>>>>> fio: 5 jobs failed to start
>>>>> Segmentation fault (core dumped)
>>>>> pu05#
>>>>>
>>>>>
>>>>> The reason 5 jobs failed to start is because the parent timed
>>>>> out on them immediately.
>>>>> It didn't time out on 10 of them apparently.
>>>>>
>>>>>
>>>>> if I set the timer to ACPI-fast it works as expected..
>>>> maybe following code can check to see if TSC-LOW works by let the
>>>> thread run
>>>> on each cpu.
>>>>
>>>> gettimeofday(&prev, NULL);
>>>> int cpu = 0;
>>>> for (;;) {
>>>>      cpuset_t set;
>>>>      cpu = ++cpu % 4;
>>>>      CPU_ZERO(&set);
>>>>      CPU_SET(cpu, &set);
>>>>      pthread_setaffinity_np(pthread_self(), sizeof(set), &set);
>>>>      gettimeofday(&cur, NULL);
>>>>      if ( timercmp(&prev, &cur, >=)) {
>>>>         abort();
>>>>    }
>>>> }
>>>>
>>>>
>>
>> pu05# sysctl kern.timecounter.hardware=TSC-low
>> kern.timecounter.hardware: ACPI-fast -> TSC-low
>> pu05# ./test
>> ^C
>> pu05# cat test.c
>>
>> #include <stdlib.h>
>> #include <sys/param.h>
>> #include <sys/cpuset.h>
>> #include <pthread_np.h>
>>
>> #include <sys/time.h>
>>
>> main()
>> {
>>     int cpu = 0;
>>     struct timeval prev, cur;
>>
>>     gettimeofday(&prev, NULL);
>>     for (;;) {
>>          cpuset_t set;
>>          cpu = ++cpu % 4;
>>          CPU_ZERO(&set);
>>          CPU_SET(cpu, &set);
>>          pthread_setaffinity_np(pthread_self(), sizeof(set), &set);
>>          gettimeofday(&cur, NULL);
>>          if ( timercmp(&prev, &cur, >)) {
>>             abort();
>>        }
>>        prev = cur;
>>     }
>> }
>>
>> pu05# ./test
>>
>> minutes pass.......
>>
>> ^C
>> pu05#
>>
>> so it looks as if the TSC is working ok..
>> I'm just going to check that the program is actually moving CPU...
>> yes it is moving around but I can't tell at what speed. (according
>> to top).
>>
>> so we are still left with a question of "where is the problem?"
>>
>> kernel TSC driver?
>> generic gettimeofday() code?
>> pthreads cond code?
>> the application?
>>
>>
> I am running the fio test on my notebook which is using TSC-low,
> it is on 9.0-RC3, I can not reproduce the problem for
> minutes, then I interrupt it with ctrl-c: looks mot
>
> http://people.freebsd.org/~davidxu/tsc_pthread/dmesg.txt
> http://people.freebsd.org/~davidxu/tsc_pthread/tc.txt
> http://people.freebsd.org/~davidxu/tsc_pthread/fio.txt
>
>
looks normal to me..
I have to been able to test this on a 9-RELEASE machine.. just 9-stable..


_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-threads
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pthread_cond_timedwait() broken in 9-stable? (from JAN 10)

Julian Elischer-5
In reply to this post by Jung-uk Kim
> On Friday 17 February 2012 06:28 am, David Xu wrote:
>> On 2012/2/17 16:06, Julian Elischer wrote:
>>> On 2/16/12 11:41 PM, Julian Elischer wrote:
>>>> adding jkim as he seems to be the last person working with TSC.
>>>>
>>>> On 2/16/12 6:42 PM, David Xu wrote:
>>>>> On 2012/2/17 10:19, Julian Elischer wrote:
>>>>>> On 2/16/12 5:56 PM, David Xu wrote:
>>>>>>> On 2012/2/17 8:42, Julian Elischer wrote:
>>>>>>>> Adding David Xu for his thoughts since he reqrote the code
>>>>>>>> in quesiton in revision 213098
>>>>>>>>
>>>>>>>> On 2/16/12 2:57 PM, Julian Elischer wrote:
>>>>>>>>> On 2/16/12 1:06 PM, Julian Elischer wrote:
>>>>>>>>>> On 2/16/12 9:34 AM, Andriy Gapon wrote:
>>>>>>>>>>> on 15/02/2012 23:41 Julian Elischer said the following:
>>>>>>>>>>>> The program fio (an IO test in ports) uses pthreads
>>>>>>>>>>>>
>>>>>>>>>>>> the following code (from fio-2.0.3, but its in earlier
>>>>>>>>>>>> code too) has suddenly started misbehaving.
>>>>>>>>>>>>
>>>>>>>>>>>>           clock_gettime(CLOCK_REALTIME,&t);
>>>>>>>>>>>>           t.tv_sec += seconds + 10;
>>>>>>>>>>>>
>>>>>>>>>>>>           pthread_mutex_lock(&mutex->lock);
>>>>>>>>>>>>
>>>>>>>>>>>>           while (!mutex->value&&   !ret) {
>>>>>>>>>>>>                   mutex->waiters++;
>>>>>>>>>>>>                   ret =
>>>>>>>>>>>> pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);
>>>>>>>>>>>>                   mutex->waiters--;
>>>>>>>>>>>>           }
>>>>>>>>>>>>
>>>>>>>>>>>>           if (!ret) {
>>>>>>>>>>>>                   mutex->value--;
>>>>>>>>>>>>                   pthread_mutex_unlock(&mutex->lock);
>>>>>>>>>>>>           }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> It turns out that 'ret' sometimes comes back instantly
>>>>>>>>>>>> (on my machine) with a
>>>>>>>>>>>> value of 60 (ETIMEDOUT)
>>>>>>>>>>>> despite the fact that we set the timeout 10 seconds into
>>>>>>>>>>>> the future.
>>>>>>>>>>>>
>>>>>>>>>>>> Has anyone else seen anything like this?
>>>>>>>>>>>> (and yes the condition variable attribute have been set
>>>>>>>>>>>> to use the REALTIME clock).
>>>>>>>>>>> But why?
>>>>>>>>>>>
>>>>>>>>>>> Just a hypothesis that maybe there is some issue with
>>>>>>>>>>> time keeping on that system.
>>>>>>>>>>> How would that code work out for you with MONOTONIC?
>>>>>>>>>> Jens Axboe, (CC'd) tried both CLOCK_REALTIME and
>>>>>>>>>> CLOCK_MONOTONIC, and they both had the same problem..
>>>>>>>>>> i.e. random early returns with ETIMEDOUT.
>>>>>>>>>>
>>>>>>>>>> I think we will try move out machine forward to a newer
>>>>>>>>>> -stable to see if it resolves.
>>>>>>>>> Kan upgraded the machine today to today's 9.x branch tip
>>>>>>>>> and the problem still occurs.
>>>>>>>>> 8.x does not have this problem.
>>>>>>>>>
>>>>>>>>> I have not got a 9-RELEASE machine to test on.. so I can
>>>>>>>>> not tell if this came in with the burst of stuff
>>>>>>>>> that came in after the 9.x branch was unfrozen after the
>>>>>>>>> release of 9.0.
>>>>>>> I am trying to reproduce the problem,  do you have complete
>>>>>>> sample code to test ?
>>>>>> I'm still looking the exact set
>>>>>> but on my machine (4 cpus) the program from ports sysutils/fio
>>>>>> exhibits the problem when used with
>>>>>> kern.timecounter.hardware=TSC-low and with the following
>>>>>> config file:
>>>>>>
>>>>>> pu05 # cat config.fio
>>>>>>
>>>>>> [global]
>>>>>> #clocksource=cpu
>>>>>> direct=1
>>>>>> rw=randread
>>>>>> bs=4096
>>>>>> fill_device=1
>>>>>> numjobs=16
>>>>>> iodepth=16
>>>>>> #ioengine=posixaio
>>>>>> #ioengine=psync
>>>>>> ioengine=psync
>>>>>> group_reporting
>>>>>> norandommap
>>>>>> time_based
>>>>>> runtime=60000
>>>>>> randrepeat=0
>>>>>>
>>>>>> [file1]
>>>>>> filename=/dev/ada0
>>>>>>
>>>>>> pu05 #
>>>>>> pu05 # fio config.fio
>>>>>> fio: this platform does not support process shared mutexes,
>>>>>> forcing use of threads. Use the 'thread' option to get rid of
>>>>>> this warning. file1: (g=0): rw=randread, bs=4K-4K/4K-4K,
>>>>>> ioengine=psync, iodepth=16 ...
>>>>>> file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync,
>>>>>> iodepth=16 fio 2.0.3
>>>>>> Starting 15 threads and 1 process
>>>>>> fio: job startup hung? exiting.
>>>>>> fio: 5 jobs failed to start
>>>>>> Segmentation fault (core dumped)
>>>>>> pu05#
>>>>>>
>>>>>>
>>>>>> The reason 5 jobs failed to start is because the parent timed
>>>>>> out on them immediately.
>>>>>> It didn't time out on 10 of them apparently.
>>>>>>
>>>>>>
>>>>>> if I set the timer to ACPI-fast it works as expected..
>>>>> maybe following code can check to see if TSC-LOW works by let
>>>>> the thread run
>>>>> on each cpu.
>>>>>
>>>>> gettimeofday(&prev, NULL);
>>>>> int cpu = 0;
>>>>> for (;;) {
>>>>>       cpuset_t set;
>>>>>       cpu = ++cpu % 4;
>>>>>       CPU_ZERO(&set);
>>>>>       CPU_SET(cpu,&set);
>>>>>       pthread_setaffinity_np(pthread_self(), sizeof(set),&set);
>>>>>       gettimeofday(&cur, NULL);
>>>>>       if ( timercmp(&prev,&cur,>=)) {
>>>>>          abort();
>>>>>     }
>>>>> }
>>> pu05# sysctl kern.timecounter.hardware=TSC-low
>>> kern.timecounter.hardware: ACPI-fast ->  TSC-low
>>> pu05# ./test
>>> ^C
>>> pu05# cat test.c
>>>
>>> #include<stdlib.h>
>>> #include<sys/param.h>
>>> #include<sys/cpuset.h>
>>> #include<pthread_np.h>
>>>
>>> #include<sys/time.h>
>>>
>>> main()
>>> {
>>>      int cpu = 0;
>>>      struct timeval prev, cur;
>>>
>>>      gettimeofday(&prev, NULL);
>>>      for (;;) {
>>>           cpuset_t set;
>>>           cpu = ++cpu % 4;
>>>           CPU_ZERO(&set);
>>>           CPU_SET(cpu,&set);
>>>           pthread_setaffinity_np(pthread_self(), sizeof(set),
>>> &set); gettimeofday(&cur, NULL);
>>>           if ( timercmp(&prev,&cur,>)) {
>>>              abort();
>>>         }
>>>         prev = cur;
>>>      }
>>> }
>>>
>>> pu05# ./test
>>>
>>> minutes pass.......
>>>
>>> ^C
>>> pu05#
>>>
>>> so it looks as if the TSC is working ok..
>>> I'm just going to check that the program is actually moving
>>> CPU... yes it is moving around but I can't tell at what speed.
>>> (according to top).
>>>
>>> so we are still left with a question of "where is the problem?"
>>>
>>> kernel TSC driver?
>>> generic gettimeofday() code?
>>> pthreads cond code?
>>> the application?
>> I am running the fio test on my notebook which is using TSC-low,
>> it is on 9.0-RC3, I can not reproduce the problem for
>> minutes, then I interrupt it with ctrl-c:
>>
>> http://people.freebsd.org/~davidxu/tsc_pthread/dmesg.txt
>> http://people.freebsd.org/~davidxu/tsc_pthread/tc.txt
>> http://people.freebsd.org/~davidxu/tsc_pthread/fio.txt
> Your CPU is single-package, dual-core, and SMT-enabled.  All cores
> should be in perfect sync.
>
> Jung-uk Kim
>

mine is too, yet it still has problems..
CPU: Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz (2500.14-MHz
K8-class CPU)
   Origin = "GenuineIntel"  Id = 0x10676  Family = 6  Model = 17  
Stepping = 6
   
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
   
Features2=0xce3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1>
   AMD Features=0x20100800<SYSCALL,NX,LM>
   AMD Features2=0x1<LAHF>
   TSC: P-state invariant, performance statistics
real memory  = 8589934592 (8192 MB)
avail memory = 8214368256 (7833 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <PTLTD          APIC >
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s)
  cpu0 (BSP): APIC ID:  0
  cpu1 (AP): APIC ID:  1
  cpu2 (AP): APIC ID:  2
  cpu3 (AP): APIC ID:  3
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 24-47 on motherboard

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-threads
To unsubscribe, send any mail to "[hidden email]"
12
Loading...