|
Hi, the kde team is seeing some strange problems with the new version (4.8.1) of devel/dbus-qt4 with current. It does work with stable. I also suspect that the problem described below is affecting the experimental cinnamon port (an alternative to gnome3, possible replacement of gnome2). The problem happens with both i386 and amd64 with empty /etc/malloc.conf and simple /etc/make.conf. Everything compiled with base gcc (no clang). The kernel was compiled with no debug support, but it can enable if needed. There are reports from [hidden email] of the same behavior with clang compiled world and kernel and with MALLOC_PRODUCTION=yes. When qdbus starts, it segfauts. The backtrace of the problem with r234769 can be found here: http://pastebin.com/ryBXtqGF. When starting the qdbus daemon by hand in a X+twm session, we see it calls calloc many times and after a fixed number of times segfaults. We see it segfaults at rb_gen (a quite large macro defined at $SRC_BASE/contrib/jemalloc/include/jemalloc/internal/rb.h). If the daemon is started by hand, I'm able to skip all the calls qdbus makes to calloc till the one causing the segfault. At that point, at rb_gen, we don't exactly know what is going on or how to debug the macro. Ktrace are available, but we were unable to find anything new from them. With old versions of current before the jemalloc imports (as of March 30th) the daemon segfaulted at malloc.c:2426. With revisions during April 20 to 24th (can be more precise, it was during the jemalloc imports) the daemon segfaulted at malloc_init. Bts are available if needed, and if necessary I can go back to those revision and recompile world+kernel to see its behavior. Any help from freebsd-current@ (perhaps Jason Evans can help us) will be appreciated. Any additional info, like source revisions, can be provided. I would like to stress that the experimental devel/dbus-qt4 works fine with recent stable. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[hidden email]" |
|
Hi,
Please install valgrind and run the program inside valgrind. See what kind of errors it generates. Adrian _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Gustau Pérez i Querol-2
On Apr 30, 2012, at 7:13 AM, Gustau Pérez i Querol wrote:
> the kde team is seeing some strange problems with the new version (4.8.1) of devel/dbus-qt4 with current. It does work with stable. I also suspect that the problem described below is affecting the experimental cinnamon port (an alternative to gnome3, possible replacement of gnome2). > > The problem happens with both i386 and amd64 with empty /etc/malloc.conf and simple /etc/make.conf. Everything compiled with base gcc (no clang). The kernel was compiled with no debug support, but it can enable if needed. There are reports from [hidden email] of the same behavior with clang compiled world and kernel and with MALLOC_PRODUCTION=yes. > > When qdbus starts, it segfauts. The backtrace of the problem with r234769 can be found here: http://pastebin.com/ryBXtqGF. When starting the qdbus daemon by hand in a X+twm session, we see it calls calloc many times and after a fixed number of times segfaults. We see it segfaults at rb_gen (a quite large macro defined at $SRC_BASE/contrib/jemalloc/include/jemalloc/internal/rb.h). > > If the daemon is started by hand, I'm able to skip all the calls qdbus makes to calloc till the one causing the segfault. At that point, at rb_gen, we don't exactly know what is going on or how to debug the macro. Ktrace are available, but we were unable to find anything new from them. > > With old versions of current before the jemalloc imports (as of March 30th) the daemon segfaulted at malloc.c:2426. With revisions during April 20 to 24th (can be more precise, it was during the jemalloc imports) the daemon segfaulted at malloc_init. Bts are available if needed, and if necessary I can go back to those revision and recompile world+kernel to see its behavior. > > Any help from freebsd-current@ (perhaps Jason Evans can help us) will be appreciated. Any additional info, like source revisions, can be provided. I would like to stress that the experimental devel/dbus-qt4 works fine with recent stable. The crash is happening in page run management, so there is some pretty bad memory corruption going on by the time of the crash. If I understand you correctly, you have reproduced the crash on a system that does *not* have MALLOC_PRODUCTION defined, which means that none of the assertions in jemalloc caught the problem. Adrian Chadd made the excellent suggestion of trying valgrind; it's likely to point out the problem almost immediately. If that doesn't work, the utrace functionality in malloc may help you figure out what activity has occurred by the time of the crash, and give you a better understanding of what happened to memory around the address that is involved in the crash. Jason_______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[hidden email]" |
|
Al 30/04/2012 21:34, En/na Jason Evans ha escrit:
> On Apr 30, 2012, at 7:13 AM, Gustau Pérez i Querol wrote: >> the kde team is seeing some strange problems with the new version (4.8.1) of devel/dbus-qt4 with current. It does work with stable. I also suspect that the problem described below is affecting the experimental cinnamon port (an alternative to gnome3, possible replacement of gnome2). >> >> The problem happens with both i386 and amd64 with empty /etc/malloc.conf and simple /etc/make.conf. Everything compiled with base gcc (no clang). The kernel was compiled with no debug support, but it can enable if needed. There are reports from [hidden email] of the same behavior with clang compiled world and kernel and with MALLOC_PRODUCTION=yes. >> >> When qdbus starts, it segfauts. The backtrace of the problem with r234769 can be found here: http://pastebin.com/ryBXtqGF. When starting the qdbus daemon by hand in a X+twm session, we see it calls calloc many times and after a fixed number of times segfaults. We see it segfaults at rb_gen (a quite large macro defined at $SRC_BASE/contrib/jemalloc/include/jemalloc/internal/rb.h). >> >> If the daemon is started by hand, I'm able to skip all the calls qdbus makes to calloc till the one causing the segfault. At that point, at rb_gen, we don't exactly know what is going on or how to debug the macro. Ktrace are available, but we were unable to find anything new from them. >> >> With old versions of current before the jemalloc imports (as of March 30th) the daemon segfaulted at malloc.c:2426. With revisions during April 20 to 24th (can be more precise, it was during the jemalloc imports) the daemon segfaulted at malloc_init. Bts are available if needed, and if necessary I can go back to those revision and recompile world+kernel to see its behavior. >> >> Any help from freebsd-current@ (perhaps Jason Evans can help us) will be appreciated. Any additional info, like source revisions, can be provided. I would like to stress that the experimental devel/dbus-qt4 works fine with recent stable. > The crash is happening in page run management, so there is some pretty bad memory corruption going on by the time of the crash. If I understand you correctly, you have reproduced the crash on a system that does *not* have MALLOC_PRODUCTION defined, which means that none of the assertions in jemalloc caught the problem. > > Adrian Chadd made the excellent suggestion of trying valgrind; it's likely to point out the problem almost immediately. If that doesn't work, the utrace functionality in malloc may help you figure out what activity has occurred by the time of the crash, and give you a better understanding of what happened to memory around the address that is involved in the crash. Thanks all for your suggestions. It would appear devel/dbus-qt4 has some problems with multithread management, the daemon has a problem which consists in starting a lot of threads and leading it to be finished due to stack exhaustion. Valgrind suggested to increase the stack size, doing so made things even worse; the qdbus daemon was able to spawn even more threads, causing the machine to need more memory than the physically allocated (that is, it started to use swap). So the problem seems to be not related to jemalloc or malloc. As the experimental 4.8.1 devel/dbus-qt4 port works fine in stable, the problem has do to with some differences between head and stable. When we get more hints where the problem is, I will post them in a new thread in freebsd-current@. Anyhow, thanks again for your suggestions! Gus _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[hidden email]" |
|
On Tue, May 1, 2012 at 8:18 PM, Gustau Pérez i Querol
<[hidden email]> wrote: > So the problem seems to be not related to jemalloc or malloc. As the > experimental 4.8.1 devel/dbus-qt4 port works fine in stable, the problem has > do to with some differences between head and stable. When we get more hints > where the problem is, I will post them in a new thread in freebsd-current@. Gus has been away for a while, but before disappearing he found a workaround to be building devel/dbus-qt4 with -fno-use-cxa-atexit. So I had a look around, and found this NetBSD bug report: http://www.archivum.info/fa.netbsd.bugs/2007-12/00070/lib-37654-libc's-atexit_mutex-should-be-fully-recursive.html Since qdbus crashes after exit(3) here too, that might be an explanation. Or, at least, something related. kib@ and kan@ are CCed as per avg@ suggestion. -- Alberto Villa, FreeBSD committer <[hidden email]> http://people.FreeBSD.org/~avilla _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[hidden email]" |
|
On Fri, May 18, 2012 at 07:01:25PM +0200, Alberto Villa wrote:
> On Tue, May 1, 2012 at 8:18 PM, Gustau P?rez i Querol > <[hidden email]> wrote: > > So the problem seems to be not related to jemalloc or malloc. As the > > experimental 4.8.1 devel/dbus-qt4 port works fine in stable, the problem has > > do to with some differences between head and stable. When we get more hints > > where the problem is, I will post them in a new thread in freebsd-current@. > > Gus has been away for a while, but before disappearing he found a > workaround to be building devel/dbus-qt4 with -fno-use-cxa-atexit. So > I had a look around, and found this NetBSD bug report: > http://www.archivum.info/fa.netbsd.bugs/2007-12/00070/lib-37654-libc's-atexit_mutex-should-be-fully-recursive.html > > Since qdbus crashes after exit(3) here too, that might be an > explanation. Or, at least, something related. > > kib@ and kan@ are CCed as per avg@ suggestion. The reference to NetBSD is completely meaningless, we drop atexit_mutex when calling registered atexit handlers. At least bother to provide useful bug report if you suspect a bug in base system and want it fixed. |
|
In reply to this post by Alberto Villa-3
on 18/05/2012 20:01 Alberto Villa said the following:
> On Tue, May 1, 2012 at 8:18 PM, Gustau Pérez i Querol > <[hidden email]> wrote: >> So the problem seems to be not related to jemalloc or malloc. As the >> experimental 4.8.1 devel/dbus-qt4 port works fine in stable, the problem has >> do to with some differences between head and stable. When we get more hints >> where the problem is, I will post them in a new thread in freebsd-current@. > > Gus has been away for a while, but before disappearing he found a > workaround to be building devel/dbus-qt4 with -fno-use-cxa-atexit. So > I had a look around, and found this NetBSD bug report: > http://www.archivum.info/fa.netbsd.bugs/2007-12/00070/lib-37654-libc's-atexit_mutex-should-be-fully-recursive.html > > Since qdbus crashes after exit(3) here too, that might be an > explanation. Or, at least, something related. > > kib@ and kan@ are CCed as per avg@ suggestion. Alberto, you have add new people to the discussion, but unfortunately too little of the original context is present here... That is, this email doesn't even include a description of an actual problem. Could you please provide the useful context either as a link to a mailing list archive or in some other equally useful way? Thank you! -- Andriy Gapon _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[hidden email]" |
|
On Fri, May 18, 2012 at 11:28 PM, Andriy Gapon <[hidden email]> wrote:
> you have add new people to the discussion, but unfortunately too little of the > original context is present here... That is, this email doesn't even include a > description of an actual problem. > Could you please provide the useful context either as a link to a mailing list > archive or in some other equally useful way? Sorry, Gmail showed the thread with all the history, but I see that in the archives it's considered as two different conversations. Here's the original thread: http://lists.freebsd.org/pipermail/freebsd-current/2012-April/033547.html I think I understand that the NetBSD problem is not related to our case, Also, Gustau told me that he narrowed the problem down to __pthread_cxa_finalize. He will add new information very soon, anyway. -- Alberto Villa, FreeBSD committer <[hidden email]> http://people.FreeBSD.org/~avilla _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[hidden email]" |
|
On Sat, May 19, 2012 at 12:16:59AM +0200, Alberto Villa wrote:
> On Fri, May 18, 2012 at 11:28 PM, Andriy Gapon <[hidden email]> wrote: > > you have add new people to the discussion, but unfortunately too little of the > > original context is present here... That is, this email doesn't even include a > > description of an actual problem. > > Could you please provide the useful context either as a link to a mailing list > > archive or in some other equally useful way? > > Sorry, Gmail showed the thread with all the history, but I see that in > the archives it's considered as two different conversations. > > Here's the original thread: > http://lists.freebsd.org/pipermail/freebsd-current/2012-April/033547.html > > I think I understand that the NetBSD problem is not related to our > case, Also, Gustau told me that he narrowed the problem down to > __pthread_cxa_finalize. He will add new information very soon, anyway. shows 'Unknown Paste ID!'. That said, why do you think that the problem is in system and not in the application ? The fact that the issue does not manifests itself under stable/9 is not enough to arrive at this conclusion. |
|
On Sat, May 19, 2012 at 12:37 AM, Konstantin Belousov
<[hidden email]> wrote: > Well, there is still not much to read. And, http://pastebin.com/ryBXtqGF. > shows 'Unknown Paste ID!'. Eh, sorry, Gus will provide updated data. > That said, why do you think that the problem is in system and not in the > application ? The fact that the issue does not manifests itself under > stable/9 is not enough to arrive at this conclusion. We thought it because it suddenly appeared, but neither me nor Gus are sure of this. We asked for help because this is affecting the whole Qt update, and as a kde@ member this is a major concern for me (and many others, I guess). Whether the issue will be found in the system or in the application is mostly of no interest. That said, if there is no information to examine at the moment, let's just wait for Gus mail. Sorry for the noise, then. -- Alberto Villa, FreeBSD committer <[hidden email]> http://people.FreeBSD.org/~avilla _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[hidden email]" |
|
On Sat, May 19, 2012 at 12:49:02AM +0200, Alberto Villa wrote:
> On Sat, May 19, 2012 at 12:37 AM, Konstantin Belousov > <[hidden email]> wrote: > > Well, there is still not much to read. And, http://pastebin.com/ryBXtqGF. > > shows 'Unknown Paste ID!'. > > Eh, sorry, Gus will provide updated data. > > > That said, why do you think that the problem is in system and not in the > > application ? The fact that the issue does not manifests itself under > > stable/9 is not enough to arrive at this conclusion. > > We thought it because it suddenly appeared, but neither me nor Gus are > sure of this. We asked for help because this is affecting the whole Qt > update, and as a kde@ member this is a major concern for me (and many > others, I guess). Whether the issue will be found in the system or in > the application is mostly of no interest. > > That said, if there is no information to examine at the moment, let's > just wait for Gus mail. Sorry for the noise, then. to my test box). |
|
On Sat, May 19, 2012 at 12:52 AM, Konstantin Belousov
<[hidden email]> wrote: > How to reproduce the issue locally ? (I do not want to install all KDE > to my test box). Just build devel/dbus-qt4 on 10-CURRENT and run qdbus. It should crash (should you have D-Bus running, which you probably don't have, it would first print all D-Bus connections and then crash on exit). -- Alberto Villa, FreeBSD committer <[hidden email]> http://people.FreeBSD.org/~avilla _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Gustau Pérez i Querol-2
On 2012/4/30 22:13, Gustau Pérez i Querol wrote:
> > Hi, > > the kde team is seeing some strange problems with the new version > (4.8.1) of devel/dbus-qt4 with current. It does work with stable. I > also suspect that the problem described below is affecting the > experimental cinnamon port (an alternative to gnome3, possible > replacement of gnome2). > > The problem happens with both i386 and amd64 with empty > /etc/malloc.conf and simple /etc/make.conf. Everything compiled with > base gcc (no clang). The kernel was compiled with no debug support, > but it can enable if needed. There are reports from [hidden email] > of the same behavior with clang compiled world and kernel and with > MALLOC_PRODUCTION=yes. > > When qdbus starts, it segfauts. The backtrace of the problem with > r234769 can be found here: http://pastebin.com/ryBXtqGF. When starting > the qdbus daemon by hand in a X+twm session, we see it calls calloc > many times and after a fixed number of times segfaults. We see it > segfaults at rb_gen (a quite large macro defined at > $SRC_BASE/contrib/jemalloc/include/jemalloc/internal/rb.h). > > If the daemon is started by hand, I'm able to skip all the calls > qdbus makes to calloc till the one causing the segfault. At that > point, at rb_gen, we don't exactly know what is going on or how to > debug the macro. Ktrace are available, but we were unable to find > anything new from them. > > With old versions of current before the jemalloc imports (as of > March 30th) the daemon segfaulted at malloc.c:2426. With revisions > during April 20 to 24th (can be more precise, it was during the > jemalloc imports) the daemon segfaulted at malloc_init. Bts are > available if needed, and if necessary I can go back to those revision > and recompile world+kernel to see its behavior. > > Any help from freebsd-current@ (perhaps Jason Evans can help us) > will be appreciated. Any additional info, like source revisions, can > be provided. I would like to stress that the experimental > devel/dbus-qt4 works fine with recent stable. > problem is in QT, it deleted current_thread_data_key, but it still uses it in some cxa hooks, I applied the following patch, and it works fine. --- qthread_unix.cpp 2012-05-20 13:23:09.000000000 +0800 +++ qthread_unix_new.cpp 2012-05-20 13:22:45.000000000 +0800 @@ -156,7 +156,7 @@ { pthread_key_delete(current_thread_data_key); } -Q_DESTRUCTOR_FUNCTION(destroy_current_thread_data_key) +//Q_DESTRUCTOR_FUNCTION(destroy_current_thread_data_key) // Utility functions for getting, setting and clearing thread specific data. --- the Q_DESTRUCTOR_FUNCTION defined global a C++ object, and in its destructor, it deletes the current_thread_data_key, but in other cxa hooks, the key is still needed. So, finally the QT library crashed. I think the bug depends on linking order in QT library ? if the qthread_unix.cpp is linked as lastest module, the key will be deleted after all cxa hooks run, then it will be fine, otherwise, it would crash. This sounds like a bug in QT. Regards, David Xu _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[hidden email]" |
|
On Sun, May 20, 2012 at 8:03 AM, David Xu <[hidden email]> wrote:
> qdbus segfaults on my machine too, I tracked it down, and found the problem > is in QT, > it deleted current_thread_data_key, but it still uses it in some cxa hooks, > I applied the > following patch, and it works fine. Thanks for the analysis David! > I think the bug depends on linking order in QT library ? if the > qthread_unix.cpp is linked > as lastest module, the key will be deleted after all cxa hooks run, then it > will be fine, > otherwise, it would crash. Is this really possible? -- Alberto Villa, FreeBSD committer <[hidden email]> http://people.FreeBSD.org/~avilla _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[hidden email]" |
|
On Sun, May 20, 2012 at 06:42:35PM +0200, Alberto Villa wrote:
> On Sun, May 20, 2012 at 8:03 AM, David Xu <[hidden email]> wrote: > > qdbus segfaults on my machine too, I tracked it down, and found the problem > > is in QT, > > it deleted current_thread_data_key, but it still uses it in some cxa hooks, > > I applied the > > following patch, and it works fine. > > Thanks for the analysis David! > > > I think the bug depends on linking order in QT library ? if the > > qthread_unix.cpp is linked > > as lastest module, the key will be deleted after all cxa hooks run, then it > > will be fine, > > otherwise, it would crash. > > Is this really possible? The only possibility for something weird happen is for atexit/__cxa_atexit functions to be registered from another atexit function, and then we indeed could call the newly registered function too late. I wonder if the following hack makes any change in the observed behaviour. diff --git a/lib/libc/stdlib/atexit.c b/lib/libc/stdlib/atexit.c index 511172a..bab850c 100644 --- a/lib/libc/stdlib/atexit.c +++ b/lib/libc/stdlib/atexit.c @@ -72,6 +72,7 @@ struct atexit { }; static struct atexit *__atexit; /* points to head of LIFO stack */ +static int atexit_gen; /* * Register the function described by 'fptr' to be called at application @@ -107,6 +108,7 @@ atexit_register(struct atexit_fn *fptr) __atexit = p; } p->fns[p->ind++] = *fptr; + atexit_gen++; _MUTEX_UNLOCK(&atexit_mutex); return 0; } @@ -162,7 +164,7 @@ __cxa_finalize(void *dso) struct dl_phdr_info phdr_info; struct atexit *p; struct atexit_fn fn; - int n, has_phdr; + int atexit_gen_prev, n, has_phdr; if (dso != NULL) has_phdr = _rtld_addr_phdr(dso, &phdr_info); @@ -170,6 +172,8 @@ __cxa_finalize(void *dso) has_phdr = 0; _MUTEX_LOCK(&atexit_mutex); +retry: + atexit_gen_prev = atexit_gen; for (p = __atexit; p; p = p->next) { for (n = p->ind; --n >= 0;) { if (p->fns[n].fn_type == ATEXIT_FN_EMPTY) @@ -196,6 +200,8 @@ __cxa_finalize(void *dso) _MUTEX_LOCK(&atexit_mutex); } } + if (atexit_gen_prev != atexit_gen) + goto retry; _MUTEX_UNLOCK(&atexit_mutex); if (dso == NULL) _MUTEX_DESTROY(&atexit_mutex); |
|
On 2012/5/21 1:24, Konstantin Belousov wrote:
> On Sun, May 20, 2012 at 06:42:35PM +0200, Alberto Villa wrote: >> On Sun, May 20, 2012 at 8:03 AM, David Xu<[hidden email]> wrote: >>> qdbus segfaults on my machine too, I tracked it down, and found the problem >>> is in QT, >>> it deleted current_thread_data_key, but it still uses it in some cxa hooks, >>> I applied the >>> following patch, and it works fine. >> Thanks for the analysis David! >> >>> I think the bug depends on linking order in QT library ? if the >>> qthread_unix.cpp is linked >>> as lastest module, the key will be deleted after all cxa hooks run, then it >>> will be fine, >>> otherwise, it would crash. >> Is this really possible? > No, I do not think it is possible. > > The only possibility for something weird happen is for atexit/__cxa_atexit > functions to be registered from another atexit function, and then we > indeed could call the newly registered function too late. > > I wonder if the following hack makes any change in the observed behaviour. > > diff --git a/lib/libc/stdlib/atexit.c b/lib/libc/stdlib/atexit.c > index 511172a..bab850c 100644 > --- a/lib/libc/stdlib/atexit.c > +++ b/lib/libc/stdlib/atexit.c > @@ -72,6 +72,7 @@ struct atexit { > }; > > static struct atexit *__atexit; /* points to head of LIFO stack */ > +static int atexit_gen; > > /* > * Register the function described by 'fptr' to be called at application > @@ -107,6 +108,7 @@ atexit_register(struct atexit_fn *fptr) > __atexit = p; > } > p->fns[p->ind++] = *fptr; > + atexit_gen++; > _MUTEX_UNLOCK(&atexit_mutex); > return 0; > } > @@ -162,7 +164,7 @@ __cxa_finalize(void *dso) > struct dl_phdr_info phdr_info; > struct atexit *p; > struct atexit_fn fn; > - int n, has_phdr; > + int atexit_gen_prev, n, has_phdr; > > if (dso != NULL) > has_phdr = _rtld_addr_phdr(dso,&phdr_info); > @@ -170,6 +172,8 @@ __cxa_finalize(void *dso) > has_phdr = 0; > > _MUTEX_LOCK(&atexit_mutex); > +retry: > + atexit_gen_prev = atexit_gen; > for (p = __atexit; p; p = p->next) { > for (n = p->ind; --n>= 0;) { > if (p->fns[n].fn_type == ATEXIT_FN_EMPTY) > @@ -196,6 +200,8 @@ __cxa_finalize(void *dso) > _MUTEX_LOCK(&atexit_mutex); > } > } > + if (atexit_gen_prev != atexit_gen) > + goto retry; > _MUTEX_UNLOCK(&atexit_mutex); > if (dso == NULL) > _MUTEX_DESTROY(&atexit_mutex); a bug in QT, the bug is pthread key current_thread_data_key is deleted by a global C++ object too early, other C++ global objects still need this pthread key. The following procedure shows how I found the problem: davidxu@xyf:~%gdb qdbus GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd"...(no debugging symbols found)... (gdb) break __cxa_finalize Function "__cxa_finalize" not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 1 (__cxa_finalize) pending. (gdb) run Starting program: /usr/local/bin/qdbus (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...[New LWP 100077] (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...Breakpoint 2 at 0x2864ac26 Pending breakpoint "__cxa_finalize" resolved (no debugging symbols found)...[New Thread 29007300 (LWP 100077/qdbus)] (no debugging symbols found)...:1.0 org.gnome.SessionManager :1.11 :1.111 :1.12 :1.13 org.gtk.vfs.Daemon :1.143 :1.15 org.pulseaudio.Server :1.17 org.gnome.Panel :1.18 :1.19 :1.20 org.gtk.Private.HalVolumeMonitor :1.21 org.gtk.Private.GPhoto2VolumeMonitor :1.22 :1.24 org.gnome.ScreenSaver :1.25 :1.27 :1.28 :1.29 :1.30 :1.31 org.gnome.panel.applet.WnckletFactory :1.32 :1.33 :1.34 :1.35 org.gnome.panel.applet.CPUFreqAppletFactory :1.36 org.gnome.panel.applet.NotificationAreaAppletFactory :1.37 org.gnome.panel.applet.MultiLoadAppletFactory :1.38 :1.39 :1.4 org.gnome.GConf :1.41 org.gnome.panel.applet.ClockAppletFactory :1.49 :1.5 org.gnome.SettingsDaemon :1.50 :1.53 :1.64 :1.7 org.freedesktop.secrets org.gnome.keyring :1.75 org.gtk.vfs.Metadata :1.76 org.gnome.Terminal.Display_0_0 :1.77 org.freedesktop.DBus [Switching to Thread 29007300 (LWP 100077/qdbus)] Breakpoint 2, 0x2864ac26 in __cxa_finalize () from /lib/libc.so.7 (gdb) print current_thread_data_key $1 = 0 (gdb) thread tsd Key 0, destructor 0x281d77f0 <_Z27destroy_current_thread_dataPv> Key 1, destructor 0x28732dc0 <g_thread_create_full> Key 2, destructor 0x28726a00 <g_slice_get_config> Key 3, destructor 0x0 <???> Here you can find that the function destroy_current_thread_data() is registered. (gdb) bt #0 0x2864ac26 in __cxa_finalize () from /lib/libc.so.7 #1 0x285efe1a in exit () from /lib/libc.so.7 #2 0x08051db5 in main () (gdb) break QThreadData::current() Breakpoint 3 at 0x281d7856 (gdb) info breakpoints Num Type Disp Enb Address What 2 breakpoint keep y 0x2864ac26 <__cxa_finalize+6> breakpoint already hit 1 time 3 breakpoint keep y 0x281d7856 <QThreadData::current()+6> (gdb) delete 2 (gdb) cont Continuing. Breakpoint 3, 0x281d7856 in QThreadData::current () from /usr/local/lib/qt4/libQtCore.so.4 (gdb) bt #0 0x281d7856 in QThreadData::current () from /usr/local/lib/qt4/libQtCore.so.4 #1 0x281d4747 in QThread::currentThread () from /usr/local/lib/qt4/libQtCore.so.4 #2 0x28097248 in QDBusConnectionPrivate::deleteYourself () from /usr/local/lib/qt4/libQtDBus.so.4 #3 0x2808f2ea in QDBusConnection::~QDBusConnection () from /usr/local/lib/qt4/libQtDBus.so.4 #4 0x2864ad8f in __cxa_finalize () from /lib/libc.so.7 #5 0x285efe1a in exit () from /lib/libc.so.7 #6 0x08051db5 in main () (gdb) thread tsd Key 1, destructor 0x0 <???> Key 2, destructor 0x0 <???> Key 3, destructor 0x0 <???> Here you can see the destroy_current_thread_data() was executed, and unregistered. the key current_thread_key_data which is index 0 is deleted. (gdb) cont Continuing. Breakpoint 3, 0x281d7856 in QThreadData::current () from /usr/local/lib/qt4/libQtCore.so.4 (gdb) bt #0 0x281d7856 in QThreadData::current () from /usr/local/lib/qt4/libQtCore.so.4 #1 0x282f58e3 in QObject::QObject () from /usr/local/lib/qt4/libQtCore.so.4 #2 0x281d4710 in QThread::QThread () from /usr/local/lib/qt4/libQtCore.so.4 #3 0x281d5a9e in QAdoptedThread::QAdoptedThread () from /usr/local/lib/qt4/libQtCore.so.4 #4 0x281d7934 in QThreadData::current () from /usr/local/lib/qt4/libQtCore.so.4 #5 0x281d4747 in QThread::currentThread () from /usr/local/lib/qt4/libQtCore.so.4 #6 0x28097248 in QDBusConnectionPrivate::deleteYourself () from /usr/local/lib/qt4/libQtDBus.so.4 #7 0x2808f2ea in QDBusConnection::~QDBusConnection () from /usr/local/lib/qt4/libQtDBus.so.4 #8 0x2864ad8f in __cxa_finalize () from /lib/libc.so.7 #9 0x285efe1a in exit () from /lib/libc.so.7 #10 0x08051db5 in main () now the stupid code starts to create a new thread... (gdb) cont Continuing. Breakpoint 3, 0x281d7856 in QThreadData::current () from /usr/local/lib/qt4/libQtCore.so.4 (gdb) bt #0 0x281d7856 in QThreadData::current () from /usr/local/lib/qt4/libQtCore.so.4 #1 0x282f58e3 in QObject::QObject () from /usr/local/lib/qt4/libQtCore.so.4 #2 0x281d4710 in QThread::QThread () from /usr/local/lib/qt4/libQtCore.so.4 #3 0x281d5a9e in QAdoptedThread::QAdoptedThread () from /usr/local/lib/qt4/libQtCore.so.4 #4 0x281d7934 in QThreadData::current () from /usr/local/lib/qt4/libQtCore.so.4 #5 0x282f58e3 in QObject::QObject () from /usr/local/lib/qt4/libQtCore.so.4 #6 0x281d4710 in QThread::QThread () from /usr/local/lib/qt4/libQtCore.so.4 #7 0x281d5a9e in QAdoptedThread::QAdoptedThread () from /usr/local/lib/qt4/libQtCore.so.4 #8 0x281d7934 in QThreadData::current () from /usr/local/lib/qt4/libQtCore.so.4 #9 0x281d4747 in QThread::currentThread () from /usr/local/lib/qt4/libQtCore.so.4 #10 0x28097248 in QDBusConnectionPrivate::deleteYourself () from /usr/local/lib/qt4/libQtDBus.so.4 #11 0x2808f2ea in QDBusConnection::~QDBusConnection () from /usr/local/lib/qt4/libQtDBus.so.4 #12 0x2864ad8f in __cxa_finalize () from /lib/libc.so.7 #13 0x285efe1a in exit () from /lib/libc.so.7 #14 0x08051db5 in main () (gdb) #0 0x281d7856 in QThreadData::current () from /usr/local/lib/qt4/libQtCore.so.4 #1 0x282f58e3 in QObject::QObject () from /usr/local/lib/qt4/libQtCore.so.4 #2 0x281d4710 in QThread::QThread () from /usr/local/lib/qt4/libQtCore.so.4 #3 0x281d5a9e in QAdoptedThread::QAdoptedThread () from /usr/local/lib/qt4/libQtCore.so.4 #4 0x281d7934 in QThreadData::current () from /usr/local/lib/qt4/libQtCore.so.4 #5 0x282f58e3 in QObject::QObject () from /usr/local/lib/qt4/libQtCore.so.4 #6 0x281d4710 in QThread::QThread () from /usr/local/lib/qt4/libQtCore.so.4 #7 0x281d5a9e in QAdoptedThread::QAdoptedThread () from /usr/local/lib/qt4/libQtCore.so.4 #8 0x281d7934 in QThreadData::current () from /usr/local/lib/qt4/libQtCore.so.4 #9 0x281d4747 in QThread::currentThread () from /usr/local/lib/qt4/libQtCore.so.4 #10 0x28097248 in QDBusConnectionPrivate::deleteYourself () from /usr/local/lib/qt4/libQtDBus.so.4 #11 0x2808f2ea in QDBusConnection::~QDBusConnection () from /usr/local/lib/qt4/libQtDBus.so.4 #12 0x2864ad8f in __cxa_finalize () from /lib/libc.so.7 #13 0x285efe1a in exit () from /lib/libc.so.7 #14 0x08051db5 in main () (gdb) dead-loop in QT library until the stack overflow. As I said, it depends on ordering the global objects are destructed, if the object which deleting the current_thread_data_key is destructed lastly, the problem wont happen, but now it is destructed too early. I believe there is no specification said that which C++ object should be destructed first if they are in different compiled module and then are linked together to generated a shared object, .so file. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[hidden email]" |
|
On 2012/5/21 10:54, David Xu wrote:
> On 2012/5/21 1:24, Konstantin Belousov wrote: >> On Sun, May 20, 2012 at 06:42:35PM +0200, Alberto Villa wrote: >>> On Sun, May 20, 2012 at 8:03 AM, David Xu<[hidden email]> >>> wrote: >>>> qdbus segfaults on my machine too, I tracked it down, and found the >>>> problem >>>> is in QT, >>>> it deleted current_thread_data_key, but it still uses it in some >>>> cxa hooks, >>>> I applied the >>>> following patch, and it works fine. >>> Thanks for the analysis David! >>> >>>> I think the bug depends on linking order in QT library ? if the >>>> qthread_unix.cpp is linked >>>> as lastest module, the key will be deleted after all cxa hooks run, >>>> then it >>>> will be fine, >>>> otherwise, it would crash. >>> Is this really possible? >> No, I do not think it is possible. >> >> The only possibility for something weird happen is for >> atexit/__cxa_atexit >> functions to be registered from another atexit function, and then we >> indeed could call the newly registered function too late. >> >> I wonder if the following hack makes any change in the observed >> behaviour. >> >> diff --git a/lib/libc/stdlib/atexit.c b/lib/libc/stdlib/atexit.c >> index 511172a..bab850c 100644 >> --- a/lib/libc/stdlib/atexit.c >> +++ b/lib/libc/stdlib/atexit.c >> @@ -72,6 +72,7 @@ struct atexit { >> }; >> >> static struct atexit *__atexit; /* points to head of LIFO >> stack */ >> +static int atexit_gen; >> >> /* >> * Register the function described by 'fptr' to be called at >> application >> @@ -107,6 +108,7 @@ atexit_register(struct atexit_fn *fptr) >> __atexit = p; >> } >> p->fns[p->ind++] = *fptr; >> + atexit_gen++; >> _MUTEX_UNLOCK(&atexit_mutex); >> return 0; >> } >> @@ -162,7 +164,7 @@ __cxa_finalize(void *dso) >> struct dl_phdr_info phdr_info; >> struct atexit *p; >> struct atexit_fn fn; >> - int n, has_phdr; >> + int atexit_gen_prev, n, has_phdr; >> >> if (dso != NULL) >> has_phdr = _rtld_addr_phdr(dso,&phdr_info); >> @@ -170,6 +172,8 @@ __cxa_finalize(void *dso) >> has_phdr = 0; >> >> _MUTEX_LOCK(&atexit_mutex); >> +retry: >> + atexit_gen_prev = atexit_gen; >> for (p = __atexit; p; p = p->next) { >> for (n = p->ind; --n>= 0;) { >> if (p->fns[n].fn_type == ATEXIT_FN_EMPTY) >> @@ -196,6 +200,8 @@ __cxa_finalize(void *dso) >> _MUTEX_LOCK(&atexit_mutex); >> } >> } >> + if (atexit_gen_prev != atexit_gen) >> + goto retry; >> _MUTEX_UNLOCK(&atexit_mutex); >> if (dso == NULL) >> _MUTEX_DESTROY(&atexit_mutex); > I have tried your patch, it does not fix the problem. As I said, it > is a bug in QT, > the bug is pthread key current_thread_data_key is deleted by a global > C++ object > too early, other C++ global objects still need this pthread key. The > following procedure > shows how I found the problem: > > davidxu@xyf:~%gdb qdbus > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and > you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for > details. > This GDB was configured as "i386-marcel-freebsd"...(no debugging > symbols found)... > (gdb) break __cxa_finalize > Function "__cxa_finalize" not defined. > Make breakpoint pending on future shared library load? (y or [n]) y > Breakpoint 1 (__cxa_finalize) pending. > (gdb) run > Starting program: /usr/local/bin/qdbus > (no debugging symbols found)...(no debugging symbols found)...(no > debugging symbols found)...(no debugging symbols found)...(no > debugging symbols found)...(no debugging symbols found)...(no > debugging symbols found)...[New LWP 100077] > (no debugging symbols found)...(no debugging symbols found)...(no > debugging symbols found)...(no debugging symbols found)...(no > debugging symbols found)...(no debugging symbols found)...(no > debugging symbols found)...(no debugging symbols found)...Breakpoint 2 > at 0x2864ac26 > Pending breakpoint "__cxa_finalize" resolved > (no debugging symbols found)...[New Thread 29007300 (LWP 100077/qdbus)] > (no debugging symbols found)...:1.0 > org.gnome.SessionManager > :1.11 > :1.111 > :1.12 > :1.13 > org.gtk.vfs.Daemon > :1.143 > :1.15 > org.pulseaudio.Server > :1.17 > org.gnome.Panel > :1.18 > :1.19 > :1.20 > org.gtk.Private.HalVolumeMonitor > :1.21 > org.gtk.Private.GPhoto2VolumeMonitor > :1.22 > :1.24 > org.gnome.ScreenSaver > :1.25 > :1.27 > :1.28 > :1.29 > :1.30 > :1.31 > org.gnome.panel.applet.WnckletFactory > :1.32 > :1.33 > :1.34 > :1.35 > org.gnome.panel.applet.CPUFreqAppletFactory > :1.36 > org.gnome.panel.applet.NotificationAreaAppletFactory > :1.37 > org.gnome.panel.applet.MultiLoadAppletFactory > :1.38 > :1.39 > :1.4 > org.gnome.GConf > :1.41 > org.gnome.panel.applet.ClockAppletFactory > :1.49 > :1.5 > org.gnome.SettingsDaemon > :1.50 > :1.53 > :1.64 > :1.7 > org.freedesktop.secrets > org.gnome.keyring > :1.75 > org.gtk.vfs.Metadata > :1.76 > org.gnome.Terminal.Display_0_0 > :1.77 > org.freedesktop.DBus > [Switching to Thread 29007300 (LWP 100077/qdbus)] > > Breakpoint 2, 0x2864ac26 in __cxa_finalize () from /lib/libc.so.7 > (gdb) print current_thread_data_key > $1 = 0 > (gdb) thread tsd > Key 0, destructor 0x281d77f0 <_Z27destroy_current_thread_dataPv> > Key 1, destructor 0x28732dc0 <g_thread_create_full> > Key 2, destructor 0x28726a00 <g_slice_get_config> > Key 3, destructor 0x0 <???> > > > Here you can find that the function destroy_current_thread_data() is > registered. > > > (gdb) bt > #0 0x2864ac26 in __cxa_finalize () from /lib/libc.so.7 > #1 0x285efe1a in exit () from /lib/libc.so.7 > #2 0x08051db5 in main () > (gdb) break QThreadData::current() > Breakpoint 3 at 0x281d7856 > (gdb) info breakpoints > Num Type Disp Enb Address What > 2 breakpoint keep y 0x2864ac26 <__cxa_finalize+6> > breakpoint already hit 1 time > 3 breakpoint keep y 0x281d7856 <QThreadData::current()+6> > (gdb) delete 2 > (gdb) cont > Continuing. > > Breakpoint 3, 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > (gdb) bt > #0 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #1 0x281d4747 in QThread::currentThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #2 0x28097248 in QDBusConnectionPrivate::deleteYourself () from > /usr/local/lib/qt4/libQtDBus.so.4 > #3 0x2808f2ea in QDBusConnection::~QDBusConnection () from > /usr/local/lib/qt4/libQtDBus.so.4 > #4 0x2864ad8f in __cxa_finalize () from /lib/libc.so.7 > #5 0x285efe1a in exit () from /lib/libc.so.7 > #6 0x08051db5 in main () > (gdb) thread tsd > Key 1, destructor 0x0 <???> > Key 2, destructor 0x0 <???> > Key 3, destructor 0x0 <???> > > Here you can see the destroy_current_thread_data() was executed, and > unregistered. > the key current_thread_key_data which is index 0 is deleted. > > > (gdb) cont > Continuing. > > Breakpoint 3, 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > (gdb) bt > #0 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #1 0x282f58e3 in QObject::QObject () from > /usr/local/lib/qt4/libQtCore.so.4 > #2 0x281d4710 in QThread::QThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #3 0x281d5a9e in QAdoptedThread::QAdoptedThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #4 0x281d7934 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #5 0x281d4747 in QThread::currentThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #6 0x28097248 in QDBusConnectionPrivate::deleteYourself () from > /usr/local/lib/qt4/libQtDBus.so.4 > #7 0x2808f2ea in QDBusConnection::~QDBusConnection () from > /usr/local/lib/qt4/libQtDBus.so.4 > #8 0x2864ad8f in __cxa_finalize () from /lib/libc.so.7 > #9 0x285efe1a in exit () from /lib/libc.so.7 > #10 0x08051db5 in main () > > now the stupid code starts to create a new thread... > > (gdb) cont > Continuing. > > Breakpoint 3, 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > (gdb) bt > #0 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #1 0x282f58e3 in QObject::QObject () from > /usr/local/lib/qt4/libQtCore.so.4 > #2 0x281d4710 in QThread::QThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #3 0x281d5a9e in QAdoptedThread::QAdoptedThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #4 0x281d7934 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #5 0x282f58e3 in QObject::QObject () from > /usr/local/lib/qt4/libQtCore.so.4 > #6 0x281d4710 in QThread::QThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #7 0x281d5a9e in QAdoptedThread::QAdoptedThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #8 0x281d7934 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #9 0x281d4747 in QThread::currentThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #10 0x28097248 in QDBusConnectionPrivate::deleteYourself () from > /usr/local/lib/qt4/libQtDBus.so.4 > #11 0x2808f2ea in QDBusConnection::~QDBusConnection () from > /usr/local/lib/qt4/libQtDBus.so.4 > #12 0x2864ad8f in __cxa_finalize () from /lib/libc.so.7 > #13 0x285efe1a in exit () from /lib/libc.so.7 > #14 0x08051db5 in main () > (gdb) > #0 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #1 0x282f58e3 in QObject::QObject () from > /usr/local/lib/qt4/libQtCore.so.4 > #2 0x281d4710 in QThread::QThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #3 0x281d5a9e in QAdoptedThread::QAdoptedThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #4 0x281d7934 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #5 0x282f58e3 in QObject::QObject () from > /usr/local/lib/qt4/libQtCore.so.4 > #6 0x281d4710 in QThread::QThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #7 0x281d5a9e in QAdoptedThread::QAdoptedThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #8 0x281d7934 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #9 0x281d4747 in QThread::currentThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #10 0x28097248 in QDBusConnectionPrivate::deleteYourself () from > /usr/local/lib/qt4/libQtDBus.so.4 > #11 0x2808f2ea in QDBusConnection::~QDBusConnection () from > /usr/local/lib/qt4/libQtDBus.so.4 > #12 0x2864ad8f in __cxa_finalize () from /lib/libc.so.7 > #13 0x285efe1a in exit () from /lib/libc.so.7 > #14 0x08051db5 in main () > (gdb) > > dead-loop in QT library until the stack overflow. > > As I said, it depends on ordering the global objects are destructed, > if the object which deleting > the current_thread_data_key is destructed lastly, the problem wont > happen, but now > it is destructed too early. I believe there is no specification said > that which C++ object should be > destructed first if they are in different compiled module and then are > linked together to generated > a shared object, .so file. > > QThreadData *QThreadData::current() { QThreadData *data = get_thread_data(); if (!data) { void *a; if (QInternal::activateCallbacks(QInternal::AdoptCurrentThread, &a)) { QThread *adopted = static_cast<QThread*>(a); Q_ASSERT(adopted); data = QThreadData::get2(adopted); set_thread_data(data); adopted->d_func()->running = true; adopted->d_func()->finished = false; static_cast<QAdoptedThread *>(adopted)->init(); } else { data = new QThreadData; QT_TRY { set_thread_data(data); data->thread = new QAdoptedThread(data); } QT_CATCH(...) { clear_thread_data(); data->deref(); data = 0; QT_RETHROW; } data->deref(); } if (!QCoreApplicationPrivate::theMainThread) QCoreApplicationPrivate::theMainThread = data->thread; } return data; } it calls get_thread_data(), if it returns NULL, it create a new thread, and try to set the new thread as "current thread data", it calls set_thread_data(). let's see how get_thread_data() and set_thread_data() work : static QThreadData *get_thread_data() { #ifdef Q_OS_SYMBIAN return reinterpret_cast<QThreadData *>(Dll::Tls()); #else pthread_once(¤t_thread_data_once, create_current_thread_data_key); return reinterpret_cast<QThreadData *>(pthread_getspecific(current_thread_data_key)); #endif } static void set_thread_data(QThreadData *data) { #ifdef Q_OS_SYMBIAN qt_symbian_throwIfError(Dll::SetTls(data)); #endif pthread_once(¤t_thread_data_once, create_current_thread_data_key); pthread_setspecific(current_thread_data_key, data); } They just use pthread_getspecific and pthread_setspecific, the current_thread_data_key was only created once which is guarded by pthread_once(), but as you know, the key has already been deleted by Q_DESTRUCTOR_FUNCTION(destroy_current_thread_data_key) which is a global object which has been destructed early, the key is no longer recreated, it is a stale key. So pthread_setspecific should fail, and pthread_getspecific would return NULL, this causes QThreadData::current() create a new thread, it seems QAdoptedThread also calls QThreadData::current(), so it is a recursion, the recursion is never ended until stack overflow. _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by David Xu
On 2012-05-21 04:54, David Xu wrote:
... > As I said, it depends on ordering the global objects are destructed, if > the object which deleting > the current_thread_data_key is destructed lastly, the problem wont > happen, but now > it is destructed too early. I believe there is no specification said > that which C++ object should be > destructed first if they are in different compiled module and then are > linked together to generated > a shared object, .so file. Indeed, the order in which global constructors or destructors are called is undefined. Depending on the order is a bug (a.k.a. the "static initialization order fiasco"). _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by David Xu
On Mon, May 21, 2012 at 10:54:54AM +0800, David Xu wrote:
> On 2012/5/21 1:24, Konstantin Belousov wrote: > >On Sun, May 20, 2012 at 06:42:35PM +0200, Alberto Villa wrote: > >>On Sun, May 20, 2012 at 8:03 AM, David Xu<[hidden email]> wrote: > >>>qdbus segfaults on my machine too, I tracked it down, and found the > >>>problem > >>>is in QT, > >>>it deleted current_thread_data_key, but it still uses it in some cxa > >>>hooks, > >>> I applied the > >>>following patch, and it works fine. > >>Thanks for the analysis David! > >> > >>>I think the bug depends on linking order in QT library ? if the > >>>qthread_unix.cpp is linked > >>>as lastest module, the key will be deleted after all cxa hooks run, then > >>>it > >>>will be fine, > >>>otherwise, it would crash. > >>Is this really possible? > >No, I do not think it is possible. > > > >The only possibility for something weird happen is for atexit/__cxa_atexit > >functions to be registered from another atexit function, and then we > >indeed could call the newly registered function too late. > > > >I wonder if the following hack makes any change in the observed behaviour. > > > >diff --git a/lib/libc/stdlib/atexit.c b/lib/libc/stdlib/atexit.c > >index 511172a..bab850c 100644 > >--- a/lib/libc/stdlib/atexit.c > >+++ b/lib/libc/stdlib/atexit.c > >@@ -72,6 +72,7 @@ struct atexit { > > }; > > > > static struct atexit *__atexit; /* points to head of LIFO > > stack */ > >+static int atexit_gen; > > > > /* > > * Register the function described by 'fptr' to be called at application > >@@ -107,6 +108,7 @@ atexit_register(struct atexit_fn *fptr) > > __atexit = p; > > } > > p->fns[p->ind++] = *fptr; > >+ atexit_gen++; > > _MUTEX_UNLOCK(&atexit_mutex); > > return 0; > > } > >@@ -162,7 +164,7 @@ __cxa_finalize(void *dso) > > struct dl_phdr_info phdr_info; > > struct atexit *p; > > struct atexit_fn fn; > >- int n, has_phdr; > >+ int atexit_gen_prev, n, has_phdr; > > > > if (dso != NULL) > > has_phdr = _rtld_addr_phdr(dso,&phdr_info); > >@@ -170,6 +172,8 @@ __cxa_finalize(void *dso) > > has_phdr = 0; > > > > _MUTEX_LOCK(&atexit_mutex); > >+retry: > >+ atexit_gen_prev = atexit_gen; > > for (p = __atexit; p; p = p->next) { > > for (n = p->ind; --n>= 0;) { > > if (p->fns[n].fn_type == ATEXIT_FN_EMPTY) > >@@ -196,6 +200,8 @@ __cxa_finalize(void *dso) > > _MUTEX_LOCK(&atexit_mutex); > > } > > } > >+ if (atexit_gen_prev != atexit_gen) > >+ goto retry; > > _MUTEX_UNLOCK(&atexit_mutex); > > if (dso == NULL) > > _MUTEX_DESTROY(&atexit_mutex); > I have tried your patch, it does not fix the problem. As I said, it is > a bug in QT, > the bug is pthread key current_thread_data_key is deleted by a global > C++ object > too early, other C++ global objects still need this pthread key. The > following procedure > shows how I found the problem: > > davidxu@xyf:~%gdb qdbus > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "i386-marcel-freebsd"...(no debugging symbols > found)... > (gdb) break __cxa_finalize > Function "__cxa_finalize" not defined. > Make breakpoint pending on future shared library load? (y or [n]) y > Breakpoint 1 (__cxa_finalize) pending. > (gdb) run > Starting program: /usr/local/bin/qdbus > (no debugging symbols found)...(no debugging symbols found)...(no > debugging symbols found)...(no debugging symbols found)...(no debugging > symbols found)...(no debugging symbols found)...(no debugging symbols > found)...[New LWP 100077] > (no debugging symbols found)...(no debugging symbols found)...(no > debugging symbols found)...(no debugging symbols found)...(no debugging > symbols found)...(no debugging symbols found)...(no debugging symbols > found)...(no debugging symbols found)...Breakpoint 2 at 0x2864ac26 > Pending breakpoint "__cxa_finalize" resolved > (no debugging symbols found)...[New Thread 29007300 (LWP 100077/qdbus)] > (no debugging symbols found)...:1.0 > org.gnome.SessionManager > :1.11 > :1.111 > :1.12 > :1.13 > org.gtk.vfs.Daemon > :1.143 > :1.15 > org.pulseaudio.Server > :1.17 > org.gnome.Panel > :1.18 > :1.19 > :1.20 > org.gtk.Private.HalVolumeMonitor > :1.21 > org.gtk.Private.GPhoto2VolumeMonitor > :1.22 > :1.24 > org.gnome.ScreenSaver > :1.25 > :1.27 > :1.28 > :1.29 > :1.30 > :1.31 > org.gnome.panel.applet.WnckletFactory > :1.32 > :1.33 > :1.34 > :1.35 > org.gnome.panel.applet.CPUFreqAppletFactory > :1.36 > org.gnome.panel.applet.NotificationAreaAppletFactory > :1.37 > org.gnome.panel.applet.MultiLoadAppletFactory > :1.38 > :1.39 > :1.4 > org.gnome.GConf > :1.41 > org.gnome.panel.applet.ClockAppletFactory > :1.49 > :1.5 > org.gnome.SettingsDaemon > :1.50 > :1.53 > :1.64 > :1.7 > org.freedesktop.secrets > org.gnome.keyring > :1.75 > org.gtk.vfs.Metadata > :1.76 > org.gnome.Terminal.Display_0_0 > :1.77 > org.freedesktop.DBus > [Switching to Thread 29007300 (LWP 100077/qdbus)] > > Breakpoint 2, 0x2864ac26 in __cxa_finalize () from /lib/libc.so.7 > (gdb) print current_thread_data_key > $1 = 0 > (gdb) thread tsd > Key 0, destructor 0x281d77f0 <_Z27destroy_current_thread_dataPv> > Key 1, destructor 0x28732dc0 <g_thread_create_full> > Key 2, destructor 0x28726a00 <g_slice_get_config> > Key 3, destructor 0x0 <???> > > > Here you can find that the function destroy_current_thread_data() is > registered. > > > (gdb) bt > #0 0x2864ac26 in __cxa_finalize () from /lib/libc.so.7 > #1 0x285efe1a in exit () from /lib/libc.so.7 > #2 0x08051db5 in main () > (gdb) break QThreadData::current() > Breakpoint 3 at 0x281d7856 > (gdb) info breakpoints > Num Type Disp Enb Address What > 2 breakpoint keep y 0x2864ac26 <__cxa_finalize+6> > breakpoint already hit 1 time > 3 breakpoint keep y 0x281d7856 <QThreadData::current()+6> > (gdb) delete 2 > (gdb) cont > Continuing. > > Breakpoint 3, 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > (gdb) bt > #0 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #1 0x281d4747 in QThread::currentThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #2 0x28097248 in QDBusConnectionPrivate::deleteYourself () from > /usr/local/lib/qt4/libQtDBus.so.4 > #3 0x2808f2ea in QDBusConnection::~QDBusConnection () from > /usr/local/lib/qt4/libQtDBus.so.4 > #4 0x2864ad8f in __cxa_finalize () from /lib/libc.so.7 > #5 0x285efe1a in exit () from /lib/libc.so.7 > #6 0x08051db5 in main () > (gdb) thread tsd > Key 1, destructor 0x0 <???> > Key 2, destructor 0x0 <???> > Key 3, destructor 0x0 <???> > > Here you can see the destroy_current_thread_data() was executed, and > unregistered. > the key current_thread_key_data which is index 0 is deleted. > > > (gdb) cont > Continuing. > > Breakpoint 3, 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > (gdb) bt > #0 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #1 0x282f58e3 in QObject::QObject () from /usr/local/lib/qt4/libQtCore.so.4 > #2 0x281d4710 in QThread::QThread () from /usr/local/lib/qt4/libQtCore.so.4 > #3 0x281d5a9e in QAdoptedThread::QAdoptedThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #4 0x281d7934 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #5 0x281d4747 in QThread::currentThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #6 0x28097248 in QDBusConnectionPrivate::deleteYourself () from > /usr/local/lib/qt4/libQtDBus.so.4 > #7 0x2808f2ea in QDBusConnection::~QDBusConnection () from > /usr/local/lib/qt4/libQtDBus.so.4 > #8 0x2864ad8f in __cxa_finalize () from /lib/libc.so.7 > #9 0x285efe1a in exit () from /lib/libc.so.7 > #10 0x08051db5 in main () > > now the stupid code starts to create a new thread... > > (gdb) cont > Continuing. > > Breakpoint 3, 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > (gdb) bt > #0 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #1 0x282f58e3 in QObject::QObject () from /usr/local/lib/qt4/libQtCore.so.4 > #2 0x281d4710 in QThread::QThread () from /usr/local/lib/qt4/libQtCore.so.4 > #3 0x281d5a9e in QAdoptedThread::QAdoptedThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #4 0x281d7934 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #5 0x282f58e3 in QObject::QObject () from /usr/local/lib/qt4/libQtCore.so.4 > #6 0x281d4710 in QThread::QThread () from /usr/local/lib/qt4/libQtCore.so.4 > #7 0x281d5a9e in QAdoptedThread::QAdoptedThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #8 0x281d7934 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #9 0x281d4747 in QThread::currentThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #10 0x28097248 in QDBusConnectionPrivate::deleteYourself () from > /usr/local/lib/qt4/libQtDBus.so.4 > #11 0x2808f2ea in QDBusConnection::~QDBusConnection () from > /usr/local/lib/qt4/libQtDBus.so.4 > #12 0x2864ad8f in __cxa_finalize () from /lib/libc.so.7 > #13 0x285efe1a in exit () from /lib/libc.so.7 > #14 0x08051db5 in main () > (gdb) > #0 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #1 0x282f58e3 in QObject::QObject () from /usr/local/lib/qt4/libQtCore.so.4 > #2 0x281d4710 in QThread::QThread () from /usr/local/lib/qt4/libQtCore.so.4 > #3 0x281d5a9e in QAdoptedThread::QAdoptedThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #4 0x281d7934 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #5 0x282f58e3 in QObject::QObject () from /usr/local/lib/qt4/libQtCore.so.4 > #6 0x281d4710 in QThread::QThread () from /usr/local/lib/qt4/libQtCore.so.4 > #7 0x281d5a9e in QAdoptedThread::QAdoptedThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #8 0x281d7934 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #9 0x281d4747 in QThread::currentThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #10 0x28097248 in QDBusConnectionPrivate::deleteYourself () from > /usr/local/lib/qt4/libQtDBus.so.4 > #11 0x2808f2ea in QDBusConnection::~QDBusConnection () from > /usr/local/lib/qt4/libQtDBus.so.4 > #12 0x2864ad8f in __cxa_finalize () from /lib/libc.so.7 > #13 0x285efe1a in exit () from /lib/libc.so.7 > #14 0x08051db5 in main () > (gdb) > > dead-loop in QT library until the stack overflow. > > As I said, it depends on ordering the global objects are destructed, if > the object which deleting > the current_thread_data_key is destructed lastly, the problem wont > happen, but now > it is destructed too early. I believe there is no specification said > that which C++ object should be > destructed first if they are in different compiled module and then are > linked together to generated > a shared object, .so file. My previous note about 'cannot happen' was about __pthread_cxa_finalize() being called before all atexit hooks for given dso are executed. And patch should fix one corner case there, which is apparently not relevant to the problem in hand. Standard for C++ only mandates order of ctr/dtr calls for objects defined in single compilation unit. There is gcc extension that indeed allows to specify order of constructor/destructor calls for objects contained in the given dso regardless of their location in source code, see init_priority() object attribute. |
|
In reply to this post by David Xu
>> > Now let me dig into qthread_unix.cpp, see how QThreadData::current() > works: > > QThreadData *QThreadData::current() > { > QThreadData *data = get_thread_data(); > if (!data) { > void *a; > if > (QInternal::activateCallbacks(QInternal::AdoptCurrentThread, &a)) { > QThread *adopted = static_cast<QThread*>(a); > Q_ASSERT(adopted); > data = QThreadData::get2(adopted); > set_thread_data(data); > adopted->d_func()->running = true; > adopted->d_func()->finished = false; > static_cast<QAdoptedThread *>(adopted)->init(); > } else { > data = new QThreadData; > QT_TRY { > set_thread_data(data); > data->thread = new QAdoptedThread(data); > } QT_CATCH(...) { > clear_thread_data(); > data->deref(); > data = 0; > QT_RETHROW; > } > data->deref(); > } > if (!QCoreApplicationPrivate::theMainThread) > QCoreApplicationPrivate::theMainThread = data->thread; > } > return data; > } > > it calls get_thread_data(), if it returns NULL, it create a new > thread, and try to > set the new thread as "current thread data", it calls set_thread_data(). > > let's see how get_thread_data() and set_thread_data() work : > > static QThreadData *get_thread_data() > { > #ifdef Q_OS_SYMBIAN > return reinterpret_cast<QThreadData *>(Dll::Tls()); > #else > pthread_once(¤t_thread_data_once, > create_current_thread_data_key); > return reinterpret_cast<QThreadData > *>(pthread_getspecific(current_thread_data_key)); > #endif > } > > static void set_thread_data(QThreadData *data) > { > #ifdef Q_OS_SYMBIAN > qt_symbian_throwIfError(Dll::SetTls(data)); > #endif > pthread_once(¤t_thread_data_once, > create_current_thread_data_key); > pthread_setspecific(current_thread_data_key, data); > } > > > They just use pthread_getspecific and pthread_setspecific, the > current_thread_data_key was only > created once which is guarded by pthread_once(), but as you know, the > key has already > been deleted by Q_DESTRUCTOR_FUNCTION(destroy_current_thread_data_key) > which is a global > object which has been destructed early, the key is no longer > recreated, it is a stale key. > I was able to debug until the point where qthread_unix.cpp spawns a new thread because the get_thread_data call returns 0. I was unable to reach the full analysis, but now I get it. The explanation seems fine to me, thanks. What I don't get is why it works in stable. The functions registered to be executed at exit (atexit_register hasn't changed) get registered in same order in both branches (at least I checked them by printing the two atexit structures when calling exit in both stable and head). Wouldn't that mean that the problem of deleting the current_thread_data_key should happen in both branches? Gus _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[hidden email]" |
| Powered by Nabble | Edit this page |
