New condvar implementation that provides stronger ordering guarantees.

This is a new implementation for condition variables, required after http://austingroupbugs.net/view.php?id=609 to fix bug 13165. In essence, we need to be stricter in which waiters a signal or broadcast is required to wake up; this couldn't be solved using the old algorithm. ISO C++ made a similar clarification, so this also fixes a bug in current libstdc++, for example. We can't use the old algorithm anymore because futexes do not guarantee to wake in FIFO order. Thus, when we wake, we can't simply let any waiter grab a signal, but we need to ensure that one of the waiters happening before the signal is woken up. This is something the previous algorithm violated (see bug 13165). There's another issue specific to condvars: ABA issues on the underlying futexes. Unlike mutexes that have just three states, or semaphores that have no tokens or a limited number of them, the state of a condvar is the *order* of the waiters. A waiter on a semaphore can grab a token whenever one is available; a condvar waiter must only consume a signal if it is eligible to do so as determined by the relative order of the waiter and the signal. Therefore, this new algorithm maintains two groups of waiters: Those eligible to consume signals (G1), and those that have to wait until previous waiters have consumed signals (G2). Once G1 is empty, G2 becomes the new G1. 64b counters are used to avoid ABA issues. This condvar doesn't yet use a requeue optimization (ie, on a broadcast, waking just one thread and requeueing all others on the futex of the mutex supplied by the program). I don't think doing the requeue is necessarily the right approach (but I haven't done real measurements yet): * If a program expects to wake many threads at the same time and make that scalable, a condvar isn't great anyway because of how it requires waiters to operate mutually exclusive (due to the mutex usage). Thus, a thundering herd problem is a scalability problem with or without the optimization. Using something like a semaphore might be more appropriate in such a case. * The scalability problem is actually at the mutex side; the condvar could help (and it tries to with the requeue optimization), but it should be the mutex who decides how that is done, and whether it is done at all. * Forcing all but one waiter into the kernel-side wait queue of the mutex prevents/avoids the use of lock elision on the mutex. Thus, it prevents the only cure against the underlying scalability problem inherent to condvars. * If condvars use short critical sections (ie, hold the mutex just to check a binary flag or such), which they should do ideally, then forcing all those waiter to proceed serially with kernel-based hand-off (ie, futex ops in the mutex' contended state, via the futex wait queues) will be less efficient than just letting a scalable mutex implementation take care of it. Our current mutex impl doesn't employ spinning at all, but if critical sections are short, spinning can be much better. * Doing the requeue stuff requires all waiters to always drive the mutex into the contended state. This leads to each waiter having to call futex_wake after lock release, even if this wouldn't be necessary. [BZ #13165] * nptl/pthread_cond_broadcast.c (__pthread_cond_broadcast): Rewrite to use new algorithm. * nptl/pthread_cond_destroy.c (__pthread_cond_destroy): Likewise. * nptl/pthread_cond_init.c (__pthread_cond_init): Likewise. * nptl/pthread_cond_signal.c (__pthread_cond_signal): Likewise. * nptl/pthread_cond_wait.c (__pthread_cond_wait): Likewise. (__pthread_cond_timedwait): Move here from pthread_cond_timedwait.c. (__condvar_confirm_wakeup, __condvar_cancel_waiting, __condvar_cleanup_waiting, __condvar_dec_grefs, __pthread_cond_wait_common): New. (__condvar_cleanup): Remove. * npt/pthread_condattr_getclock.c (pthread_condattr_getclock): Adapt. * npt/pthread_condattr_setclock.c (pthread_condattr_setclock): Likewise. * npt/pthread_condattr_getpshared.c (pthread_condattr_getpshared): Likewise. * npt/pthread_condattr_init.c (pthread_condattr_init): Likewise. * nptl/tst-cond1.c: Add comment. * nptl/tst-cond20.c (do_test): Adapt. * nptl/tst-cond22.c (do_test): Likewise. * sysdeps/aarch64/nptl/bits/pthreadtypes.h (pthread_cond_t): Adapt structure. * sysdeps/arm/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/ia64/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/m68k/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/microblaze/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/mips/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/nios2/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/s390/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/sh/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/tile/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/unix/sysv/linux/alpha/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/unix/sysv/linux/powerpc/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/x86/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/nptl/internaltypes.h (COND_NWAITERS_SHIFT): Remove. (COND_CLOCK_BITS): Adapt. * sysdeps/nptl/pthread.h (PTHREAD_COND_INITIALIZER): Adapt. * nptl/pthreadP.h (__PTHREAD_COND_CLOCK_MONOTONIC_MASK, __PTHREAD_COND_SHARED_MASK): New. * nptl/nptl-printers.py (CLOCK_IDS): Remove. (ConditionVariablePrinter, ConditionVariableAttributesPrinter): Adapt. * nptl/nptl_lock_constants.pysym: Adapt. * nptl/test-cond-printers.py: Adapt. * sysdeps/unix/sysv/linux/hppa/internaltypes.h (cond_compat_clear, cond_compat_check_and_clear): Adapt. * sysdeps/unix/sysv/linux/hppa/pthread_cond_timedwait.c: Remove file ... * sysdeps/unix/sysv/linux/hppa/pthread_cond_wait.c (__pthread_cond_timedwait): ... and move here. * nptl/DESIGN-condvar.txt: Remove file. * nptl/lowlevelcond.sym: Likewise. * nptl/pthread_cond_timedwait.c: Likewise. * sysdeps/unix/sysv/linux/i386/i486/pthread_cond_broadcast.S: Likewise. * sysdeps/unix/sysv/linux/i386/i486/pthread_cond_signal.S: Likewise. * sysdeps/unix/sysv/linux/i386/i486/pthread_cond_timedwait.S: Likewise. * sysdeps/unix/sysv/linux/i386/i486/pthread_cond_wait.S: Likewise. * sysdeps/unix/sysv/linux/i386/i586/pthread_cond_broadcast.S: Likewise. * sysdeps/unix/sysv/linux/i386/i586/pthread_cond_signal.S: Likewise. * sysdeps/unix/sysv/linux/i386/i586/pthread_cond_timedwait.S: Likewise. * sysdeps/unix/sysv/linux/i386/i586/pthread_cond_wait.S: Likewise. * sysdeps/unix/sysv/linux/i386/i686/pthread_cond_broadcast.S: Likewise. * sysdeps/unix/sysv/linux/i386/i686/pthread_cond_signal.S: Likewise. * sysdeps/unix/sysv/linux/i386/i686/pthread_cond_timedwait.S: Likewise. * sysdeps/unix/sysv/linux/i386/i686/pthread_cond_wait.S: Likewise. * sysdeps/unix/sysv/linux/x86_64/pthread_cond_broadcast.S: Likewise. * sysdeps/unix/sysv/linux/x86_64/pthread_cond_signal.S: Likewise. * sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S: Likewise. * sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: Likewise.
author: Torvald Riegel <triegel@redhat.com> 2016-05-25 23:43:36 +0200
committer: Torvald Riegel <triegel@redhat.com> 2016-12-31 14:56:47 +0100
commit: ed19993b5b0d05d62cc883571519a67dae481a14 (patch)
tree: 8956d8320ba5bb051cfdf76ba8f8d2f6e1907898 /nptl/pthread_cond_broadcast.c
parent: c0ff3befa9861171498dd29666d32899e9d8145b (diff)
1 files changed, 49 insertions, 50 deletions
diff --git a/nptl/pthread_cond_broadcast.c b/nptl/pthread_cond_broadcast.c
index 552fd42f60..87c07552cf 100644
--- a/nptl/pthread_cond_broadcast.c
+++ b/nptl/pthread_cond_broadcast.c
@@ -19,72 +19,71 @@
 #include <endian.h>
 #include <errno.h>
 #include <sysdep.h>
-#include <lowlevellock.h>
+#include <futex-internal.h>
 #include <pthread.h>
 #include <pthreadP.h>
 #include <stap-probe.h>
+#include <atomic.h>
 
 #include <shlib-compat.h>
-#include <kernel-features.h>
 
+#include "pthread_cond_common.c"
 
+
+/* We do the following steps from __pthread_cond_signal in one critical
+   section: (1) signal all waiters in G1, (2) close G1 so that it can become
+   the new G2 and make G2 the new G1, and (3) signal all waiters in the new
+   G1.  We don't need to do all these steps if there are no waiters in G1
+   and/or G2.  See __pthread_cond_signal for further details.  */
 int
 __pthread_cond_broadcast (pthread_cond_t *cond)
 {
   LIBC_PROBE (cond_broadcast, 1, cond);
 
-  int pshared = (cond->__data.__mutex == (void *) ~0l)
-		? LLL_SHARED : LLL_PRIVATE;
-  /* Make sure we are alone.  */
-  lll_lock (cond->__data.__lock, pshared);
+  unsigned int wrefs = atomic_load_relaxed (&cond->__data.__wrefs);
+  if (wrefs >> 3 == 0)
+    return 0;
+  int private = __condvar_get_private (wrefs);
+
+  __condvar_acquire_lock (cond, private);
 
-  /* Are there any waiters to be woken?  */
-  if (cond->__data.__total_seq > cond->__data.__wakeup_seq)
+  unsigned long long int wseq = __condvar_load_wseq_relaxed (cond);
+  unsigned int g2 = wseq & 1;
+  unsigned int g1 = g2 ^ 1;
+  wseq >>= 1;
+  bool do_futex_wake = false;
+
+  /* Step (1): signal all waiters remaining in G1.  */
+  if (cond->__data.__g_size[g1] != 0)
     {
-      /* Yes.  Mark them all as woken.  */
-      cond->__data.__wakeup_seq = cond->__data.__total_seq;
-      cond->__data.__woken_seq = cond->__data.__total_seq;
-      cond->__data.__futex = (unsigned int) cond->__data.__total_seq * 2;
-      int futex_val = cond->__data.__futex;
-      /* Signal that a broadcast happened.  */
-      ++cond->__data.__broadcast_seq;
-
-      /* We are done.  */
-      lll_unlock (cond->__data.__lock, pshared);
-
-      /* Wake everybody.  */
-      pthread_mutex_t *mut = (pthread_mutex_t *) cond->__data.__mutex;
-
-      /* Do not use requeue for pshared condvars.  */
-      if (mut == (void *) ~0l
-	  || PTHREAD_MUTEX_PSHARED (mut) & PTHREAD_MUTEX_PSHARED_BIT)
-	goto wake_all;
-
-#if (defined lll_futex_cmp_requeue_pi \
-     && defined __ASSUME_REQUEUE_PI)
-      if (USE_REQUEUE_PI (mut))
-	{
-	  if (lll_futex_cmp_requeue_pi (&cond->__data.__futex, 1, INT_MAX,
-					&mut->__data.__lock, futex_val,
-					LLL_PRIVATE) == 0)
-	    return 0;
-	}
-      else
-#endif
-	/* lll_futex_requeue returns 0 for success and non-zero
-	   for errors.  */
-	if (!__builtin_expect (lll_futex_requeue (&cond->__data.__futex, 1,
-						  INT_MAX, &mut->__data.__lock,
-						  futex_val, LLL_PRIVATE), 0))
-	  return 0;
-
-wake_all:
-      lll_futex_wake (&cond->__data.__futex, INT_MAX, pshared);
-      return 0;
+      /* Add as many signals as the remaining size of the group.  */
+      atomic_fetch_add_relaxed (cond->__data.__g_signals + g1,
+				cond->__data.__g_size[g1] << 1);
+      cond->__data.__g_size[g1] = 0;
+
+      /* We need to wake G1 waiters before we quiesce G1 below.  */
+      /* TODO Only set it if there are indeed futex waiters.  We could
+	 also try to move this out of the critical section in cases when
+	 G2 is empty (and we don't need to quiesce).  */
+      futex_wake (cond->__data.__g_signals + g1, INT_MAX, private);
     }
 
-  /* We are done.  */
-  lll_unlock (cond->__data.__lock, pshared);
+  /* G1 is complete.  Step (2) is next unless there are no waiters in G2, in
+     which case we can stop.  */
+  if (__condvar_quiesce_and_switch_g1 (cond, wseq, &g1, private))
+    {
+      /* Step (3): Send signals to all waiters in the old G2 / new G1.  */
+      atomic_fetch_add_relaxed (cond->__data.__g_signals + g1,
+				cond->__data.__g_size[g1] << 1);
+      cond->__data.__g_size[g1] = 0;
+      /* TODO Only set it if there are indeed futex waiters.  */
+      do_futex_wake = true;
+    }
+
+  __condvar_release_lock (cond, private);
+
+  if (do_futex_wake)
+    futex_wake (cond->__data.__g_signals + g1, INT_MAX, private);
 
   return 0;
 }
author	Torvald Riegel <triegel@redhat.com>	2016-05-25 23:43:36 +0200
committer	Torvald Riegel <triegel@redhat.com>	2016-12-31 14:56:47 +0100
commit	ed19993b5b0d05d62cc883571519a67dae481a14 (patch)
tree	8956d8320ba5bb051cfdf76ba8f8d2f6e1907898 /nptl/pthread_cond_broadcast.c
parent	c0ff3befa9861171498dd29666d32899e9d8145b (diff)