summaryrefslogtreecommitdiff
path: root/sysdeps/x86
AgeCommit message (Collapse)Author
2017-01-10New pthread rwlock that is more scalable.Torvald Riegel
This replaces the pthread rwlock with a new implementation that uses a more scalable algorithm (primarily through not using a critical section anymore to make state changes). The fast path for rdlock acquisition and release is now basically a single atomic read-modify write or CAS and a few branches. See nptl/pthread_rwlock_common.c for details. * nptl/DESIGN-rwlock.txt: Remove. * nptl/lowlevelrwlock.sym: Remove. * nptl/Makefile: Add new tests. * nptl/pthread_rwlock_common.c: New file. Contains the new rwlock. * nptl/pthreadP.h (PTHREAD_RWLOCK_PREFER_READER_P): Remove. (PTHREAD_RWLOCK_WRPHASE, PTHREAD_RWLOCK_WRLOCKED, PTHREAD_RWLOCK_RWAITING, PTHREAD_RWLOCK_READER_SHIFT, PTHREAD_RWLOCK_READER_OVERFLOW, PTHREAD_RWLOCK_WRHANDOVER, PTHREAD_RWLOCK_FUTEX_USED): New. * nptl/pthread_rwlock_init.c (__pthread_rwlock_init): Adapt to new implementation. * nptl/pthread_rwlock_rdlock.c (__pthread_rwlock_rdlock_slow): Remove. (__pthread_rwlock_rdlock): Adapt. * nptl/pthread_rwlock_timedrdlock.c (pthread_rwlock_timedrdlock): Adapt. * nptl/pthread_rwlock_timedwrlock.c (pthread_rwlock_timedwrlock): Adapt. * nptl/pthread_rwlock_trywrlock.c (pthread_rwlock_trywrlock): Adapt. * nptl/pthread_rwlock_tryrdlock.c (pthread_rwlock_tryrdlock): Adapt. * nptl/pthread_rwlock_unlock.c (pthread_rwlock_unlock): Adapt. * nptl/pthread_rwlock_wrlock.c (__pthread_rwlock_wrlock_slow): Remove. (__pthread_rwlock_wrlock): Adapt. * nptl/tst-rwlock10.c: Adapt. * nptl/tst-rwlock11.c: Adapt. * nptl/tst-rwlock17.c: New file. * nptl/tst-rwlock18.c: New file. * nptl/tst-rwlock19.c: New file. * nptl/tst-rwlock2b.c: New file. * nptl/tst-rwlock8.c: Adapt. * nptl/tst-rwlock9.c: Adapt. * sysdeps/aarch64/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/arm/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/hppa/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/ia64/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/m68k/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/microblaze/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/mips/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/nios2/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/s390/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/sh/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/sparc/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/tile/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/unix/sysv/linux/alpha/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/unix/sysv/linux/powerpc/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/x86/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * nptl/nptl-printers.py (): Adapt. * nptl/nptl_lock_constants.pysym: Adapt. * nptl/test-rwlock-printers.py: Adapt. * nptl/test-rwlockattr-printers.c: Adapt. * nptl/test-rwlockattr-printers.py: Adapt.
2017-01-01Update copyright dates with scripts/update-copyrights.Joseph Myers
2016-12-31New condvar implementation that provides stronger ordering guarantees.Torvald Riegel
This is a new implementation for condition variables, required after http://austingroupbugs.net/view.php?id=609 to fix bug 13165. In essence, we need to be stricter in which waiters a signal or broadcast is required to wake up; this couldn't be solved using the old algorithm. ISO C++ made a similar clarification, so this also fixes a bug in current libstdc++, for example. We can't use the old algorithm anymore because futexes do not guarantee to wake in FIFO order. Thus, when we wake, we can't simply let any waiter grab a signal, but we need to ensure that one of the waiters happening before the signal is woken up. This is something the previous algorithm violated (see bug 13165). There's another issue specific to condvars: ABA issues on the underlying futexes. Unlike mutexes that have just three states, or semaphores that have no tokens or a limited number of them, the state of a condvar is the *order* of the waiters. A waiter on a semaphore can grab a token whenever one is available; a condvar waiter must only consume a signal if it is eligible to do so as determined by the relative order of the waiter and the signal. Therefore, this new algorithm maintains two groups of waiters: Those eligible to consume signals (G1), and those that have to wait until previous waiters have consumed signals (G2). Once G1 is empty, G2 becomes the new G1. 64b counters are used to avoid ABA issues. This condvar doesn't yet use a requeue optimization (ie, on a broadcast, waking just one thread and requeueing all others on the futex of the mutex supplied by the program). I don't think doing the requeue is necessarily the right approach (but I haven't done real measurements yet): * If a program expects to wake many threads at the same time and make that scalable, a condvar isn't great anyway because of how it requires waiters to operate mutually exclusive (due to the mutex usage). Thus, a thundering herd problem is a scalability problem with or without the optimization. Using something like a semaphore might be more appropriate in such a case. * The scalability problem is actually at the mutex side; the condvar could help (and it tries to with the requeue optimization), but it should be the mutex who decides how that is done, and whether it is done at all. * Forcing all but one waiter into the kernel-side wait queue of the mutex prevents/avoids the use of lock elision on the mutex. Thus, it prevents the only cure against the underlying scalability problem inherent to condvars. * If condvars use short critical sections (ie, hold the mutex just to check a binary flag or such), which they should do ideally, then forcing all those waiter to proceed serially with kernel-based hand-off (ie, futex ops in the mutex' contended state, via the futex wait queues) will be less efficient than just letting a scalable mutex implementation take care of it. Our current mutex impl doesn't employ spinning at all, but if critical sections are short, spinning can be much better. * Doing the requeue stuff requires all waiters to always drive the mutex into the contended state. This leads to each waiter having to call futex_wake after lock release, even if this wouldn't be necessary. [BZ #13165] * nptl/pthread_cond_broadcast.c (__pthread_cond_broadcast): Rewrite to use new algorithm. * nptl/pthread_cond_destroy.c (__pthread_cond_destroy): Likewise. * nptl/pthread_cond_init.c (__pthread_cond_init): Likewise. * nptl/pthread_cond_signal.c (__pthread_cond_signal): Likewise. * nptl/pthread_cond_wait.c (__pthread_cond_wait): Likewise. (__pthread_cond_timedwait): Move here from pthread_cond_timedwait.c. (__condvar_confirm_wakeup, __condvar_cancel_waiting, __condvar_cleanup_waiting, __condvar_dec_grefs, __pthread_cond_wait_common): New. (__condvar_cleanup): Remove. * npt/pthread_condattr_getclock.c (pthread_condattr_getclock): Adapt. * npt/pthread_condattr_setclock.c (pthread_condattr_setclock): Likewise. * npt/pthread_condattr_getpshared.c (pthread_condattr_getpshared): Likewise. * npt/pthread_condattr_init.c (pthread_condattr_init): Likewise. * nptl/tst-cond1.c: Add comment. * nptl/tst-cond20.c (do_test): Adapt. * nptl/tst-cond22.c (do_test): Likewise. * sysdeps/aarch64/nptl/bits/pthreadtypes.h (pthread_cond_t): Adapt structure. * sysdeps/arm/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/ia64/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/m68k/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/microblaze/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/mips/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/nios2/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/s390/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/sh/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/tile/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/unix/sysv/linux/alpha/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/unix/sysv/linux/powerpc/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/x86/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/nptl/internaltypes.h (COND_NWAITERS_SHIFT): Remove. (COND_CLOCK_BITS): Adapt. * sysdeps/nptl/pthread.h (PTHREAD_COND_INITIALIZER): Adapt. * nptl/pthreadP.h (__PTHREAD_COND_CLOCK_MONOTONIC_MASK, __PTHREAD_COND_SHARED_MASK): New. * nptl/nptl-printers.py (CLOCK_IDS): Remove. (ConditionVariablePrinter, ConditionVariableAttributesPrinter): Adapt. * nptl/nptl_lock_constants.pysym: Adapt. * nptl/test-cond-printers.py: Adapt. * sysdeps/unix/sysv/linux/hppa/internaltypes.h (cond_compat_clear, cond_compat_check_and_clear): Adapt. * sysdeps/unix/sysv/linux/hppa/pthread_cond_timedwait.c: Remove file ... * sysdeps/unix/sysv/linux/hppa/pthread_cond_wait.c (__pthread_cond_timedwait): ... and move here. * nptl/DESIGN-condvar.txt: Remove file. * nptl/lowlevelcond.sym: Likewise. * nptl/pthread_cond_timedwait.c: Likewise. * sysdeps/unix/sysv/linux/i386/i486/pthread_cond_broadcast.S: Likewise. * sysdeps/unix/sysv/linux/i386/i486/pthread_cond_signal.S: Likewise. * sysdeps/unix/sysv/linux/i386/i486/pthread_cond_timedwait.S: Likewise. * sysdeps/unix/sysv/linux/i386/i486/pthread_cond_wait.S: Likewise. * sysdeps/unix/sysv/linux/i386/i586/pthread_cond_broadcast.S: Likewise. * sysdeps/unix/sysv/linux/i386/i586/pthread_cond_signal.S: Likewise. * sysdeps/unix/sysv/linux/i386/i586/pthread_cond_timedwait.S: Likewise. * sysdeps/unix/sysv/linux/i386/i586/pthread_cond_wait.S: Likewise. * sysdeps/unix/sysv/linux/i386/i686/pthread_cond_broadcast.S: Likewise. * sysdeps/unix/sysv/linux/i386/i686/pthread_cond_signal.S: Likewise. * sysdeps/unix/sysv/linux/i386/i686/pthread_cond_timedwait.S: Likewise. * sysdeps/unix/sysv/linux/i386/i686/pthread_cond_wait.S: Likewise. * sysdeps/unix/sysv/linux/x86_64/pthread_cond_broadcast.S: Likewise. * sysdeps/unix/sysv/linux/x86_64/pthread_cond_signal.S: Likewise. * sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S: Likewise. * sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: Likewise.
2016-12-19Disable TSX on some Haswell processors.Andrew Senkevich
Patch disables Intel TSX on some Haswell processors to avoid TSX on kernels that weren't updated with the latest microcode package (which disables broken feature by default). * sysdeps/x86/cpu-features.c (get_common_indeces): Add stepping identification. (init_cpu_features): Add handle of Haswell.
2016-12-14Refactor long double information into bits/long-double.h.Joseph Myers
Information about whether the ABI of long double is the same as that of double is split between bits/mathdef.h and bits/wordsize.h. When the ABIs are the same, bits/mathdef.h defines __NO_LONG_DOUBLE_MATH. In addition, in the case where the same glibc binary supports both -mlong-double-64 and -mlong-double-128, bits/wordsize.h defines __LONG_DOUBLE_MATH_OPTIONAL, along with __NO_LONG_DOUBLE_MATH if this particular compilation is with -mlong-double-64. As part of the refactoring I proposed in <https://sourceware.org/ml/libc-alpha/2016-11/msg00745.html>, this patch puts all that information in a single header, bits/long-double.h. It is included from sys/cdefs.h alongside the include of bits/wordsize.h, so other headers generally do not need to include bits/long-double.h directly. Previously, various bits/mathdef.h headers and bits/wordsize.h headers had this long double information (including implicitly in some bits/mathdef.h headers through not having the defines present in the default version). After the patch, it's all in six bits/long-double.h headers. Furthermore, most of those new headers are not architecture-specific. Architectures with optional long double all use the ldbl-opt sysdeps directory, either in the order (ldbl-64-128, ldbl-opt, ldbl-128) or (ldbl-128ibm, ldbl-opt). Thus a generic header for the case where long double = double, and headers in ldbl-128, ldbl-96 and ldbl-opt, suffices to cover every architecture except for cases where long double properties vary between different ABIs sharing a set of installed headers; fortunately all the ldbl-opt cases share a single compiler-predefined macro __LONG_DOUBLE_128__ that can be used to tell whether this compilation is -mlong-double-64 or -mlong-double-128. The two cases where a set of headers is shared between ABIs with different long double properties, MIPS (o32 has long double = double, other ABIs use ldbl-128) and SPARC (32-bit has optional long double, 64-bit has required long double), need their own bits/long-double.h headers. As with bits/wordsize.h, multiple-include protection for this header is generally implicit through the include guards on sys/cdefs.h, and multiple inclusion is harmless in any case. There is one subtlety: the header must not define __LONG_DOUBLE_MATH_OPTIONAL if __NO_LONG_DOUBLE_MATH was defined before its inclusion, because doing so breaks how sysdeps/ieee754/ldbl-opt/nldbl-compat.h defines __NO_LONG_DOUBLE_MATH itself before including system headers. Subject to keeping that working, it would be reasonable to move these macros from defined/undefined #ifdef to always-defined 1/0 #if semantics, but this patch does not attempt to do so, just rearranges where the macros are defined. After this patch, the only use of bits/mathdef.h is the alpha one for modifying complex function ABIs for old GCC. Thus, all versions of the header other than the default and alpha versions are removed, as is the include from math.h. Tested for x86_64 and x86. Also did compilation-only testing with build-many-glibcs.py. * bits/long-double.h: New file. * sysdeps/ieee754/ldbl-128/bits/long-double.h: Likewise. * sysdeps/ieee754/ldbl-96/bits/long-double.h: Likewise. * sysdeps/ieee754/ldbl-opt/bits/long-double.h: Likewise. * sysdeps/mips/bits/long-double.h: Likewise. * sysdeps/unix/sysv/linux/sparc/bits/long-double.h: Likewise. * math/Makefile (headers): Add bits/long-double.h. * misc/sys/cdefs.h: Include <bits/long-double.h>. * stdlib/strtold.c: Include <bits/long-double.h> instead of <bits/wordsize.h>. * bits/mathdef.h [!_COMPLEX_H]: Do not allow inclusion. [!__NO_LONG_DOUBLE_MATH]: Remove conditional code. * math/math.h: Do not include <bits/mathdef.h>. * sysdeps/aarch64/bits/mathdef.h: Remove file. * sysdeps/alpha/bits/mathdef.h [!_COMPLEX_H]: Do not allow inclusion. * sysdeps/ia64/bits/mathdef.h: Remove file. * sysdeps/m68k/m680x0/bits/mathdef.h: Likewise. * sysdeps/mips/bits/mathdef.h: Likewise. * sysdeps/powerpc/bits/mathdef.h: Likewise. * sysdeps/s390/bits/mathdef.h: Likewise. * sysdeps/sparc/bits/mathdef.h: Likewise. * sysdeps/x86/bits/mathdef.h: Likewise. * sysdeps/s390/s390-32/bits/wordsize.h [!__NO_LONG_DOUBLE_MATH && !__LONG_DOUBLE_MATH_OPTIONAL]: Remove conditional code. * sysdeps/s390/s390-64/bits/wordsize.h [!__NO_LONG_DOUBLE_MATH && !__LONG_DOUBLE_MATH_OPTIONAL]: Likewise. * sysdeps/unix/sysv/linux/alpha/bits/wordsize.h [!__NO_LONG_DOUBLE_MATH && !__LONG_DOUBLE_MATH_OPTIONAL]: Likewise. * sysdeps/unix/sysv/linux/powerpc/bits/wordsize.h [!__NO_LONG_DOUBLE_MATH && !__LONG_DOUBLE_MATH_OPTIONAL]: Likewise. * sysdeps/unix/sysv/linux/sparc/bits/wordsize.h [!__NO_LONG_DOUBLE_MATH && !__LONG_DOUBLE_MATH_OPTIONAL]: Likewise.
2016-12-05Use C11-like atomics instead of plain memory accesses in x86 lock elision.Torvald Riegel
This uses atomic operations to access lock elision metadata that is accessed concurrently (ie, adapt_count fields). The size of the data is less than a word but accessed only with atomic loads and stores; therefore, we add support for shorter-size atomic load and stores too. * include/atomic.h (__atomic_check_size_ls): New. (atomic_load_relaxed, atomic_load_acquire, atomic_store_relaxed, atomic_store_release): Use it. * sysdeps/x86/elide.h (ACCESS_ONCE): Remove. (elision_adapt, ELIDE_LOCK): Use atomics. * sysdeps/unix/sysv/linux/x86/elision-lock.c (__lll_lock_elision): Use atomics and improve code comments. * sysdeps/unix/sysv/linux/x86/elision-trylock.c (__lll_trylock_elision): Likewise.
2016-12-01Refactor FP_ILOGB* out of bits/mathdef.h.Joseph Myers
Continuing the refactoring of bits/mathdef.h, this patch stops it defining FP_ILOGB0 and FP_ILOGBNAN, moving the required information to a new header bits/fp-logb.h. There are only two possible values of each of those macros permitted by ISO C. TS 18661-1 adds corresponding macros for llogb, and their values are required to correspond to those of the ilogb macros in the obvious way. Thus two boolean values - for which the same choices are correct for most architectures - suffice to determine the value of all these macros, and by defining macros for those boolean values in bits/fp-logb.h we can then define the public FP_* macros in math.h and avoid the present duplication of the associated feature test macro logic. This patch duly moves to bits/fp-logb.h defining __FP_LOGB0_IS_MIN and __FP_LOGBNAN_IS_MIN. Default definitions of those to 0 are correct for both architectures, while ia64, m68k and x86 get their own versions of bits/fp-logb.h to reflect their use of values different from the defaults. The patch renders many copies of bits/mathdef.h trivial (needed only to avoid the default __NO_LONG_DOUBLE_MATH). I'll revise <https://sourceware.org/ml/libc-alpha/2016-11/msg00865.html> accordingly so that it removes all bits/mathdef.h headers except the default one and the alpha one, and arranges for the header to be included only by complex.h as the only remaining use at that point will be for the alpha ABI issues there. Tested for x86_64 and x86. Also did compile-only testing with build-many-glibcs.py (using glibc sources from before the commit that introduced many build failures with undefined __GI___sigsetjmp). * bits/fp-logb.h: New file. * sysdeps/ia64/bits/fp-logb.h: Likewise. * sysdeps/m68k/m680x0/bits/fp-logb.h: Likewise. * sysdeps/x86/bits/fp-logb.h: Likewise. * math/Makefile (headers): Add bits/fp-logb.h. * math/math.h: Include <bits/fp-logb.h>. [__USE_ISOC99] (FP_ILOGB0): Define based on __FP_LOGB0_IS_MIN. [__USE_ISOC99] (FP_ILOGBNAN): Define based on __FP_LOGBNAN_IS_MIN. * bits/mathdef.h (FP_ILOGB0): Remove. (FP_ILOGBNAN): Likewise. * sysdeps/aarch64/bits/mathdef.h (FP_ILOGB0): Likewise. (FP_ILOGBNAN): Likewise. * sysdeps/alpha/bits/mathdef.h (FP_ILOGB0): Likewise. (FP_ILOGBNAN): Likewise. * sysdeps/ia64/bits/mathdef.h (FP_ILOGB0): Likewise. (FP_ILOGBNAN): Likewise. * sysdeps/m68k/m680x0/bits/mathdef.h (FP_ILOGB0): Likewise. (FP_ILOGBNAN): Likewise. * sysdeps/mips/bits/mathdef.h (FP_ILOGB0): Likewise. (FP_ILOGBNAN): Likewise. * sysdeps/powerpc/bits/mathdef.h (FP_ILOGB0): Likewise. (FP_ILOGBNAN): Likewise. * sysdeps/s390/bits/mathdef.h (FP_ILOGB0): Likewise. (FP_ILOGBNAN): Likewise. * sysdeps/sparc/bits/mathdef.h (FP_ILOGB0): Likewise. (FP_ILOGBNAN): Likewise. * sysdeps/x86/bits/mathdef.h (FP_ILOGB0): Likewise. (FP_ILOGBNAN): Likewise.
2016-11-29Refactor FP_FAST_* into bits/fp-fast.h.Joseph Myers
Continuing the refactoring of bits/mathdef.h, this patch moves the FP_FAST_* definitions into a new bits/fp-fast.h header. Currently this is only for FP_FAST_FMA*, but in future it would be the appropriate place for the FP_FAST_* macros from TS 18661-1 as well. The generic bits/mathdef.h header defines these macros based on whether the compiler defines __FP_FAST_*. Most architecture-specific headers, however, fail to do so, meaning that if the architecture (or some particular processors) does in fact have fused operations, and GCC knows to use them inline, the FP_FAST_* macros will still not be defined. By refactoring, this patch causes the generic version (based on __FP_FAST_*) to be used in more cases, and so the macro definitions to be more accurate. Architectures that already defined some or all of these macros other than based on the predefines have their own versions of fp-fast.h, which are arranged so they define FP_FAST_* if either the architecture-specific conditions are true or __FP_FAST_* are defined. After this refactoring, various bits/mathdef.h headers for architectures with long double = double are semantically identical to the generic version. The patch removes those headers that are redundant. (In fact two of the four removed were already redundant before this patch because they did use __FP_FAST_*.) Tested for x86_64 and x86, and compilation-only with build-many-glibcs.py. * bits/fp-fast.h: New file. * sysdeps/aarch64/bits/fp-fast.h: Likewise. * sysdeps/powerpc/bits/fp-fast.h: Likewise. * math/Makefile (headers): Add bits/fp-fast.h. * math/math.h: Include <bits/fp-fast.h>. * bits/mathdef.h (FP_FAST_FMA): Remove. (FP_FAST_FMAF): Likewise. (FP_FAST_FMAL): Likewise. * sysdeps/aarch64/bits/mathdef.h (FP_FAST_FMA): Likewise. (FP_FAST_FMAF): Likewise. * sysdeps/powerpc/bits/mathdef.h (FP_FAST_FMA): Likewise. (FP_FAST_FMAF): Likewise. * sysdeps/x86/bits/mathdef.h (FP_FAST_FMA): Likewise. (FP_FAST_FMAF): Likewise. (FP_FAST_FMAL): Likewise. * sysdeps/arm/bits/mathdef.h: Remove file. * sysdeps/hppa/fpu/bits/mathdef.h: Likewise. * sysdeps/sh/sh4/bits/mathdef.h: Likewise. * sysdeps/tile/bits/mathdef.h: Likewise.
2016-11-24Refactor float_t, double_t information into bits/flt-eval-method.h.Joseph Myers
At present, definitions of float_t and double_t are split among many bits/mathdef.h headers. For all but three architectures, these types are float and double. Furthermore, if you assume __FLT_EVAL_METHOD__ to be defined, that provides a more generic way of determining the correct values of these typedefs. Defining these typedefs more generally based on __FLT_EVAL_METHOD__ was previously proposed by Paul Eggert in <https://sourceware.org/ml/libc-alpha/2012-02/msg00002.html>. This patch refactors things in the way I proposed in <https://sourceware.org/ml/libc-alpha/2016-11/msg00745.html>. A new header bits/flt-eval-method.h defines a single macro, __GLIBC_FLT_EVAL_METHOD, which is then used by math.h to define float_t and double_t. The default is based on __FLT_EVAL_METHOD__ (although actually a default to 0 would have the same effect for current ports, because ports where values other than 0 or 16 are possible all have their own headers). To avoid changing the existing semantics in any case, including for compilers not defining __FLT_EVAL_METHOD__, architecture-specific files are then added for m68k, s390, x86 which replicate the existing semantics. At least with __FLT_EVAL_METHOD__ values possible with GCC, there should be no change to the choices of float_t and double_t for any supported configuration. Architecture maintainer notes: * m68k: sysdeps/m68k/m680x0/bits/flt-eval-method.h always defines __GLIBC_FLT_EVAL_METHOD to 2 to replicate the existing logic. But actually GCC defines __FLT_EVAL_METHOD__ to 0 if TARGET_68040. It might make sense to make the header prefer to base things on __FLT_EVAL_METHOD__ if defined, like the x86 version, and so make the choices of these types more accurate (with a NEWS entry as for the other changes to these types on particular architectures). * s390: sysdeps/s390/bits/flt-eval-method.h always defines __GLIBC_FLT_EVAL_METHOD to 1 to replicate the existing logic. As previously discussed, it might make sense in coordination with GCC to eliminate the historic mistake, avoid excess precision in the -fexcess-precision=standard case and make the typedefs match (with a NEWS entry, again). Tested for x86-64 and x86. Also did compilation-only testing with build-many-glibcs.py. * bits/flt-eval-method.h: New file. * sysdeps/m68k/m680x0/bits/flt-eval-method.h: Likewise. * sysdeps/s390/bits/flt-eval-method.h: Likewise. * sysdeps/x86/bits/flt-eval-method.h: Likewise. * math/Makefile (headers): Add bits/flt-eval-method.h. * math/math.h: Include <bits/flt-eval-method.h>. [__USE_ISOC99] (float_t): Define based on __GLIBC_FLT_EVAL_METHOD. [__USE_ISOC99] (double_t): Likewise. * bits/mathdef.h (float_t): Remove. (double_t): Likewise. * sysdeps/aarch64/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/alpha/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/arm/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/hppa/fpu/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/ia64/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/m68k/m680x0/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/mips/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/powerpc/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/s390/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/sh/sh4/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/sparc/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/tile/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/x86/bits/mathdef.h (float_t): Likewise. (double_t): Likewise.
2016-11-23Fix x86_64 -mfpmath=387 float_t, double_t (bug 20787).Joseph Myers
Bug 20787 reports that, while float_t and double_t for 32-bit x86 properly respect -mfpmath=sse, for x86_64 they fail to reflect -mfpmath=387, which is valid if unusual and results in FLT_EVAL_METHOD being 2. This patch fixes the definitions to respect __FLT_EVAL_METHOD__ in that case, arranging for the test that the types correspond with FLT_EVAL_METHOD to be run with both -mfpmath=387 and -mfpmath=sse. Note: this patch will also have the effect of making float_t and double_t be long double for x86_64 with -mfpmath=sse+387, when FLT_EVAL_METHOD is -1. It seems reasonable for x86_64 to be consistent with 32-bit x86 in this case (and that definition is conservatively safe, in that it makes the types correspond to the widest evaluation format that might be used). Tested for x86-64 and x86. [BZ #20787] * sysdeps/x86/bits/mathdef.h (float_t): Do not define to float if [__x86_64__] when __FLT_EVAL_METHOD__ is nonzero. (double_t): Do not define to double if [__x86_64__] when __FLT_EVAL_METHOD__ is nonzero. * sysdeps/x86/fpu/test-flt-eval-method-387.c: New file. * sysdeps/x86/fpu/test-flt-eval-method-sse.c: Likewise. * sysdeps/x86/fpu/Makefile [$(subdir) = math] (tests): Add test-flt-eval-method-387 and test-flt-eval-method-sse. [$(subdir) = math] (CFLAGS-test-flt-eval-method-387.c): New variable. [$(subdir) = math] (CFLAGS-test-flt-eval-method-sse.c): Likewise.
2016-11-07nptl: Document the reason why __kind in pthread_mutex_t is part of the ABIFlorian Weimer
2016-11-04Define wordsize.h macros everywhereSteve Ellcey
* bits/wordsize.h: Add documentation. * sysdeps/aarch64/bits/wordsize.h : New file * sysdeps/generic/stdint.h (PTRDIFF_MIN, PTRDIFF_MAX): Update definitions. (SIZE_MAX): Change ifdef to if in __WORDSIZE32_SIZE_ULONG check. * sysdeps/gnu/bits/utmp.h (__WORDSIZE_TIME64_COMPAT32): Check with #if instead of #ifdef. * sysdeps/gnu/bits/utmpx.h (__WORDSIZE_TIME64_COMPAT32): Ditto. * sysdeps/mips/bits/wordsize.h (__WORDSIZE32_SIZE_ULONG, __WORDSIZE32_PTRDIFF_LONG, __WORDSIZE_TIME64_COMPAT32): Add or change defines. * sysdeps/powerpc/powerpc32/bits/wordsize.h: Likewise. * sysdeps/powerpc/powerpc64/bits/wordsize.h: Likewise. * sysdeps/s390/s390-32/bits/wordsize.h: Likewise. * sysdeps/s390/s390-64/bits/wordsize.h: Likewise. * sysdeps/sparc/sparc32/bits/wordsize.h: Likewise. * sysdeps/sparc/sparc64/bits/wordsize.h: Likewise. * sysdeps/tile/tilegx/bits/wordsize.h: Likewise. * sysdeps/tile/tilepro/bits/wordsize.h: Likewise. * sysdeps/unix/sysv/linux/alpha/bits/wordsize.h: Likewise. * sysdeps/unix/sysv/linux/powerpc/bits/wordsize.h: Likewise. * sysdeps/unix/sysv/linux/sparc/bits/wordsize.h: Likewise. * sysdeps/wordsize-32/bits/wordsize.h: Likewise. * sysdeps/wordsize-64/bits/wordsize.h: Likewise. * sysdeps/x86/bits/wordsize.h: Likewise.
2016-10-17Bug 20689: Fix FMA and AVX2 detection on IntelCarlos O'Donell
In the Intel Architecture Instruction Set Extensions Programming reference the recommended way to test for FMA in section '2.2.1 Detection of FMA' is: "Application Software must identify that hardware supports AVX as explained in ... after that it must also detect support for FMA..." We don't do that in glibc. We use osxsave to detect the use of xgetbv, and after that we check for AVX and FMA orthogonally. It is conceivable that you could have the AVX bit clear and the FMA bit in an undefined state. This commit fixes FMA and AVX2 detection to depend on usable AVX as required by the recommended Intel sequences. v1: https://www.sourceware.org/ml/libc-alpha/2016-10/msg00241.html v2: https://www.sourceware.org/ml/libc-alpha/2016-10/msg00265.html
2016-10-12X86: Don't assert on older Intel CPUs [BZ #20647]H.J. Lu
Since the maximum CPUID level of older Intel CPUs is 1, change handle_intel to return -1, instead of assert, when the maximum CPUID level is less than 2. [BZ #20647] * sysdeps/x86/cacheinfo.c (handle_intel): Return -1 if the maximum CPUID level is less than 2.
2016-10-06Add iseqsig.Joseph Myers
TS 18661-1 adds an iseqsig type-generic comparison macro to <math.h>. This macro is like the == operator except that unordered operands result in the "invalid" exception and errno being set to EDOM. This patch implements this macro for glibc. Given the need to set errno, this is implemented with out-of-line functions __iseqsigf, __iseqsig and __iseqsigl (of which the last only exists at all if long double is ABI-distinct from double, so no function aliases or compat support are needed). The present patch ignores excess precision issues; I intend to deal with those in a followup patch. (Like comparison operators, type-generic comparison macros should *not* convert operands to their semantic types but should preserve excess range and precision, meaning that for some argument types and values of FLT_EVAL_METHOD, an underlying function should be called for a wider type than that of the arguments.) The underlying functions are implemented with the type-generic template machinery. Comparing x <= y && x >= y is sufficient in ISO C to achieve an equality comparison with "invalid" raised for unordered operands (and the results of those two comparisons can also be used to tell whether errno needs to be set). However, some architectures have GCC bugs meaning that unordered comparison instructions are used instead of ordered ones. Thus, a mechanism is provided for architectures to use an explicit call to feraiseexcept to raise exceptions if required. If your architecture has such a bug you should add a fix-fp-int-compare-invalid.h header for it, with a comment pointing to the relevant GCC bug report; if such a GCC bug is fixed, that header's contents should have a __GNUC_PREREQ conditional added so that the workaround can eventually be removed for that architecture. Tested for x86_64, x86, mips64, arm and powerpc. * math/math.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (iseqsig): New macro. * math/bits/mathcalls.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (__iseqsig): New declaration. * math/s_iseqsig_template.c: New file. * math/Versions (__iseqsigf): New libm symbol at version GLIBC_2.25. (__iseqsig): Likewise. (__iseqsigl): Likewise. * math/libm-test.inc (iseqsig_test_data): New array. (iseqsig_test): New function. (main): Call iseqsig_test. * math/Makefile (gen-libm-calls): Add s_iseqsigF. * manual/arith.texi (FP Comparison Functions): Document iseqsig. * manual/libm-err-tab.pl: Update comment on interfaces without ulps tabulated. * sysdeps/generic/fix-fp-int-compare-invalid.h: New file. * sysdeps/powerpc/fpu/fix-fp-int-compare-invalid.h: Likewise. * sysdeps/x86/fpu/fix-fp-int-compare-invalid.h: Likewise. * sysdeps/nacl/libm.abilist: Update. * sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/arm/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/hppa/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/microblaze/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/nios2/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/sh/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/tile/tilepro/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
2016-09-23Installed header hygiene (BZ#20366): Test of installed headers.Zack Weinberg
This adds a test to ensure that the problems fixed in the last several patches do not recur. Each directory checks the headers that it installs for two properties: first, each header must be compilable in isolation, as both C and C++, under a representative combination of language and library conformance levels; second, there is a blacklist of identifiers that may not appear in any installed header, currently consisting of the legacy BSD typedefs. (There is an exemption for the headers that define those typedefs, and for the RPC headers. It may be necessary to make this more sophisticated if we add more stuff to the blacklist in the future.) In order for this test to work correctly, every wrapper header that actually defines something must guard those definitions with #ifndef _ISOMAC. This is the existing mechanism used by the conform/ tests to tell wrapper headers not to define anything that the public header wouldn't, and not to use anything from libc-symbols.h. conform/ only cares for headers that we need to check for standards conformance, whereas this test applies to *every* header. (Headers in include/ that are either installed directly, or are internal-use-only and do *not* correspond to any installed header, are not affected.) * scripts/check-installed-headers.sh: New script. * Rules: In each directory that defines header files to be installed, run check-installed-headers.sh on them as a special test. * Makefile: Likewise for the headers installed at top level. * include/aliases.h, include/alloca.h, include/argz.h * include/arpa/nameser.h, include/arpa/nameser_compat.h * include/elf.h, include/envz.h, include/err.h * include/execinfo.h, include/fpu_control.h, include/getopt.h * include/gshadow.h, include/ifaddrs.h, include/libintl.h * include/link.h, include/malloc.h, include/mcheck.h * include/mntent.h, include/netinet/ether.h * include/nss.h, include/obstack.h, include/printf.h * include/pty.h, include/resolv.h, include/rpc/auth.h * include/rpc/auth_des.h, include/rpc/auth_unix.h * include/rpc/clnt.h, include/rpc/des_crypt.h * include/rpc/key_prot.h, include/rpc/netdb.h * include/rpc/pmap_clnt.h, include/rpc/pmap_prot.h * include/rpc/pmap_rmt.h, include/rpc/rpc.h * include/rpc/rpc_msg.h, include/rpc/svc.h * include/rpc/svc_auth.h, include/rpc/xdr.h * include/rpcsvc/nis_callback.h, include/rpcsvc/nislib.h * include/rpcsvc/yp.h, include/rpcsvc/ypclnt.h * include/rpcsvc/ypupd.h, include/shadow.h * include/stdio_ext.h, include/sys/epoll.h * include/sys/file.h, include/sys/gmon.h, include/sys/ioctl.h * include/sys/prctl.h, include/sys/profil.h * include/sys/statfs.h, include/sys/sysctl.h * include/sys/sysinfo.h, include/ttyent.h, include/utmp.h * sysdeps/arm/nacl/include/bits/setjmp.h * sysdeps/mips/include/sys/asm.h * sysdeps/unix/sysv/linux/include/sys/sysinfo.h * sysdeps/unix/sysv/linux/include/sys/timex.h * sysdeps/x86/fpu/include/bits/fenv.h: Add #ifndef _ISOMAC guard around internal declarations. Add multiple-inclusion guard if not already present.
2016-09-07Add femode_t functions.Joseph Myers
TS 18661-1 defines a type femode_t to represent the set of dynamic floating-point control modes (such as the rounding mode and trap enablement modes), and functions fegetmode and fesetmode to manipulate those modes (without affecting other state such as the raised exception flags) and a corresponding macro FE_DFL_MODE. This patch series implements those interfaces for glibc. This first patch adds the architecture-independent pieces, the x86 and x86_64 implementations, and the <bits/fenv.h> and ABI baseline updates for all architectures so glibc keeps building and passing the ABI tests on all architectures. Subsequent patches add the fegetmode and fesetmode implementations for other architectures. femode_t is generally an integer type - the same type as fenv_t, or as the single element of fenv_t where fenv_t is a structure containing a single integer (or the single relevant element, where it has elements for both status and control registers) - except where architecture properties or consistency with the fenv_t implementation indicate otherwise. FE_DFL_MODE follows FE_DFL_ENV in whether it's a magic pointer value (-1 cast to const femode_t *), a value that can be distinguished from valid pointers by its high bits but otherwise contains a representation of the desired register contents, or a pointer to a constant variable (the powerpc case; __fe_dfl_mode is added as an exported constant object, an alias to __fe_dfl_env). Note that where architectures (that share a register between control and status bits) gain definitions of new floating-point control or status bits in future, the implementations of fesetmode for those architectures may need updating (depending on whether the new bits are control or status bits and what the implementation does with previously unknown bits), just like existing implementations of <fenv.h> functions that take care not to touch reserved bits may need updating when the set of reserved bits changes. (As any new bits are outside the scope of ISO C, that's just a quality-of-implementation issue for supporting them, not a conformance issue.) As with fenv_t, femode_t should properly include any software DFP rounding mode (and for both fenv_t and femode_t I'd consider that fragment of DFP support appropriate for inclusion in glibc even in the absence of the rest of libdfp; hardware DFP rounding modes should already be included if the definitions of which bits are status / control bits are correct). Tested for x86_64, x86, mips64 (hard float, and soft float to test the fallback version), arm (hard float) and powerpc (hard float, soft float and e500). Other architecture versions are untested. * math/fegetmode.c: New file. * math/fesetmode.c: Likewise. * sysdeps/i386/fpu/fegetmode.c: Likewise. * sysdeps/i386/fpu/fesetmode.c: Likewise. * sysdeps/x86_64/fpu/fegetmode.c: Likewise. * sysdeps/x86_64/fpu/fesetmode.c: Likewise. * math/fenv.h: Update comment on inclusion of <bits/fenv.h>. [__GLIBC_USE (IEC_60559_BFP_EXT)] (fegetmode): New function declaration. [__GLIBC_USE (IEC_60559_BFP_EXT)] (fesetmode): Likewise. * bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/aarch64/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/alpha/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/arm/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/hppa/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/ia64/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/m68k/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/microblaze/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/mips/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/nios2/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/powerpc/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (__fe_dfl_mode): New variable declaration. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/s390/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/sh/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/sparc/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/tile/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/x86/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * manual/arith.texi (FE_DFL_MODE): Document macro. (fegetmode): Document function. (fesetmode): Likewise. * math/Versions (fegetmode): New libm symbol at version GLIBC_2.25. (fesetmode): Likewise. * math/Makefile (libm-support): Add fegetmode and fesetmode. (tests): Add test-femode and test-femode-traps. * math/test-femode-traps.c: New file. * math/test-femode.c: Likewise. * sysdeps/powerpc/fpu/fenv_const.c (__fe_dfl_mode): Declare as alias for __fe_dfl_env. * sysdeps/powerpc/nofpu/fenv_const.c (__fe_dfl_mode): Likewise. * sysdeps/powerpc/powerpc32/e500/nofpu/fenv_const.c (__fe_dfl_mode): Likewise. * sysdeps/powerpc/Versions (__fe_dfl_mode): New libm symbol at version GLIBC_2.25. * sysdeps/nacl/libm.abilist: Update. * sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/arm/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/hppa/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/microblaze/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/nios2/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/sh/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/tile/tilepro/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
2016-09-06X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ #20508]H.J. Lu
There is transition penalty when SSE instructions are mixed with 256-bit AVX or 512-bit AVX512 load instructions. Since _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM registers, there is transition penalty when SSE instructions are used with lazy binding on AVX and AVX512 processors. To avoid SSE transition penalty, if only the lower 128 bits of the first 8 vector registers are non-zero, we can preserve %xmm0 - %xmm7 registers with the zero upper bits. For AVX and AVX512 processors which support XGETBV with ECX == 1, we can use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers or the upper 256 bits of ZMM registers are zero. We can restore only the non-zero portion of vector registers with AVX/AVX512 load instructions which will zero-extend upper bits of vector registers. This patch adds _dl_runtime_resolve_sse_vex which saves and restores XMM registers with 128-bit AVX store/load instructions. It is used to preserve YMM/ZMM registers when only the lower 128 bits are non-zero. _dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so that we store and load only the non-zero portion of vector registers. This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 when only the lower 128 bits of vector registers are used. _dl_runtime_resolve_avx_slow is added and used for AVX processors which don't support XGETBV with ECX == 1. Since there is no SSE transition penalty on AVX512 processors which don't support XGETBV with ECX == 1, _dl_runtime_resolve_avx512_slow isn't provided. [BZ #20495] [BZ #20508] * sysdeps/x86/cpu-features.c (init_cpu_features): For Intel processors, set Use_dl_runtime_resolve_slow and set Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1. * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt): New. (bit_arch_Use_dl_runtime_resolve_slow): Likewise. (index_arch_Use_dl_runtime_resolve_opt): Likewise. (index_arch_Use_dl_runtime_resolve_slow): Likewise. * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use _dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt if Use_dl_runtime_resolve_opt is set. Use _dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set. * sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>. (_dl_runtime_resolve_opt): New. Defined for AVX and AVX512. (_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow): New. (_dl_runtime_resolve_opt): Likewise. (_dl_runtime_profile): Define only if _dl_runtime_profile is defined.
2016-08-19X86: Change bit_YMM_state to (1 << 2)H.J. Lu
All other state bits, except for bit_YMM_state, are defined as (1 << N). This patch changes bit_YMM_state from (2 << 1) to (1 << 2). * sysdeps/x86/cpu-features.h (bit_YMM_state): Set to (1 << 2).
2016-07-01Fixed wrong vector sincos/sincosf ABI to have it compatible withAndrew Senkevich
current vector function declaration "#pragma omp declare simd notinbranch", according to which vector sincos should have vector of pointers for second and third parameters. It is fixed with implementation as wrapper to version having second and third parameters as pointers. [BZ #20024] * sysdeps/x86/fpu/test-math-vector-sincos.h: New. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core_sse4.S: Fixed ABI of this implementation of vector function. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/svml_d_sincos2_core.S: Likewise. * sysdeps/x86_64/fpu/svml_d_sincos4_core.S: Likewise. * sysdeps/x86_64/fpu/svml_d_sincos4_core_avx.S: Likewise. * sysdeps/x86_64/fpu/svml_d_sincos8_core.S: Likewise. * sysdeps/x86_64/fpu/svml_s_sincosf16_core.S: Likewise. * sysdeps/x86_64/fpu/svml_s_sincosf4_core.S: Likewise. * sysdeps/x86_64/fpu/svml_s_sincosf8_core.S: Likewise. * sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S: Likewise. * sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Use another wrapper for testing vector sincos with fixed ABI. * sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx.c: New test. * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx512.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-sincos.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx512.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-sincosf.c: Likewise. * sysdeps/x86_64/fpu/Makefile: Added new tests.
2016-06-30Check Prefer_ERMS in memmove/memcpy/mempcpy/memsetH.J. Lu
Although the Enhanced REP MOVSB/STOSB (ERMS) implementations of memmove, memcpy, mempcpy and memset aren't used by the current processors, this patch adds Prefer_ERMS check in memmove, memcpy, mempcpy and memset so that they can be used in the future. * sysdeps/x86/cpu-features.h (bit_arch_Prefer_ERMS): New. (index_arch_Prefer_ERMS): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Return __memcpy_erms for Prefer_ERMS. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S (__memmove_erms): Enabled for libc.a. * ysdeps/x86_64/multiarch/memmove.S (__libc_memmove): Return __memmove_erms or Prefer_ERMS. * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Return __mempcpy_erms for Prefer_ERMS. * sysdeps/x86_64/multiarch/memset.S (memset): Return __memset_erms for Prefer_ERMS.
2016-06-29Avoid array-bounds warning for strncat on i586 (bug 20260)Andreas Schwab
2016-06-07Check FMA after COMMON_CPUID_INDEX_80000001H.J. Lu
Since the FMA4 bit is in COMMON_CPUID_INDEX_80000001 and FMA4 requires AVX, determine if FMA4 is usable after COMMON_CPUID_INDEX_80000001 is available and if AVX is usable. [BZ #20195] * sysdeps/x86/cpu-features.c (get_common_indeces): Move FMA4 check to ... (init_cpu_features): Here.
2016-05-27Count number of logical processors sharing L2 cacheH.J. Lu
For Intel processors, when there are both L2 and L3 caches, SMT level type should be ued to count number of available logical processors sharing L2 cache. If there is only L2 cache, core level type should be used to count number of available logical processors sharing L2 cache. Number of available logical processors sharing L2 cache should be used for non-inclusive L2 and L3 caches. * sysdeps/x86/cacheinfo.c (init_cacheinfo): Count number of available logical processors with SMT level type sharing L2 cache for Intel processors.
2016-05-20Remove special L2 cache case for Knights LandingH.J. Lu
L2 cache is shared by 2 cores on Knights Landing, which has 4 threads per core: https://en.wikipedia.org/wiki/Xeon_Phi#Knights_Landing So L2 cache is shared by 8 threads on Knights Landing as reported by CPUID. We should remove special L2 cache case for Knights Landing. [BZ #18185] * sysdeps/x86/cacheinfo.c (init_cacheinfo): Don't limit threads sharing L2 cache to 2 for Knights Landing.
2016-05-19Correct Intel processor level type mask from CPUIDH.J. Lu
Intel CPUID with EAX == 11 returns: ECX Bits 07 - 00: Level number. Same value in ECX input. Bits 15 - 08: Level type. ^^^^^^^^^^^^^^^^^^^^^^^^ This is level type. Bits 31 - 16: Reserved. Intel processor level type mask should be 0xff00, not 0xff0. [BZ #20119] * sysdeps/x86/cacheinfo.c (init_cacheinfo): Correct Intel processor level type mask for CPUID with EAX == 11.
2016-05-19Check the HTT bit before counting logical threadsH.J. Lu
Skip counting logical threads for Intel processors if the HTT bit is 0 which indicates there is only a single logical processor. * sysdeps/x86/cacheinfo.c (init_cacheinfo): Skip counting logical threads if the HTT bit is 0. * sysdeps/x86/cpu-features.h (bit_cpu_HTT): New. (index_cpu_HTT): Likewise. (reg_HTT): Likewise.
2016-05-13Support non-inclusive caches on Intel processorsH.J. Lu
* sysdeps/x86/cacheinfo.c (init_cacheinfo): Check and support non-inclusive caches on Intel processors.
2016-05-11Remove x86 ifunc-defines.sym and rtld-global-offsets.symH.J. Lu
Merge x86 ifunc-defines.sym with x86 cpu-features-offsets.sym. Remove x86 ifunc-defines.sym and rtld-global-offsets.sym. No code changes on i686 and x86-64. * sysdeps/i386/i686/multiarch/Makefile (gen-as-const-headers): Remove ifunc-defines.sym. * sysdeps/x86_64/multiarch/Makefile (gen-as-const-headers): Likewise. * sysdeps/i386/i686/multiarch/ifunc-defines.sym: Removed. * sysdeps/x86/rtld-global-offsets.sym: Likewise. * sysdeps/x86_64/multiarch/ifunc-defines.sym: Likewise. * sysdeps/x86/Makefile (gen-as-const-headers): Remove rtld-global-offsets.sym. * sysdeps/x86_64/multiarch/ifunc-defines.sym: Merged with ... * sysdeps/x86/cpu-features-offsets.sym: This. * sysdeps/x86/cpu-features.h: Include <cpu-features-offsets.h> instead of <ifunc-defines.h> and <rtld-global-offsets.h>.
2016-05-08Move sysdeps/x86_64/cacheinfo.c to sysdeps/x86H.J. Lu
Move sysdeps/x86_64/cacheinfo.c to sysdeps/x86. No code changes on x86 and x86_64. * sysdeps/i386/cacheinfo.c: Include <sysdeps/x86/cacheinfo.c> instead of <sysdeps/x86_64/cacheinfo.c>. * sysdeps/x86_64/cacheinfo.c: Moved to ... * sysdeps/x86/cacheinfo.c: Here.
2016-04-15Detect Intel Goldmont and Airmont processorsH.J. Lu
Updated from the model numbers of Goldmont and Airmont processors in Intel64 And IA-32 Processor Architectures Software Developer's Manual Volume 3 Revision 058. * sysdeps/x86/cpu-features.c (init_cpu_features): Detect Intel Goldmont and Airmont processors.
2016-04-01Remove Fast_Copy_Backward from Intel Core processorsH.J. Lu
Intel Core i3, i5 and i7 processors have fast unaligned copy and copy backward is ignored. Remove Fast_Copy_Backward from Intel Core processors to avoid confusion. * sysdeps/x86/cpu-features.c (init_cpu_features): Don't set bit_arch_Fast_Copy_Backward for Intel Core proessors.
2016-03-28Initial Enhanced REP MOVSB/STOSB (ERMS) supportH.J. Lu
The newer Intel processors support Enhanced REP MOVSB/STOSB (ERMS) which has a feature bit in CPUID. This patch adds the Enhanced REP MOVSB/STOSB (ERMS) bit to x86 cpu-features. * sysdeps/x86/cpu-features.h (bit_cpu_ERMS): New. (index_cpu_ERMS): Likewise. (reg_ERMS): Likewise.
2016-03-28[x86] Add a feature bit: Fast_Unaligned_CopyH.J. Lu
On AMD processors, memcpy optimized with unaligned SSE load is slower than emcpy optimized with aligned SSSE3 while other string functions are faster with unaligned SSE load. A feature bit, Fast_Unaligned_Copy, is added to select memcpy optimized with unaligned SSE load. [BZ #19583] * sysdeps/x86/cpu-features.c (init_cpu_features): Set Fast_Unaligned_Copy with Fast_Unaligned_Load for Intel processors. Set Fast_Copy_Backward for AMD Excavator processors. * sysdeps/x86/cpu-features.h (bit_arch_Fast_Unaligned_Copy): New. (index_arch_Fast_Unaligned_Copy): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check Fast_Unaligned_Copy instead of Fast_Unaligned_Load.
2016-03-22Set index_arch_AVX_Fast_Unaligned_Load only for Intel processorsH.J. Lu
Since only Intel processors with AVX2 have fast unaligned load, we should set index_arch_AVX_Fast_Unaligned_Load only for Intel processors. Move AVX, AVX2, AVX512, FMA and FMA4 detection into get_common_indeces and call get_common_indeces for other processors. Add CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P to aoid loading GLRO(dl_x86_cpu_features) in cpu-features.c. [BZ #19583] * sysdeps/x86/cpu-features.c (get_common_indeces): Remove inline. Check family before setting family, model and extended_model. Set AVX, AVX2, AVX512, FMA and FMA4 usable bits here. (init_cpu_features): Replace HAS_CPU_FEATURE and HAS_ARCH_FEATURE with CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P. Set index_arch_AVX_Fast_Unaligned_Load for Intel processors with usable AVX2. Call get_common_indeces for other processors with family == NULL. * sysdeps/x86/cpu-features.h (CPU_FEATURES_CPU_P): New macro. (CPU_FEATURES_ARCH_P): Likewise. (HAS_CPU_FEATURE): Use CPU_FEATURES_CPU_P. (HAS_ARCH_FEATURE): Use CPU_FEATURES_ARCH_P.
2016-03-10Add _arch_/_cpu_ to index_*/bit_* in x86 cpu-features.hH.J. Lu
index_* and bit_* macros are used to access cpuid and feature arrays o struct cpu_features. It is very easy to use bits and indices of cpuid array on feature array, especially in assembly codes. For example, sysdeps/i386/i686/multiarch/bcopy.S has HAS_CPU_FEATURE (Fast_Rep_String) which should be HAS_ARCH_FEATURE (Fast_Rep_String) We change index_* and bit_* to index_cpu_*/index_arch_* and bit_cpu_*/bit_arch_* so that we can catch such error at build time. [BZ #19762] * sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h (EXTRA_LD_ENVVARS): Add _arch_ to index_*/bit_*. * sysdeps/x86/cpu-features.c (init_cpu_features): Likewise. * sysdeps/x86/cpu-features.h (bit_*): Renamed to ... (bit_arch_*): This for feature array. (bit_*): Renamed to ... (bit_cpu_*): This for cpu array. (index_*): Renamed to ... (index_arch_*): This for feature array. (index_*): Renamed to ... (index_cpu_*): This for cpu array. [__ASSEMBLER__] (HAS_FEATURE): Add and use field. [__ASSEMBLER__] (HAS_CPU_FEATURE)): Pass cpu to HAS_FEATURE. [__ASSEMBLER__] (HAS_ARCH_FEATURE)): Pass arch to HAS_FEATURE. [!__ASSEMBLER__] (HAS_CPU_FEATURE): Replace index_##name and bit_##name with index_cpu_##name and bit_cpu_##name. [!__ASSEMBLER__] (HAS_ARCH_FEATURE): Replace index_##name and bit_##name with index_arch_##name and bit_arch_##name.
2016-03-08Define _HAVE_STRING_ARCH_mempcpy to 1 for x86H.J. Lu
Since x86 has an optimized mempcpy and GCC can inline mempcpy on x86, define _HAVE_STRING_ARCH_mempcpy to 1 for x86. [BZ #19759] * sysdeps/x86/bits/string.h (_HAVE_STRING_ARCH_mempcpy): New.
2016-02-18Add _STRING_INLINE_unaligned and string_private.hH.J. Lu
As discussed in https://sourceware.org/ml/libc-alpha/2015-10/msg00403.html the setting of _STRING_ARCH_unaligned currently controls the external GLIBC ABI as well as selecting the use of unaligned accesses withing GLIBC. Since _STRING_ARCH_unaligned was recently changed for AArch64, this would potentially break the ABI in GLIBC 2.23, so split the uses and add _STRING_INLINE_unaligned to select the string ABI. This setting must be fixed for each target, while _STRING_ARCH_unaligned may be changed from release to release. _STRING_ARCH_unaligned is used unconditionally in glibc. But <bits/string.h>, which defines _STRING_ARCH_unaligned, isn't included with -Os. Since _STRING_ARCH_unaligned is internal to glibc and may change between glibc releases, it should be made private to glibc. _STRING_ARCH_unaligned should defined in the new string_private.h heade file which is included unconditionally from internal <string.h> for glibc build. [BZ #19462] * bits/string.h (_STRING_ARCH_unaligned): Renamed to ... (_STRING_INLINE_unaligned): This. * include/string.h: Include <string_private.h>. * string/bits/string2.h: Replace _STRING_ARCH_unaligned with _STRING_INLINE_unaligned. * sysdeps/aarch64/bits/string.h (_STRING_ARCH_unaligned): Removed. (_STRING_INLINE_unaligned): New. * sysdeps/aarch64/string_private.h: New file. * sysdeps/generic/string_private.h: Likewise. * sysdeps/m68k/m680x0/m68020/string_private.h: Likewise. * sysdeps/s390/string_private.h: Likewise. * sysdeps/x86/string_private.h: Likewise. * sysdeps/m68k/m680x0/m68020/bits/string.h (_STRING_ARCH_unaligned): Renamed to ... (_STRING_INLINE_unaligned): This. * sysdeps/s390/bits/string.h (_STRING_ARCH_unaligned): Renamed to ... (_STRING_INLINE_unaligned): This. * sysdeps/sparc/bits/string.h (_STRING_ARCH_unaligned): Renamed to ... (_STRING_INLINE_unaligned): This. * sysdeps/x86/bits/string.h (_STRING_ARCH_unaligned): Renamed to ... (_STRING_INLINE_unaligned): This.
2016-01-14Set index_Fast_Unaligned_Load for Excavator family CPUsAmit Pawar
GLIBC benchtest testcases shows SSE2_Unaligned based implementations are performing faster compare to SSE2 based implementations for routines: strcmp, strcat, strncat, stpcpy, stpncpy, strcpy, strncpy and strstr. Flag index_Fast_Unaligned_Load is set for Excavator family 0x15h CPU's. This makes SSE2_Unaligned based implementations as default for these routines. [BZ #19467] * sysdeps/x86/cpu-features.c (init_cpu_features): Set index_Fast_Unaligned_Load flag for Excavator family CPUs.
2016-01-04Update copyright dates with scripts/update-copyrights.Joseph Myers
2015-12-19Added memset optimized with AVX512 for KNL hardware.Andrew Senkevich
It shows improvement up to 28% over AVX2 memset (performance results attached at <https://sourceware.org/ml/libc-alpha/2015-12/msg00052.html>). * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: New file. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Added new file. * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Added new tests. * sysdeps/x86_64/multiarch/memset.S: Added new IFUNC branch. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86/cpu-features.h (bit_Prefer_No_VZEROUPPER, index_Prefer_No_VZEROUPPER): New. * sysdeps/x86/cpu-features.c (init_cpu_features): Set the Prefer_No_VZEROUPPER for Knights Landing.
2015-12-15Add Prefer_MAP_32BIT_EXEC to map executable pages with MAP_32BITH.J. Lu
According to Silvermont software optimization guide, for 64-bit applications, branch prediction performance can be negatively impacted when the target of a branch is more than 4GB away from the branch. Add the Prefer_MAP_32BIT_EXEC bit so that mmap will try to map executable pages with MAP_32BIT first. NB: MAP_32BIT will map to lower 2GB, not lower 4GB, address. Prefer_MAP_32BIT_EXEC reduces bits available for address space layout randomization (ASLR), which is always disabled for SUID programs and can only be enabled by setting environment variable, LD_PREFER_MAP_32BIT_EXEC. On Fedora 23, this patch speeds up GCC 5 testsuite by 3% on Silvermont. [BZ #19367] * sysdeps/unix/sysv/linux/wordsize-64/mmap.c: New file. * sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h: Likewise. * sysdeps/unix/sysv/linux/x86_64/64/mmap.c: Likewise. * sysdeps/x86/cpu-features.h (bit_Prefer_MAP_32BIT_EXEC): New. (index_Prefer_MAP_32BIT_EXEC): Likewise.
2015-12-15Enable Silvermont optimizations for Knights LandingH.J. Lu
Knights Landing processor is based on Silvermont. This patch enables Silvermont optimizations for Knights Landing. * sysdeps/x86/cpu-features.c (init_cpu_features): Enable Silvermont optimizations for Knights Landing.
2015-12-07Utilize x86_64 vector math functions w/o -fopenmp.Andrew Senkevich
This patch allows to use x86_64 vector math functions with GCC 6.* without OpenMP SIMD constructs. For additional details please visit <https://sourceware.org/glibc/wiki/libmvec#Example_2>. * sysdeps/x86/fpu/bits/math-vector.h: W/o -fopenmp declare vector math functions with GCC 6.* __attribute__ ((__simd__)).
2015-11-30Update family and model detection for AMD CPUsH.J. Lu
AMD CPUs uses the similar encoding scheme for extended family and model as Intel CPUs as shown in: http://support.amd.com/TechDocs/25481.pdf This patch updates get_common_indeces to get family and model for both Intel and AMD CPUs when family == 0x0f. [BZ #19214] * sysdeps/x86/cpu-features.c (get_common_indeces): Add an argument to return extended model. Update family and model with extended family and model when family == 0x0f. (init_cpu_features): Updated.
2015-11-27Better workaround for aliases of *_finite symbols in vector math library.Andrew Senkevich
Old workaround based on assembly aliases can lead to link fail (bug 19058). This patch makes workaround in another way to avoid it. [BZ #19058] * math/Makefile ($(inst_libdir)/libm.so): Added libmvec_nonshared.a to AS_NEEDED. * sysdeps/x86/fpu/bits/math-vector.h: Removed code with old workaround. * sysdeps/x86_64/fpu/Makefile (libmvec-support, libmvec-static-only-routines): Added new file. * sysdeps/x86_64/fpu/svml_finite_alias.S: New file.
2015-11-14Run tst-prelink test for GLOB_DAT relocH.J. Lu
Run tst-prelink test on targets with GLOB_DAT relocaton. * config.make.in (have-glob-dat-reloc): New. * configure.ac (libc_cv_has_glob_dat): New. Set to yes if target supports GLOB_DAT relocaton. AC_SUBST. * configure: Regenerated. * elf/Makefile (tests): Add tst-prelink. (tests-special): Add $(objpfx)tst-prelink-cmp.out. (tst-prelink-ENV): New. ($(objpfx)tst-prelink-conflict.out): Likewise. ($(objpfx)tst-prelink-cmp.out): Likewise. * sysdeps/x86/tst-prelink.c: Moved to ... * elf/tst-prelink.c: Here. * sysdeps/x86/tst-prelink.exp: Moved to ... * elf/tst-prelink.exp: Here. * sysdeps/x86/Makefile (tests): Don't add tst-prelink. (tst-prelink-ENV): Removed. ($(objpfx)tst-prelink-conflict.out): Likewise. ($(objpfx)tst-prelink-cmp.out): Likewise. (tests-special): Don't add $(objpfx)tst-prelink-cmp.out.
2015-11-10Add a test for prelink outputH.J. Lu
This test applies to i386 and x86_64 which set R_386_GLOB_DAT and R_X86_64_GLOB_DAT to ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA. [BZ #19178] * sysdeps/x86/Makefile (tests): Add tst-prelink. (tst-prelink-ENV): New. ($(objpfx)tst-prelink-conflict.out): Likewise. ($(objpfx)tst-prelink-cmp.out): Likewise. (tests-special): Add $(objpfx)tst-prelink-cmp.out. * sysdeps/x86/tst-prelink.c: New file. * sysdeps/x86/tst-prelink.exp: Likewise.
2015-10-28Handle more state in i386/x86_64 fesetenv (bug 16068).Joseph Myers
fenv_t should include architecture-specific floating-point modes and status flags. i386 and x86_64 fesetenv limit which bits they use from the x87 status and control words, when using saved state, and limit which parts of the state they set to fixed values, when using FE_DFL_ENV / FE_NOMASK_ENV. The following should be included but are excluded in at least some cases: status and masking for the "denormal operand" exception (which isn't part of FE_ALL_EXCEPT); precision control (explicitly mentioned in Annex F as something that counts as part of the floating-point environment); MXCSR FZ and DAZ bits (for FE_DFL_ENV and FE_NOMASK_ENV). This patch arranges for this extra state to be handled by fesetenv (and thereby by feupdateenv, which calls fesetenv). (Note that glibc functions using floating point are not generally expected to work correctly with non-default values of this state, especially precision control, but it is still logically part of the floating-point environment and should be handled as such by fesetenv. Changes to the state relating to subnormals ought generally to work with libm functions when the arguments aren't subnormal and neither are the expected results; that's a consequence of functions avoiding spurious internal underflows.) A question arising from this is whether FE_NOMASK_ENV should or should not mask the "denormal operand" exception. I decided it should mask that exception. This is the status quo - previously that exception could only be unmasked by direct manipulation of control registers (possibly via <fpu_control.h>). In addition, it means that use of FE_NOMASK_ENV leaves a floating-point environment the same as could be obtained by fesetenv (FE_DFL_ENV); feenableexcept (FE_ALL_EXCEPT);, rather than an environment in which an exception is unmasked that could only be masked again by using fesetenv with FE_DFL_ENV (or a previously saved environment) - this exception not being usable with other <fenv.h> functions because it's outside FE_ALL_EXCEPT. Tested for x86_64 and x86. [BZ #16068] * sysdeps/i386/fpu/fesetenv.c: Include <fpu_control.h>. (FE_ALL_EXCEPT_X86): New macro. (__fesetenv): Use FE_ALL_EXCEPT_X86 in most places instead of FE_ALL_EXCEPT. Ensure precision control is included in floating-point state. Ensure that FE_DFL_ENV and FE_NOMASK_ENV handle "denormal operand exception" and clear FZ and DAZ bits. * sysdeps/x86_64/fpu/fesetenv.c: Include <fpu_control.h>. (FE_ALL_EXCEPT_X86): New macro. (__fesetenv): Use FE_ALL_EXCEPT_X86 in most places instead of FE_ALL_EXCEPT. Ensure precision control is included in floating-point state. Ensure that FE_DFL_ENV and FE_NOMASK_ENV handle "denormal operand exception" and clear FZ and DAZ bits. * sysdeps/x86/fpu/test-fenv-sse-2.c: New file. * sysdeps/x86/fpu/test-fenv-x87.c: Likewise. * sysdeps/x86/fpu/Makefile [$(subdir) = math] (tests): Add test-fenv-x87 and test-fenv-sse-2. [$(subdir) = math] (CFLAGS-test-fenv-sse-2.c): New variable.
2015-10-28Fix i386/x86_64 fesetenv SSE exception clearing (bug 19181).Joseph Myers
The i386 and x86_64 versions of fesetenv, when called with FE_DFL_ENV or FE_NOMASK_ENV as argument, do not clear SSE exceptions raised in MXCSR. These arguments should, like other fenv_t values, represent the whole of the floating-point state, so such exceptions should be cleared; this patch adds the required clearing. (Discovered while working on bug 16068.) Tested for x86_64 and x86. [BZ #19181] * sysdeps/i386/fpu/fesetenv.c (__fesetenv): Clear already-raised SSE exceptions when argument is FE_DFL_ENV or FE_NOMASK_ENV. * sysdeps/x86_64/fpu/fesetenv.c (__fesetenv): Likewise. * math/test-fenv-clear-main.c: New file. * math/test-fenv-clear.c: Likewise. * math/Makefile (tests): Add test-fenv-clear. * sysdeps/x86/fpu/test-fenv-clear-sse.c: New file. * sysdeps/x86/fpu/Makefile [$(subdir) = math] (tests): Add test-fenv-clear-sse. [$(subdir) = math] (CFLAGS-test-fenv-clear-sse.c): New variable.