summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-10-17noloopalias: Replace noalias with noloopalias.HEADgcc-7_3_0-amp4gcc-7_3_0-amp-branchChristoph Muellner
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
2018-10-17cfgloopmanip: Allow forced creation of loop preheaders.Christoph Muellner
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
2018-10-17cfgloop: Fix FOR_EACH_LOOP_FN() macro.Christoph Muellner
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
2018-10-17Introduce new command line flag -funroll-more.Christoph Muellner
This flag triggers more aggressive loop unrolling in tree-predcom. Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
2018-10-17Revert "tree-predcom: Do more unrolling if -fnoalias flag is set."Christoph Muellner
This reverts commit 3df45c222875f9edb98342cf1fcdf92edb2850b2. Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
2018-10-17tree-vect-slp: Enable vectorization when -fvectorize-more is given.Dominik Inführ
GCC 6+ detects more SLP instances compared to earlier versions. Cost analysis is done for all instances together and not all combinations. Additional instances make vectorization unprofitable, fix it by ignoring all but first instance when -fvectorize-more is given. Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
2018-10-15Revert "tree-vect-slp: Enable vectorization when -fnoalias is given."Christoph Muellner
This reverts commit a3ba4a0362e6707ca4ac86ee8590faecc10ae5d4. Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
2018-10-15funcse: Improve helper string for -funcse.Christoph Muellner
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
2018-10-04[aarch64] Add CPU support for Ampere Computing's eMAG.Christoph Muellner
This patch adds support for Ampere Computing's eMAG processor. Tested on aarch64 (no regressions seen). Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
2018-06-13Revert "[nomainline] Turn on -frename-registers for -O2."Christoph Muellner
Register renaming did not show improvements on SPEC CPU2006. Therefore we drop the patch. This reverts commit e8214e9512d568fa85cdd5660544649de75c8486.
2018-06-13Revert "tree-ssa-cpp: Initalize loop optimizer."Christoph Muellner
The initialize loop optimizer patch is confirmed to be not needed on GCC 7.3.0. This reverts commit 8ab36d8103543b2572c34c8824c0c81ef26696a2.
2018-05-31aarch64: Fixing code style issues of retpoline changes.Christoph Muellner
This patch contains just whitespace changes. Most of this changes were made to address issues found by check_GNU_style.sh. Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
2018-05-24aarch64: Optimize retpoline (Spectre-V2 mitigation) for aarch64.Christoph Muellner
Merging the adrp/add instruction pair to a single adr instruction. The actual retpoline instruction sequence, which prevents speculative indirect branches looks like this now: str x30, [sp, #-16]! bl 101f 100: //speculation trap wfe b 100b 101: //do ROP adr x30, 102f ret 102: //non-spec code ldr x30, [sp], #16 Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
2018-04-27Revert "2017-10-24 Richard Biener <rguenther@suse.de>"Christoph Muellner
This reverts commit 20496f0e660347d8d2d885dff2702182853602bd. The reason for the revert is, that it shows a significant performance slowdown on some benchmarks. After reverting the commit, the testcase (pr82697.c) still passes. Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
2018-04-27Avoid assembler warnings from AArch64 constructor/destructor priorities.jsm28
Many GCC tests fail for AArch64 with current binutils because of assembler warnings of the form "Warning: ignoring incorrect section type for .init_array.00100". The same issue was fixed for ARM in r247015 by using SECTION_NOTYPE when creating those sections; this patch applies the same fix to AArch64. Tested with no regressions with cross to aarch64-linux-gnu. * config/aarch64/aarch64.c (aarch64_elf_asm_constructor) (aarch64_elf_asm_destructor): Pass SECTION_NOTYPE to get_section when creating .init_array and .fini_array sections with priority specified. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@253252 138bc75d-0d04-0410-961f-82ee72b054a4
2018-04-27tree-cfg: Store gimple_build_nop () instead of NULL.Dominik Inführ
2018-04-27tree-ssa-cpp: Initalize loop optimizer.Dominik Inführ
2018-04-27Added clone functions to duplicated passes.Benedikt Huber
2018-04-27[nomainline] Turn on -frename-registers for -O2.Benedikt Huber
2018-04-27uncse: Added pass to undo common subexpression elimination.Dominik Inführ
Assuming a common subexpression, GCC will always keep it separate instead of fusing it into two instructions (even when this would have the lowest overall cost). Assume the following code example (a multiply that could be syntesized as (b + b << 2) << 2 or 4 * (b + 4b)): signed long mult_20 ( signed int b ) { return (signed long)b*20; } Due to a common subexpression (i.e. the sign-extended argument b) this will result in: mult_20: sxtw x0, w0 add x0, x0, x0, lsl 2 lsl x0, x0, 2 ret At 4 cycles latency, this is in stark contrast with the optimal solution (which takes just 2 cycles and fuses the sign-extension into two dependent instructions): mult_20: sbfiz x1, x0, 4, 32 add x0, x1, x0, sxtw 2 ret To resolve this, a separate un-combine pass has been proposed (originally by Philipp Tomsich with additional improvements by Chris Nelson). The algorithm for this pass is: for RTX expression A, identify all dependent RTX expressions B[0?n] for each combination A -> B[i], check whether the combined RTX expression is a valid instruction with the same cost as B[i], and perform the following RTX changes: replace B[i] with a new RTX expression B'[i], which is the combination A -> B[i] link each B'[i] to depend on the dependencies of A replace B[i] in the dependecy-list of its dependent operations with a dependency on B'[i] (i.e. unlink B[i]) remove B[i] (as this is dead code now) ? or leave it dangling for the DCE-pass to clean up behind us. if all combinations are valid, remove A (as this is dead code now) ? or leave it dangling for the DCE-pass to clean up behind us.
2018-04-27tree-ssa-list-find-pipeline: Add pipelining loads for list finds.Benedikt Huber
This pass can be activated with -flist-find-pipeline. This patch optimizes a common linked list idiom (example from Coremark's 'core_list_find') of the following form: while (list && (list->info->idx != info->idx)) list=list->next; return list; This idiom introduces a number of dependent-loads across the code path. However, the dereference of list and the assignment of list_{i+1} = list_{i}->next (i ... iteration) only depends on the first condition (i.e. “list != NULL”) and can be moved earlier. The [list-find pass] is an experimental pass (to be generalised in a next step) that provides a targeted implementation of a software pipeliner for loops iterating over linked list and hoisting the list = list->next dereference (for the next iteration) above the comparison of the index field. In SSA form, the loop should thus become (all conditions need to be inverted): if (!list_{i}) return NULL; // forward-propagate from the // if-condition list_{i+1} = list_{i}; if (list_{i}->idx == info->idx) return list{i}; which should be unrolled at least once to allow using two distinct registers for list_{i} and list_{i+1} and avoids any additional register moves.
2018-04-27tree-predcom: Do more unrolling if -fnoalias flag is set.Dominik Inführ
2018-04-27tree-vect-slp: Adapt calculation of scalar cost.Dominik Inführ
needs own life-array for every subtree
2018-04-27tree-vect-slp: Enable vectorization when -fnoalias is given.Dominik Inführ
GCC 6+ detects more SLP instances compared to earlier versions. Cost analysis is done for all instances together and not all combinations. Additional instances make vectorization unprofitable, fix it by ignoring all but first instance when -fnoalias is given.
2018-04-27Adding -fnoalias to delcare non-aliasing on function level.Philipp Tomsich
This allows optimization by assuring the compiler that pointers within the given function don't alias. The optimizations within this patch are: * pass_uninline_innermost_loops (new lowering pass) Uninline the innermost loop if noalias is enabled, by moving it into a separate function. * pass_peel_last_iteration (new optimization pass) Peel the last loop iteration if noalias is enabled.
2018-04-27aarch64: X-Gene: Adapt tuning struct for GCC 7.Dominik Inführ
* use generic_[addrcost/regmove]_cost * use generic_approx_mode * decrease memmove_cost to 4 (was 6) * decrease issues_rate to 2 (was 4) This should actually be 4 and we should use TARGET_SCHED_VARIABLE_ISSUE() * relax function alignment to 8 (was 16) * relax loop alignment to 4 (was 16) * set max_case_values (switch-case) to 17 (was 0)
2018-04-27aarch64: X-Gene: Updated scheduling model for GCC 7.Philipp Tomsich
This patch updates the X-Gene scheduling model for GCC 7: * Improve scheduling by modeling IXB-dependency of shift/rotate. * Adjust latencies of branches/calls. * Remove pattern for prefetch as there is no type for it. * Added bypasses for alu and alus. * Remove the store reservation to improve performance. * Use multiple automatons. * Define ALUs latency to 1 with bypass for conditional ops. * Add IXB completion unit. * Add bfx-pattern to scheduling model. * Reduce latency on store-ops.
2018-04-27aarch64: Xgene: Procedural cost-model for X-Gene processors.Dominik Inführ
Bypass table-based cost-model with a procedural one to more closely the Xgene microarchitecture.
2018-04-27aarch64: Fix and<mode>3_zeroextend case (20040709-1.s regression).Philipp Tomsich
With the unguarded, HImode/QImode-optimized and-pattern, an additional guard for the immediate bitmask is required to exclude cases where an inverted bitmask is used (i.e. all-ones in those bits that should be zero-extended).
2018-04-27aarch64: Optimize and(s) patterns for HI/QI operands.Philipp Tomsich
HImode and QImode operands can be handled in a more optimal way for logical AND than for logical OR operations. An AND will never set bits that are not already set in its operands, so the resulting mode/precision depends on the least precision of its operands with an implicit zero-extension to any larger precision. These patterns help to avoid unnecessary zero-extension operations on benchmarks, including some SPEC workloads.
2018-04-27aarch64: Extend '*tb<optab><mode>1'.Philipp Tomsich
The '*tb<optab><mode>1' can safely be extended to match operands of any size, as long as the immediate operand (i.e. the bits tested) match the size of the register operand. This removes unnecessary zero-extension operations from the generated instruction stream.
2018-04-27aarch64: Correct the maximum shift amount for shifted operands.Philipp Tomsich
2018-04-27aarch64: Retpoline (Spectre-V2 mitigation) for aarch64.Christoph Muellner
The compiler option -mindirect-branch=<value> converts indirect branch-and-link-register and branch-register instructions according to <value>. The default is ``keep``, which keeps indirect branch-and-link-register and branch-register instructions unmodified. ``thunk`` converts indirect branch-and-link-register/branch-register instructions to a branch-and-link/branch to a function containing a retpoline (to stop speculative execution) followed by a branch-register to the target. ``thunk-inline`` is similar to ``thunk``, but inlines the retpoline before the branch-and-link-register/branch-register instruction. ``thunk-extern`` is also similar to ``thunk``, but does not insert the functions containing the retpoline. When using this option, these functions need to be provided in a separate object file. The retpoline functions exist for each register and are named ``__aarch64_indirect_thunk_xN`` (N being the register number). It is also possible to override the indirect-branch setting for individual fuctions using the function attribute ``indirect_branch``. The actual retpoline instruction sequence, which prevents speculative indirect branches looks like this:: str x30, [sp, #-16]! bl 101f 100: //speculation trap wfe b 100b 101: //do ROP adrp x30, 102f add x30, x30, :lo12:102f ret 102: //non-spec code ldr x30, [sp], #16 This patch has been tested with the included testcases and various other source bases (benchmarks, retpoline-patched arm64 kernel, etc.). Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
2018-01-25Update ChangeLog and version files for releaserguenth
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@257041 138bc75d-0d04-0410-961f-82ee72b054a4
2018-01-25Daily bump.gccadmin
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@257035 138bc75d-0d04-0410-961f-82ee72b054a4
2018-01-24Daily bump.gccadmin
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@257007 138bc75d-0d04-0410-961f-82ee72b054a4
2018-01-23libgcc: xtensa: fix NaN return from add/sub/mul/div helpersjcmvbkbc
libgcc/ 2018-01-23 Max Filippov <jcmvbkbc@gmail.com> Backport from mainline 2018-01-23 Max Filippov <jcmvbkbc@gmail.com> * config/xtensa/ieee754-df.S (__addsf3, __subsf3, __mulsf3) (__divsf3): Make NaN return value quiet. * config/xtensa/ieee754-sf.S (__adddf3, __subdf3, __muldf3) (__divdf3): Make NaN return value quiet. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@257003 138bc75d-0d04-0410-961f-82ee72b054a4
2018-01-23 PR rtl-optimization/81443ebotcazou
* rtlanal.c (num_sign_bit_copies1) <SUBREG>: Do not propagate results from inner REGs to paradoxical SUBREGs. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256998 138bc75d-0d04-0410-961f-82ee72b054a4
2018-01-23Daily bump.gccadmin
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256969 138bc75d-0d04-0410-961f-82ee72b054a4
2018-01-22 * es.po: Update.jsm28
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256956 138bc75d-0d04-0410-961f-82ee72b054a4
2018-01-22Daily bump.gccadmin
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256937 138bc75d-0d04-0410-961f-82ee72b054a4
2018-01-21[gcc]wschmidt
2018-01-21 Bill Schmidt <wschmidt@linux.vnet.ibm.com> Backport from mainline 2018-01-21 Bill Schmidt <wschmidt@linux.vnet.ibm.com> David Edelsohn <dje.gcc@gmail.com> PR target/83946 * config/rs6000/rs6000.md (*call_indirect_nonlocal_sysv<mode>): Change "crset eq" to "crset 2". (*call_value_indirect_nonlocal_sysv<mode>): Likewise. (*call_indirect_aix<mode>_nospec): Likewise. (*call_value_indirect_aix<mode>_nospec): Likewise. (*call_indirect_elfv2<mode>_nospec): Likewise. (*call_value_indirect_elfv2<mode>_nospec): Likewise. (*sibcall_nonlocal_sysv<mode>): Change "crset eq" to "crset 2"; change assembly output from . to $. (*sibcall_value_nonlocal_sysv<mode>): Likewise. (indirect_jump<mode>_nospec): Change assembly output from . to $. (*tablejump<mode>_internal1_nospec): Likewise. [gcc/testsuite] 2018-01-21 Bill Schmidt <wschmidt@linux.vnet.ibm.com> Backport from mainline 2018-01-21 Bill Schmidt <wschmidt@linux.vnet.ibm.com> David Edelsohn <dje.gcc@gmail.com> PR target/83946 * gcc.target/powerpc/safe-indirect-jump-1.c: Change expected assembly output from "crset eq" to "crset 2". * gcc.target/powerpc/safe-indirect-jump-2.c: Change expected assembly output from . to $. * gcc.target/powerpc/safe-indirect-jump-3.c: Likewise. * gcc.target/powerpc/safe-indirect-jump-1.c: Change expected assembly output from "crset eq" to "crset 2". * gcc.target/powerpc/safe-indirect-jump-8.c: Change expected assembly output from "crset eq" to "crset 2", and from . to $. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256932 138bc75d-0d04-0410-961f-82ee72b054a4
2018-01-21 Backport from mainlineolegendo
2018-01-21 Oleg Endo <olegendo@gcc.gnu.org> PR target/80870 * config/sh/sh_optimize_sett_clrt.cc: Use INCLUDE_ALGORITHM and INCLUDE_VECTOR instead of direct includes. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256928 138bc75d-0d04-0410-961f-82ee72b054a4
2018-01-21Daily bump.gccadmin
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256923 138bc75d-0d04-0410-961f-82ee72b054a4
2018-01-202018-01-20 Steven G. Kargl <kargl@gcc.gnu.org>kargl
PR fortran/83900 * simplify.c (gfc_simplify_matmul): Set return type correctly. 2018-01-20 Steven G. Kargl <kargl@gcc.gnu.org> PR fortran/83900 * gfortran.dg/matmul_18.f90: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256920 138bc75d-0d04-0410-961f-82ee72b054a4
2018-01-202018-01-19 Steven G. Kargl <kargl@gcc.gnu.org>kargl
PR fortran/83900 * simplify.c (gfc_simplify_matmul): Delete bogus assertion. 2018-01-19 Steven G. Kargl <kargl@gcc.gnu.org> PR fortran/83900 * gfortran.dg/matmul_17.f90: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256913 138bc75d-0d04-0410-961f-82ee72b054a4
2018-01-20Daily bump.gccadmin
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256910 138bc75d-0d04-0410-961f-82ee72b054a4
2018-01-192018-01-19 Steven G. Kargl <kargl@gcc.gnu.org>kargl
Backport of r250734 from mainline PR fortran/80768 * check.c (gfc_check_num_images): Fix typo. 2018-01-19 Steven G. Kargl <kargl@gcc.gnu.org> PR fortran/80768 * gfortran.dg/num_images_1.f90: New test that tests fix in r250734. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256907 138bc75d-0d04-0410-961f-82ee72b054a4
2018-01-19PR libstdc++/83834 replace wildcard pattern in linker scriptredi
Backport from mainline 2018-01-16 Jonathan Wakely <jwakely@redhat.com> PR libstdc++/83834 * config/abi/pre/gnu.ver (GLIBCXX_3.4): Replace std::c[a-g]* wildcard pattern with exact match for std::cerr. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256882 138bc75d-0d04-0410-961f-82ee72b054a4
2018-01-19Daily bump.gccadmin
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256869 138bc75d-0d04-0410-961f-82ee72b054a4