Age | Commit message (Collapse) | Author |
|
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
|
|
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
|
|
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
|
|
This flag triggers more aggressive loop unrolling in tree-predcom.
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
|
|
This reverts commit 3df45c222875f9edb98342cf1fcdf92edb2850b2.
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
|
|
GCC 6+ detects more SLP instances compared to earlier versions.
Cost analysis is done for all instances together and not all
combinations. Additional instances make vectorization unprofitable,
fix it by ignoring all but first instance when -fvectorize-more is given.
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
|
|
This reverts commit a3ba4a0362e6707ca4ac86ee8590faecc10ae5d4.
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
|
|
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
|
|
This patch adds support for Ampere Computing's eMAG processor.
Tested on aarch64 (no regressions seen).
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
|
|
Register renaming did not show improvements on SPEC CPU2006.
Therefore we drop the patch.
This reverts commit e8214e9512d568fa85cdd5660544649de75c8486.
|
|
The initialize loop optimizer patch is confirmed
to be not needed on GCC 7.3.0.
This reverts commit 8ab36d8103543b2572c34c8824c0c81ef26696a2.
|
|
This patch contains just whitespace changes.
Most of this changes were made to address issues
found by check_GNU_style.sh.
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
|
|
Merging the adrp/add instruction pair to a single adr instruction.
The actual retpoline instruction sequence, which prevents speculative
indirect branches looks like this now:
str x30, [sp, #-16]!
bl 101f
100: //speculation trap
wfe
b 100b
101: //do ROP
adr x30, 102f
ret
102: //non-spec code
ldr x30, [sp], #16
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
|
|
This reverts commit 20496f0e660347d8d2d885dff2702182853602bd.
The reason for the revert is, that it shows a significant performance
slowdown on some benchmarks.
After reverting the commit, the testcase (pr82697.c) still passes.
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
|
|
Many GCC tests fail for AArch64 with current binutils because of
assembler warnings of the form "Warning: ignoring incorrect section
type for .init_array.00100". The same issue was fixed for ARM in
r247015 by using SECTION_NOTYPE when creating those sections; this
patch applies the same fix to AArch64.
Tested with no regressions with cross to aarch64-linux-gnu.
* config/aarch64/aarch64.c (aarch64_elf_asm_constructor)
(aarch64_elf_asm_destructor): Pass SECTION_NOTYPE to get_section
when creating .init_array and .fini_array sections with priority
specified.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@253252 138bc75d-0d04-0410-961f-82ee72b054a4
|
|
|
|
|
|
|
|
|
|
Assuming a common subexpression, GCC will always keep it separate
instead of fusing it into two instructions (even when this would have
the lowest overall cost).
Assume the following code example (a multiply that could be syntesized
as (b + b << 2) << 2 or 4 * (b + 4b)):
signed long mult_20 ( signed int b )
{
return (signed long)b*20;
}
Due to a common subexpression (i.e. the sign-extended argument b) this
will result in:
mult_20:
sxtw x0, w0
add x0, x0, x0, lsl 2
lsl x0, x0, 2
ret
At 4 cycles latency, this is in stark contrast with the optimal solution
(which takes just 2 cycles and fuses the sign-extension into two
dependent instructions):
mult_20:
sbfiz x1, x0, 4, 32
add x0, x1, x0, sxtw 2
ret
To resolve this, a separate un-combine pass has been proposed
(originally by Philipp Tomsich with additional improvements by Chris Nelson).
The algorithm for this pass is:
for RTX expression A, identify all dependent RTX expressions B[0?n]
for each combination A -> B[i], check whether the combined RTX
expression is a valid instruction with the same cost as B[i], and
perform the following RTX changes:
replace B[i] with a new RTX expression B'[i], which is the
combination A -> B[i]
link each B'[i] to depend on the dependencies of A
replace B[i] in the dependecy-list of its dependent operations
with a dependency on B'[i] (i.e. unlink B[i])
remove B[i] (as this is dead code now) ? or leave it dangling for
the DCE-pass to clean up behind us.
if all combinations are valid, remove A (as this is dead code now) ?
or leave it dangling for the DCE-pass to clean up behind us.
|
|
This pass can be activated with -flist-find-pipeline.
This patch optimizes a common linked list idiom (example from
Coremark's 'core_list_find') of the following form:
while (list && (list->info->idx != info->idx))
list=list->next;
return list;
This idiom introduces a number of dependent-loads across the code path.
However, the dereference of list and the assignment of
list_{i+1} = list_{i}->next (i ... iteration)
only depends on the first condition (i.e. “list != NULL”) and can be
moved earlier.
The [list-find pass] is an experimental pass (to be generalised in a
next step) that provides a targeted implementation of a software pipeliner
for loops iterating over linked list and hoisting the list = list->next
dereference (for the next iteration) above the comparison of the index field.
In SSA form, the loop should thus become (all conditions need to be
inverted):
if (!list_{i})
return NULL; // forward-propagate from the
// if-condition
list_{i+1} = list_{i};
if (list_{i}->idx == info->idx)
return list{i};
which should be unrolled at least once to allow using two distinct registers
for list_{i} and list_{i+1} and avoids any additional register moves.
|
|
|
|
needs own life-array for every subtree
|
|
GCC 6+ detects more SLP instances compared to earlier versions.
Cost analysis is done for all instances together and not all
combinations. Additional instances make vectorization unprofitable,
fix it by ignoring all but first instance when -fnoalias is given.
|
|
This allows optimization by assuring the compiler that
pointers within the given function don't alias.
The optimizations within this patch are:
* pass_uninline_innermost_loops (new lowering pass)
Uninline the innermost loop if noalias is enabled, by moving it
into a separate function.
* pass_peel_last_iteration (new optimization pass)
Peel the last loop iteration if noalias is enabled.
|
|
* use generic_[addrcost/regmove]_cost
* use generic_approx_mode
* decrease memmove_cost to 4 (was 6)
* decrease issues_rate to 2 (was 4)
This should actually be 4 and we should use TARGET_SCHED_VARIABLE_ISSUE()
* relax function alignment to 8 (was 16)
* relax loop alignment to 4 (was 16)
* set max_case_values (switch-case) to 17 (was 0)
|
|
This patch updates the X-Gene scheduling model for GCC 7:
* Improve scheduling by modeling IXB-dependency of shift/rotate.
* Adjust latencies of branches/calls.
* Remove pattern for prefetch as there is no type for it.
* Added bypasses for alu and alus.
* Remove the store reservation to improve performance.
* Use multiple automatons.
* Define ALUs latency to 1 with bypass for conditional ops.
* Add IXB completion unit.
* Add bfx-pattern to scheduling model.
* Reduce latency on store-ops.
|
|
Bypass table-based cost-model with a procedural one to more closely
the Xgene microarchitecture.
|
|
With the unguarded, HImode/QImode-optimized and-pattern, an additional
guard for the immediate bitmask is required to exclude cases where an
inverted bitmask is used (i.e. all-ones in those bits that should be
zero-extended).
|
|
HImode and QImode operands can be handled in a more optimal way for
logical AND than for logical OR operations. An AND will never set
bits that are not already set in its operands, so the resulting
mode/precision depends on the least precision of its operands with
an implicit zero-extension to any larger precision.
These patterns help to avoid unnecessary zero-extension operations
on benchmarks, including some SPEC workloads.
|
|
The '*tb<optab><mode>1' can safely be extended to match operands of
any size, as long as the immediate operand (i.e. the bits tested)
match the size of the register operand.
This removes unnecessary zero-extension operations from the
generated instruction stream.
|
|
|
|
The compiler option -mindirect-branch=<value> converts indirect
branch-and-link-register and branch-register instructions according to <value>.
The default is ``keep``, which keeps indirect branch-and-link-register and
branch-register instructions unmodified.
``thunk`` converts indirect branch-and-link-register/branch-register
instructions to a branch-and-link/branch to a function containing a retpoline
(to stop speculative execution) followed by a branch-register to the target.
``thunk-inline`` is similar to ``thunk``, but inlines the retpoline
before the branch-and-link-register/branch-register instruction.
``thunk-extern`` is also similar to ``thunk``, but does not insert the
functions containing the retpoline. When using this option, these functions
need to be provided in a separate object file. The retpoline functions exist
for each register and are named ``__aarch64_indirect_thunk_xN`` (N being the
register number).
It is also possible to override the indirect-branch setting for
individual fuctions using the function attribute ``indirect_branch``.
The actual retpoline instruction sequence, which prevents speculative
indirect branches looks like this::
str x30, [sp, #-16]!
bl 101f
100: //speculation trap
wfe
b 100b
101: //do ROP
adrp x30, 102f
add x30, x30, :lo12:102f
ret
102: //non-spec code
ldr x30, [sp], #16
This patch has been tested with the included testcases and various other
source bases (benchmarks, retpoline-patched arm64 kernel, etc.).
Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
|
|
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@257041 138bc75d-0d04-0410-961f-82ee72b054a4
|
|
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@257035 138bc75d-0d04-0410-961f-82ee72b054a4
|
|
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@257007 138bc75d-0d04-0410-961f-82ee72b054a4
|
|
libgcc/
2018-01-23 Max Filippov <jcmvbkbc@gmail.com>
Backport from mainline
2018-01-23 Max Filippov <jcmvbkbc@gmail.com>
* config/xtensa/ieee754-df.S (__addsf3, __subsf3, __mulsf3)
(__divsf3): Make NaN return value quiet.
* config/xtensa/ieee754-sf.S (__adddf3, __subdf3, __muldf3)
(__divdf3): Make NaN return value quiet.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@257003 138bc75d-0d04-0410-961f-82ee72b054a4
|
|
* rtlanal.c (num_sign_bit_copies1) <SUBREG>: Do not propagate results
from inner REGs to paradoxical SUBREGs.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256998 138bc75d-0d04-0410-961f-82ee72b054a4
|
|
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256969 138bc75d-0d04-0410-961f-82ee72b054a4
|
|
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256956 138bc75d-0d04-0410-961f-82ee72b054a4
|
|
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256937 138bc75d-0d04-0410-961f-82ee72b054a4
|
|
2018-01-21 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
Backport from mainline
2018-01-21 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
David Edelsohn <dje.gcc@gmail.com>
PR target/83946
* config/rs6000/rs6000.md (*call_indirect_nonlocal_sysv<mode>):
Change "crset eq" to "crset 2".
(*call_value_indirect_nonlocal_sysv<mode>): Likewise.
(*call_indirect_aix<mode>_nospec): Likewise.
(*call_value_indirect_aix<mode>_nospec): Likewise.
(*call_indirect_elfv2<mode>_nospec): Likewise.
(*call_value_indirect_elfv2<mode>_nospec): Likewise.
(*sibcall_nonlocal_sysv<mode>): Change "crset eq" to "crset 2";
change assembly output from . to $.
(*sibcall_value_nonlocal_sysv<mode>): Likewise.
(indirect_jump<mode>_nospec): Change assembly output from . to $.
(*tablejump<mode>_internal1_nospec): Likewise.
[gcc/testsuite]
2018-01-21 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
Backport from mainline
2018-01-21 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
David Edelsohn <dje.gcc@gmail.com>
PR target/83946
* gcc.target/powerpc/safe-indirect-jump-1.c: Change expected
assembly output from "crset eq" to "crset 2".
* gcc.target/powerpc/safe-indirect-jump-2.c: Change expected
assembly output from . to $.
* gcc.target/powerpc/safe-indirect-jump-3.c: Likewise.
* gcc.target/powerpc/safe-indirect-jump-1.c: Change expected
assembly output from "crset eq" to "crset 2".
* gcc.target/powerpc/safe-indirect-jump-8.c: Change expected
assembly output from "crset eq" to "crset 2", and from . to $.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256932 138bc75d-0d04-0410-961f-82ee72b054a4
|
|
2018-01-21 Oleg Endo <olegendo@gcc.gnu.org>
PR target/80870
* config/sh/sh_optimize_sett_clrt.cc:
Use INCLUDE_ALGORITHM and INCLUDE_VECTOR instead of direct includes.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256928 138bc75d-0d04-0410-961f-82ee72b054a4
|
|
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256923 138bc75d-0d04-0410-961f-82ee72b054a4
|
|
PR fortran/83900
* simplify.c (gfc_simplify_matmul): Set return type correctly.
2018-01-20 Steven G. Kargl <kargl@gcc.gnu.org>
PR fortran/83900
* gfortran.dg/matmul_18.f90: New test.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256920 138bc75d-0d04-0410-961f-82ee72b054a4
|
|
PR fortran/83900
* simplify.c (gfc_simplify_matmul): Delete bogus assertion.
2018-01-19 Steven G. Kargl <kargl@gcc.gnu.org>
PR fortran/83900
* gfortran.dg/matmul_17.f90: New test.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256913 138bc75d-0d04-0410-961f-82ee72b054a4
|
|
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256910 138bc75d-0d04-0410-961f-82ee72b054a4
|
|
Backport of r250734 from mainline
PR fortran/80768
* check.c (gfc_check_num_images): Fix typo.
2018-01-19 Steven G. Kargl <kargl@gcc.gnu.org>
PR fortran/80768
* gfortran.dg/num_images_1.f90: New test that tests fix in r250734.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256907 138bc75d-0d04-0410-961f-82ee72b054a4
|
|
Backport from mainline
2018-01-16 Jonathan Wakely <jwakely@redhat.com>
PR libstdc++/83834
* config/abi/pre/gnu.ver (GLIBCXX_3.4): Replace std::c[a-g]* wildcard
pattern with exact match for std::cerr.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256882 138bc75d-0d04-0410-961f-82ee72b054a4
|
|
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-7-branch@256869 138bc75d-0d04-0410-961f-82ee72b054a4
|