Age | Commit message (Collapse) | Author |
|
This delays the SLP permutation check to vectorizable_load and optimizes
permutations only after all SLP instances have been generated and the
vectorization factor is determined.
2020-05-08 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vec_info::slp_loads): New.
(vect_optimize_slp): Declare.
* tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Do
nothing when there are no loads.
(vect_gather_slp_loads): Gather loads into a vector.
(vect_supported_load_permutation_p): Remove.
(vect_analyze_slp_instance): Do not verify permutation
validity here.
(vect_analyze_slp): Optimize permutations of reductions
after all SLP instances have been gathered and gather
all loads.
(vect_optimize_slp): New function split out from
vect_supported_load_permutation_p. Elide some permutations.
(vect_slp_analyze_bb_1): Call vect_optimize_slp.
* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
* tree-vect-stmts.c (vectorizable_load): Check whether
the load can be permuted. When generating code assert we can.
* gcc.dg/vect/bb-slp-pr68892.c: Adjust for not supported
SLP permutations becoming builds from scalars.
* gcc.dg/vect/bb-slp-pr78205.c: Likewise.
* gcc.dg/vect/bb-slp-34.c: Likewise.
|
|
This removes trivial instances of SLP_INSTANCE_GROUP_SIZE and refrains
from using a "SLP instance" which nowadays is just one of the possibly
many entries into the SLP graph.
2020-05-06 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vect_transform_slp_perm_load): Adjust.
* tree-vect-data-refs.c (vect_slp_analyze_node_dependences):
Remove slp_instance parameter, just iterate over all scalar stmts.
(vect_slp_analyze_instance_dependence): Adjust and likewise.
* tree-vect-slp.c (vect_bb_slp_scalar_cost): Remove unused BB
parameter.
(vect_schedule_slp): Just iterate over all scalar stmts.
(vect_supported_load_permutation_p): Adjust.
(vect_transform_slp_perm_load): Remove slp_instance parameter,
instead use the number of lanes in the node as group size.
* tree-vect-stmts.c (vect_model_load_cost): Get vectorization
factor instead of slp_instance as parameter.
(vectorizable_load): Adjust.
|
|
This rewrites hybrid SLP detection to be simpler and cope with
group size changes in the SLP graph. In particular detection
works starting from non-SLP stmts following use->def chains
rather than walking the SLP graph and following def->use chains.
It's all temporary of course since non-SLP and thus hybrid SLP
will go away.
2020-05-05 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (struct vdhs_data): New.
(vect_detect_hybrid_slp): New walker.
(vect_detect_hybrid_slp): Rewrite.
|
|
Soonish we'll get SLP nodes which have no corresponding scalar
stmt and thus not stmt_vec_info and thus no way to get back to
the associated vec_info. This patch makes the vec_info available
as part of the APIs instead of putting in that back-pointer into
the leaf data structures.
2020-05-05 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_stmt_vec_info::vinfo): Remove.
(STMT_VINFO_LOOP_VINFO): Likewise.
(STMT_VINFO_BB_VINFO): Likewise.
* tree-vect-data-refs.c: Adjust for the above, adding vec_info *
parameters and adjusting calls.
* tree-vect-loop-manip.c: Likewise.
* tree-vect-loop.c: Likewise.
* tree-vect-patterns.c: Likewise.
* tree-vect-slp.c: Likewise.
* tree-vect-stmts.c: Likewise.
* tree-vectorizer.c: Likewise.
* target.def (add_stmt_cost): Add vec_info * parameter.
* target.h (stmt_in_inner_loop_p): Likewise.
* targhooks.c (default_add_stmt_cost): Adjust.
* doc/tm.texi: Re-generate.
* config/aarch64/aarch64.c (aarch64_extending_load_p): Add
vec_info * parameter and adjust.
(aarch64_sve_adjust_stmt_cost): Likewise.
(aarch64_add_stmt_cost): Likewise.
* config/arm/arm.c (arm_add_stmt_cost): Likewise.
* config/i386/i386.c (ix86_add_stmt_cost): Likewise.
* config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise.
|
|
The remaining IL adjustment done by SLP analysis turns out harmful
since we share them in the now multiple analyses states. It turns
out we do not actually need those apart from the case where we
reorg scalar stmts during re-arrangement when optimizing load
permutations in SLP reductions. But that isn't needed either now
since we only need to permute non-isomorphic parts which now
reside in separate SLP nodes who are all leafs.
2020-03-23 Richard Biener <rguenther@suse.de>
PR tree-optimization/94261
* tree-vect-slp.c (vect_get_and_check_slp_defs): Remove
IL operand swapping code.
(vect_slp_rearrange_stmts): Do not arrange isomorphic
nodes that would need operation code adjustments.
|
|
This also dumps the root node we eventually smuggle in.
2020-03-20 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_analyze_slp_instance): Dump SLP tree
from the possibly modified root.
|
|
We failed to handle pattern stmts appropriately.
2020-03-20 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_analyze_slp_instance): Push the stmts
to vectorize for CTOR defs.
|
|
When possibly expanding a hash-map avoid keeping a reference to an
entry.
2020-02-27 Richard Biener <rguenther@suse.de>
PR tree-optimization/93953
* tree-vect-slp.c (slp_copy_subtree): Avoid keeping a reference
to the hash-map entry.
* gcc.dg/pr93953.c: New testcase.
|
|
This adjusts dumping as proved useful in debugging.
2020-02-26 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_print_slp_tree): Also dump ref count
and load permutation.
|
|
This avoids altering possibly shared SLP subtrees when attempting
to get rid of permutations in SLP reductions by copying the SLP
subtree before re-arranging stmts in it.
2020-02-25 Richard Biener <rguenther@suse.de>
PR tree-optimization/93868
* tree-vect-slp.c (slp_copy_subtree): New function.
(vect_attempt_slp_rearrange_stmts): Copy the SLP tree before
re-arranging stmts in it.
* gcc.dg/torture/pr93868.c: New testcase.
|
|
With SLP now being a graph with shared nodes across instances we have
to make sure to compute the load permutation of nodes once, not
overwriting the result of earlier analysis.
2020-01-28 Richard Biener <rguenther@suse.de>
PR tree-optimization/93428
* tree-vect-slp.c (vect_build_slp_tree_2): Compute the load
permutation when the load node is created.
(vect_analyze_slp_instance): Re-use it here.
* gcc.dg/torture/pr93428.c: New testcase.
|
|
The following delays adjusting the SLP graph for converted reduction
chains to a point where the SLP build no longer can fail since we
otherwise fail to undo marking the conversion as a group.
2020-01-27 Richard Biener <rguenther@suse.de>
PR tree-optimization/93397
* tree-vect-slp.c (vect_analyze_slp_instance): Delay
converted reduction chain SLP graph adjustment.
* gcc.dg/torture/pr93397.c: New testcase.
|
|
Having the "same" vector types with different modes means that we can
end up vectorising a constructor with a different mode from the lhs.
This patch adds a VIEW_CONVERT_EXPR in that case.
This showed up on existing tests when testing with fixed-length
-msve-vector-bits=128.
2020-01-15 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-slp.c (vectorize_slp_instance_root_stmt): Use a
VIEW_CONVERT_EXPR if the vectorized constructor has a diffeent
type from the lhs.
|
|
gcc/cp/ChangeLog:
* cp-gimplify.c (source_location_table_entry_hash::empty_zero_p):
New static constant.
* cp-tree.h (named_decl_hash::empty_zero_p): Likewise.
(struct named_label_hash::empty_zero_p): Likewise.
* decl2.c (mangled_decl_hash::empty_zero_p): Likewise.
gcc/ChangeLog:
* attribs.c (excl_hash_traits::empty_zero_p): New static constant.
* gcov.c (function_start_pair_hash::empty_zero_p): Likewise.
* graphite.c (struct sese_scev_hash::empty_zero_p): Likewise.
* hash-map-tests.c (selftest::test_nonzero_empty_key): New selftest.
(selftest::hash_map_tests_c_tests): Call it.
* hash-map-traits.h (simple_hashmap_traits::empty_zero_p):
New static constant, using the value of = H::empty_zero_p.
(unbounded_hashmap_traits::empty_zero_p): Likewise, using the value
from default_hash_traits <Value>.
* hash-map.h (hash_map::empty_zero_p): Likewise, using the value
from Traits.
* hash-set-tests.c (value_hash_traits::empty_zero_p): Likewise.
* hash-table.h (hash_table::alloc_entries): Guard the loop of
calls to mark_empty with !Descriptor::empty_zero_p.
(hash_table::empty_slow): Conditionalize the memset call with a
check that Descriptor::empty_zero_p; otherwise, loop through the
entries calling mark_empty on them.
* hash-traits.h (int_hash::empty_zero_p): New static constant.
(pointer_hash::empty_zero_p): Likewise.
(pair_hash::empty_zero_p): Likewise.
* ipa-devirt.c (default_hash_traits <type_pair>::empty_zero_p):
Likewise.
* ipa-prop.c (ipa_bit_ggc_hash_traits::empty_zero_p): Likewise.
(ipa_vr_ggc_hash_traits::empty_zero_p): Likewise.
* profile.c (location_triplet_hash::empty_zero_p): Likewise.
* sanopt.c (sanopt_tree_triplet_hash::empty_zero_p): Likewise.
(sanopt_tree_couple_hash::empty_zero_p): Likewise.
* tree-hasher.h (int_tree_hasher::empty_zero_p): Likewise.
* tree-ssa-sccvn.c (vn_ssa_aux_hasher::empty_zero_p): Likewise.
* tree-vect-slp.c (bst_traits::empty_zero_p): Likewise.
* tree-vectorizer.h
(default_hash_traits<scalar_cond_masked_key>::empty_zero_p):
Likewise.
|
|
IFN_DIV_POW2 currently requires all elements to be shifted by the
same amount, in a similar way as for WIDEN_LSHIFT_EXPR. This patch
enforces that when building the SLP tree.
If in future targets want to support IFN_DIV_POW2 without this
restriction, we'll probably need the kind of vector-vector/
vector-scalar split that we already have for normal shifts.
2020-01-06 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-slp.c (vect_build_slp_tree_1): Require all shifts
in an IFN_DIV_POW2 node to be equal.
gcc/testsuite/
* gcc.target/aarch64/sve/asrdiv_1.c: Remove trailing %s.
* gcc.target/aarch64/sve/asrdiv_2.c: New test.
* gcc.target/aarch64/sve/asrdiv_3.c: Likewise.
From-SVN: r279908
|
|
From-SVN: r279813
|
|
2019-11-29 Richard Biener <rguenther@suse.de>
PR tree-optimization/91003
* tree-vect-slp.c (vect_mask_constant_operand_p): Pass in the
operand number, avoid handling the non-condition operands of
COND_EXPRs as comparisons.
(vect_get_constant_vectors): Pass down the operand number.
(vect_get_slp_defs): Likewise.
* gfortran.dg/pr91003.f90: New testcase.
From-SVN: r278860
|
|
Now that stmt_vec_info records the choice between vector mask
types and normal nonmask types, we can use that information in
vect_get_vector_types_for_stmt instead of deferring the choice
of vector type till later.
vect_get_mask_type_for_stmt used to check whether the boolean inputs
to an operation:
(a) consistently used mask types or consistently used nonmask types; and
(b) agreed on the number of elements.
(b) shouldn't be a problem when (a) is met. If the operation
consistently uses mask types, tree-vect-patterns.c will have corrected
any mismatches in mask precision. (This is because we only use mask
types for a small well-known set of operations and tree-vect-patterns.c
knows how to handle any that could have different mask precisions.)
And if the operation consistently uses normal nonmask types, there's
no reason why booleans should need extra vector compatibility checks
compared to ordinary integers.
So the potential difficulties all seem to come from (a). Now that
we've chosen the result type ahead of time, we also have to consider
whether the outputs and inputs consistently use mask types.
Taking each vectorizable_* routine in turn:
- vectorizable_call
vect_get_vector_types_for_stmt only handled booleans specially
for gassigns, so vect_get_mask_type_for_stmt never had chance to
handle calls. I'm not sure we support any calls that operate on
booleans, but as things stand, a boolean result would always have
a nonmask type. Presumably any vector argument would also need to
use nonmask types, unless it corresponds to internal_fn_mask_index
(which is already a special case).
For safety, I've added a check for mask/nonmask combinations here
even though we didn't check this previously.
- vectorizable_simd_clone_call
Again, vect_get_mask_type_for_stmt never had chance to handle calls.
The result of the call will always be a nonmask type and the patch
for PR 92710 rejects mask arguments. So all booleans should
consistently use nonmask types here.
- vectorizable_conversion
The function already rejects any conversion between booleans in which
one type isn't a mask type.
- vectorizable_operation
This function definitely needs a consistency check, e.g. to handle
& and | in which one operand is loaded from memory and the other is
a comparison result. Ideally we'd handle this via pattern stmts
instead (like we do for the all-mask case), but that's future work.
- vectorizable_assignment
VECT_SCALAR_BOOLEAN_TYPE_P requires single-bit precision, so the
current code already rejects problematic cases.
- vectorizable_load
Loads always produce nonmask types and there are no relevant inputs
to check against.
- vectorizable_store
vect_check_store_rhs already rejects mask/nonmask combinations
via useless_type_conversion_p.
- vectorizable_reduction
- vectorizable_lc_phi
PHIs always have nonmask types. After the change above, attempts
to combine the PHI result with a mask type would be rejected by
vectorizable_operation. (Again, it would be better to handle
this using pattern stmts.)
- vectorizable_induction
We don't generate inductions for booleans.
- vectorizable_shift
The function already rejects boolean shifts via type_has_mode_precision_p.
- vectorizable_condition
The function already rejects mismatches via useless_type_conversion_p.
- vectorizable_comparison
The function already rejects comparisons between mask and nonmask types.
The result is always a mask type.
2019-11-29 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR tree-optimization/92596
* tree-vect-stmts.c (vectorizable_call): Punt on hybrid mask/nonmask
operations.
(vectorizable_operation): Likewise, instead of relying on
vect_get_mask_type_for_stmt to do this.
(vect_get_vector_types_for_stmt): Always return a vector type
immediately, rather than deferring the choice for boolean results.
Use a vector mask type instead of a normal vector if
vect_use_mask_type_p.
(vect_get_mask_type_for_stmt): Delete.
* tree-vect-loop.c (vect_determine_vf_for_stmt_1): Remove
mask_producers argument and special boolean_type_node handling.
(vect_determine_vf_for_stmt): Remove mask_producers argument and
update calls to vect_determine_vf_for_stmt_1. Remove doubled call.
(vect_determine_vectorization_factor): Update call accordingly.
* tree-vect-slp.c (vect_build_slp_tree_1): Remove special
boolean_type_node handling.
(vect_slp_analyze_node_operations_1): Likewise.
gcc/testsuite/
PR tree-optimization/92596
* gcc.dg/vect/bb-slp-pr92596.c: New test.
* gcc.dg/vect/bb-slp-43.c: Likewise.
From-SVN: r278851
|
|
This patch makes vect_get_mask_type_for_stmt and
get_mask_type_for_scalar_type take a group size instead of
the SLP node, so that later patches can call it before an
SLP node has been built.
2019-11-29 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (get_mask_type_for_scalar_type): Replace
the slp_tree parameter with a group size parameter.
(vect_get_mask_type_for_stmt): Likewise.
* tree-vect-stmts.c (get_mask_type_for_scalar_type): Likewise.
(vect_get_mask_type_for_stmt): Likewise.
* tree-vect-slp.c (vect_slp_analyze_node_operations_1): Update
call accordingly.
From-SVN: r278849
|
|
when compiled with GCC compared to Clang)
2019-11-26 Richard Biener <rguenther@suse.de>
PR tree-optimization/92645
* tree-vect-slp.c (vect_build_slp_tree_2): For unary ops
do not build the operation from scalars if the operand is.
* gcc.target/i386/pr92645.c: New testcase.
From-SVN: r278719
|
|
2019-11-25 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_detect_hybrid_slp_stmts): Add assertion.
(vect_detect_hybrid_slp): Swap lane and instance iteration,
properly re-building the visited hash-map for each lane.
From-SVN: r278679
|
|
r278406)
2019-11-21 Richard Biener <rguenther@suse.de>
PR tree-optimization/92596
* tree-vect-slp.c (vect_build_slp_tree): Fix pasto.
* gcc.dg/torture/pr92596-1.c: New testcase.
From-SVN: r278555
|
|
2019-11-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-slp.c (vect_schedule_slp_instance): Restore stmt
def types for two-operation SLP.
From-SVN: r278533
|
|
actually analyzing.
2019-11-20 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_analyze_slp_instance): Dump
constructors we are actually analyzing.
(vect_slp_check_for_constructors): Do not vectorize uniform
constuctors, do not dump here.
* gcc.dg/vect/bb-slp-42.c: Adjust.
* gcc.dg/vect/bb-slp-40.c: Likewise.
From-SVN: r278495
|
|
tree-vect-slp.c:2775)
2019-11-20 Richard Biener <rguenther@suse.de>
PR tree-optimization/92537
* tree-vect-slp.c (vect_analyze_slp_instance): Move CTOR
vectorization validity check...
(vect_slp_analyze_operations): ... here.
* gfortran.dg/pr92537.f90: New testcase.
From-SVN: r278494
|
|
tree-vect-slp.c:4095 since r278246)
2019-11-18 Richard Biener <rguenther@suse.de>
PR tree-optimization/92516
* tree-vect-slp.c (vect_analyze_slp_instance): Add bst_map
argument, hoist bst_map creation/destruction to ...
(vect_analyze_slp): ... here, forming a true graph with
SLP instances being the entries.
(vect_detect_hybrid_slp_stmts): Remove wrapper.
(vect_detect_hybrid_slp): Use one visited set for all
graph entries.
(vect_slp_analyze_node_operations): Simplify visited/lvisited
to hash-sets of slp_tree.
(vect_slp_analyze_operations): Likewise.
(vect_bb_slp_scalar_cost): Remove wrapper.
(vect_bb_vectorization_profitable_p): Use one visited set for
all graph entries.
(vect_schedule_slp_instance): Elide bst_map use.
(vect_schedule_slp): Likewise.
* g++.dg/vect/slp-pr92516.cc: New testcase.
2019-11-18 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_analyze_slp_instance): When a CTOR
was vectorized with just external refs fail.
* gcc.dg/vect/vect-ctor-1.c: New testcase.
From-SVN: r278406
|
|
This patch makes can_duplicate_and_interleave_p cope with mixtures of
vector sizes, by using queries based on get_vectype_for_scalar_type
instead of directly querying GET_MODE_SIZE (vinfo->vector_mode).
int_mode_for_size is now the first check we do for a candidate mode,
so it seemed better to restrict it to MAX_FIXED_MODE_SIZE. This avoids
unnecessary work and avoids trying to create scalar types that the
target might not support.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (can_duplicate_and_interleave_p): Take an
element type rather than an element mode.
* tree-vect-slp.c (can_duplicate_and_interleave_p): Likewise.
Use get_vectype_for_scalar_type to query the natural types
for a given element type rather than basing everything on
GET_MODE_SIZE (vinfo->vector_mode). Limit int_mode_for_size
query to MAX_FIXED_MODE_SIZE.
(duplicate_and_interleave): Update call accordingly.
* tree-vect-loop.c (vectorizable_reduction): Likewise.
From-SVN: r278335
|
|
The BB vectoriser picked vector types in the same way as the loop
vectoriser: it picked a vector mode/size for the region and then
based all the vector types off that choice. This meant we could
end up trying to use vector types that had too many elements for
the group size.
The main part of this patch is therefore about passing the SLP
group size down to routines like get_vectype_for_scalar_type and
ensuring that each vector type in the SLP tree is chosen wrt the
group size. That part in itself is pretty easy and mechanical.
The main warts are:
(1) We normally pick a STMT_VINFO_VECTYPE for data references at an
early stage (vect_analyze_data_refs). However, nothing in the
BB vectoriser relied on this, or on the min_vf calculated from it.
I couldn't see anything other than vect_recog_bool_pattern that
tried to access the vector type before the SLP tree is built.
(2) It's possible for the same statement to be used in groups of
different sizes. Taking the group size into account meant that
we could try to pick different vector types for the same statement.
This problem should go away with the move to doing everything on
SLP trees, where presumably we would attach the vector type to the
SLP node rather than the stmt_vec_info. Until then, the patch just
uses a first-come, first-served approach.
(3) A similar problem exists for grouped data references, where
different statements in the same dataref group could be used
in SLP nodes that have different group sizes. The patch copes
with that by making sure that all vector types in a dataref
group remain consistent.
The patch means that:
void
f (int *x, short *y)
{
x[0] += y[0];
x[1] += y[1];
x[2] += y[2];
x[3] += y[3];
}
now produces:
ldr q0, [x0]
ldr d1, [x1]
saddw v0.4s, v0.4s, v1.4h
str q0, [x0]
ret
instead of:
ldrsh w2, [x1]
ldrsh w3, [x1, 2]
fmov s0, w2
ldrsh w2, [x1, 4]
ldrsh w1, [x1, 6]
ins v0.s[1], w3
ldr q1, [x0]
ins v0.s[2], w2
ins v0.s[3], w1
add v0.4s, v0.4s, v1.4s
str q0, [x0]
ret
Unfortunately it also means we start to vectorise
gcc.target/i386/pr84101.c for -m32. That seems like a target
cost issue though; see PR92265 for details.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (vect_get_vector_types_for_stmt): Take an
optional maximum nunits.
(get_vectype_for_scalar_type): Likewise. Also declare a form that
takes an slp_tree.
(get_mask_type_for_scalar_type): Take an optional slp_tree.
(vect_get_mask_type_for_stmt): Likewise.
* tree-vect-data-refs.c (vect_analyze_data_refs): Don't store
the vector type in STMT_VINFO_VECTYPE for BB vectorization.
* tree-vect-patterns.c (vect_recog_bool_pattern): Use
vect_get_vector_types_for_stmt instead of STMT_VINFO_VECTYPE
to get an assumed vector type for data references.
* tree-vect-slp.c (vect_update_shared_vectype): New function.
(vect_update_all_shared_vectypes): Likewise.
(vect_build_slp_tree_1): Pass the group size to
vect_get_vector_types_for_stmt. Use vect_update_shared_vectype
for BB vectorization.
(vect_build_slp_tree_2): Call vect_update_all_shared_vectypes
before building the vectof from scalars.
(vect_analyze_slp_instance): Pass the group size to
get_vectype_for_scalar_type.
(vect_slp_analyze_node_operations_1): Don't recompute the vector
types for BB vectorization here; just handle the case in which
we deferred the choice for booleans.
(vect_get_constant_vectors): Pass the slp_tree to
get_vectype_for_scalar_type.
* tree-vect-stmts.c (vect_prologue_cost_for_slp_op): Likewise.
(vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_comparison): Likewise.
(vect_is_simple_cond): Take the slp_tree as argument and
pass it to get_vectype_for_scalar_type.
(vectorizable_condition): Update call accordingly.
(get_vectype_for_scalar_type): Take a group_size argument.
For BB vectorization, limit the the vector to that number
of elements. Also define an overload that takes an slp_tree.
(get_mask_type_for_scalar_type): Add an slp_tree argument and
pass it to get_vectype_for_scalar_type.
(vect_get_vector_types_for_stmt): Add a group_size argument
and pass it to get_vectype_for_scalar_type. Don't use the
cached vector type for BB vectorization if a group size is given.
Handle data references in that case.
(vect_get_mask_type_for_stmt): Take an slp_tree argument and
pass it to get_mask_type_for_scalar_type.
gcc/testsuite/
* gcc.dg/vect/bb-slp-4.c: Expect the block to be vectorized
with -fno-vect-cost-model.
* gcc.dg/vect/bb-slp-bool-1.c: New test.
* gcc.target/aarch64/vect_mixed_sizes_14.c: Likewise.
* gcc.target/i386/pr84101.c: XFAIL for -m32.
From-SVN: r278334
|
|
If the statements in an SLP node aren't similar enough to be vectorised,
or aren't something the vectoriser has code to handle, the BB vectoriser
tries building the vector from scalars instead. This patch does the
same thing if we're able to build a viable-looking tree but fail later
during the analysis phase, e.g. because the target doesn't support a
particular vector operation.
This is needed to avoid regressions with a later patch.
2019-11-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-slp.c (vect_contains_pattern_stmt_p): New function.
(vect_slp_convert_to_external): Likewise.
(vect_slp_analyze_node_operations): If analysis fails, try building
the node from scalars instead.
gcc/testsuite/
* gcc.dg/vect/bb-slp-div-2.c: New test.
From-SVN: r278246
|
|
A later patch makes the AArch64 port add four entries to
autovectorize_vector_modes. Each entry describes a different
vector mode assignment for vector code that mixes 8-bit, 16-bit,
32-bit and 64-bit elements. But if (as usual) the vector code has
fewer element sizes than that, we could end up trying the same
combination of vector modes multiple times. This patch adds a
check to prevent that.
2019-11-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (vec_info::mode_set): New typedef.
(vec_info::used_vector_mode): New member variable.
(vect_chooses_same_modes_p): Declare.
* tree-vect-stmts.c (get_vectype_for_scalar_type): Record each
chosen vector mode in vec_info::used_vector_mode.
(vect_chooses_same_modes_p): New function.
* tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying
the same vector statements multiple times.
* tree-vect-slp.c (vect_slp_bb_region): Likewise.
From-SVN: r278242
|
|
After previous patches, it's now possible to make the vectoriser
support multiple vector sizes in the same vector region, using
related_vector_mode to pick the right vector mode for a given
element mode. No port yet takes advantage of this, but I have
a follow-on patch for AArch64.
This patch also seemed like a good opportunity to add some more dump
messages: one to make it clear which vector size/mode was being used
when analysis passed or failed, and another to say when we've decided
to skip a redundant vector size/mode.
2019-11-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* machmode.h (opt_machine_mode::operator==): New function.
(opt_machine_mode::operator!=): Likewise.
* tree-vectorizer.h (vec_info::vector_mode): Update comment.
(get_related_vectype_for_scalar_type): Delete.
(get_vectype_for_scalar_type_and_size): Declare.
* tree-vect-slp.c (vect_slp_bb_region): Print dump messages to say
whether analysis passed or failed, and with what vector modes.
Use related_vector_mode to check whether trying a particular
vector mode would be redundant with the autodetected mode,
and print a dump message if we decide to skip it.
* tree-vect-loop.c (vect_analyze_loop): Likewise.
(vect_create_epilog_for_reduction): Use
get_related_vectype_for_scalar_type instead of
get_vectype_for_scalar_type_and_size.
* tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Replace
with...
(get_related_vectype_for_scalar_type): ...this new function.
Take a starting/"prevailing" vector mode rather than a vector size.
Take an optional nunits argument, with the same meaning as for
related_vector_mode. Use related_vector_mode when not
auto-detecting a mode, falling back to mode_for_vector if no
target mode exists.
(get_vectype_for_scalar_type): Update accordingly.
(get_same_sized_vectype): Likewise.
* tree-vectorizer.c (get_vec_alignment_for_array_type): Likewise.
From-SVN: r278240
|
|
This patch replaces vec_info::vector_size with vec_info::vector_mode,
but for now continues to use it as a way of specifying a single
vector size. This makes it easier for later patches to use
related_vector_mode instead.
2019-11-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (vec_info::vector_size): Replace with...
(vec_info::vector_mode): ...this new field.
* tree-vect-loop.c (vect_update_vf_for_slp): Update accordingly.
(vect_analyze_loop, vect_transform_loop): Likewise.
* tree-vect-loop-manip.c (vect_do_peeling): Likewise.
* tree-vect-slp.c (can_duplicate_and_interleave_p): Likewise.
(vect_make_slp_decision, vect_slp_bb_region): Likewise.
* tree-vect-stmts.c (get_vectype_for_scalar_type): Likewise.
* tree-vectorizer.c (try_vectorize_loop_1): Likewise.
gcc/testsuite/
* gcc.dg/vect/vect-tail-nomask-1.c: Update expected epilogue
vectorization message.
From-SVN: r278237
|
|
This is another patch in the series to remove the assumption that
all modes involved in vectorisation have to be the same size.
Rather than have the target provide a list of vector sizes,
it makes the target provide a list of vector "approaches",
with each approach represented by a mode.
A later patch will pass this mode to targetm.vectorize.related_mode
to get the vector mode for a given element mode. Until then, the modes
simply act as an alternative way of specifying the vector size.
2019-11-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* target.h (vector_sizes, auto_vector_sizes): Delete.
(vector_modes, auto_vector_modes): New typedefs.
* target.def (autovectorize_vector_sizes): Replace with...
(autovectorize_vector_modes): ...this new hook.
* doc/tm.texi.in (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES):
Replace with...
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): ...this new hook.
* doc/tm.texi: Regenerate.
* targhooks.h (default_autovectorize_vector_sizes): Delete.
(default_autovectorize_vector_modes): New function.
* targhooks.c (default_autovectorize_vector_sizes): Delete.
(default_autovectorize_vector_modes): New function.
* omp-general.c (omp_max_vf): Use autovectorize_vector_modes instead
of autovectorize_vector_sizes. Use the number of units in the mode
to calculate the maximum VF.
* omp-low.c (omp_clause_aligned_alignment): Use
autovectorize_vector_modes instead of autovectorize_vector_sizes.
Use a loop based on related_mode to iterate through all supported
vector modes for a given scalar mode.
* optabs-query.c (can_vec_mask_load_store_p): Use
autovectorize_vector_modes instead of autovectorize_vector_sizes.
* tree-vect-loop.c (vect_analyze_loop, vect_transform_loop): Likewise.
* tree-vect-slp.c (vect_slp_bb_region): Likewise.
* config/aarch64/aarch64.c (aarch64_autovectorize_vector_sizes):
Replace with...
(aarch64_autovectorize_vector_modes): ...this new function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
* config/arc/arc.c (arc_autovectorize_vector_sizes): Replace with...
(arc_autovectorize_vector_modes): ...this new function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
* config/arm/arm.c (arm_autovectorize_vector_sizes): Replace with...
(arm_autovectorize_vector_modes): ...this new function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
* config/i386/i386.c (ix86_autovectorize_vector_sizes): Replace with...
(ix86_autovectorize_vector_modes): ...this new function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
* config/mips/mips.c (mips_autovectorize_vector_sizes): Replace with...
(mips_autovectorize_vector_modes): ...this new function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
From-SVN: r278236
|
|
build_same_sized_truth_vector_type was confusingly named, since for
SVE and AVX512 the returned vector isn't the same byte size (although
it does have the same number of elements). What it really returns
is the "truth" vector type for a given data vector type.
The more general truth_type_for provides the same thing when passed
a vector and IMO has a more descriptive name, so this patch replaces
all uses of build_same_sized_truth_vector_type with that. It does
the same for a call to build_truth_vector_type, leaving truth_type_for
itself as the only remaining caller.
It's then more natural to pass build_truth_vector_type the original
vector type rather than its size and nunits, especially since the
given size isn't the size of the returned vector. This in turn allows
a future patch to simplify the interface of get_mask_mode. Doing this
also fixes a bug in which truth_type_for would pass a size of zero for
BLKmode vector types.
2019-11-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree.h (build_truth_vector_type): Delete.
(build_same_sized_truth_vector_type): Likewise.
* tree.c (build_truth_vector_type): Rename to...
(build_truth_vector_type_for): ...this. Make static and take
a vector type as argument.
(truth_type_for): Update accordingly.
(build_same_sized_truth_vector_type): Delete.
* tree-vect-generic.c (expand_vector_divmod): Use truth_type_for
instead of build_same_sized_truth_vector_type.
* tree-vect-loop.c (vect_create_epilog_for_reduction): Likewise.
(vect_record_loop_mask, vect_get_loop_mask): Likewise.
* tree-vect-patterns.c (build_mask_conversion): Likeise.
* tree-vect-slp.c (vect_get_constant_vectors): Likewise.
* tree-vect-stmts.c (vect_get_vec_def_for_operand): Likewise.
(vect_build_gather_load_calls, vectorizable_call): Likewise.
(scan_store_can_perm_p, vectorizable_scan_store): Likewise.
(vectorizable_store, vectorizable_condition): Likewise.
(get_mask_type_for_scalar_type, get_same_sized_vectype): Likewise.
(vect_get_mask_type_for_stmt): Use truth_type_for instead of
build_truth_vector_type.
* config/aarch64/aarch64-sve-builtins.cc (gimple_folder::convert_pred):
Use truth_type_for instead of build_same_sized_truth_vector_type.
* config/rs6000/rs6000-call.c (fold_build_vec_cmp): Likewise.
gcc/c/
* c-typeck.c (build_conditional_expr): Use truth_type_for instead
of build_same_sized_truth_vector_type.
(build_vec_cmp): Likewise.
gcc/cp/
* call.c (build_conditional_expr_1): Use truth_type_for instead
of build_same_sized_truth_vector_type.
* typeck.c (build_vec_cmp): Likewise.
gcc/d/
* d-codegen.cc (build_boolop): Use truth_type_for instead of
build_same_sized_truth_vector_type.
From-SVN: r278232
|
|
vectorizable_assignment handles true SSA-to-SSA copies (which hopefully
we don't see in practice) and no-op conversions that are required
to maintain correct gimple, such as changes between signed and
unsigned types. These cases shouldn't generate any code and so
shouldn't count against either the scalar or vector costs.
Later patches test this, but it seemed worth splitting out.
2019-11-13 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (vect_nop_conversion_p): Declare.
* tree-vect-stmts.c (vect_nop_conversion_p): New function.
(vectorizable_assignment): Don't add a cost for nop conversions.
* tree-vect-loop.c (vect_compute_single_scalar_iteration_cost):
Likewise.
* tree-vect-slp.c (vect_bb_slp_scalar_cost): Likewise.
From-SVN: r278122
|
|
2019-11-12 Martin Liska <mliska@suse.cz>
* Makefile.in: Remove PARAMS_H and params.list
and params.options.
* params-enum.h: Remove.
* params-list.h: Remove.
* params-options.h: Remove.
* params.c: Remove.
* params.def: Remove.
* params.h: Remove.
* asan.c: Do not include params.h.
* auto-profile.c: Likewise.
* bb-reorder.c: Likewise.
* builtins.c: Likewise.
* cfgcleanup.c: Likewise.
* cfgexpand.c: Likewise.
* cfgloopanal.c: Likewise.
* cgraph.c: Likewise.
* combine.c: Likewise.
* common/config/aarch64/aarch64-common.c: Likewise.
* common/config/gcn/gcn-common.c: Likewise.
* common/config/ia64/ia64-common.c: Likewise.
* common/config/powerpcspe/powerpcspe-common.c: Likewise.
* common/config/rs6000/rs6000-common.c: Likewise.
* common/config/sh/sh-common.c: Likewise.
* config/aarch64/aarch64.c: Likewise.
* config/alpha/alpha.c: Likewise.
* config/arm/arm.c: Likewise.
* config/avr/avr.c: Likewise.
* config/csky/csky.c: Likewise.
* config/i386/i386-builtins.c: Likewise.
* config/i386/i386-expand.c: Likewise.
* config/i386/i386-features.c: Likewise.
* config/i386/i386-options.c: Likewise.
* config/i386/i386.c: Likewise.
* config/ia64/ia64.c: Likewise.
* config/rs6000/rs6000-logue.c: Likewise.
* config/rs6000/rs6000.c: Likewise.
* config/s390/s390.c: Likewise.
* config/sparc/sparc.c: Likewise.
* config/visium/visium.c: Likewise.
* coverage.c: Likewise.
* cprop.c: Likewise.
* cse.c: Likewise.
* cselib.c: Likewise.
* dse.c: Likewise.
* emit-rtl.c: Likewise.
* explow.c: Likewise.
* final.c: Likewise.
* fold-const.c: Likewise.
* gcc.c: Likewise.
* gcse.c: Likewise.
* ggc-common.c: Likewise.
* ggc-page.c: Likewise.
* gimple-loop-interchange.cc: Likewise.
* gimple-loop-jam.c: Likewise.
* gimple-loop-versioning.cc: Likewise.
* gimple-ssa-split-paths.c: Likewise.
* gimple-ssa-sprintf.c: Likewise.
* gimple-ssa-store-merging.c: Likewise.
* gimple-ssa-strength-reduction.c: Likewise.
* gimple-ssa-warn-alloca.c: Likewise.
* gimple-ssa-warn-restrict.c: Likewise.
* graphite-isl-ast-to-gimple.c: Likewise.
* graphite-optimize-isl.c: Likewise.
* graphite-scop-detection.c: Likewise.
* graphite-sese-to-poly.c: Likewise.
* graphite.c: Likewise.
* haifa-sched.c: Likewise.
* hsa-gen.c: Likewise.
* ifcvt.c: Likewise.
* ipa-cp.c: Likewise.
* ipa-fnsummary.c: Likewise.
* ipa-inline-analysis.c: Likewise.
* ipa-inline.c: Likewise.
* ipa-polymorphic-call.c: Likewise.
* ipa-profile.c: Likewise.
* ipa-prop.c: Likewise.
* ipa-split.c: Likewise.
* ipa-sra.c: Likewise.
* ira-build.c: Likewise.
* ira-conflicts.c: Likewise.
* loop-doloop.c: Likewise.
* loop-invariant.c: Likewise.
* loop-unroll.c: Likewise.
* lra-assigns.c: Likewise.
* lra-constraints.c: Likewise.
* modulo-sched.c: Likewise.
* opt-suggestions.c: Likewise.
* opts.c: Likewise.
* postreload-gcse.c: Likewise.
* predict.c: Likewise.
* reload.c: Likewise.
* reorg.c: Likewise.
* resource.c: Likewise.
* sanopt.c: Likewise.
* sched-deps.c: Likewise.
* sched-ebb.c: Likewise.
* sched-rgn.c: Likewise.
* sel-sched-ir.c: Likewise.
* sel-sched.c: Likewise.
* shrink-wrap.c: Likewise.
* stmt.c: Likewise.
* targhooks.c: Likewise.
* toplev.c: Likewise.
* tracer.c: Likewise.
* trans-mem.c: Likewise.
* tree-chrec.c: Likewise.
* tree-data-ref.c: Likewise.
* tree-if-conv.c: Likewise.
* tree-inline.c: Likewise.
* tree-loop-distribution.c: Likewise.
* tree-parloops.c: Likewise.
* tree-predcom.c: Likewise.
* tree-profile.c: Likewise.
* tree-scalar-evolution.c: Likewise.
* tree-sra.c: Likewise.
* tree-ssa-ccp.c: Likewise.
* tree-ssa-dom.c: Likewise.
* tree-ssa-dse.c: Likewise.
* tree-ssa-ifcombine.c: Likewise.
* tree-ssa-loop-ch.c: Likewise.
* tree-ssa-loop-im.c: Likewise.
* tree-ssa-loop-ivcanon.c: Likewise.
* tree-ssa-loop-ivopts.c: Likewise.
* tree-ssa-loop-manip.c: Likewise.
* tree-ssa-loop-niter.c: Likewise.
* tree-ssa-loop-prefetch.c: Likewise.
* tree-ssa-loop-unswitch.c: Likewise.
* tree-ssa-math-opts.c: Likewise.
* tree-ssa-phiopt.c: Likewise.
* tree-ssa-pre.c: Likewise.
* tree-ssa-reassoc.c: Likewise.
* tree-ssa-sccvn.c: Likewise.
* tree-ssa-scopedtables.c: Likewise.
* tree-ssa-sink.c: Likewise.
* tree-ssa-strlen.c: Likewise.
* tree-ssa-structalias.c: Likewise.
* tree-ssa-tail-merge.c: Likewise.
* tree-ssa-threadbackward.c: Likewise.
* tree-ssa-threadedge.c: Likewise.
* tree-ssa-uninit.c: Likewise.
* tree-switch-conversion.c: Likewise.
* tree-vect-data-refs.c: Likewise.
* tree-vect-loop.c: Likewise.
* tree-vect-slp.c: Likewise.
* tree-vrp.c: Likewise.
* tree.c: Likewise.
* value-prof.c: Likewise.
* var-tracking.c: Likewise.
2019-11-12 Martin Liska <mliska@suse.cz>
* gimple-parser.c: Do not include params.h.
2019-11-12 Martin Liska <mliska@suse.cz>
* name-lookup.c: Do not include params.h.
* typeck.c: Likewise.
2019-11-12 Martin Liska <mliska@suse.cz>
* lto-common.c: Do not include params.h.
* lto-partition.c: Likewise.
* lto.c: Likewise.
From-SVN: r278086
|
|
2019-11-12 Martin Liska <mliska@suse.cz>
* asan.c (asan_sanitize_stack_p): Replace old parameter syntax
with the new one, include opts.h if needed. Use SET_OPTION_IF_UNSET
macro.
(asan_sanitize_allocas_p): Likewise.
(asan_emit_stack_protection): Likewise.
(asan_protect_global): Likewise.
(instrument_derefs): Likewise.
(instrument_builtin_call): Likewise.
(asan_expand_mark_ifn): Likewise.
* auto-profile.c (auto_profile): Likewise.
* bb-reorder.c (copy_bb_p): Likewise.
(duplicate_computed_gotos): Likewise.
* builtins.c (inline_expand_builtin_string_cmp): Likewise.
* cfgcleanup.c (try_crossjump_to_edge): Likewise.
(try_crossjump_bb): Likewise.
* cfgexpand.c (defer_stack_allocation): Likewise.
(stack_protect_classify_type): Likewise.
(pass_expand::execute): Likewise.
* cfgloopanal.c (expected_loop_iterations_unbounded): Likewise.
(estimate_reg_pressure_cost): Likewise.
* cgraph.c (cgraph_edge::maybe_hot_p): Likewise.
* combine.c (combine_instructions): Likewise.
(record_value_for_reg): Likewise.
* common/config/aarch64/aarch64-common.c (aarch64_option_validate_param): Likewise.
(aarch64_option_default_params): Likewise.
* common/config/ia64/ia64-common.c (ia64_option_default_params): Likewise.
* common/config/powerpcspe/powerpcspe-common.c (rs6000_option_default_params): Likewise.
* common/config/rs6000/rs6000-common.c (rs6000_option_default_params): Likewise.
* common/config/sh/sh-common.c (sh_option_default_params): Likewise.
* config/aarch64/aarch64.c (aarch64_output_probe_stack_range): Likewise.
(aarch64_allocate_and_probe_stack_space): Likewise.
(aarch64_expand_epilogue): Likewise.
(aarch64_override_options_internal): Likewise.
* config/alpha/alpha.c (alpha_option_override): Likewise.
* config/arm/arm.c (arm_option_override): Likewise.
(arm_valid_target_attribute_p): Likewise.
* config/i386/i386-options.c (ix86_option_override_internal): Likewise.
* config/i386/i386.c (get_probe_interval): Likewise.
(ix86_adjust_stack_and_probe_stack_clash): Likewise.
(ix86_max_noce_ifcvt_seq_cost): Likewise.
* config/ia64/ia64.c (ia64_adjust_cost): Likewise.
* config/rs6000/rs6000-logue.c (get_stack_clash_protection_probe_interval): Likewise.
(get_stack_clash_protection_guard_size): Likewise.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Likewise.
* config/s390/s390.c (allocate_stack_space): Likewise.
(s390_emit_prologue): Likewise.
(s390_option_override_internal): Likewise.
* config/sparc/sparc.c (sparc_option_override): Likewise.
* config/visium/visium.c (visium_option_override): Likewise.
* coverage.c (get_coverage_counts): Likewise.
(coverage_compute_profile_id): Likewise.
(coverage_begin_function): Likewise.
(coverage_end_function): Likewise.
* cse.c (cse_find_path): Likewise.
(cse_extended_basic_block): Likewise.
(cse_main): Likewise.
* cselib.c (cselib_invalidate_mem): Likewise.
* dse.c (dse_step1): Likewise.
* emit-rtl.c (set_new_first_and_last_insn): Likewise.
(get_max_insn_count): Likewise.
(make_debug_insn_raw): Likewise.
(init_emit): Likewise.
* explow.c (compute_stack_clash_protection_loop_data): Likewise.
* final.c (compute_alignments): Likewise.
* fold-const.c (fold_range_test): Likewise.
(fold_truth_andor): Likewise.
(tree_single_nonnegative_warnv_p): Likewise.
(integer_valued_real_single_p): Likewise.
* gcse.c (want_to_gcse_p): Likewise.
(prune_insertions_deletions): Likewise.
(hoist_code): Likewise.
(gcse_or_cprop_is_too_expensive): Likewise.
* ggc-common.c: Likewise.
* ggc-page.c (ggc_collect): Likewise.
* gimple-loop-interchange.cc (MAX_NUM_STMT): Likewise.
(MAX_DATAREFS): Likewise.
(OUTER_STRIDE_RATIO): Likewise.
* gimple-loop-jam.c (tree_loop_unroll_and_jam): Likewise.
* gimple-loop-versioning.cc (loop_versioning::max_insns_for_loop): Likewise.
* gimple-ssa-split-paths.c (is_feasible_trace): Likewise.
* gimple-ssa-store-merging.c (imm_store_chain_info::try_coalesce_bswap): Likewise.
(imm_store_chain_info::coalesce_immediate_stores): Likewise.
(imm_store_chain_info::output_merged_store): Likewise.
(pass_store_merging::process_store): Likewise.
* gimple-ssa-strength-reduction.c (find_basis_for_base_expr): Likewise.
* graphite-isl-ast-to-gimple.c (class translate_isl_ast_to_gimple): Likewise.
(scop_to_isl_ast): Likewise.
* graphite-optimize-isl.c (get_schedule_for_node_st): Likewise.
(optimize_isl): Likewise.
* graphite-scop-detection.c (build_scops): Likewise.
* haifa-sched.c (set_modulo_params): Likewise.
(rank_for_schedule): Likewise.
(model_add_to_worklist): Likewise.
(model_promote_insn): Likewise.
(model_choose_insn): Likewise.
(queue_to_ready): Likewise.
(autopref_multipass_dfa_lookahead_guard): Likewise.
(schedule_block): Likewise.
(sched_init): Likewise.
* hsa-gen.c (init_prologue): Likewise.
* ifcvt.c (bb_ok_for_noce_convert_multiple_sets): Likewise.
(cond_move_process_if_block): Likewise.
* ipa-cp.c (ipcp_lattice::add_value): Likewise.
(merge_agg_lats_step): Likewise.
(devirtualization_time_bonus): Likewise.
(hint_time_bonus): Likewise.
(incorporate_penalties): Likewise.
(good_cloning_opportunity_p): Likewise.
(ipcp_propagate_stage): Likewise.
* ipa-fnsummary.c (decompose_param_expr): Likewise.
(set_switch_stmt_execution_predicate): Likewise.
(analyze_function_body): Likewise.
(compute_fn_summary): Likewise.
* ipa-inline-analysis.c (estimate_growth): Likewise.
* ipa-inline.c (caller_growth_limits): Likewise.
(inline_insns_single): Likewise.
(inline_insns_auto): Likewise.
(can_inline_edge_by_limits_p): Likewise.
(want_early_inline_function_p): Likewise.
(big_speedup_p): Likewise.
(want_inline_small_function_p): Likewise.
(want_inline_self_recursive_call_p): Likewise.
(edge_badness): Likewise.
(recursive_inlining): Likewise.
(compute_max_insns): Likewise.
(early_inliner): Likewise.
* ipa-polymorphic-call.c (csftc_abort_walking_p): Likewise.
* ipa-profile.c (ipa_profile): Likewise.
* ipa-prop.c (determine_known_aggregate_parts): Likewise.
(ipa_analyze_node): Likewise.
(ipcp_transform_function): Likewise.
* ipa-split.c (consider_split): Likewise.
* ipa-sra.c (allocate_access): Likewise.
(process_scan_results): Likewise.
(ipa_sra_summarize_function): Likewise.
(pull_accesses_from_callee): Likewise.
* ira-build.c (loop_compare_func): Likewise.
(mark_loops_for_removal): Likewise.
* ira-conflicts.c (build_conflict_bit_table): Likewise.
* loop-doloop.c (doloop_optimize): Likewise.
* loop-invariant.c (gain_for_invariant): Likewise.
(move_loop_invariants): Likewise.
* loop-unroll.c (decide_unroll_constant_iterations): Likewise.
(decide_unroll_runtime_iterations): Likewise.
(decide_unroll_stupid): Likewise.
(expand_var_during_unrolling): Likewise.
* lra-assigns.c (spill_for): Likewise.
* lra-constraints.c (EBB_PROBABILITY_CUTOFF): Likewise.
* modulo-sched.c (sms_schedule): Likewise.
(DFA_HISTORY): Likewise.
* opts.c (default_options_optimization): Likewise.
(finish_options): Likewise.
(common_handle_option): Likewise.
* postreload-gcse.c (eliminate_partially_redundant_load): Likewise.
(if): Likewise.
* predict.c (get_hot_bb_threshold): Likewise.
(maybe_hot_count_p): Likewise.
(probably_never_executed): Likewise.
(predictable_edge_p): Likewise.
(predict_loops): Likewise.
(expr_expected_value_1): Likewise.
(tree_predict_by_opcode): Likewise.
(handle_missing_profiles): Likewise.
* reload.c (find_equiv_reg): Likewise.
* reorg.c (redundant_insn): Likewise.
* resource.c (mark_target_live_regs): Likewise.
(incr_ticks_for_insn): Likewise.
* sanopt.c (pass_sanopt::execute): Likewise.
* sched-deps.c (sched_analyze_1): Likewise.
(sched_analyze_2): Likewise.
(sched_analyze_insn): Likewise.
(deps_analyze_insn): Likewise.
* sched-ebb.c (schedule_ebbs): Likewise.
* sched-rgn.c (find_single_block_region): Likewise.
(too_large): Likewise.
(haifa_find_rgns): Likewise.
(extend_rgns): Likewise.
(new_ready): Likewise.
(schedule_region): Likewise.
(sched_rgn_init): Likewise.
* sel-sched-ir.c (make_region_from_loop): Likewise.
* sel-sched-ir.h (MAX_WS): Likewise.
* sel-sched.c (process_pipelined_exprs): Likewise.
(sel_setup_region_sched_flags): Likewise.
* shrink-wrap.c (try_shrink_wrapping): Likewise.
* targhooks.c (default_max_noce_ifcvt_seq_cost): Likewise.
* toplev.c (print_version): Likewise.
(process_options): Likewise.
* tracer.c (tail_duplicate): Likewise.
* trans-mem.c (tm_log_add): Likewise.
* tree-chrec.c (chrec_fold_plus_1): Likewise.
* tree-data-ref.c (split_constant_offset): Likewise.
(compute_all_dependences): Likewise.
* tree-if-conv.c (MAX_PHI_ARG_NUM): Likewise.
* tree-inline.c (remap_gimple_stmt): Likewise.
* tree-loop-distribution.c (MAX_DATAREFS_NUM): Likewise.
* tree-parloops.c (MIN_PER_THREAD): Likewise.
(create_parallel_loop): Likewise.
* tree-predcom.c (determine_unroll_factor): Likewise.
* tree-scalar-evolution.c (instantiate_scev_r): Likewise.
* tree-sra.c (analyze_all_variable_accesses): Likewise.
* tree-ssa-ccp.c (fold_builtin_alloca_with_align): Likewise.
* tree-ssa-dse.c (setup_live_bytes_from_ref): Likewise.
(dse_optimize_redundant_stores): Likewise.
(dse_classify_store): Likewise.
* tree-ssa-ifcombine.c (ifcombine_ifandif): Likewise.
* tree-ssa-loop-ch.c (ch_base::copy_headers): Likewise.
* tree-ssa-loop-im.c (LIM_EXPENSIVE): Likewise.
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Likewise.
(try_peel_loop): Likewise.
(tree_unroll_loops_completely): Likewise.
* tree-ssa-loop-ivopts.c (avg_loop_niter): Likewise.
(CONSIDER_ALL_CANDIDATES_BOUND): Likewise.
(MAX_CONSIDERED_GROUPS): Likewise.
(ALWAYS_PRUNE_CAND_SET_BOUND): Likewise.
* tree-ssa-loop-manip.c (can_unroll_loop_p): Likewise.
* tree-ssa-loop-niter.c (MAX_ITERATIONS_TO_TRACK): Likewise.
* tree-ssa-loop-prefetch.c (PREFETCH_BLOCK): Likewise.
(L1_CACHE_SIZE_BYTES): Likewise.
(L2_CACHE_SIZE_BYTES): Likewise.
(should_issue_prefetch_p): Likewise.
(schedule_prefetches): Likewise.
(determine_unroll_factor): Likewise.
(volume_of_references): Likewise.
(add_subscript_strides): Likewise.
(self_reuse_distance): Likewise.
(mem_ref_count_reasonable_p): Likewise.
(insn_to_prefetch_ratio_too_small_p): Likewise.
(loop_prefetch_arrays): Likewise.
(tree_ssa_prefetch_arrays): Likewise.
* tree-ssa-loop-unswitch.c (tree_unswitch_single_loop): Likewise.
* tree-ssa-math-opts.c (gimple_expand_builtin_pow): Likewise.
(convert_mult_to_fma): Likewise.
(math_opts_dom_walker::after_dom_children): Likewise.
* tree-ssa-phiopt.c (cond_if_else_store_replacement): Likewise.
(hoist_adjacent_loads): Likewise.
(gate_hoist_loads): Likewise.
* tree-ssa-pre.c (translate_vuse_through_block): Likewise.
(compute_partial_antic_aux): Likewise.
* tree-ssa-reassoc.c (get_reassociation_width): Likewise.
* tree-ssa-sccvn.c (vn_reference_lookup_pieces): Likewise.
(vn_reference_lookup): Likewise.
(do_rpo_vn): Likewise.
* tree-ssa-scopedtables.c (avail_exprs_stack::lookup_avail_expr): Likewise.
* tree-ssa-sink.c (select_best_block): Likewise.
* tree-ssa-strlen.c (new_stridx): Likewise.
(new_addr_stridx): Likewise.
(get_range_strlen_dynamic): Likewise.
(class ssa_name_limit_t): Likewise.
* tree-ssa-structalias.c (push_fields_onto_fieldstack): Likewise.
(create_variable_info_for_1): Likewise.
(init_alias_vars): Likewise.
* tree-ssa-tail-merge.c (find_clusters_1): Likewise.
(tail_merge_optimize): Likewise.
* tree-ssa-threadbackward.c (thread_jumps::profitable_jump_thread_path): Likewise.
(thread_jumps::fsm_find_control_statement_thread_paths): Likewise.
(thread_jumps::find_jump_threads_backwards): Likewise.
* tree-ssa-threadedge.c (record_temporary_equivalences_from_stmts_at_dest): Likewise.
* tree-ssa-uninit.c (compute_control_dep_chain): Likewise.
* tree-switch-conversion.c (switch_conversion::check_range): Likewise.
(jump_table_cluster::can_be_handled): Likewise.
* tree-switch-conversion.h (jump_table_cluster::case_values_threshold): Likewise.
(SWITCH_CONVERSION_BRANCH_RATIO): Likewise.
(param_switch_conversion_branch_ratio): Likewise.
* tree-vect-data-refs.c (vect_mark_for_runtime_alias_test): Likewise.
(vect_enhance_data_refs_alignment): Likewise.
(vect_prune_runtime_alias_test_list): Likewise.
* tree-vect-loop.c (vect_analyze_loop_costing): Likewise.
(vect_get_datarefs_in_loop): Likewise.
(vect_analyze_loop): Likewise.
* tree-vect-slp.c (vect_slp_bb): Likewise.
* tree-vectorizer.h: Likewise.
* tree-vrp.c (find_switch_asserts): Likewise.
(vrp_prop::check_mem_ref): Likewise.
* tree.c (wide_int_to_tree_1): Likewise.
(cache_integer_cst): Likewise.
* var-tracking.c (EXPR_USE_DEPTH): Likewise.
(reverse_op): Likewise.
(vt_find_locations): Likewise.
2019-11-12 Martin Liska <mliska@suse.cz>
* gimple-parser.c (c_parser_parse_gimple_body): Replace old parameter syntax
with the new one, include opts.h if needed. Use SET_OPTION_IF_UNSET
macro.
2019-11-12 Martin Liska <mliska@suse.cz>
* name-lookup.c (namespace_hints::namespace_hints): Replace old parameter syntax
with the new one, include opts.h if needed. Use SET_OPTION_IF_UNSET
macro.
* typeck.c (comptypes): Likewise.
2019-11-12 Martin Liska <mliska@suse.cz>
* lto-partition.c (lto_balanced_map): Replace old parameter syntax
with the new one, include opts.h if needed. Use SET_OPTION_IF_UNSET
macro.
* lto.c (do_whole_program_analysis): Likewise.
From-SVN: r278085
|
|
This initializes the rstmt variable with NULL and adds an assert to
check that a value has been given by one of the if cases before use.
This fixes the bootstrap failure due to -Werror.
Committed under the gcc obvious rule.
gcc/ChangeLog:
* tree-vect-slp.c (vectorize_slp_instance_root_stmt): Initialize rstmt.
From-SVN: r277788
|
|
gcc/ChangeLog:
2019-11-04 Joel Hutton <Joel.Hutton@arm.com>
* expr.c (store_constructor): Modify to handle single element vectors.
* tree-vect-slp.c (vect_analyze_slp_instance): Add case for vector
constructors.
(vect_slp_check_for_constructors): New function.
(vect_slp_analyze_bb_1): Call new function to check for vector
constructors.
(vectorize_slp_instance_root_stmt): New function.
(vect_schedule_slp): Call new function to vectorize root stmt of vector
constructors.
* tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field.
gcc/testsuite/ChangeLog:
2019-11-04 Joel Hutton <Joel.Hutton@arm.com>
* gcc.dg/vect/bb-slp-40.c: New test.
* gcc.dg/vect/bb-slp-41.c: New test.
From-SVN: r277784
|
|
2019-10-30 Richard Biener <rguenther@suse.de>
PR tree-optimization/65930
* tree-vect-loop.c (vect_is_simple_reduction): For reduction
chains also allow a leading and trailing conversion.
* tree-vect-slp.c (vect_get_and_check_slp_defs): Handle
intermediate reduction chains.
(vect_analyze_slp_instance): Likewise. Build a SLP
node for a trailing conversion manually.
* gcc.dg/vect/pr65930-2.c: New testcase.
From-SVN: r277603
|
|
2019-10-29 Richard Biener <rguenther@suse.de>
PR tree-optimization/92260
* tree-vect-slp.c (vect_get_constant_vectors): Special-case
lane-reducing ops.
* gcc.dg/pr92260.c: New testcase.
From-SVN: r277571
|
|
vect_stmt_to_vectorize))
2019-10-28 Richard Biener <rguenther@suse.de>
PR tree-optimization/92252
* tree-vect-slp.c (vect_get_and_check_slp_defs): Adjust
STMT_VINFO_REDUC_IDX when swapping operands.
* gcc.dg/torture/pr92252.c: New testcase.
From-SVN: r277517
|
|
gimple-expr.c:86)
2019-10-25 Richard Biener <rguenther@suse.de>
PR tree-optimization/92222
* tree-vect-slp.c (_slp_oprnd_info::first_pattern): Remove.
(_slp_oprnd_info::second_pattern): Likewise.
(_slp_oprnd_info::any_pattern): New.
(vect_create_oprnd_info): Adjust.
(vect_get_and_check_slp_defs): Compute whether any stmt is
in a pattern.
(vect_build_slp_tree_2): Avoid building up a node from scalars
if any of the operand defs, not just the first, is in a pattern.
* gcc.dg/torture/pr92222.c: New testcase.
From-SVN: r277448
|
|
actually have to modify the IL on a shared stmt.
2019-10-25 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_get_and_check_slp_defs): Only fail
swapping if we actually have to modify the IL on a shared stmt.
(vect_build_slp_tree_2): Never fail swapping on shared stmts
because we no longer modify the IL.
From-SVN: r277446
|
|
harder with operand swapping and instead of putting a...
2019-10-24 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_get_and_check_slp_defs): For reduction
chains try harder with operand swapping and instead of
putting a shifted chain into the reduction operands put
a repetition of the final reduction op there as if we'd
reassociate the expression.
* gcc.dg/vect/slp-reduc-10a.c: New testcase.
* gcc.dg/vect/slp-reduc-10b.c: Likewise.
* gcc.dg/vect/slp-reduc-10c.c: Likewise.
* gcc.dg/vect/slp-reduc-10d.c: Likewise.
* gcc.dg/vect/slp-reduc-10e.c: Likewise.
From-SVN: r277406
|
|
try to handle the reduction as part of...
2019-10-24 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_analyze_slp): When reduction group
SLP discovery fails try to handle the reduction as part
of SLP reduction discovery.
* gcc.dg/vect/slp-reduc-9.c: New testcase.
From-SVN: r277366
|
|
case there's a constant operand in its definition.
2019-10-23 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_build_slp_tree_2): Do not build
op from scalars in case there's a constant operand in its
definition.
From-SVN: r277308
|
|
2019-10-22 Richard Biener <rguenther@suse.de>
PR tree-optimization/92173
* tree-vect-loop.c (vectorizable_reduction): If
vect_transform_reduction cannot handle code-generation try without
the single-def-use-cycle optimization. Pass optab_vector to
optab_for_tree_code to get vector shifts as that's what we'd
generate.
* gcc.dg/torture/pr92173.c: New testcase.
From-SVN: r277288
|
|
r277235 was a bit too mechanical and ended up introducing use
after free bugs in both loop and SLP vectorisation.
2019-10-22 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-slp.c (vect_slp_bb_region): Check whether
autodetected_vector_size rather than vector_size is zero.
* tree-vect-loop.c (vect_analyze_loop): Likewise.
Set autodetected_vector_size immediately after calling
vect_analyze_loop_2. Check for a fatal error before advancing
next_size.
From-SVN: r277282
|
|
2019-10-21 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_slp_tree::ops): New member.
(SLP_TREE_SCALAR_OPS): New.
(vect_get_slp_defs): Adjust prototype.
* tree-vect-slp.c (vect_free_slp_tree): Release
SLP_TREE_SCALAR_OPS.
(vect_create_new_slp_node): Initialize it. New overload for
initializing by an operands array.
(_slp_oprnd_info::ops): New member.
(vect_create_oprnd_info): Initialize it.
(vect_free_oprnd_info): Release it.
(vect_get_and_check_slp_defs): Populate the operands array.
Do not swap operands in the IL when not necessary.
(vect_build_slp_tree_2): Build SLP nodes for invariant operands.
Record SLP_TREE_SCALAR_OPS for all invariant nodes. Also
swap operands in the operands array. Do not swap operands in
the IL.
(vect_slp_rearrange_stmts): Re-arrange SLP_TREE_SCALAR_OPS as well.
(vect_gather_slp_loads): Fix.
(vect_detect_hybrid_slp_stmts): Likewise.
(vect_slp_analyze_node_operations_1): Search for a internal
def child for computing reduction SLP_TREE_NUMBER_OF_VEC_STMTS.
(vect_slp_analyze_node_operations): Skip ops-only stmts for
the def-type push/pop dance.
(vect_get_constant_vectors): Compute number_of_vectors here.
Use SLP_TREE_SCALAR_OPS and simplify greatly.
(vect_get_slp_vect_defs): Use gimple_get_lhs also for PHIs.
(vect_get_slp_defs): Simplify greatly.
* tree-vect-loop.c (vectorize_fold_left_reduction): Simplify.
(vect_transform_reduction): Likewise.
* tree-vect-stmts.c (vect_get_vec_defs): Simplify.
(vectorizable_call): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_comparison): Likewise.
From-SVN: r277241
|