diff options
author | Amara Emerson <amara.emerson@arm.com> | 2017-05-09 10:43:25 +0000 |
---|---|---|
committer | Amara Emerson <amara.emerson@arm.com> | 2017-05-09 10:43:25 +0000 |
commit | 8f1f7ce9d1817231ea4a3d7ecd625bb06ca69d21 (patch) | |
tree | e161247c9b8cd5e01963c2ca8f77c726de4f3dcc /docs | |
parent | 64dbd2ef25d5f73decd5114501f7fa5c69be6b21 (diff) |
Introduce experimental generic intrinsics for horizontal vector reductions.
- This change allows targets to opt-in to using them instead of the log2
shufflevector algorithm.
- The SLP and Loop vectorizers have the common code to do shuffle reductions
factored out into LoopUtils, and now have a unified interface for generating
reductions regardless of the preference of the target. LoopUtils now uses TTI
to determine what kind of reductions the target wants to handle.
- For CodeGen, basic legalization support is added.
Differential Revision: https://reviews.llvm.org/D30086
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302514 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs')
-rw-r--r-- | docs/LangRef.rst | 332 |
1 files changed, 332 insertions, 0 deletions
diff --git a/docs/LangRef.rst b/docs/LangRef.rst index dad99e3352d..dc6350c5520 100644 --- a/docs/LangRef.rst +++ b/docs/LangRef.rst @@ -11687,6 +11687,338 @@ Examples: %r2 = call float @llvm.fmuladd.f32(float %a, float %b, float %c) ; yields float:r2 = (a * b) + c + +Experimental Vector Reduction Intrinsics +---------------------------------------- + +Horizontal reductions of vectors can be expressed using the following +intrinsics. Each one takes a vector operand as an input and applies its +respective operation across all elements of the vector, returning a single +scalar result of the same element type. + + +'``llvm.experimental.vector.reduce.add.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i32 @llvm.experimental.vector.reduce.add.i32.v4i32(<4 x i32> %a) + declare i64 @llvm.experimental.vector.reduce.add.i64.v2i64(<2 x i64> %a) + +Overview: +""""""""" + +The '``llvm.experimental.vector.reduce.add.*``' intrinsics do an integer ``ADD`` +reduction of a vector, returning the result as a scalar. The return type matches +the element-type of the vector input. + +Arguments: +"""""""""" +The argument to this intrinsic must be a vector of integer values. + +'``llvm.experimental.vector.reduce.fadd.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare float @llvm.experimental.vector.reduce.fadd.f32.v4f32(float %acc, <4 x float> %a) + declare double @llvm.experimental.vector.reduce.fadd.f64.v2f64(double %acc, <2 x double> %a) + +Overview: +""""""""" + +The '``llvm.experimental.vector.reduce.fadd.*``' intrinsics do a floating point +``ADD`` reduction of a vector, returning the result as a scalar. The return type +matches the element-type of the vector input. + +If the intrinsic call has fast-math flags, then the reduction will not preserve +the associativity of an equivalent scalarized counterpart. If it does not have +fast-math flags, then the reduction will be *ordered*, implying that the +operation respects the associativity of a scalarized reduction. + + +Arguments: +"""""""""" +The first argument to this intrinsic is a scalar accumulator value, which is +only used when there are no fast-math flags attached. This argument may be undef +when fast-math flags are used. + +The second argument must be a vector of floating point values. + +Examples: +""""""""" + +.. code-block:: llvm + + %fast = call fast float @llvm.experimental.vector.reduce.fadd.f32.v4f32(float undef, <4 x float> %input) ; fast reduction + %ord = call float @llvm.experimental.vector.reduce.fadd.f32.v4f32(float %acc, <4 x float> %input) ; ordered reduction + + +'``llvm.experimental.vector.reduce.mul.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i32 @llvm.experimental.vector.reduce.mul.i32.v4i32(<4 x i32> %a) + declare i64 @llvm.experimental.vector.reduce.mul.i64.v2i64(<2 x i64> %a) + +Overview: +""""""""" + +The '``llvm.experimental.vector.reduce.mul.*``' intrinsics do an integer ``MUL`` +reduction of a vector, returning the result as a scalar. The return type matches +the element-type of the vector input. + +Arguments: +"""""""""" +The argument to this intrinsic must be a vector of integer values. + +'``llvm.experimental.vector.reduce.fmul.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare float @llvm.experimental.vector.reduce.fmul.f32.v4f32(float %acc, <4 x float> %a) + declare double @llvm.experimental.vector.reduce.fmul.f64.v2f64(double %acc, <2 x double> %a) + +Overview: +""""""""" + +The '``llvm.experimental.vector.reduce.fmul.*``' intrinsics do a floating point +``MUL`` reduction of a vector, returning the result as a scalar. The return type +matches the element-type of the vector input. + +If the intrinsic call has fast-math flags, then the reduction will not preserve +the associativity of an equivalent scalarized counterpart. If it does not have +fast-math flags, then the reduction will be *ordered*, implying that the +operation respects the associativity of a scalarized reduction. + + +Arguments: +"""""""""" +The first argument to this intrinsic is a scalar accumulator value, which is +only used when there are no fast-math flags attached. This argument may be undef +when fast-math flags are used. + +The second argument must be a vector of floating point values. + +Examples: +""""""""" + +.. code-block:: llvm + + %fast = call fast float @llvm.experimental.vector.reduce.fmul.f32.v4f32(float undef, <4 x float> %input) ; fast reduction + %ord = call float @llvm.experimental.vector.reduce.fmul.f32.v4f32(float %acc, <4 x float> %input) ; ordered reduction + +'``llvm.experimental.vector.reduce.and.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i32 @llvm.experimental.vector.reduce.and.i32.v4i32(<4 x i32> %a) + +Overview: +""""""""" + +The '``llvm.experimental.vector.reduce.and.*``' intrinsics do a bitwise ``AND`` +reduction of a vector, returning the result as a scalar. The return type matches +the element-type of the vector input. + +Arguments: +"""""""""" +The argument to this intrinsic must be a vector of integer values. + +'``llvm.experimental.vector.reduce.or.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i32 @llvm.experimental.vector.reduce.or.i32.v4i32(<4 x i32> %a) + +Overview: +""""""""" + +The '``llvm.experimental.vector.reduce.or.*``' intrinsics do a bitwise ``OR`` reduction +of a vector, returning the result as a scalar. The return type matches the +element-type of the vector input. + +Arguments: +"""""""""" +The argument to this intrinsic must be a vector of integer values. + +'``llvm.experimental.vector.reduce.xor.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i32 @llvm.experimental.vector.reduce.xor.i32.v4i32(<4 x i32> %a) + +Overview: +""""""""" + +The '``llvm.experimental.vector.reduce.xor.*``' intrinsics do a bitwise ``XOR`` +reduction of a vector, returning the result as a scalar. The return type matches +the element-type of the vector input. + +Arguments: +"""""""""" +The argument to this intrinsic must be a vector of integer values. + +'``llvm.experimental.vector.reduce.smax.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i32 @llvm.experimental.vector.reduce.smax.i32.v4i32(<4 x i32> %a) + +Overview: +""""""""" + +The '``llvm.experimental.vector.reduce.smax.*``' intrinsics do a signed integer +``MAX`` reduction of a vector, returning the result as a scalar. The return type +matches the element-type of the vector input. + +Arguments: +"""""""""" +The argument to this intrinsic must be a vector of integer values. + +'``llvm.experimental.vector.reduce.smin.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i32 @llvm.experimental.vector.reduce.smin.i32.v4i32(<4 x i32> %a) + +Overview: +""""""""" + +The '``llvm.experimental.vector.reduce.smin.*``' intrinsics do a signed integer +``MIN`` reduction of a vector, returning the result as a scalar. The return type +matches the element-type of the vector input. + +Arguments: +"""""""""" +The argument to this intrinsic must be a vector of integer values. + +'``llvm.experimental.vector.reduce.umax.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i32 @llvm.experimental.vector.reduce.umax.i32.v4i32(<4 x i32> %a) + +Overview: +""""""""" + +The '``llvm.experimental.vector.reduce.umax.*``' intrinsics do an unsigned +integer ``MAX`` reduction of a vector, returning the result as a scalar. The +return type matches the element-type of the vector input. + +Arguments: +"""""""""" +The argument to this intrinsic must be a vector of integer values. + +'``llvm.experimental.vector.reduce.umin.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i32 @llvm.experimental.vector.reduce.umin.i32.v4i32(<4 x i32> %a) + +Overview: +""""""""" + +The '``llvm.experimental.vector.reduce.umin.*``' intrinsics do an unsigned +integer ``MIN`` reduction of a vector, returning the result as a scalar. The +return type matches the element-type of the vector input. + +Arguments: +"""""""""" +The argument to this intrinsic must be a vector of integer values. + +'``llvm.experimental.vector.reduce.fmax.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare float @llvm.experimental.vector.reduce.fmax.f32.v4f32(<4 x float> %a) + declare double @llvm.experimental.vector.reduce.fmax.f64.v2f64(<2 x double> %a) + +Overview: +""""""""" + +The '``llvm.experimental.vector.reduce.fmax.*``' intrinsics do a floating point +``MAX`` reduction of a vector, returning the result as a scalar. The return type +matches the element-type of the vector input. + +If the intrinsic call has the ``nnan`` fast-math flag then the operation can +assume that NaNs are not present in the input vector. + +Arguments: +"""""""""" +The argument to this intrinsic must be a vector of floating point values. + +'``llvm.experimental.vector.reduce.fmin.*``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare float @llvm.experimental.vector.reduce.fmin.f32.v4f32(<4 x float> %a) + declare double @llvm.experimental.vector.reduce.fmin.f64.v2f64(<2 x double> %a) + +Overview: +""""""""" + +The '``llvm.experimental.vector.reduce.fmin.*``' intrinsics do a floating point +``MIN`` reduction of a vector, returning the result as a scalar. The return type +matches the element-type of the vector input. + +If the intrinsic call has the ``nnan`` fast-math flag then the operation can +assume that NaNs are not present in the input vector. + +Arguments: +"""""""""" +The argument to this intrinsic must be a vector of floating point values. + Half Precision Floating Point Intrinsics ---------------------------------------- |