[SCEV] Teach SCEVExpander to expand BinPow

Current implementation of SCEVExpander demonstrates a very naive behavior when it deals with power calculation. For example, a SCEV for x^8 looks like (x * x * x * x * x * x * x * x) If we try to expand it, it generates a very straightforward sequence of muls, like: x2 = mul x, x x3 = mul x2, x x4 = mul x3, x ... x8 = mul x7, x This is a non-efficient way of doing that. A better way is to generate a sequence of binary power calculation. In this case the expanded calculation will look like: x2 = mul x, x x4 = mul x2, x2 x8 = mul x4, x4 In some cases the code size reduction for such SCEVs is dramatic. If we had a loop: x = a; for (int i = 0; i < 3; i++) x = x * x; And this loop have been fully unrolled, we have something like: x = a; x2 = x * x; x4 = x2 * x2; x8 = x4 * x4; The SCEV for x8 is the same as in example above, and if we for some reason want to expand it, we will generate naively 7 multiplications instead of 3. The BinPow expansion algorithm here allows to keep code size reasonable. This patch teaches SCEV Expander to generate a sequence of BinPow multiplications if we have repeating arguments in SCEVMulExpressions. Differential Revision: https://reviews.llvm.org/D34025 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305663 91177308-0d34-0410-b5e6-96231b3b80d8
author: Max Kazantsev <max.kazantsev@azul.com> 2017-06-19 06:24:53 +0000
committer: Max Kazantsev <max.kazantsev@azul.com> 2017-06-19 06:24:53 +0000
commit: 9026c3d3fa0f2087ab47736c931d8ae7470d9e36 (patch)
tree: ea60c801098f30c335fe1c2df85794710c5911a0 /lib/Analysis/ScalarEvolutionExpander.cpp
parent: 4039181fbec25e2827d656aa2c8454b0f981dfe0 (diff)
1 files changed, 43 insertions, 5 deletions
diff --git a/lib/Analysis/ScalarEvolutionExpander.cpp b/lib/Analysis/ScalarEvolutionExpander.cpp
index f9b9df2bc70..47bdac00ae1 100644
--- a/lib/Analysis/ScalarEvolutionExpander.cpp
+++ b/lib/Analysis/ScalarEvolutionExpander.cpp
@@ -748,18 +748,56 @@ Value *SCEVExpander::visitMulExpr(const SCEVMulExpr *S) {
   // Emit instructions to mul all the operands. Hoist as much as possible
   // out of loops.
   Value *Prod = nullptr;
-  for (const auto &I : OpsAndLoops) {
-    const SCEV *Op = I.second;
+  auto I = OpsAndLoops.begin();
+
+  // Expand the calculation of X pow N in the following manner:
+  // Let N = P1 + P2 + ... + PK, where all P are powers of 2. Then:
+  // X pow N = (X pow P1) * (X pow P2) * ... * (X pow PK).
+  const auto ExpandOpBinPowN = [this, &I, &OpsAndLoops, &Ty]() {
+    auto E = I;
+    // Calculate how many times the same operand from the same loop is included
+    // into this power.
+    uint64_t Exponent = 0;
+    const uint64_t MaxExponent = UINT64_MAX >> 1;
+    // No one sane will ever try to calculate such huge exponents, but if we
+    // need this, we stop on UINT64_MAX / 2 because we need to exit the loop
+    // below when the power of 2 exceeds our Exponent, and we want it to be
+    // 1u << 31 at most to not deal with unsigned overflow.
+    while (E != OpsAndLoops.end() && *I == *E && Exponent != MaxExponent) {
+      ++Exponent;
+      ++E;
+    }
+    assert(Exponent > 0 && "Trying to calculate a zeroth exponent of operand?");
+
+    // Calculate powers with exponents 1, 2, 4, 8 etc. and include those of them
+    // that are needed into the result.
+    Value *P = expandCodeFor(I->second, Ty);
+    Value *Result = nullptr;
+    if (Exponent & 1)
+      Result = P;
+    for (uint64_t BinExp = 2; BinExp <= Exponent; BinExp <<= 1) {
+      P = InsertBinop(Instruction::Mul, P, P);
+      if (Exponent & BinExp)
+        Result = Result ? InsertBinop(Instruction::Mul, Result, P) : P;
+    }
+
+    I = E;
+    assert(Result && "Nothing was expanded?");
+    return Result;
+  };
+
+  while (I != OpsAndLoops.end()) {
     if (!Prod) {
       // This is the first operand. Just expand it.
-      Prod = expand(Op);
-    } else if (Op->isAllOnesValue()) {
+      Prod = ExpandOpBinPowN();
+    } else if (I->second->isAllOnesValue()) {
       // Instead of doing a multiply by negative one, just do a negate.
       Prod = InsertNoopCastOfTo(Prod, Ty);
       Prod = InsertBinop(Instruction::Sub, Constant::getNullValue(Ty), Prod);
+      ++I;
     } else {
       // A simple mul.
-      Value *W = expandCodeFor(Op, Ty);
+      Value *W = ExpandOpBinPowN();
       Prod = InsertNoopCastOfTo(Prod, Ty);
       // Canonicalize a constant to the RHS.
       if (isa<Constant>(Prod)) std::swap(Prod, W);
author	Max Kazantsev <max.kazantsev@azul.com>	2017-06-19 06:24:53 +0000
committer	Max Kazantsev <max.kazantsev@azul.com>	2017-06-19 06:24:53 +0000
commit	9026c3d3fa0f2087ab47736c931d8ae7470d9e36 (patch)
tree	ea60c801098f30c335fe1c2df85794710c5911a0 /lib/Analysis/ScalarEvolutionExpander.cpp
parent	4039181fbec25e2827d656aa2c8454b0f981dfe0 (diff)