ampere-computing/llvm.git - LLVM including Ampere Computing toolchain specific patches

Age	Commit message (Collapse)	Author
2017-12-19	[ARM] Register the Thumb2SizeReducePass. NFC	David Green
	Also adds a simple test case. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321072 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-07	[CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register.	Francis Visoiu Mistrih
	Work towards the unification of MIR and debug output by refactoring the interfaces. For MachineOperand::print, keep a simple version that can be easily called from `dump()`, and a more complex one which will be called from both the MIRPrinter and MachineInstr::print. Add extra checks inside MachineOperand for detached operands (operands with getParent() == nullptr). https://reviews.llvm.org/D40836 * find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/<def>//g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<def[ ],[ ]dead>/dead \1/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ],[ ]dead>/implicit-def dead \1/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name "*.s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g' git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320022 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-04	[CodeGen] Unify MBB reference format in both MIR and debug output	Francis Visoiu Mistrih
	As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" \) -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(\1)/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" \) -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g' * find . \( -name ".txt" -o -name ".s" -o -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" \) -type f -print0 \| xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319665 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-30	[CodeGen] Always use `printReg` to print registers in both MIR and debug	Francis Visoiu Mistrih
	output As part of the unification of the debug format and the MIR format, always use `printReg` to print all kinds of registers. Updated the tests using '_' instead of '%noreg' until we decide which one we want to be the default one. Differential Revision: https://reviews.llvm.org/D40421 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319445 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-13	[arm] Fix Unnecessary reloads from GOT.	Evgeniy Stepanov
	Summary: This fixes PR35221. Use pseudo-instructions to let MachineCSE hoist global address computation. Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D39871 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318081 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-26	[ARM] Honor -mfloat-abi for libcall calling convention	Eli Friedman
	As far as I can tell, this matches gcc: -mfloat-abi determines the calling convention for all functions except those explicitly defined as soft-float in the ARM RTABI. This change only affects cases where the user specifies -mfloat-abi to override the default calling convention derived from the target triple. Fixes https://bugs.llvm.org//show_bug.cgi?id=34530. Differential Revision: https://reviews.llvm.org/D38299 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316708 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-25	[ARM] Swap cmp operands for automatic shifts	Sam Parker
	Swap the compare operands if the lhs is a shift and the rhs isn't, as in arm and T2 the shift can be performed by the compare for its second operand. Differential Revision: https://reviews.llvm.org/D39004 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316562 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-22	[ARM] Call setBooleanContents(ZeroOrOneBooleanContent)	Renato Golin
	The ARM backend should call setBooleanContents so that it can use known bits to make some optimizations. Review: D35821 Patch by Joel Galenson <jgalenson@google.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311446 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-15	[Dominators] Include infinite loops in PostDominatorTree	Jakub Kuderski
	Summary: This patch teaches PostDominatorTree about infinite loops. It is built on top of D29705 by @dberlin which includes a very detailed motivation for this change. What's new is that the patch also teaches the incremental updater how to deal with reverse-unreachable regions and how to properly maintain and verify tree roots. Before that, the incremental algorithm sometimes ended up preserving reverse-unreachable regions after updates that wouldn't appear in the tree if it was constructed from scratch on the same CFG. This patch makes the following assumptions: - A sequence of updates should produce the same tree as a recalculating it. - Any sequence of the same updates should lead to the same tree. - Siblings and roots are unordered. The last two properties are essential to efficiently perform batch updates in the future. When it comes to the first one, we can decide later that the consistency between freshly built tree and an updated one doesn't matter match, as there are many correct ways to pick roots in infinite loops, and to relax this assumption. That should enable us to recalculate postdominators less frequently. This patch is pretty conservative when it comes to incremental updates on reverse-unreachable regions and ends up recalculating the whole tree in many cases. It should be possible to improve the performance in many cases, if we decide that it's important enough. That being said, my experiments showed that reverse-unreachable are very rare in the IR emitted by clang when bootstrapping clang. Here are the statistics I collected by analyzing IR between passes and after each removePredecessor call: ``` # functions: 52283 # samples: 337609 # reverse unreachable BBs: 216022 # BBs: 247840796 Percent reverse-unreachable: 0.08716159869015269 % Max(PercRevUnreachable) in a function: 87.58620689655172 % # > 25 % samples: 471 ( 0.1395104988314885 % samples ) ... in 145 ( 0.27733680163724345 % functions ) ``` Most of the reverse-unreachable regions come from invalid IR where it wouldn't be possible to construct a PostDomTree anyway. I would like to commit this patch in the next week in order to be able to complete the work that depends on it before the end of my internship, so please don't wait long to voice your concerns :). Reviewers: dberlin, sanjoy, grosser, brzycki, davide, chandlerc, hfinkel Reviewed By: dberlin Subscribers: nhaehnle, javed.absar, kparzysz, uabelho, jlebar, hiraditya, llvm-commits, dberlin, david2050 Differential Revision: https://reviews.llvm.org/D35851 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310940 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-01	ARM: Do not use llc -march in tests.	Matthias Braun
	`llc -march` is problematic because it only switches the target architecture, but leaves the operating system unchanged. This occasionally leads to indeterministic tests because the OS from LLVM_DEFAULT_TARGET_TRIPLE is used. However we can simply always use `llc -mtriple` instead. This changes all the tests to do this to avoid people using -march when they copy and paste parts of tests. See also the discussion in https://reviews.llvm.org/D35287 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309755 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-12	[ARM] Adjust ifcvt heuristic for the diamond ifcvt case	John Brawn
	When we have a diamond ifcvt the fallthough block will have a branch at the end of it that disappears when predicated, so discount it from the predication cost. Differential Revision: https://reviews.llvm.org/D34952 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307788 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-28	[ARM] Improve if-conversion for M-class CPUs without branch predictors	John Brawn
	The current heuristic in isProfitableToIfCvt assumes we have a branch predictor, and so gives the wrong answer in some cases when we don't. This patch adds a subtarget feature to indicate that a subtarget has no branch predictor, and changes the heuristic in isProfitableToiIfCvt when it's present. This gives a slight overall improvement in a set of embedded benchmarks on Cortex-M4 and Cortex-M33. Differential Revision: https://reviews.llvm.org/D34398 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306547 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-28	[ARM] Make -mcpu=generic schedule for an in-order core (Cortex-A8).	Kristof Beyls
	The benchmarking summarized in http://lists.llvm.org/pipermail/llvm-dev/2017-May/113525.html showed this is beneficial for a wide range of cores. As is to be expected, quite a few small adaptations are needed to the regressions tests, as the difference in scheduling results in: - Quite a few small instruction schedule differences. - A few changes in register allocation decisions caused by different instruction schedules. - A few changes in IfConversion decisions, due to a difference in instruction schedule and/or the estimated cost of a branch mispredict. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306514 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-22	Don't conditionalize Neon instructions, even in IT blocks.	Kristof Beyls
	This has been deprecated since ARMARM v7-AR, release C.b, published back in 2012. This also removes test/CodeGen/Thumb2/ifcvt-neon.ll that originally was introduced to check that conditionalization of Neon instructions did happen when generating Thumb2. However, the test had evolved and was no longer testing that. Rather than trying to adapt that test, this commit introduces test/CodeGen/Thumb2/ifcvt-neon-deprecated.mir, since we can now use the MIR framework to write nicer/more maintainable tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305998 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-30	MIR: remove explicit "noVRegs" property.	Tim Northover
	We can infer this from the incoming MIR, so there's no reason to represent it with a special flag. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304246 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-16	Elide stores which are overwritten without being observed.	Nirav Dave
	Summary: In SelectionDAG, when a store is immediately chained to another store to the same address, elide the first store as it has no observable effects. This is causes small improvements dealing with intrinsics lowered to stores. Test notes: * Many testcases overwrite store addresses multiple times and needed minor changes, mainly making stores volatile to prevent the optimization from optimizing the test away. * Many X86 test cases optimized out instructions associated with associated with va_start. * Note that test_splat in CodeGen/AArch64/misched-stp.ll no longer has dependencies to check and can probably be removed and potentially replaced with another test. Reviewers: rnk, john.brawn Subscribers: aemerson, rengolin, qcolombet, jyknight, nemanjai, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33206 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303198 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-15	[ShrinkWrapping] Handle restores on no-return paths	Francis Visoiu Mistrih
	Shrink-wrapping uses post-dominators to find a restore point that post-dominates all the uses of CSR / stack. The way dominator trees are modeled in LLVM today is that unreachable blocks are not present in a generic dominator tree, so, an unreachable node is dominated by anything: include/llvm/Support/GenericDomTree.h:467. Since for post-dominators, a no-return block is considered "unreachable", calling findNearestCommonDominator on an unreachable node A and a non-unreachable node B, will return B, which can be false. If we find such node, we bail out since there is no good restore point available. rdar://problem/30186931 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303130 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-10	Add address space mangling to lifetime intrinsics	Matt Arsenault
	In preparation for allowing allocas to have non-0 addrspace. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299876 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-06	[ARM] Remove a dead ADD during the creation of TBBs	David Green
	During the optimisation of jump tables in the constant island pass, an extra ADD could be left over, now dead but not removed. Differential Revision: https://reviews.llvm.org/D31389 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299634 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-09	[ARM] Remove t2xtpk feature from tests	Sam Parker
	I previously removed the T2XtPk feature from the ARM backend, but it looks like I missed some of the tests that were using the feature. Differential Revision: https://reviews.llvm.org/D30778 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297386 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-17	[ARM] Replace HasT2ExtractPack with HasDSP	Sam Parker
	Removed the HasT2ExtractPack feature and replaced its references with HasDSP. This then allows the Thumb2 extend instructions to be selected for ARMv8M +dsp. These instruction descriptions have also been refactored and more target tests have been added for their isel. Differential Revision: https://reviews.llvm.org/D29623 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295452 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13	[ARM] Use VCMP, not VCMPE, for floating point equality comparisons	James Molloy
	When generating a floating point comparison we currently unconditionally generate VCMPE. This has the sideeffect of setting the cumulative Invalid bit in FPSCR if any of the operands are QNaN. It is expected that use of a relational predicate on a QNaN value should raise Invalid. Quoting from the C standard: The relational and equality operators support the usual mathematical relationships between numeric values. For any ordered pair of numeric values exactly one of relationships the less, greater, equal and is true. Relational operators may raise the floating-point exception when argument values are NaNs. The standard doesn't explicitly state the expectation for equality operators, but the implication and obvious expectation is that equality operators should not raise Invalid on a QNaN input, as those predicates are wholly defined on unordered inputs (to return not equal). Therefore, add a new operand to ARMISD::FPCMP and FPCMPZ indicating if QNaN should raise Invalid, and pipe that through to TableGen. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294945 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-31	CodeGen: Allow small copyable blocks to "break" the CFG.	Kyle Butt
	When choosing the best successor for a block, ordinarily we would have preferred a block that preserves the CFG unless there is a strong probability the other direction. For small blocks that can be duplicated we now skip that requirement as well, subject to some simple frequency calculations. Differential Revision: https://reviews.llvm.org/D28583 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293716 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-31	[ARM] Avoid using ARM instructions in Thumb mode	Sam Parker
	The Requires class overrides the target requirements of an instruction, rather than adding to them, so all ARM instructions need to include the IsARM predicate when they have overwitten requirements. This caused the swp and swpb instructions to be allowed in thumb mode assembly, and the ARM encoding of CDP to be selected in codegen (which is different for conditional instructions). Differential Revision: https://reviews.llvm.org/D29283 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293634 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-13	ARM: match GCC's behaviour for builtins	Saleem Abdulrasool
	GCC changes the CC between the user-code and the builtins based on the value of `-target` rather than `-mfloat-abi`. When a HF target is used, the VFP variant of the AAPCS CC is used. Otherwise, the AAPCS variant is used. In all cases, the AEABI functions use the AAPCS CC. Adjust the calling convention based on the target. Resolves PR30543! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291909 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-11	Revert "CodeGen: Allow small copyable blocks to "break" the CFG."	Kyle Butt
	This reverts commit ada6595a526d71df04988eb0a4b4fe84df398ded. This needs a simple probability check because there are some cases where it is not profitable. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291695 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-10	CodeGen: Allow small copyable blocks to "break" the CFG.	Kyle Butt
	When choosing the best successor for a block, ordinarily we would have preferred a block that preserves the CFG unless there is a strong probability the other direction. For small blocks that can be duplicated we now skip that requirement as well. Differential revision: https://reviews.llvm.org/D27742 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291609 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-23	Make the canonicalisation on shifts benifit to more case.	Zijiao Ma
	1.Fix pessimized case in FIXME. 2.Add tests for it. 3.The canonicalisation on shifts results in different sequence for tests of machine-licm.Correct some check lines. Differential Revision: https://reviews.llvm.org/D27916 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290410 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-15	[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently	Sjoerd Meijer
	This is essentially a recommit of r285893, but with a correctness fix. The problem of the original commit was that this: bic r5, r7, #31 cbz r5, .LBB2_10 got rewritten into: lsrs r5, r7, #5 beq .LBB2_10 The result in destination register r5 is not the same and this is incorrect when r5 is not dead. So this fix includes checking the uses of the AND destination register. And also, compared to the original commit, some regression tests didn't need changing anymore because of this extra check. For completeness, this was the original commit message: For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. Differential Revision: https://reviews.llvm.org/D27761 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289794 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-03	Revert "[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently"	James Molloy
	This reverts commit r285893. It caused (probably) http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/83 . git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285912 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-03	[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently	James Molloy
	This recommits r281323, which was backed out for two reasons. One, a selfhost failure, and two, it apparently caused Chromium failures. Actually, the latter was a red herring. The log has expired from the former, but I suspect that was a red herring too (actually caused by another problematic patch of mine). Therefore reapplying, and will watch the bots like a hawk. For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285893 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-01	[Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables	James Molloy
	[Reapplying r284580 and r285917 with fix and testing to ensure emitted jump tables for Thumb-1 have 4-byte alignment] The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions. It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size. TBB example: Before: lsls r0, r0, #2 After: add r0, pc adr r1, .LJTI0_0 ldrb r0, [r0, #6] ldr r0, [r0, r1] lsls r0, r0, #1 mov pc, r0 add pc, r0 => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4. The only case that can increase dynamic instruction count is the TBH case: Before: lsls r0, r4, #2 After: lsls r4, r4, #1 adr r1, .LJTI0_0 add r4, pc ldr r0, [r0, r1] ldrh r4, [r4, #6] mov pc, r0 lsls r4, r4, #1 add pc, r4 => 1 more instruction in prologue. Jump table shrunk by a factor of 2. So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285690 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-24	Revert r284580+r284917. ("Synthesize TBB/TBH instructions")	Eli Friedman
	The optimization has correctness issues, so reverting for now to fix tests on thumb1 targets. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284993 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-19	[Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables	James Molloy
	The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions. It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size. TBB example: Before: lsls r0, r0, #2 After: add r0, pc adr r1, .LJTI0_0 ldrb r0, [r0, #6] ldr r0, [r0, r1] lsls r0, r0, #1 mov pc, r0 add pc, r0 => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4. The only case that can increase dynamic instruction count is the TBH case: Before: lsls r0, r4, #2 After: lsls r4, r4, #1 adr r1, .LJTI0_0 add r4, pc ldr r0, [r0, r1] ldrh r4, [r4, #6] mov pc, r0 lsls r4, r4, #1 add pc, r4 => 1 more instruction in prologue. Jump table shrunk by a factor of 2. So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284580 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-11	Re-land "[Thumb] Save/restore high registers in Thumb1 pro/epilogues"	Reid Kleckner
	Reverts r283938 to reinstate r283867 with a fix. The original change had an ArrayRef referring to a destroyed temporary initializer list. Use plain C arrays instead. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283942 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-11	Revert "[Thumb] Save/restore high registers in Thumb1 pro/epilogues"	Reid Kleckner
	This reverts r283867. This appears to be an infinite loop: while (HiRegToSave != AllHighRegs.end() && CopyReg != AllCopyRegs.end()) { if (HiRegsToSave.count(*HiRegToSave)) { ... CopyReg = findNextOrderedReg(++CopyReg, CopyRegs, AllCopyRegs.end()); HiRegToSave = findNextOrderedReg(++HiRegToSave, HiRegsToSave, AllHighRegs.end()); } } git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283938 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-11	[Thumb] Save/restore high registers in Thumb1 pro/epilogues	Oliver Stannard
	The high registers are not allocatable in Thumb1 functions, but they could still be used by inline assembly, so we need to save and restore the callee-saved high registers (r8-r11) in the prologue and epilogue. This is complicated by the fact that the Thumb1 push and pop instructions cannot access these registers. Therefore, we have to move them down into low registers before pushing, and move them back after popping into low registers. In most functions, we will have low registers that are also being pushed/popped, which we can use as the temporary registers for saving/restoring the high registers. However, this is not guaranteed, so we may need to push some extra low registers to ensure that the high registers can be saved/restored. For correctness, it would be sufficient to use just one low register, but if we have enough low registers available then we only need one push/pop instruction, rather than one per high register. We can also use the argument/return registers when they are not live, and the link register when saving (but not restoring), reducing the number of extra registers we need to push. There are still a few extreme edge cases where we need two push/pop instructions, because not enough low registers can be made live in the prologue or epilogue. In addition to the regression tests included here, I've also tested this using a script to generate functions which clobber different combinations of registers, have different numbers of argument and return registers (including variadic arguments), allocate different fixed sized objects on the stack, and do or don't use variable sized allocas and the __builtin_return_address intrinsic (all of which affect the available registers in the prologue and epilogue). I ran these functions in a test harness which verifies that all of the callee-saved registers are correctly preserved. Differential Revision: https://reviews.llvm.org/D24228 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283867 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-14	Revert "[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently"	James Molloy
	This reverts commit r281323. It caused chromium test failures and a selfhost failure. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281451 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-13	[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently	James Molloy
	For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281323 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-12	Revert r281215, it caused PR30358.	Nico Weber
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281263 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-12	[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently	James Molloy
	For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281215 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-09	[Thumb] Select (CMPZ X, -C) -> (CMPZ (ADDS X, C), 0)	James Molloy
	The CMPZ #0 disappears during peepholing, leaving just a tADDi3, tADDi8 or t2ADDri. This avoids having to materialize the expensive negative constant in Thumb-1, and allows a shrinking from a 32-bit CMN to a 16-bit ADDS in Thumb-2. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281040 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-07	CodeGen: ensure that libcalls are always AAPCS CC	Saleem Abdulrasool
	The original commit was too aggressive about marking LibCalls as AAPCS. The libcalls contain libc/libm/libunwind calls which are not AAPCS, but C. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280833 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-07	Revert "CodeGen: ensure that libcalls are always AAPCS CC"	Saleem Abdulrasool
	This reverts SVN r280683. Revert until I figure out why this is breaking lli tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280778 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-06	CodeGen: ensure that libcalls are always AAPCS CC	Saleem Abdulrasool
	All of the builtins are designed to be invoked with ARM AAPCS CC even on ARM AAPCS VFP CC hosts. Tweak the default initialisation to ARM AAPCS CC rather than C CC for ARM/thumb targets. The changes to the tests are necessary to ensure that the calling convention for the lowered library calls are honoured. Furthermore, these adjustments cause certain branch invocations to change to branch-and-link since the returned value needs to be moved across registers (d0 -> r0, r1). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280683 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-02	IfConversion: Fix bug introduced by rescanning diamonds.	Kyle Butt
	Passing the wrong values for predicate-clobbering. Simple to miss. Added an assert to make this easier to catch in the future. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280517 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-24	CodeGen: If Convert blocks that would form a diamond when tail-merged.	Kyle Butt
	The following function currently relies on tail-merging for if conversion to succeed. The common tail of cond_true and cond_false is extracted, and this then forms a diamond pattern that can be successfully if converted. If this block does not get extracted, either because tail-merging is disabled or the threshold is higher, we should still recognize this pattern and if-convert it. Fixed a regression in the original commit. Need to un-reverse branches after reversing them, or other conversions go awry. define i32 @t2(i32 %a, i32 %b) nounwind { entry: %tmp1434 = icmp eq i32 %a, %b ; <i1> [#uses=1] br i1 %tmp1434, label %bb17, label %bb.outer bb.outer: ; preds = %cond_false, %entry %b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ] %a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ] br label %bb bb: ; preds = %cond_true, %bb.outer %indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ] %tmp. = sub i32 0, %b_addr.021.0.ph %tmp.40 = mul i32 %indvar, %tmp. %a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph %tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph br i1 %tmp3, label %cond_true, label %cond_false cond_true: ; preds = %bb %tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph %tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph %indvar.next = add i32 %indvar, 1 br i1 %tmp1437, label %bb17, label %bb cond_false: ; preds = %bb %tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0 %tmp14 = icmp eq i32 %a_addr.026.0, %tmp10 br i1 %tmp14, label %bb17, label %bb.outer bb17: ; preds = %cond_false, %cond_true, %entry %a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ] ret i32 %a_addr.026.1 } Without tail-merging or diamond-tail if conversion: LBB1_1: @ %bb @ =>This Inner Loop Header: Depth=1 cmp r0, r1 ble LBB1_3 @ BB#2: @ %cond_true @ in Loop: Header=BB1_1 Depth=1 subs r0, r0, r1 cmp r1, r0 it ne cmpne r0, r1 bgt LBB1_4 LBB1_3: @ %cond_false @ in Loop: Header=BB1_1 Depth=1 subs r1, r1, r0 cmp r1, r0 bne LBB1_1 LBB1_4: @ %bb17 bx lr With diamond-tail if conversion, but without tail-merging: @ BB#0: @ %entry cmp r0, r1 it eq bxeq lr LBB1_1: @ %bb @ =>This Inner Loop Header: Depth=1 cmp r0, r1 ite le suble r1, r1, r0 subgt r0, r0, r1 cmp r1, r0 bne LBB1_1 @ BB#2: @ %bb17 bx lr git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279671 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-24	IfConversion: Rescan diamonds.	Kyle Butt
	The cost of predicating a diamond is only the instructions that are not shared between the two branches. Additionally If a predicate clobbering instruction occurs in the shared portion of the branches (e.g. a cond move), it may still be possible to if convert the sub-cfg. This change handles these two facts by rescanning the non-shared portion of a diamond sub-cfg to recalculate both the predication cost and whether both blocks are pred-clobbering. Fixed 2 bugs before recommitting. Branch instructions must be compared and found identical before diamond conversion. Also, predicate-clobbering instructions in the shared prefix disqualifies a potential diamond conversion. Includes tests for both. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279670 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-23	[ARM] Generate consistent frame records for Thumb2	Oliver Stannard
	There is not an official documented ABI for frame pointers in Thumb2, but we should try to emit something which is useful. We use r7 as the frame pointer for Thumb code, which currently means that if a function needs to save a high register (r8-r11), it will get pushed to the stack between the frame pointer (r7) and link register (r14). This means that while a stack unwinder can follow the chain of frame pointers up the stack, it cannot know the offset to lr, so does not know which functions correspond to the stack frames. To fix this, we need to push the callee-saved registers in two batches, with the first push saving the low registers, fp and lr, and the second push saving the high registers. This is already implemented, but previously only used for iOS. This patch turns it on for all Thumb2 targets when frame pointers are required by the ABI, and the frame pointer is r7 (Windows uses r11, so this isn't a problem there). If frame pointer elimination is enabled we still emit a single push/pop even if we need a frame pointer for other reasons, to avoid increasing code size. We must also ensure that lr is pushed to the stack when using a frame pointer, so that we end up with a complete frame record. Situations that could cause this were rare, because we already push lr in most situations so that we can return using the pop instruction. Differential Revision: https://reviews.llvm.org/D23516 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279506 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-19	Revert "CodeGen: If Convert blocks that would form a diamond when tail-merged."	Kyle Butt
	This reverts commit 0fda93481c4231c06b838ef476c0c404c51ff875. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279288 91177308-0d34-0410-b5e6-96231b3b80d8