ampere-computing/llvm.git - LLVM including Ampere Computing toolchain specific patches

Age	Commit message (Collapse)	Author
2018-06-15	Added XGene model	Martin Elshuber

2018-04-09	Merging r322319:	Tom Stellard
	------------------------------------------------------------------------ r322319 \| matze \| 2018-01-11 14:30:43 -0800 (Thu, 11 Jan 2018) \| 7 lines PeepholeOptimizer: Fix for vregs without defs The PeepholeOptimizer would fail for vregs without a definition. If this was caused by an undef operand abort to keep the code simple (so we don't need to add logic everywhere to replicate the undef flag). Differential Revision: https://reviews.llvm.org/D40763 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@329619 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-20	Merging r325525:	Hans Wennborg
	------------------------------------------------------------------------ r325525 \| steven_wu \| 2018-02-19 20:22:28 +0100 (Mon, 19 Feb 2018) \| 13 lines bitcode support change for fast flags compatibility Summary: The discussion and as per need, each vendor needs a way to keep the old fast flags and the new fast flags in the auto upgrade path of the IR upgrader. This revision addresses that issue. Patched by Michael Berg Reviewers: qcolombet, hans, steven_wu Reviewed By: qcolombet, steven_wu Subscribers: dexonsmith, vsk, mehdi_amini, andrewrk, MatzeB, wristow, spatel Differential Revision: https://reviews.llvm.org/D43253 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325592 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-19	Merging r324195:	Hans Wennborg
	------------------------------------------------------------------------ r324195 \| mcrosier \| 2018-02-04 16:42:24 +0100 (Sun, 04 Feb 2018) \| 12 lines [LV] Use Demanded Bits and ValueTracking for reduction type-shrinking The type-shrinking logic in reduction detection, although narrow in scope, is also rather ad-hoc, which has led to bugs (e.g., PR35734). This patch modifies the approach to rely on the demanded bits and value tracking analyses, if available. We currently perform type-shrinking separately for reductions and other instructions in the loop. Long-term, we should probably think about computing minimal bit widths in a more complete way for the loops we want to vectorize. PR35734 Differential Revision: https://reviews.llvm.org/D42309 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325508 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-19	Merging r325168:	Hans Wennborg
	------------------------------------------------------------------------ r325168 \| rksimon \| 2018-02-14 21:43:47 +0100 (Wed, 14 Feb 2018) \| 1 line Removed superfluous semicolon to fix -Wpedantic gcc warning. NFCI. ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325498 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-16	Merging r325139:	Hans Wennborg
	------------------------------------------------------------------------ r325139 \| rafael \| 2018-02-14 17:34:27 +0100 (Wed, 14 Feb 2018) \| 12 lines Store defined macros in MCContext. So that macros defined in inline assembly blocks are available to the whole file. This provides a consistent behavior with other assembly directives, since equations for example are already preserved between inline assembly blocks. PR: 36110 Patch by Roger! ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325330 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-14	Revert r319778 (and r319911) due to PR36357	Hans Wennborg
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325112 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-14	Merging r324962: (only the first hunk; see PR36375)	Hans Wennborg
	------------------------------------------------------------------------ r324962 \| kuhar \| 2018-02-13 00:37:27 +0100 (Tue, 13 Feb 2018) \| 16 lines [Dominators] Always recalculate postdominators when update yields different roots Summary: This patch makes postdominators always recalculate the tree when an update causes to change the tree roots. As @dmgreen noticed in [[ https://reviews.llvm.org/D41298 \| D41298 ]], the previous implementation was not conservative enough and it was possible to end up with a PostDomTree that was different than a freshly computed one. The patch also compares postdominators with a freshly computed tree at the end of full verification to make sure we don't hit similar issues in the future. This should (ideally) be also backported to 6.0 before the release, although I don't have any reports of this causing an observable error. It should be safe to do it even if it's late in the release, as the change only makes the current behavior more conservative. Reviewers: dmgreen, dberlin, davide, brzycki, grosser Reviewed By: brzycki, grosser Subscribers: llvm-commits, dmgreen Differential Revision: https://reviews.llvm.org/D43140 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325108 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-02	Merging r323908:	Hans Wennborg
	------------------------------------------------------------------------ r323908 \| mareko \| 2018-01-31 21:18:04 +0100 (Wed, 31 Jan 2018) \| 7 lines AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16} Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D41663 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324103 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-02	Merging r323857:	Hans Wennborg
	------------------------------------------------------------------------ r323857 \| rogfer01 \| 2018-01-31 10:23:43 +0100 (Wed, 31 Jan 2018) \| 19 lines [ARM] Allow the scheduler to clone a node with glue to avoid a copy CPSR ↔ GPR. In Thumb 1, with the new ADDCARRY / SUBCARRY the scheduler may need to do copies CPSR ↔ GPR but not all Thumb1 targets implement them. The schedule can attempt, before attempting a copy, to clone the instructions but it does not currently do that for nodes with input glue. In this patch we introduce a target-hook to let the hook decide if a glued machinenode is still eligible for copying. In this case these are ARM::tADCS and ARM::tSBCS . As a follow-up of this change we should actually implement the copies for the Thumb1 targets that do implement them and restrict the hook to the targets that can't really do such copy as these clones are not ideal. This change fixes PR35836. Differential Revision: https://reviews.llvm.org/D42051 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324082 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-02	Merging r323155:	Hans Wennborg
	------------------------------------------------------------------------ r323155 \| chandlerc \| 2018-01-22 23:05:25 +0100 (Mon, 22 Jan 2018) \| 133 lines Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took in the speculative execution, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` in addition to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile all libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we strongly recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324067 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-30	Merging r323331:	Hans Wennborg
	------------------------------------------------------------------------ r323331 \| spatel \| 2018-01-24 16:20:37 +0100 (Wed, 24 Jan 2018) \| 21 lines [ValueTracking] add recursion depth param to matchSelectPattern We're getting bug reports: https://bugs.llvm.org/show_bug.cgi?id=35807 https://bugs.llvm.org/show_bug.cgi?id=35840 https://bugs.llvm.org/show_bug.cgi?id=36045 ...where we blow up the stack in value tracking because other passes are sending in selects that have an operand that is itself the select. We don't currently have a reliable way to avoid analyzing dead code that may take non-standard forms, so bail out when things go too far. This mimics the recursion depth limitations in other parts of value tracking. Unfortunately, this pushes the underlying problems for other passes (jump-threading, simplifycfg, correlated-propagation) into hiding. If someone wants to uncover those again, the first draft of this patch on Phab would do that (it would assert rather than bail out). Differential Revision: https://reviews.llvm.org/D42442 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323737 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-30	Merging r322108, r322123 and r322131:	Hans Wennborg
	------------------------------------------------------------------------ r322108 \| rafael \| 2018-01-09 20:29:33 +0100 (Tue, 09 Jan 2018) \| 3 lines Make one of the emitFill methods non virtual. NFC. This is just preparatory work to fix PR35858. ------------------------------------------------------------------------ ------------------------------------------------------------------------ r322123 \| rafael \| 2018-01-09 22:55:10 +0100 (Tue, 09 Jan 2018) \| 3 lines Don't create MCFillFragment directly. Instead use higher level APIs that take care of most bookkeeping. ------------------------------------------------------------------------ ------------------------------------------------------------------------ r322131 \| rafael \| 2018-01-09 23:48:37 +0100 (Tue, 09 Jan 2018) \| 4 lines Use a MCExpr for the size of MCFillFragment. This allows the size to be found during ralaxation. This fixes pr35858. ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323735 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-22	Merging r323034:	Hans Wennborg
	------------------------------------------------------------------------ r323034 \| dmgreen \| 2018-01-20 11:29:37 +0100 (Sat, 20 Jan 2018) \| 9 lines [Dominators] Fix some edge cases for PostDomTree updating These fix some odd cfg cases where batch-updating the post dom tree fails. Usually around infinite loops and roots ending up being different. Differential Revision: https://reviews.llvm.org/D42247 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323121 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-22	Merging r322993:	Hans Wennborg
	------------------------------------------------------------------------ r322993 \| kuhar \| 2018-01-19 22:27:24 +0100 (Fri, 19 Jan 2018) \| 16 lines [Dominators] Visit affected node candidates found at different root levels Summary: This patch attempts to fix the DomTree incremental insertion bug found here [[ https://bugs.llvm.org/show_bug.cgi?id=35969 \| PR35969 ]] . When performing an insertion into a piece of unreachable CFG, we may find the same not at different levels. When this happens, the node can turn out to be affected when we find it starting from a node with a lower level in the tree. The level at which we start visitation affects if we consider a node affected or not. This patch tracks the lowest level at which each node was visited during insertion and allows it to be visited multiple times, if it can cause it to be considered affected. Reviewers: brzycki, davide, dberlin, grosser Reviewed By: brzycki Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42231 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323110 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-17	Merging r322003:	Hans Wennborg
	------------------------------------------------------------------------ r322003 \| niravd \| 2018-01-08 08:21:35 -0800 (Mon, 08 Jan 2018) \| 11 lines [DAG] Teach BaseIndexOffset to correctly handle with indexed operations BaseIndexOffset address analysis incorrectly ignores offsets folded into indexed memory operations causing potential errors in alias analysis of pre-indexed operations. Reviewers: efriedma, RKSimon, hfinkel, jyknight Subscribers: hiraditya, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D41701 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@322693 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-17	Merging r321751, r321806, and r321878:	Hans Wennborg
	------------------------------------------------------------------------ r321751 \| arsenm \| 2018-01-03 10:45:37 -0800 (Wed, 03 Jan 2018) \| 25 lines StructurizeCFG: Fix broken backedge detection The work order was changed in r228186 from SCC order to RPO with an arbitrary sorting function. The sorting function attempted to move inner loop nodes earlier. This was was apparently relying on an assumption that every block in a given loop / the same loop depth would be seen before visiting another loop. In the broken testcase, a block outside of the loop was encountered before moving onto another block in the same loop. The testcase would then structurize such that one blocks unconditional successor could never be reached. Revert to plain RPO for the analysis phase. This fixes detecting edges as backedges that aren't really. The processing phase does use another visited set, and I'm unclear on whether the order there is as important. An arbitrary order doesn't work, and triggers some infinite loops. The reversed RPO list seems to work and is closer to the order that was used before, minus the arbitary custom sorting. A few of the changed tests now produce smaller code, and a few are slightly worse looking. ------------------------------------------------------------------------ ------------------------------------------------------------------------ r321806 \| arsenm \| 2018-01-04 09:23:24 -0800 (Thu, 04 Jan 2018) \| 4 lines StructurizeCFG: xfail one of the testcases from r321751 It fails with -verify-region-info. This seems to be a issue with RegionInfo itself which existed before. ------------------------------------------------------------------------ ------------------------------------------------------------------------ r321878 \| arsenm \| 2018-01-05 09:51:36 -0800 (Fri, 05 Jan 2018) \| 4 lines RegionInfo: Use report_fatal_error instead of llvm_unreachable Otherwise when using -verify-region-info in a release build the error won't be emitted. ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@322686 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-17	Merging r321870, r321872, and r321994:	Hans Wennborg
	------------------------------------------------------------------------ r321870 \| abataev \| 2018-01-05 07:20:40 -0800 (Fri, 05 Jan 2018) \| 1 line [SLP] Update test checks, NFC. ------------------------------------------------------------------------ ------------------------------------------------------------------------ r321872 \| abataev \| 2018-01-05 08:15:17 -0800 (Fri, 05 Jan 2018) \| 1 line [SLP] Update more test checks, NFC. ------------------------------------------------------------------------ ------------------------------------------------------------------------ r321994 \| abataev \| 2018-01-08 06:43:06 -0800 (Mon, 08 Jan 2018) \| 13 lines [SLP] Fix PR35777: Incorrect handling of aggregate values. Summary: Fixes the bug with incorrect handling of InsertValue\|InsertElement instrucions in SLP vectorizer. Currently, we may use incorrect ExtractElement instructions as the operands of the original InsertValue\|InsertElement instructions. Reviewers: mkuper, hfinkel, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41767 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@322675 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-03	Fix incorrect documentation comment left after r321692	Alex Bradbury
	TargetRegistryInfo::createMCAsmBackend no longer takes a TheTriple parameter. The majory of the TargetRegistryInfo::create* functions have no or very limitied per-parameter doc comments, and adding a comment for the MCSubtargetInfo, MCRegisterInfo and MCTargetOptions parameters seems like it would add no real value beyond reading the function signature. As such, I've just deleted the doc comment for TheTriple. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321694 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-03	Thread MCSubtargetInfo through Target::createMCAsmBackend	Alex Bradbury
	Currently it's not possible to access MCSubtargetInfo from a TgtMCAsmBackend. D20830 threaded an MCSubtargetInfo reference through MCAsmBackend::relaxInstruction, but this isn't the only function that would benefit from access. This patch removes the Triple and CPUString arguments from createMCAsmBackend and replaces them with MCSubtargetInfo. This patch just changes the interface without making any intentional functional changes. Once in, several cleanups are possible: * Get rid of the awkward MCSubtargetInfo handling in ARMAsmBackend * Support 16-bit instructions when valid in MipsAsmBackend::writeNopData * Get rid of the CPU string parsing in X86AsmBackend and just use a SubtargetFeature for HasNopl * Emit 16-bit nops in RISCVAsmBackend::writeNopData if the compressed instruction set extension is enabled (see D41221) This change initially exposed PR35686, which has since been resolved in r321026. Differential Revision: https://reviews.llvm.org/D41349 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321692 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-02	[AArch64][GlobalISel] Enable GlobalISel at -O0 by default	Amara Emerson
	Tests updated to explicitly use fast-isel at -O0 instead of implicitly. This change also allows an explicit -fast-isel option to override an implicitly enabled global-isel. Otherwise -fast-isel would have no effect at -O0. Differential Revision: https://reviews.llvm.org/D41362 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321655 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-02	NFC. Add description comments to Function header	Dmitry Venikov
	Reviewers: ruiu, davidxl, silvas, brzycki Reviewed By: brzycki Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41609 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321648 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-30	Added support for reading configuration files	Serge Pavlov
	Configuration file is read as a response file in which file names in the nested constructs `@file` are resolved relative to the directory where the including file resides. Lines in which the first non-whitespace character is '#' are considered as comments and are skipped. Trailing backslashes are used to concatenate lines in the same way as they are used in shell scripts. Differential Revision: https://reviews.llvm.org/D24926 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321586 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-30	Reverted 321580: Added support for reading configuration files	Serge Pavlov
	It caused buildbot fails. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321582 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-30	Added support for reading configuration files	Serge Pavlov
	Configuration file is read as a response file in which file names in the nested constructs `@file` are resolved relative to the directory where the including file resides. Lines in which the first non-whitespace character is '#' are considered as comments and are skipped. Trailing backslashes are used to concatenate lines in the same way as they are used in shell scripts. Differential Revision: https://reviews.llvm.org/D24926 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321580 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-29	AMDGPU: Use unique PSVs for buffer resources	Matt Arsenault
	Also fixes using the wrong memory type for some intrinsics when custom lowering them. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321557 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-29	AMDGPU: Implement getTgtMemIntrinsic for images	Matt Arsenault
	Currently all images are lowered to have a single image PseudoSourceValue. Image stores happen to have overly strict mayLoad/mayStore/hasSideEffects flags set on them, so this happens to work. When these are fixed to be correct, the scheduler breaks this because the identical PSVs are assumed to be the same address. These need to be unique to the image resource value. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321555 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-29	[PM] pass -debug-pass-manager flag into FunctionToLoopPassAdaptor's ↵	Fedor Sergeev
	canonicalization PM Summary: New pass manager driver passes DebugPM (-debug-pass-manager) flag into individual PassManager constructors in order to enable debug logging. FunctionToLoopPassAdaptor has its own internal LoopCanonicalizationPM which never gets its debug logging enabled and that means canonicalization passes like LoopSimplify are never present in -debug-pass-manager output. Extending FunctionToLoopPassAdaptor's constructor and createFunctionToLoopPassAdaptor wrapper with an optional boolean DebugLogging argument. Passing debug-logging flags there as appropriate. Reviewers: chandlerc, davide Reviewed By: davide Subscribers: mehdi_amini, eraman, llvm-commits, JDevlieghere Differential Revision: https://reviews.llvm.org/D41586 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321548 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-28	[KnownBits] Remove asserts from KnownBits::makeNegative/makeNonNegative	Craig Topper
	Many of the callers don't guarantee there is no conflict before calling these and instead check for conflicts later. The makeNegative/makeNonNegative methods replaced Known.One.setSignBit() and Known.Zero.setSignBit() calls that didn't have asserts originally. So removing the asserts is no worse than the original code. Fixes PR35769 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321539 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-28	Remove superfluous copies in sample profiling.	Benjamin Kramer
	No functionliaty change intended. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321530 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-28	Revert r321377, it causes regression to https://reviews.llvm.org/P8055.	Guozhi Wei
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321528 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-28	Revert "[memcpyopt] Teach memcpyopt to optimize across basic blocks"	Reid Kleckner
	This reverts r321138. It seems there are still underlying issues with memdep. PR35519 seems to still be present if debug info is enabled. We end up losing a memcpy. Somehow during store to memset merging, we insert the memset after the memcpy or fail to update the memdep analysis to account for the newly inserted memset of a pair. Reduced test case: #include <assert.h> #include <stdio.h> #include <string> #include <utility> #include <vector> void do_push_back( std::vector<std::pair<std::string, std::vector<std::string>>>* crls) { crls->push_back(std::make_pair(std::string(), std::vector<std::string>())); } int __attribute__((optnone)) main() { // Put some data in the vector and then remove it so we take the push_back // fast path. std::vector<std::pair<std::string, std::vector<std::string>>> crl_set; crl_set.push_back({"asdf", {}}); crl_set.pop_back(); printf("first word in vector storage: %p\n", (void)crl_set.data()); // Do the push_back which may fail to initialize the data. do_push_back(&crl_set); auto first = &crl_set.back().first; printf("first word in vector storage (should be zero): %p\n", (void*)crl_set.data()); assert(first->empty()); puts("ok"); } Compile with libc++, enable optimizations, and enable debug info: $ clang++ -stdlib=libc++ -g -O2 t.cpp -o t.exe -Wl,-rpath=llvm/build/lib This program will assert with this change. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321510 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-28	AMDGPU: Add MMO to atomic_inc/dec	Matt Arsenault
	This doesn't really change anything because these already had custom node wrappers. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321508 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-27	[NFC] Extract out a helper function for SimplifyCall(CS, Q)	Philip Reames
	This simplifies code, but the real motivation is that it lets me clean up some downstream code. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321466 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-25	COFF: fix IMAGE_FILE_MACHINE_AM33	Martell Malone
	PE COFF spec value is 0x1D3 not 0x13 https://msdn.microsoft.com/en-us/library/windows/desktop/ms680547(v=vs.85).aspx git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321447 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-22	[SimplifyCFG] Don't do if-conversion if there is a long dependence chain	Guozhi Wei
	If after if-conversion, most of the instructions in this new BB construct a long and slow dependence chain, it may be slower than cmp/branch, even if the branch has a high miss rate, because the control dependence is transformed into data dependence, and control dependence can be speculated, and thus, the second part can execute in parallel with the first part on modern OOO processor. This patch checks for the long dependence chain, and give up if-conversion if find one. Differential Revision: https://reviews.llvm.org/D39352 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321377 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-22	[ThinLTO][CachePruning] explicitly disable pruning	Ben Dunbobbin
	In https://reviews.llvm.org/rL321077 and https://reviews.llvm.org/D41231 I fixed a regression in the c-api which prevented the pruning from being effectively disabled. However this approach, helpfully recommended by @labath, is cleaner. It is also nice to remove the weasel words about effectively disabling from the api comments. Differential Revision: https://reviews.llvm.org/D41497 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321376 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-22	(Re-landing) Expose a TargetMachine::getTargetTransformInfo function	Sanjoy Das
	Re-land r321234. It had to be reverted because it broke the shared library build. The shared library build broke because there was a missing LLVMBuild dependency from lib/Passes (which calls TargetMachine::getTargetIRAnalysis) to lib/Target. As far as I can tell, this problem was always there but was somehow masked before (perhaps because TargetMachine::getTargetIRAnalysis was a virtual function). Original commit message: This makes the TargetMachine interface a bit simpler. We still need the std::function in TargetIRAnalysis to avoid having to add a dependency from Analysis to Target. See discussion: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html I avoided adding all of the backend owners to this review since the change is simple, but let me know if you feel differently about this. Reviewers: echristo, MatzeB, hfinkel Reviewed By: hfinkel Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D41464 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321375 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-22	Rewrite the cached map used for locating the most precise DIE among	Chandler Carruth
	inlined subroutines for a given address. This is essentially the hot path of llvm-symbolizer when extracting inlined frames during symbolization. Previously, we would read every subprogram and every inlined subroutine, building a std::map across the entire PC space to the best DIE, and then do only a handful of queries as we symbolized a backtrace. A huge fraction of the time was spent building the map itself. This patch changes it two a two-level system. First, we just build a map from PC-interval to DWARF subprograms. These are required to be disjoint and so constructing this is pretty easy. Second, we build a map just for the inlined subroutines within the subprogram containing the query address. This allows us to look at far fewer DIEs and build a much smaller set of cached maps in the llvm-symbolizer case where only a few address get symbolized during the entire run. It also builds both interval maps in a very different way. It constructs a single flat vector of pairs that maps from offset -> index. The indices point into collections of DIE objects, but can also be "tombstones" (-1) to mark gaps. In the case of subprograms, this mostly just simplifies the data structure a bit. For inlined subroutines, because we carefully split them as we build the map, we end up in many cases having no holes and not having to store both start and stop offsets. Finally, the PC ranges for the inlined subroutines are compressed into 32-bits by making them relative to the base PC of the outer subprogram. This means that if you have a single function body with over 2gb of executable code in it, we will stop mapping address past the first 2gb of that function into inlined subroutines and just give you the subprogram. This doesn't seem like a problem. ;] All of this combines to make llvm-symbolizer well over 2x faster for symbolizing backtraces out of LLVM's unittests. Death-test heavy unit tests are running >2x faster. I'm still going to look at completely disabling symbolization there, but figured while I had a good benchmark we should make symbolization a bit better. Sadly, the logic to build the flat interval map for the inlined subroutines is fairly complex. I'm not super happy about this and welcome any simplifying suggestions. Huge thanks to Dave Blaikie who helped walk me through what the various things I needed to do in DWARF to make this work. Differential Revision: https://reviews.llvm.org/D40987 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321345 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-22	[Inliner] Restrict soft-float inlining penalty.	Eli Friedman
	The penalty is currently getting applied in a bunch of places where it doesn't make sense, like bitcasts (which are free) and calls (which were getting the call penalty applied twice). Instead, just apply the penalty to binary operators and floating-point casts. While I'm here, also fix getFPOpCost() to do the right thing in more cases, so we don't have to dig into function attributes. Differential Revision: https://reviews.llvm.org/D41522 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321332 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-22	Add hasProfileData() to check if a function has profile data. NFC.	Easwaran Raman
	Summary: This replaces calls to getEntryCount().hasValue() with hasProfileData that does the same thing. This refactoring is useful to do before adding synthetic function entry counts but also a useful cleanup IMO even otherwise. I have used hasProfileData instead of hasRealProfileData as David had earlier suggested since I think profile implies "real" and I use the phrase "synthetic entry count" and not "synthetic profile count" but I am fine calling it hasRealProfileData if you prefer. Reviewers: davidxl, silvas Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41461 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321331 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-21	[ModRefInfo] Add must alias info to ModRefInfo.	Alina Sbirlea
	Summary: Add an additional bit to ModRefInfo, ModRefInfo::Must, to be cleared for known must aliases. Shift existing Mod/Ref/ModRef values to include an additional most significant bit. Update wrappers that modify ModRefInfo values to reflect the change. Notes: * ModRefInfo::Must is almost entirely cleared in the AAResults methods, the remaining changes are trying to preserve it. * Only some small changes to make custom AA passes set ModRefInfo::Must (BasicAA). * GlobalsModRef already declares a bit, who's meaning overlaps with the most significant bit in ModRefInfo (MayReadAnyGlobal). No changes to shift the value of MayReadAnyGlobal (see AlignedMap). FunctionInfo.getModRef() ajusts most significant bit so correctness is preserved, but the Must info is lost. * There are cases where the ModRefInfo::Must is not set, e.g. 2 calls that only read will return ModRefInfo::NoModRef, though they may read from exactly the same location. Reviewers: dberlin, hfinkel, george.burgess.iv Subscribers: llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D38862 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321309 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-21	[DWARF v5] Rework of string offsets table reader	Wolfgang Pieb
	Reorganizes the DWARF consumer to derive the string offsets table contribution's format from the contribution header instead of (incorrectly) from the unit's format. Reviewers: JDevliegehere, aprantl Differential Revision: https://reviews.llvm.org/D41146 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321295 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-21	[YAML] Refactor escaping unittests	Francis Visoiu Mistrih
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321284 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-21	[Support] Remove MemoryBuffer::getNewUninitMemBuffer	Pavel Labath
	There is nothing useful that can be done with a read-only uninitialized buffer without const_casting its contents to initialize it. A better solution is to obtain a writable buffer (WritableMemoryBuffer::getNewUninitMemBuffer), and then convert it to a read-only buffer after initialization. All callers of this function have already been updated to do this, so this function is now unused. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321257 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-21	[WebAssembly] Remove unneeded sub-directory	Sam Clegg
	This is the only wasm def (and likely likely will be for the foreseeable) file so no need for a sub-directory Differential Revision: https://reviews.llvm.org/D41476 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321246 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-21	Revert "Expose a TargetMachine::getTargetTransformInfo function"	Sanjoy Das
	This reverts commit r321234. It breaks the -DBUILD_SHARED_LIBS=ON build. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321243 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-21	[WebAssembly] Fix local references to weak aliases	Sam Clegg
	When weak aliases are used with in same translation unit we need to be able to directly reference to alias and not just the thing it is aliases. We do this by defining both a wasm import and a wasm export in this case that result in a single Symbol. This change is a partial revert of rL314245. A corresponding lld change address the previous issues we had with this. See: https://github.com/WebAssembly/tool-conventions/issues/34 Differential Revision: https://reviews.llvm.org/D41472 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321242 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-21	Expose a TargetMachine::getTargetTransformInfo function	Sanjoy Das
	Summary: This makes the TargetMachine interface a bit simpler. We still need the std::function in TargetIRAnalysis to avoid having to add a dependency from Analysis to Target. See discussion: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html I avoided adding all of the backend owners to this review since the change is simple, but let me know if you feel differently about this. Reviewers: echristo, MatzeB, hfinkel Reviewed By: hfinkel Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D41464 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321234 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-20	TableGen: Allow setting SDNodeProperties on intrinsics	Matt Arsenault
	Allows preserving MachineMemOperands on intrinsics through selection. For reasons I don't understand, this is a static property of the pattern and the selector deliberately goes out of its way to drop if not present. Intrinsics already inherit from SDPatternOperator allowing them to be used directly in instruction patterns. SDPatternOperator has a list of SDNodeProperty, but you currently can't set them on the intrinsic. Without SDNPMemOperand, when the node is selected any memory operands are always dropped. Allowing setting this on the intrinsics avoids needing to introduce another equivalent target node just to have SDNPMemOperand set. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321212 91177308-0d34-0410-b5e6-96231b3b80d8