ampere-computing/llvm.git - LLVM including Ampere Computing toolchain specific patches

Age	Commit message (Collapse)	Author
2017-10-25	AMDGPU/NFC: Rename memory legalizer tests:	Konstantin Zhuravlyov
	- memory-legalizer-atomic-load.ll -> memory-legalizer-load.ll - memory-legalizer-atomic-store.ll -> memory-legalizer-store.ll git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316586 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-25	[inlineasm] Fix crash when number of matched input constraint operands ↵	Daniil Fukalov
	overflows signed char In a case when number of output constraint operands that has matched input operands doesn't fit to signed char, TargetLowering::ParseConstraints() can try to access ConstraintOperands (that is std::vector) with negative index. Reviewers: rampitec, arsenm Differential Review: https://reviews.llvm.org/D39125 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316574 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-25	DAG: Fix creating select with wrong condition type	Matt Arsenault
	This code added in r297930 assumed that it could create a select with a condition type that is just an integer bitcast of the selected type. For AMDGPU any vselect is going to be scalarized (although the vector types are legal), and all select conditions must be i1 (the same as getSetCCResultType). This logic doesn't really make sense to me, but there's never really been a consistent policy in what the select condition mask type is supposed to be. Try to extend the logic for skipping the transform for condition types that aren't setccs. It doesn't seem quite right to me though, but checking conditions that seem more sensible (like whether the vselect is going to be expanded) doesn't work since this seems to depend on that also. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316554 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-24	MIR: Print the register class or bank in vreg defs	Justin Bogner
	This updates the MIRPrinter to include the regclass when printing virtual register defs, which is already valid syntax for the parser. That is, given 64 bit %0 and %1 in a "gpr" regbank, %1(s64) = COPY %0(s64) would now be written as %1:gpr(s64) = COPY %0(s64) While this change alone introduces a bit of redundancy with the registers block, it allows us to update the tests to be more concise and understandable and brings us closer to being able to remove the registers block completely. Note: We generally only print the class in defs, but there is one exception. If there are uses without any defs whatsoever, we'll print the class on all uses. I'm not completely convinced this comes up in meaningful machine IR, but for now the MIRParser and MachineVerifier both accept that kind of stuff, so we don't want to have a situation where we can print something we can't parse. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316479 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-24	AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)	Marek Olsak
	Summary: Kill the thread if operand 0 == false. llvm.amdgcn.wqm.vote can be applied to the operand. Also allow kill in all shader stages. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38544 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316427 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-24	AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic	Marek Olsak
	Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D38543 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316426 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-23	AMDGPU: Fix default range in non-kernel functions	Matt Arsenault
	The range should be assumed to be the hardware maximum if a workitem intrinsic is used in a callable function which does not know the restricted limit of the calling kernel. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316346 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-18	Canonicalize a large number of mir tests using update_mir_test_checks	Justin Bogner
	This converts a large and somewhat arbitrary set of tests to use update_mir_test_checks. I ran the script on all of the tests I expect to need to modify for an upcoming mir syntax change and kept the ones that obviously didn't change the tests in ways that might make it harder to understand. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316137 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-18	AMDGPU: Rename MaxFlatWorkgroupSize to MaxFlatWorkGroupSize for consistency	Konstantin Zhuravlyov
	Differential Revision: https://reviews.llvm.org/D38957 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316097 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-17	AMDGPU : Fix an error for the llvm.cttz implementation.	Wei Ding
	Differential Revision: http://reviews.llvm.org/D39014 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316037 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-17	AMDGPU: Start generating metadata for MaxFlatWorkGroupSize	Konstantin Zhuravlyov
	Differential Revision: https://reviews.llvm.org/D38958 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316024 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-16	Use the return value of UpdateNodeOperands(); in some cases, ↵	Mark Searles
	UpdateNodeOperands() modifies the node in-place and using the return value isn’t strictly necessary. However, it does not necessarily modify the node, but may return a resultant node if it already exists in the DAG. See comments in UpdateNodeOperands(). In that case, the return value must be used to avoid such scenarios as an infinite loop (node is assumed to have been updated, so added back to the worklist, and re-processed; however, node hasn’t changed so it is once again passed to UpdateNodeOperands(), assumed modified, added back to worklist; cycle infinitely repeats). Differential Revision: https://reviews.llvm.org/D38466 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315957 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-16	[AMDGPU] : revert r315908	Alexander Timofeev
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315916 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-16	[AMDGPU] Prevent Machine Copy Propagation from replacing live copy with the ↵	Alexander Timofeev
	dead one Differential revision: https://reviews.llvm.org/D38754 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315908 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-14	AMDGPU: Temporary disable pal metadata check line in llvm-readobj test	Konstantin Zhuravlyov
	It fails on mips git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315837 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-14	AMDGPU: Bring HSA metadata on par with the specification	Konstantin Zhuravlyov
	Differential Revision: https://reviews.llvm.org/D38753 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315821 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-14	llvm-readobj: Print AMDGPU note contents	Konstantin Zhuravlyov
	Differential Revision: https://reviews.llvm.org/D38752 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315819 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-14	AMDGPU: Cleanup elf-notes.ll test	Konstantin Zhuravlyov
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315816 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-14	llvm-readobj: Print AMDGPU note type names	Konstantin Zhuravlyov
	Differential Revision: https://reviews.llvm.org/D38751 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315813 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-14	AMDGPU: Do not emit deprecated notes for code object v3	Konstantin Zhuravlyov
	Differential Revision: https://reviews.llvm.org/D38749 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315810 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-14	AMDGPU: Add support for isa version note	Konstantin Zhuravlyov
	- Emit NT_AMD_AMDGPU_ISA - Add assembler parsing for isa version directive - If isa version directive does not match command line arguments, then return error Differential Revision: https://reviews.llvm.org/D38748 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315808 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-13	AMDGPU: Implement hasBitPreservingFPLogic	Matt Arsenault
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315754 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-13	AMDGPU: Look for src mods before fp_extend	Matt Arsenault
	When selecting modifiers for mad_mix instructions, look at fneg/fabs that occur before the conversion. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315748 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-13	AMDGPU: Implement isFPExtFoldable	Matt Arsenault
	This helps match v_mad_mix* in some cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315744 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-12	Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.	Wei Ding
	Differential Revision: http://reviews.llvm.org/D37348 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315610 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-12	[AMDGPU] For amdpal, widen interpolation mode workaround	Tim Renouf
	Summary: The interpolation mode workaround ensures that at least one interpolation mode is enabled in PSInputAddr. It does not also check PSInputEna on the basis that the user might enable bits in that depending on run-time state. However, for amdpal os type, the user does not enable some bits after compilation based on run-time states; the register values being generated here are the final ones set in the hardware. Therefore, apply the workaround to PSInputAddr and PSInputEnable together. (The case where a bit is set in PSInputAddr but not in PSInputEnable is where the frontend set up an input arg for a particular interpolation mode, but nothing uses that input arg. Really we should have an earlier pass that removes such an arg.) Reviewers: arsenm, nhaehnle, dstuttard Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37758 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315591 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-11	AMDGPU/NFC: Minor clean ups in PAL metadata	Konstantin Zhuravlyov
	- Move PAL metadata definitions to AMDGPUMetadata - Make naming consistent with HSA metadata Differential Revision: https://reviews.llvm.org/D38745 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315523 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-11	AMDGPU/NFC: Rename code object metadata as HSA metadata	Konstantin Zhuravlyov
	- Rename AMDGPUCodeObjectMetadata to AMDGPUMetadata (PAL metadata will be included in this file in the follow up change) - Rename AMDGPUCodeObjectMetadataStreamer to AMDGPUHSAMetadataStreamer - Introduce HSAMD namespace - Other minor name changes in function and test names git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315522 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-10	AMDGPU: Fix missing skipFunction calls	Matt Arsenault
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315361 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-10	AMDGPU: Fix failure to select branch with optnone	Matt Arsenault
	opt-bisect/optnone disable the AMDGPUUniformAnnotateValues pass. The heuristic in the custom selector for brcond deferred the branch uniformity check to the pattern, which would fail. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315360 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-10	[AMDGPU] Lower enqueued blocks and generate runtime metadata	Yaxun Liu
	This patch adds a post-linking pass which replaces the function pointer of enqueued block kernel with a global variable (runtime handle) and adds runtime-handle attribute to the enqueued block kernel. In LLVM CodeGen the runtime-handle metadata will be translated to RuntimeHandle metadata in code object. Runtime allocates a global buffer for each kernel with RuntimeHandel metadata and saves the kernel address required for the AQL packet into the buffer. __enqueue_kernel function in device library knows that the invoke function pointer in the block literal is actually runtime handle and loads the kernel address from it and puts it into AQL packet for dispatching. This cannot be done in FE since FE cannot create a unique global variable with external linkage across LLVM modules. The global variable with internal linkage does not work since optimization passes will try to replace loads of the global variable with its initialization value. Differential Revision: https://reviews.llvm.org/D38610 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315352 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-10	[DAGCombine] Fix for shuffle to vector extend for non power 2 vectors	David Stuttard
	Summary: See https://llvm.org/PR33743 for more details It seems that for non-power of 2 vector sizes, the algorithm can produce non-matching sizes for input and result causing an assert. This usually isn't a problem as the isAnyExtend check will weed these out, but in some cases (most often with lots of undefined values for the mask indices) it can pass this check for non power of 2 vectors. Adding in an extra check that ensures that bit size will match for the result and input (as required) Subscribers: nhaehnle Differential Revision: https://reviews.llvm.org/D35241 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315307 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-10	AMDGPU: Split MUBUF offset into aligned components	Nicolai Haehnle
	Summary: Atomic buffer operations do not work (and trap on gfx9) when the components are unaligned, even if their sum is aligned. Previously, we generated an offset of 4156 without an SGPR by splitting it as 4095 + 61 (immediate + inline constant). The highest offset for which we can do this correctly is 4156 = 4092 + 64. Fixes dEQP-GLES31.functional.ssbo.atomic.* Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37850 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315302 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-06	[AMDGPU] New 64 bit div/rem expansion	Stanislav Mekhanoshin
	Old expansion was 20 VGPRs, 78 SGPRs and ~380 instructions. This expansion is 11 VGPRs, 12 SGPRs and ~120 instructions. Passes OpenCL conformance test_integer_ops quick_[u]long_math Differential Revision: https://reviews.llvm.org/D38607 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315081 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-05	[PassManager] Run global optimizations after the inliner.	Davide Italiano
	The inliner performs some kind of dead code elimination as it goes, but there are cases that are not really caught by it. We might at some point consider teaching the inliner about them, but it is OK for now to run GlobalOpt + GlobalDCE in tandem as their benefits generally outweight the cost, making the whole pipeline faster. This fixes PR34652. Differential Revision: https://reviews.llvm.org/D38154 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314997 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-05	AMDGPU: Set v2i32 any_extend to expand	Matt Arsenault
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314993 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-05	AMDGPU: Add and set AMDGPU-specific e_flags	Konstantin Zhuravlyov
	Differential Revision: https://reviews.llvm.org/D38556 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314987 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-05	AMDGPU: Do not fold clamp instructions when sources are different	Matt Arsenault
	Patch by hakzsam (Samuel Pitoiset) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314951 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-04	AMDGPU: Fix not accounting for instruction size in bundles	Matt Arsenault
	These were counted as 0. Fixes branch limit exceeded errors in some large programs. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314944 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-04	AMDGPU: Correctly set EI_OSABI based on the os	Konstantin Zhuravlyov
	Differential Revision: https://reviews.llvm.org/D38555 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314943 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-03	AMDGPU: Expand setcc for v2i32 and v4i32	Konstantin Zhuravlyov
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314852 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-03	[AMDGPU] implemented pal metadata	Tim Renouf
	Summary: For the amdpal OS type: We write an AMDGPU_PAL_METADATA record in the .note section in the ELF (or as an assembler directive). It contains key=value pairs of 32 bit ints. It is a merge of metadata from codegen of the shaders, and metadata provided by the frontend as _amdgpu_pal_metadata IR metadata. Where both sources have a key=value with the same key, the two values are ORed together. This .note record is part of the amdpal ABI and will be documented in docs/AMDGPUUsage.rst in a future commit. Eventually the amdpal OS type will stop generating the .AMDGPU.config section once the frontend has safely moved over to using the .note records above instead of .AMDGPU.config. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37753 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314829 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-03	[AMDGPU] Avoid predicated execution of the basic blocks containing scalar	Alexander Timofeev
	instructions. Differential revision: https://reviews.llvm.org/D38293 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314828 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-03	Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source ↵	Geoff Berry
	forwarding"" This reverts commit r314729. Another bug has been encountered in an out-of-tree target reported by Quentin. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314814 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-02	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"	Geoff Berry
	Issues addressed since original review: - Avoid bug in regalloc greedy/machine verifier when forwarding to use in an instruction that re-defines the same virtual register. - Fixed bug when forwarding to use in EarlyClobber instruction slot. - Fixed incorrect forwarding to register definitions that showed up in explicit_uses() iterator (e.g. in INLINEASM). - Moved removal of dead instructions found by LiveIntervals::shrinkToUses() outside of loop iterating over instructions to avoid instructions being deleted while pointed to by iterator. - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314729 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-02	AMDGPU: Fix potentially incorrectly matching check lines	Matt Arsenault
	These check lines are supposed to make sure the new d16 load instructions aren't used, but the expected instruction name is a prefix of the incorrect instruction name. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314714 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-02	Eliminate ftrunc if source is know to be rounded	Stanislav Mekhanoshin
	Differential Revision: https://reviews.llvm.org/D38421 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314688 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-29	[AMDGPU] Set fast-math flags on functions given the options	Stanislav Mekhanoshin
	We have a single library build without relaxation options. When inlined library functions remove fast math attributes from the functions they are integrated into. This patch sets relaxation attributes on the functions after linking provided corresponding relaxation options are given. Math instructions inside the inlined functions remain to have no fast flags, but inlining does not prevent fast math transformations of a surrounding caller code anymore. Differential Revision: https://reviews.llvm.org/D38325 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314568 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-29	CodeGen: Fix pointer info in expandUnalignedLoad/Store	Yaxun Liu
	Currently expandUnalignedLoad/Store uses place holder pointer info for temporary memory operand in stack, which does not have correct address space. This causes unaligned private double16 load/store to be lowered to flat_load instead of buffer_load for amdgcn target. This fixes failures of OpenCL conformance test basic/vload_private/vstore_private on target amdgcn---amdgizcl. Differential Revision: https://reviews.llvm.org/D35361 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314566 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-29	AMDGPU: VALU carry-in and v_cndmask condition cannot be EXEC	Nicolai Haehnle
	The hardware will only forward EXEC_LO; the high 32 bits will be zero. Additionally, inline constants do not work. At least, v_addc_u32_e64 v0, vcc, v0, v1, -1 which could conceivably be used to combine (v0 + v1 + 1) into a single instruction, acts as if all carry-in bits are zero. The llvm.amdgcn.ps.live test is adjusted; it would be nice to combine s_mov_b64 s[0:1], exec v_cndmask_b32_e64 v0, v1, v2, s[0:1] into v_mov_b32 v0, v3 but it's not particularly high priority. Fixes dEQP-GLES31.functional.shaders.helper_invocation.value.* git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314522 91177308-0d34-0410-b5e6-96231b3b80d8