ampere-computing/llvm.git - LLVM including Ampere Computing toolchain specific patches

Age	Commit message (Collapse)	Author
2018-04-09	Merging r326535:	Tom Stellard
	------------------------------------------------------------------------ r326535 \| jvesely \| 2018-03-01 18:50:22 -0800 (Thu, 01 Mar 2018) \| 6 lines AMDGPU/GCN: Promote i16 ctpop i16 capable ASICs do not support i16 operands for this instruction. Add tablegen pattern to merge chained i16 additions. Differential Revision: https://reviews.llvm.org/D43985 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@329589 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-13	Merging r324746:	Hans Wennborg
	------------------------------------------------------------------------ r324746 \| arsenm \| 2018-02-09 17:57:48 +0100 (Fri, 09 Feb 2018) \| 4 lines AMDGPU: Fix layering issue Move utility function that depends on codegen. Fixes build with r324487 reapplied. ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325007 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-02	Merging r323908:	Hans Wennborg
	------------------------------------------------------------------------ r323908 \| mareko \| 2018-01-31 21:18:04 +0100 (Wed, 31 Jan 2018) \| 7 lines AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16} Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D41663 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324103 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-29	AMDGPU: Use unique PSVs for buffer resources	Matt Arsenault
	Also fixes using the wrong memory type for some intrinsics when custom lowering them. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321557 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-29	AMDGPU: Implement getTgtMemIntrinsic for images	Matt Arsenault
	Currently all images are lowered to have a single image PseudoSourceValue. Image stores happen to have overly strict mayLoad/mayStore/hasSideEffects flags set on them, so this happens to work. When these are fixed to be correct, the scheduler breaks this because the identical PSVs are assumed to be the same address. These need to be unique to the image resource value. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321555 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-15	MachineFunction: Return reference from getFunction(); NFC	Matthias Braun
	The Function can never be nullptr so we can return a reference. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320884 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-14	TLI: Allow using PSV for intrinsic mem operands	Matt Arsenault
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320756 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-14	DAG: Expose all MMO flags in getTgtMemIntrinsic	Matt Arsenault
	Rather than adding more bits to express every MMO flag you could want, just directly use the MMO flags. Also fixes using a bunch of bool arguments to getMemIntrinsicNode. On AMDGPU, buffer and image intrinsics should always have MODereferencable set, but currently there is no way to do that directly during the initial intrinsic lowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320746 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-13	AMDGPU: Partially fix disassembly of MIMG instructions	Matt Arsenault
	Stores failed to decode at all since they didn't have a DecoderNamespace set. Loads worked, but did not change the register width displayed to match the numbmer of enabled channels. The number of printed registers for vaddr is still wrong, but I don't think that's encoded in the instruction so there's not much we can do about that. Image atomics are still broken. MIMG is the same encoding for SI/VI, but the image atomic classes are split up into encoding specific versions unlike every other MIMG instruction. They have isAsmParserOnly set on them for some reason. dmask is also special for these, so we probably should not have it as an explicit operand as it is now. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320614 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-08	AMDGPU: image_getlod and image_getresinfo do not read memory	Matt Arsenault
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320187 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-08	AMDGPU: Preserve MMO in adjustWritemask	Matt Arsenault
	Follow up to r319705. Currently the MMO is produced after this in the custom inserter, so this doesn't change anything yet. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320186 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-04	AMDGPU: Fix creating invalid copy when adjusting dmask	Matt Arsenault
	Move the entire optimization to one place. Before it was possible to adjust dmask without changing the register class of the output instruction, since they were done in separate places. Fix all lane sizes and move all of the optimization into the DAG folding. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319705 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-30	AMDGPU: Use gfx9 carry-less add/sub instructions	Matt Arsenault
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319491 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-29	DAG: Add nuw when splitting loads and stores	Matt Arsenault
	The object can't straddle the address space wrap around, so I think it's OK to assume any offsets added to the base object pointer can't overflow. Similar logic already appears to be applied in SelectionDAGBuilder when lowering aggregate returns. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319272 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-28	[CodeGen] Print register names in lowercase in both MIR and debug output	Francis Visoiu Mistrih
	As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319187 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-22	[AMDGPU] Fix SITargetLowering::LowerCall for pointer info of byval argument	Yaxun Liu
	SITargetLowering::LowerCall uses dummy pointer info for byval argument, which causes flat load instead of buffer load. This patch fixes that. Differential Revision: https://reviews.llvm.org/D40040 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318844 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-17	Fix a bunch more layering of CodeGen headers that are in Target	David Blaikie
	All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318490 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-15	AMDGPU: Replace i64 add/sub lowering	Matt Arsenault
	Use VOP3 add/addc like usual. This has some tradeoffs. Inline immediates fold a little better, but other constants are worse off. SIShrinkInstructions could be made smarter to handle these cases. This allows us to avoid selecting scalar adds where we need to track the carry in scc and replace its users. This makes it easier to use the carryless VALU adds. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318340 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-15	AMDGPU: Don't use MUBUF vaddr if address may overflow	Matt Arsenault
	Effectively revert r263964. Before we would not allow this if vaddr was not known to be positive. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318240 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-14	AMDGPU: Handle or in multi-use shl ptr combine	Matt Arsenault
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318223 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-13	AMDGPU: Preserve nuw in shl add ptr combine	Matt Arsenault
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318017 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-13	AMDGPU: Fix multi-use shl/add combine	Matt Arsenault
	This was using a custom function that didn't handle the addressing modes properly for private. Use isLegalAddressingMode to avoid duplicating this. Additionally, skip the combine if there is only one use since the standard combine will handle it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318013 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-09	AMDGPU: Lower buffer store and atomic intrinsics manually	Marek Olsak
	Summary: Without this, SIMemoryLegalizer inserts s_waitcnt vmcnt(0) before every buffer store and atomic instruction. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39060 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317754 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-06	AMDGPU: Select v_mad_u64_u32 and v_mad_i64_i32	Matt Arsenault
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317492 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-06	[AMDGPU] Fix assertion due to assuming pointer in default addr space is 32 bit	Yaxun Liu
	The backend assumes pointer in default addr space is 32 bit, which is not true for the new addr space mapping and causes assertion for unresolved functions. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39643 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317476 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-24	AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)	Marek Olsak
	Summary: Kill the thread if operand 0 == false. llvm.amdgcn.wqm.vote can be applied to the operand. Also allow kill in all shader stages. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38544 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316427 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-16	Use the return value of UpdateNodeOperands(); in some cases, ↵	Mark Searles
	UpdateNodeOperands() modifies the node in-place and using the return value isn’t strictly necessary. However, it does not necessarily modify the node, but may return a resultant node if it already exists in the DAG. See comments in UpdateNodeOperands(). In that case, the return value must be used to avoid such scenarios as an infinite loop (node is assumed to have been updated, so added back to the worklist, and re-processed; however, node hasn’t changed so it is once again passed to UpdateNodeOperands(), assumed modified, added back to worklist; cycle infinitely repeats). Differential Revision: https://reviews.llvm.org/D38466 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315957 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-13	AMDGPU: Implement hasBitPreservingFPLogic	Matt Arsenault
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315754 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-12	[AMDGPU] For amdpal, widen interpolation mode workaround	Tim Renouf
	Summary: The interpolation mode workaround ensures that at least one interpolation mode is enabled in PSInputAddr. It does not also check PSInputEna on the basis that the user might enable bits in that depending on run-time state. However, for amdpal os type, the user does not enable some bits after compilation based on run-time states; the register values being generated here are the final ones set in the hardware. Therefore, apply the workaround to PSInputAddr and PSInputEnable together. (The case where a bit is set in PSInputAddr but not in PSInputEnable is where the frontend set up an input arg for a particular interpolation mode, but nothing uses that input arg. Really we should have an earlier pass that removes such an arg.) Reviewers: arsenm, nhaehnle, dstuttard Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37758 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315591 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-05	AMDGPU: Set v2i32 any_extend to expand	Matt Arsenault
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314993 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-29	AMDGPU: VALU carry-in and v_cndmask condition cannot be EXEC	Nicolai Haehnle
	The hardware will only forward EXEC_LO; the high 32 bits will be zero. Additionally, inline constants do not work. At least, v_addc_u32_e64 v0, vcc, v0, v1, -1 which could conceivably be used to combine (v0 + v1 + 1) into a single instruction, acts as if all carry-in bits are zero. The llvm.amdgcn.ps.live test is adjusted; it would be nice to combine s_mov_b64 s[0:1], exec v_cndmask_b32_e64 v0, v1, v2, s[0:1] into v_mov_b32 v0, v3 but it's not particularly high priority. Fixes dEQP-GLES31.functional.shaders.helper_invocation.value.* git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314522 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-20	AMDGPU: Start selecting v_mad_mixhi_f16	Matt Arsenault
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313814 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-14	AMDGPU: Stop modifying SP in call sequences	Matt Arsenault
	Because the stack growth direction and addressing is done in the same direction, modifying SP at the beginning of the call sequence was incorrect. If we had a stack passed argument, we would end up skipping that number of bytes before pushing arguments, leaving unused/inconsistent space. The callee creates fixed stack objects in its frame, so the space necessary for these is already logically allocated in the callee, so we just let the callee increment SP if it really requires it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313279 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-14	AMDGPU: Make frame register caller preserved	Matt Arsenault
	Using SplitCSR for the frame register was very broken. Often the copies in the prolog and epilog were optimized out, in addition to them being inserted after the true prolog where the FP was clobbered. I have a hacky solution which works that continues to use split CSR, but for now this is simpler and will get to working programs. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313274 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-07	AMDGPU: Don't legalize i16 extloads to i32 with legal i16	Matt Arsenault
	Keeping non-i16 extloads makes it easier to match some new gfx9 load instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312699 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-30	AMDGPU: Select clamp pattern with v2f16	Matt Arsenault
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312087 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-11	AMDGPU: Start adding tail call support	Matt Arsenault
	Handle the sibling call cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310753 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-04	[AMDGPU] Add support for Whole Wavefront Mode	Connor Abbott
	Summary: Whole Wavefront Wode (WWM) is similar to WQM, except that all of the lanes are always enabled, regardless of control flow. This is required for implementing wavefront reductions in non-uniform control flow, where we need to use the inactive lanes to propagate intermediate results, so they need to be enabled. We need to propagate WWM to uses (unless they're explicitly marked as exact) so that they also propagate intermediate results correctly. We do the analysis and exec mask munging during the WQM pass, since there are interactions with WQM for things that require both WQM and WWM. For simplicity, WWM is entirely block-local -- blocks are never WWM on entry or exit of a block, and WWM is not propagated to the block level. This means that computations involving WWM cannot involve control flow, but we only ever plan to use WWM for a few limited purposes (none of which involve control flow) anyways. Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There isn't yet a way to turn WWM off -- that will be added in a future change. Finally, it turns out that turning on inactive lanes causes a number of problems with register allocation. While the best long-term solution seems like teaching LLVM's register allocator about predication, for now we need to add some hacks to prevent ourselves from getting into trouble due to constraints that aren't currently expressed in LLVM. For the gory details, see the comments at the top of SIFixWWMLiveness.cpp. Reviewers: arsenm, nhaehnle, tpr Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D35524 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310087 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-04	[AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM	Connor Abbott
	Summary: Previously, we assumed that certain types of instructions needed WQM in pixel shaders, particularly DS instructions and image sampling instructions. This was ok because with OpenGL, the assumption was correct. But we want to start using DPP instructions for derivatives as well as other things, so the assumption that we can infer whether to use WQM based on the instruction won't continue to hold. This intrinsic lets frontends like Mesa indicate what things need WQM based on their knowledge of the API, rather than second-guessing them in the backend. We need to keep around the old method of enabling WQM, but eventually we should remove it once Mesa catches up. For now, this will let us use DPP instructions for computing derivatives correctly. Reviewers: arsenm, tpr, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D35167 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310085 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-04	AMDGPU: Remove pointless asserts	Matt Arsenault
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310007 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-03	AMDGPU: Don't use report_fatal_error for unsupported call types	Matt Arsenault
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310004 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-03	AMDGPU: Remove error on calls for amdgcn	Matt Arsenault
	Repurpose the -amdgpu-function-calls flag. Rather than require it to emit a call, only use it to run the always inline path or not. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310003 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-03	AMDGPU: Fix implicitarg.ptr handling special inputs	Matt Arsenault
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310002 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-03	AMDGPU: Pass special input registers to functions	Matt Arsenault
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309998 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-02	AMDGPU: Analyze callee resource usage in AsmPrinter	Matt Arsenault
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309781 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-02	AMDGPU: Don't place arguments in emergency stack slot	Matt Arsenault
	When finding the fixed offsets for function arguments, this needs to skip over the 4 bytes reserved for the emergency stack slot. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309776 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-01	AMDGPU: Fix handling of div_scale with undef inputs	Matt Arsenault
	The src0 register must match src1 or src2, but if these were undefined they could end up using different implicit_defed virtual registers. Force these to use one undef vreg or pick the defined other register. Also fixes producing invalid nodes without the right number of inputs when src2 is undef. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309743 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-01	AMDGPU: Initial implementation of calls	Matt Arsenault
	Includes a hack to fix the type selected for the GlobalAddress of the function, which will be fixed by changing the default datalayout to use generic pointers for 0. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309732 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-29	AMDGPU: Teach isLegalAddressingMode about global_* instructions	Matt Arsenault
	Also refine the flat check to respect flat-for-global feature, and constant fallback should check global handling, not specifically MUBUF. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309471 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-28	AMDGPU: Annotate implicitarg.ptr usage	Matt Arsenault
	We need to pass something to functions for this to work. It isn't derivable just from the kernarg segment pointer because the implicit arguments are placed after the kernel arguments. Also fixes missing test for the intrinsic. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309398 91177308-0d34-0410-b5e6-96231b3b80d8