ampere-computing/llvm.git - LLVM including Ampere Computing toolchain specific patches

Age	Commit message (Collapse)	Author
2018-01-03	Fix build of WebAssembly and AVR backends after r321692	Alex Bradbury
	As experimental backends, I didn't have them configured to build in my local build config. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321696 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-03	Thread MCSubtargetInfo through Target::createMCAsmBackend	Alex Bradbury
	Currently it's not possible to access MCSubtargetInfo from a TgtMCAsmBackend. D20830 threaded an MCSubtargetInfo reference through MCAsmBackend::relaxInstruction, but this isn't the only function that would benefit from access. This patch removes the Triple and CPUString arguments from createMCAsmBackend and replaces them with MCSubtargetInfo. This patch just changes the interface without making any intentional functional changes. Once in, several cleanups are possible: * Get rid of the awkward MCSubtargetInfo handling in ARMAsmBackend * Support 16-bit instructions when valid in MipsAsmBackend::writeNopData * Get rid of the CPU string parsing in X86AsmBackend and just use a SubtargetFeature for HasNopl * Emit 16-bit nops in RISCVAsmBackend::writeNopData if the compressed instruction set extension is enabled (see D41221) This change initially exposed PR35686, which has since been resolved in r321026. Differential Revision: https://reviews.llvm.org/D41349 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321692 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-02	Handle the case of live 16-bit subregisters in X86FixupBWInsts	Andrew Kaylor
	Differential Revision: https://reviews.llvm.org/D40524 Change-Id: Ie3a405b28503ceae999f5f3ba07a68fa733a2400 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321674 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-02	[x86] allow pairs of PCMPEQ for vector-sized integer equality comparisons ↵	Sanjay Patel
	(PR33325) This is an extension of D31156 with the goal that we'll allow memcmp() == 0 expansion for x86 to use 2 pairs of loads per block. The memcmp expansion pass (formerly part of CGP) will generate this kind of pattern with oversized integer compares, so we want to transform these into x86-specific vector nodes before legalization splits things into scalar chunks. See PR33325 for more details: https://bugs.llvm.org/show_bug.cgi?id=33325 Differential Revision: https://reviews.llvm.org/D41618 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321656 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-02	[AArch64][GlobalISel] Enable GlobalISel at -O0 by default	Amara Emerson
	Tests updated to explicitly use fast-isel at -O0 instead of implicitly. This change also allows an explicit -fast-isel option to override an implicitly enabled global-isel. Otherwise -fast-isel would have no effect at -O0. Differential Revision: https://reviews.llvm.org/D41362 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321655 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-02	[Hexagon] Fix generation of vector sign extensions	Krzysztof Parzyszek
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321650 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-02	[AArch64][AsmParser] Add isScalarReg() and repurpose isReg()	Sander de Smalen
	Summary: isReg() in AArch64AsmParser.cpp is a bit of a misnomer, and would be better named 'isScalarReg()' instead. Patch [1/3] in a series to add operand constraint checks for SVE's predicated ADD/SUB. Reviewers: rengolin, mcrosier, evandro, fhahn, echristo Reviewed By: fhahn Subscribers: aemerson, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D41445 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321646 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-02	Strip trailing whitespace. NFCI	Simon Pilgrim
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321644 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-02	[RISCV] Add Defs Uses information for c.jal and c.addi4spn	Alex Bradbury
	Differential Revision: https://reviews.llvm.org/D41339 Patch by Shiva Chen. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321643 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-02	[RISCV][NFC] Resolve unused variable warning in RISCVISelLowering	Alex Bradbury
	XLenVT in LowerFormalArguments is used only in an assert. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321642 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-01	[X86] Promote vXi1 fp_to_uint/fp_to_sint to vXi32 to avoid scalarization.	Craig Topper
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321632 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-01	[X86] Replace custom lowering of vXi1 SINT_TO_FP/UINT_TO_FP with promotion.	Craig Topper
	The custom lowering was just doing the same thing promotion would do. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321630 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-01	[SelectionDAG][X86][AArch64] Require targets to specify the promotion type ↵	Craig Topper
	when using setOperationAction Promote for INT_TO_FP and FP_TO_INT Currently the promotion for these ignores the normal getTypeToPromoteTo and instead just tries to double the element width. This is because the default behavior of getTypeToPromote to just adds 1 to the SimpleVT, which has the affect of increasing the element count while keeping the scalar size the same. If multiple steps are required to get to a legal operation type, int_to_fp will be promoted multiple times. And fp_to_int will keep trying wider types in a loop until it finds one that works. getTypeToPromoteTo does have the ability to query a promotion map to get the type and not do the increasing behavior. It seems better to just let the target specify the promotion type in the map explicitly instead of letting the legalizer iterate via widening. FWIW, it's worth I think for any other vector operations that need to be promoted, we have to specify the type explicitly because the default behavior of getTypeToPromote isn't useful for vectors. The other types of promotion already require either the element count is constant or the total vector width is constant, but neither happens by incrementing the SimpleVT enum. Differential Revision: https://reviews.llvm.org/D40664 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321629 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-01	[X86] In LowerTruncateVecI1, don't add SHL if the input is known to be all ↵	Craig Topper
	sign bits. If the input is all sign bits then the LSB through MSB are all the same so we don't need to be move the LSB to the MSB. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321617 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-01	[X86] Add missing NoVLX predicate around some patterns that use zmm ↵	Craig Topper
	registers to implement 128/256-bit operations without VLX. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321613 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-01	[X86] Add patterns for using zmm registers for v8i32/v8f32 vselect with the ↵	Craig Topper
	false input being zero. We can use zmm move with zero masking for this. We already had patterns for using a masked move, but we didn't check for the zero masking case separately. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321612 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-31	[X86] Use CONCAT_VECTORS instead of INSERT_SUBVECTOR for padding v4i1/v2i1 ↵	Craig Topper
	vector to v8i1 pre-legalize. The CONCAT_VECTORS will be lowered to INSERT_SUBVECTOR later. In the modified cases this seems to be enough to trick a later DAG combine into running in a different order than allows the ANDs to be removed. I'll admit this is a bit of a hack that happens to work, but using CONCAT_VECTORS is more consistent with other legalization code anyway. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321611 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-31	[X86][AVX2] Combine extract(broadcast(scalar_value)) --> scalar_value	Simon Pilgrim
	As it has a scalar source we don't treat it as a target shuffle so needs special handling. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321610 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-31	[X86][SSE] Don't vectorize splat buildvector of binops (PR30780)	Simon Pilgrim
	Don't combine buildvector(binop(),binop(),binop(),binop()) -> binop(buildvector(), buildvector()) if its a splat - keep the binop scalar and just splat the result to avoid large vector constants. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321607 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-31	[X86] Add a DAG combine to widen (i4 (bitcast (v4i1))) before type ↵	Craig Topper
	legalization sees the i4 and changes to load/store. Same for v2i1 and i2. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321602 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-31	[X86] Add a DAG combine to fix (v4i1 (bitcast (i4))) before type ↵	Craig Topper
	legalization sees the i4 and changes to load/store. Same for i2 and v2i1. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321601 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-31	[X86] Prevent combining (v8i1 (bitconvert (i8 load)))->(v8i1 load) if we ↵	Craig Topper
	don't have DQI. We end up using an i8 load via an isel pattern from v8i1 anyway. This just makes it more explicit. This seems to improve codgen in some cases and I'd like to kill off some of the load patterns. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321598 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-31	[X86] Remove patterns for load/store of vXi with bitcasts to/from integer.	Craig Topper
	This is better handled by a DAG combine if its not already being done. No lit tests fail from the removal of these patterns. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321597 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-31	[X86] Remove AND32ri8 from pattern for v1i1 load.	Craig Topper
	I don't think anything would actually expect the other bits to be zero. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321596 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-31	[X86] Fix a crash when returning a <1 x i1> value>	Craig Topper
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321595 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-31	[X86] Cleanup store splitting in LowerTruncatingStore	Craig Topper
	Use getMemBasePlusOffset and calculate proper pointer info and alignment for the second store. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321594 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-30	Use phi ranges to simplify code. No functionality change intended.	Benjamin Kramer
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321585 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-30	[PowerPC] fix a bug in TCO eligibility check	Hiroshi Inoue
	If the callee and caller use different calling convensions, we cannot apply TCO if the callee requires arguments on stack; e.g. C calling convention and Fast CC use the same registers for parameter passing, but the stack offset is not necessarily same. This patch also recommit r319218 "[PowerPC] Allow tail calls of fastcc functions from C CallingConv functions." by @sfertile since the problem reported in r320106 should be fixed. Differential Revision: https://reviews.llvm.org/D40893 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321579 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-30	[X86] Remove isel patterns for kshifts with types that don't support kshift ↵	Craig Topper
	natively. We should only be creating natively supported kshifts now. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321577 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-30	[X86] Custom legalize vXi1 extract_subvector with KSHIFTR.	Craig Topper
	This allows us to remove some isel patterns. This is mostly NFC, but we now use KSHIFTB instead of KSHIFTW with DQI. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321576 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-29	[mips] Provide correct descriptions of asm constraints in the comments. NFC	Simon Atanasyan
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321566 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-29	[mips] Replace assert by an error message	Simon Atanasyan
	Initially, if the `c` constraint applied to the wrong data type that causes LLVM to assert. This commit replaces the assert by an error message. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321565 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-29	AMDGPU: Use unique PSVs for buffer resources	Matt Arsenault
	Also fixes using the wrong memory type for some intrinsics when custom lowering them. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321557 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-29	AMDGPU: Remove mayLoad/hasSideEffects from MIMG stores	Matt Arsenault
	Atomics still have hasSideEffects set on them because of the mess that is the memory properties. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321556 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-29	AMDGPU: Implement getTgtMemIntrinsic for images	Matt Arsenault
	Currently all images are lowered to have a single image PseudoSourceValue. Image stores happen to have overly strict mayLoad/mayStore/hasSideEffects flags set on them, so this happens to work. When these are fixed to be correct, the scheduler breaks this because the identical PSVs are assumed to be the same address. These need to be unique to the image resource value. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321555 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-29	[X86][SSE] Match PSHUFLW/PSHUFHW + PSHUFD vXi16 shuffle patterns (PR34686)	Simon Pilgrim
	As noted in PR34686, we are relying on a PSHUFD+PSHUFLW+PSHUFHW shuffle chain for most general vXi16 unary shuffles. This patch checks for simpler PSHUFLW+PSHUFD and PSHUFHW+PSHUFD cases beforehand, building on some existing code that just handled splat shuffles. By doing so we also prevent premature use of PSHUFB shuffles which can be slower and require the creation/loading of constant shuffle masks. We now have the 'fast-variable-shuffle' option for hardware that prefers combining 2 or more shuffles to VPSHUFB etc. Differential Revision: https://reviews.llvm.org/D38318 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321553 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-29	[AMDGPU][MC] Incorrect parsing of flat/global atomic modifiers	Dmitry Preobrazhensky
	See bug 35730: https://bugs.llvm.org/show_bug.cgi?id=35730 Differential Revision: https://reviews.llvm.org/D41598 Reviewers: vpykhtin, artem.tamazov, arsenm git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321552 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-29	[PowerPC] Fix for PR35688 - handle out-of-range values for r+r to r+i conversion	Nemanja Ivanovic
	Revision 320791 introduced a pass that transforms reg+reg instructions to reg+imm if they're fed by "load immediate". However, it didn't handle out-of-range shifts correctly as reported in PR35688. This patch fixes that and therefore the PR. Furthermore, there was undefined behaviour in the patch where the RHS of an initialization expression was 32 bits and constant `1` was shifted left 32 bits. This was fixed by ensuring the RHS is 64 bits just like the LHS. Differential Revision: https://reviews.llvm.org/D41369 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321551 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-29	Fix incorrect operand sizes for some MMX instructions: punpcklwd, punpcklbw ↵	Andrew V. Tischenko
	and punpckldq. Differential Revision: https://reviews.llvm.org/D41595 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321549 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-28	[X86] When lowering extending loads from v2i1/v4i1, if we have VLX, use a ↵	Craig Topper
	narrower extend. Previously we used an extend from v8i1 to v8i32/v8i64. Then extracted to the final width. But if we have VLX we should extract first. This way we don't end up with an overly large extend. This allows us to use vcmpeq to make all ones for the sign extend when DQI isn't available. Otherwise we get a VPTERNLOG. If we make v2i1/v4i1 legal like proposed in D41560, we could always do this and rely on the lowering of the extend to widen when necessary. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321538 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-28	[X86] Use ISD::CONCAT_VECTORS when splitting 256-bit loads in combineLoad.	Craig Topper
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321537 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-28	[X86] Fix inconsistencies in different places where we split loads/stores.	Craig Topper
	-Use MinAlign instead of std::min. -Use SelectionDAG::getMemBasePlusOffset. -Apply offset to the pointer info for the second load/store created. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321536 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-28	[X86] Emit ISD::TRUNCATE instead of X86ISD::VTRUNC from ↵	Craig Topper
	LowerZERO_EXTEND_Mask/LowerSIGN_EXTEND_Mask. The truncate will be lowered X86ISD::VTRUNC later. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321534 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-28	[X86] Remove unnecessary patterns for sign extending vXi1 without VLX.	Craig Topper
	The custom lowering already widens the result type to 512-bits if VLX isn't supported. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321533 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-28	[WinEH] Don't emit state stores or EH thunks for available_externally functions	Reid Kleckner
	The exception handler thunk needs to reference the LSDA of the parent function, which won't be emitted if it's available_externally. Fixes PR35736. ThinLTO ends up producing available_externally functions that use _CxxFrameHandler3. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321532 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-28	Avoid int to string conversion in Twine or raw_ostream contexts.	Benjamin Kramer
	Some output changes from uppercase hex to lowercase hex, no other functionality change intended. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321526 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-28	[X86][SSE] Use PMADDWD for v4i32 multiplies with 17 or more leading zeros	Simon Pilgrim
	If there are 17 or more leading zeros to the v4i32 elements, then we can use PMADD for the integer multiply when PMULLD is unavailable or slow. The 17 bits need to be zero as the PMADDWD performs a v8i16 signed-mul-extend + pairwise-add - the upper 16 so we're adding a zero pair and the 17th bit so we don't incorrectly sign extend. Differential Revision: https://reviews.llvm.org/D41484 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321516 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-27	[X86] Add CLWB to icelake.	Craig Topper
	Per Table 1-1 in October 2017 edition of Intel® Architecture Instruction Set Extensions and Future Features git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321501 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-27	[X86] Reimplement r321437 using custom lowering instead of as a DAG combine.	Craig Topper
	My original implementation ran as a DAG combine post type legalization, but it turns out we don't run that DAG combine step if type legalization didn't change anything. Attempts to make the combine run before type legalization as well hit other issues. So just do it in LowerMUL where we can catch more cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321496 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-27	[AArch64] Change order of candidate FMLS patterns	Matthew Simpson
	r319980 added new patterns to the machine combiner for transforming (fsub (fmul x y) z) into (fmla (fneg z) x y). That is, fsub's where the first source operand is an fmul are transformed. We previously only matched the case where the second source operand of an fsub was an fmul, transforming (fsub z (fmul x y)) into (fmls z x y). Now, if we have an fsub where both source operands are fmuls, both of the above patterns are applicable. However, the order in which we add the patterns to the list of candidates determines the transformation that takes place, since only the first pattern that matches will be used. This patch changes the order these two patterns are added to the list of candidates such that we prefer the case where the second source operand is an fmul (the fmls case), rather than the other one (the fmla/fneg case). When both source operands are fmuls, this ordering results in fewer instructions. Differential Revision: https://reviews.llvm.org/D41587 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321491 91177308-0d34-0410-b5e6-96231b3b80d8