ampere-computing/llvm.git - LLVM including Ampere Computing toolchain specific patches

Age	Commit message (Collapse)	Author
2018-03-01	Merging r326393:	Hans Wennborg
	------------------------------------------------------------------------ r326393 \| ctopper \| 2018-03-01 01:08:38 +0100 (Thu, 01 Mar 2018) \| 5 lines [X86] Make sure we don't combine (fneg (fma X, Y, Z)) to a target specific node when there are no FMA instructions. This would cause a 'cannot select' error at isel when we should have emitted a lib call and an xor. Fixes PR36553. ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@326423 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-21	Merging r325654:	Hans Wennborg
	------------------------------------------------------------------------ r325654 \| ctopper \| 2018-02-21 01:15:48 +0100 (Wed, 21 Feb 2018) \| 10 lines [X86] Disable CLWB for Cannon Lake Cannon Lake does not support CLWB, therefore it does not include all features listed under SKX anymore. Instead, enumerate all SKX features with the exception of CLWB. Patch by Gabor Buella Differential Revision: https://reviews.llvm.org/D43380 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325671 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-14	Revert r319778 (and r319911) due to PR36357	Hans Wennborg
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325112 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-14	Merging r324576:	Hans Wennborg
	------------------------------------------------------------------------ r324576 \| ctopper \| 2018-02-08 08:45:55 +0100 (Thu, 08 Feb 2018) \| 20 lines [X86] Don't emit KTEST instructions unless only the Z flag is being used Summary: KTEST has weird flag behavior. The Z flag is set for all bits in the AND of the k-registers being 0, and the C flag is set for all bits being 1. All other flags are cleared. We currently emit this instruction in EmitTEST and don't check the condition code. This can lead to strange things like using the S flag after a KTEST for a signed compare. The domain reassignment pass can also transform TEST instructions into KTEST and is not protected against the flag usage either. For now I've disabled this part of the domain reassignment pass. I tried to comment out the checks in the mir test so that we could recover them later, but I couldn't figure out how to get that to work. This patch moves the KTEST handling into LowerSETCC and now creates a ktest+x86setcc. I've chosen this approach because I'd like to add support for the C flag for all ones in a followup patch. To do that requires that I can rewrite the condition code going in the x86setcc to be different than the original SETCC condition code. This fixes PR36182. I'll file a PR to fix domain reassignment once this goes in. Should this be merged to 6.0? Reviewers: spatel, guyblank, RKSimon, zvi Reviewed By: guyblank Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42770 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325106 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-14	Merging r324497:	Hans Wennborg
	------------------------------------------------------------------------ r324497 \| ctopper \| 2018-02-07 19:32:15 +0100 (Wed, 07 Feb 2018) \| 1 line [X86] Regenerate test using update_mir_test_checks.py. NFC ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325105 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-14	Merging r325049:	Reid Kleckner
	------------------------------------------------------------------------ r325049 \| rnk \| 2018-02-13 12:47:49 -0800 (Tue, 13 Feb 2018) \| 17 lines [X86] Use EDI for retpoline when no scratch regs are left Summary: Instead of solving the hard problem of how to pass the callee to the indirect jump thunk without a register, just use a CSR. At a call boundary, there's nothing stopping us from using a CSR to hold the callee as long as we save and restore it in the prologue. Also, add tests for this mregparm=3 case. I wrote execution tests for __llvm_retpoline_push, but they never got committed as lit tests, either because I never rewrote them or because they got lost in merge conflicts. Reviewers: chandlerc, dwmw2 Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D43214 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325084 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-14	Merging r324645:	Reid Kleckner
	------------------------------------------------------------------------ r324645 \| dwmw2 \| 2018-02-08 12:06:05 -0800 (Thu, 08 Feb 2018) \| 5 lines [X86] Support 'V' register operand modifier This allows the register name to be printed without the leading '%'. This can be used for emitting calls to the retpoline thunks from inline asm. ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325083 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-14	Merging r324449:	Reid Kleckner
	------------------------------------------------------------------------ r324449 \| chandlerc \| 2018-02-06 22:16:24 -0800 (Tue, 06 Feb 2018) \| 15 lines [x86/retpoline] Make the external thunk names exactly match the names that happened to end up in GCC. This is really unfortunate, as the names don't have much rhyme or reason to them. Originally in the discussions it seemed fine to rely on aliases to map different names to whatever external thunk code developers wished to use but there are practical problems with that in the kernel it turns out. And since we're discovering this practical problems late and since GCC has already shipped a release with one set of names, we are forced, yet again, to blindly match what is there. Somewhat rushing this patch out for the Linux kernel folks to test and so we can get it patched into our releases. Differential Revision: https://reviews.llvm.org/D42998 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325082 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-05	Merging r324002:	Hans Wennborg
	------------------------------------------------------------------------ r324002 \| ctopper \| 2018-02-01 21:48:50 +0100 (Thu, 01 Feb 2018) \| 7 lines [DAGCombiner] When folding (insert_subvector undef, (bitcast (extract_subvector N1, Idx)), Idx) -> (bitcast N1) make sure that N1 has the same total size as the original output We were only checking the element count, but not the total width. This could cause illegal bitcasts to be created if for example the output was 512-bits, but N1 is 256 bits, and the extraction size was 128-bits. Fixes PR36199 Differential Revision: https://reviews.llvm.org/D42809 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324216 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-02	Merging r323915:	Hans Wennborg
	------------------------------------------------------------------------ r323915 \| chandlerc \| 2018-01-31 21:56:37 +0100 (Wed, 31 Jan 2018) \| 17 lines [x86] Make the retpoline thunk insertion a machine function pass. Summary: This removes the need for a machine module pass using some deeply questionable hacks. This should address PR36123 which is a case where in full LTO the memory usage of a machine module pass actually ended up being significant. We should revert this on trunk as soon as we understand and fix the memory usage issue, but we should include this in any backports of retpolines themselves. Reviewers: echristo, MatzeB Subscribers: sanjoy, mcrosier, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D42726 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324071 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-02	Merging r323155:	Hans Wennborg
	------------------------------------------------------------------------ r323155 \| chandlerc \| 2018-01-22 23:05:25 +0100 (Mon, 22 Jan 2018) \| 133 lines Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took in the speculative execution, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` in addition to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile all libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we strongly recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324067 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-30	Merging r323672: (test-case re-generated)	Hans Wennborg
	------------------------------------------------------------------------ r323672 \| ctopper \| 2018-01-29 18:56:57 +0100 (Mon, 29 Jan 2018) \| 5 lines [X86] Don't create SHRUNKBLEND when the condition is used by the true or false operand of the vselect. Fixes PR34592. Differential Revision: https://reviews.llvm.org/D42628 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323743 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-24	Merging r323190:	Hans Wennborg
	------------------------------------------------------------------------ r323190 \| rksimon \| 2018-01-23 12:39:06 +0100 (Tue, 23 Jan 2018) \| 5 lines [X86][SSE] LowerBUILD_VECTORAsVariablePermute - fix PSHUFB source/index operand ordering As detailed in rL317463, PSHUFB (like most variable shuffle instructions) uses Op[0] for the source vector and Op[1] for the shuffle index vector, VPERMV works in reverse which is probably where the confusion comes from. Differential Revision: https://reviews.llvm.org/D42380 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323335 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-18	Merging r322644:	Hans Wennborg
	------------------------------------------------------------------------ r322644 \| d0k \| 2018-01-17 05:01:06 -0800 (Wed, 17 Jan 2018) \| 7 lines [X86] Don't mutate shuffle arguments after early-out for AVX512 The match* functions have the annoying behavior of modifying its inputs. Save and restore the inputs, just in case the early out for AVX512 is hit. This is still not great and its only a matter of time this kind of bug happens again, but I couldn't come up with a better pattern without rewriting significant chunks of this code. Fixes PR35977. ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@322840 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-18	Merging r322724:	Hans Wennborg
	------------------------------------------------------------------------ r322724 \| ctopper \| 2018-01-17 10:46:01 -0800 (Wed, 17 Jan 2018) \| 7 lines [X86] When legalizing (v64i1 select i8, v64i1, v64i1) make sure not to introduce bitcasts to i64 in 32-bit mode We legalize selects of masks with scalar conditions using a bitcast to an integer type. But if we are in 32-bit mode we can't convert v64i1 to i64. So instead split the v64i1 to v32i1 and concat it back together. Each half will then be legalized by bitcasting to i32 which is fine. The test case is a little indirect. If we have the v64i1 select in IR it will get legalized by legalize vector ops which has a run of type legalization after it. That type legalization run is able to fix this i64 bitcast. So in order to avoid that we need a build_vector of a splat which legalize vector ops will ignore. Legalize DAG will then turn that into a select via LowerBUILD_VECTORvXi1. And the select will get legalized. In this case there is no type legalizer run to cleanup the bitcast. This fixes pr35972. ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@322835 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-17	Merging r322223:	Hans Wennborg
	------------------------------------------------------------------------ r322223 \| matze \| 2018-01-10 12:49:57 -0800 (Wed, 10 Jan 2018) \| 5 lines TargetLoweringBase: The ios simulator has no bzero function. Make sure I really get back to the beahvior before my rewrite in r321035 which turned out not to be completely NFC as I changed the behavior for the ios simulator environment. ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@322681 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-17	Merging r322272:	Hans Wennborg
	------------------------------------------------------------------------ r322272 \| zvi \| 2018-01-11 04:26:52 -0800 (Thu, 11 Jan 2018) \| 15 lines X86: Fix LowerBUILD_VECTORAsVariablePermute for case Src is smaller than Indices Summary: As RKSimon suggested in pr35820, in the case that Src is smaller in bit-size than Indices, need to widen Src to avoid type mismatch. Fixes pr35820 Reviewers: RKSimon, craig.topper Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41865 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@322679 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-17	Merging r321791 and r321862:	Hans Wennborg
	------------------------------------------------------------------------ r321791 \| sam_parker \| 2018-01-04 01:42:27 -0800 (Thu, 04 Jan 2018) \| 4 lines [X86] Codegen test for PR37563 Adding test to ease review of D41628. ------------------------------------------------------------------------ ------------------------------------------------------------------------ r321862 \| sam_parker \| 2018-01-05 00:47:23 -0800 (Fri, 05 Jan 2018) \| 10 lines [DAGCombine] Fix for PR37563 While searching for loads to be narrowed, equal sized loads were not added to the list, resulting in anyext loads not being converted to zext loads. https://bugs.llvm.org/show_bug.cgi?id=35763 Differential Revision: https://reviews.llvm.org/D41628 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@322671 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-17	Merging r321991:	Hans Wennborg
	------------------------------------------------------------------------ r321991 \| sam_parker \| 2018-01-08 05:21:24 -0800 (Mon, 08 Jan 2018) \| 9 lines [DAGCombine] Fix for PR35761 I had falsely assumed that constant operands would be operand(1) of the bin ops that may need their constant operand to be masked. Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=35761 Differential Revision: https://reviews.llvm.org/D41667 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@322670 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-17	Merging r322623:	Hans Wennborg
	------------------------------------------------------------------------ r322623 \| avt77 \| 2018-01-17 02:12:06 -0800 (Wed, 17 Jan 2018) \| 3 lines Allow usage of X86-prefixes as separate instrs. Differential Revision: https://reviews.llvm.org/D42102 ------------------------------------------------------------------------ git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@322654 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-02	Handle the case of live 16-bit subregisters in X86FixupBWInsts	Andrew Kaylor
	Differential Revision: https://reviews.llvm.org/D40524 Change-Id: Ie3a405b28503ceae999f5f3ba07a68fa733a2400 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321674 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-02	[x86] allow pairs of PCMPEQ for vector-sized integer equality comparisons ↵	Sanjay Patel
	(PR33325) This is an extension of D31156 with the goal that we'll allow memcmp() == 0 expansion for x86 to use 2 pairs of loads per block. The memcmp expansion pass (formerly part of CGP) will generate this kind of pattern with oversized integer compares, so we want to transform these into x86-specific vector nodes before legalization splits things into scalar chunks. See PR33325 for more details: https://bugs.llvm.org/show_bug.cgi?id=33325 Differential Revision: https://reviews.llvm.org/D41618 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321656 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-02	[DAGCombine] Fix for PR35765	Sam Parker
	Remove the acceptance of ANY_EXTEND nodes while trying to move and nodes back to loads. Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=35765 Differential Revision: https://reviews.llvm.org/D41625 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321641 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-02	[X86] Codegen test for pr35765	Sam Parker
	Committing reproducer test for pr35765, fix to follow. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321640 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-02	[SelectionDAG] Teach WidenVecOp_Convert to widen the operation if a widened ↵	Craig Topper
	result type would still be legal. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321638 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-01	[X86] Promote vXi1 fp_to_uint/fp_to_sint to vXi32 to avoid scalarization.	Craig Topper
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321632 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-01	[X86] Add test cases for vXi1 fptosi/fptoui.	Craig Topper
	Currently we do a lot of scalarization in these test cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321631 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-01	[x86] add runs for more vector variants; NFC	Sanjay Patel
	Preliminary step to see what the effects of D41618 look like. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321624 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-01	[X86][SSE] Add test case from PR32160	Simon Pilgrim
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321620 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-01	[X86] Regenerate test checks in sse-intrinsics-x86-upgrade with update-llc	Uriel Korach
	Removing outdated checks. NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321619 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-01	[X86] Regenerate test checks in sse2-intrinsics-x86-upgrade with update-llc	Uriel Korach
	Removing outdated checks. NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321618 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-01	[X86] In LowerTruncateVecI1, don't add SHL if the input is known to be all ↵	Craig Topper
	sign bits. If the input is all sign bits then the LSB through MSB are all the same so we don't need to be move the LSB to the MSB. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321617 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-01	[X86] Add patterns for using zmm registers for v8i32/v8f32 vselect with the ↵	Craig Topper
	false input being zero. We can use zmm move with zero masking for this. We already had patterns for using a masked move, but we didn't check for the zero masking case separately. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321612 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-31	[X86] Use CONCAT_VECTORS instead of INSERT_SUBVECTOR for padding v4i1/v2i1 ↵	Craig Topper
	vector to v8i1 pre-legalize. The CONCAT_VECTORS will be lowered to INSERT_SUBVECTOR later. In the modified cases this seems to be enough to trick a later DAG combine into running in a different order than allows the ANDs to be removed. I'll admit this is a bit of a hack that happens to work, but using CONCAT_VECTORS is more consistent with other legalization code anyway. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321611 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-31	[X86][AVX2] Combine extract(broadcast(scalar_value)) --> scalar_value	Simon Pilgrim
	As it has a scalar source we don't treat it as a target shuffle so needs special handling. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321610 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-31	[X86][AVX] Add test case from PR33740	Simon Pilgrim
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321608 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-31	[X86][SSE] Don't vectorize splat buildvector of binops (PR30780)	Simon Pilgrim
	Don't combine buildvector(binop(),binop(),binop(),binop()) -> binop(buildvector(), buildvector()) if its a splat - keep the binop scalar and just splat the result to avoid large vector constants. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321607 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-31	[X86] Add a DAG combine to widen (i4 (bitcast (v4i1))) before type ↵	Craig Topper
	legalization sees the i4 and changes to load/store. Same for v2i1 and i2. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321602 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-31	[X86] Add a DAG combine to fix (v4i1 (bitcast (i4))) before type ↵	Craig Topper
	legalization sees the i4 and changes to load/store. Same for i2 and v2i1. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321601 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-31	[X86] Prevent combining (v8i1 (bitconvert (i8 load)))->(v8i1 load) if we ↵	Craig Topper
	don't have DQI. We end up using an i8 load via an isel pattern from v8i1 anyway. This just makes it more explicit. This seems to improve codgen in some cases and I'd like to kill off some of the load patterns. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321598 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-31	[X86] Remove AND32ri8 from pattern for v1i1 load.	Craig Topper
	I don't think anything would actually expect the other bits to be zero. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321596 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-31	[X86] Fix a crash when returning a <1 x i1> value>	Craig Topper
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321595 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-30	[X86][SSE] Add PR30780 test cases	Simon Pilgrim
	Broadcast of sign/zero extended scalars resulting in unnecessary vector constants git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321584 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-30	[X86][SSE] Add test for (v2f32 uitofp(build_vector(i32, i32))) (PR35732)	Simon Pilgrim
	To compare against (v2f32 build_vector(f32 uitofp(i32), f32 uitofp(i32))) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321583 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-30	[X86] Custom legalize vXi1 extract_subvector with KSHIFTR.	Craig Topper
	This allows us to remove some isel patterns. This is mostly NFC, but we now use KSHIFTB instead of KSHIFTW with DQI. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321576 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-29	[X86][SSE] Match PSHUFLW/PSHUFHW + PSHUFD vXi16 shuffle patterns (PR34686)	Simon Pilgrim
	As noted in PR34686, we are relying on a PSHUFD+PSHUFLW+PSHUFHW shuffle chain for most general vXi16 unary shuffles. This patch checks for simpler PSHUFLW+PSHUFD and PSHUFHW+PSHUFD cases beforehand, building on some existing code that just handled splat shuffles. By doing so we also prevent premature use of PSHUFB shuffles which can be slower and require the creation/loading of constant shuffle masks. We now have the 'fast-variable-shuffle' option for hardware that prefers combining 2 or more shuffles to VPSHUFB etc. Differential Revision: https://reviews.llvm.org/D38318 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321553 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-28	[x86] add tests for potential memcmp expansion (PR33325); NFC	Sanjay Patel
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321542 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-28	[X86] When lowering extending loads from v2i1/v4i1, if we have VLX, use a ↵	Craig Topper
	narrower extend. Previously we used an extend from v8i1 to v8i32/v8i64. Then extracted to the final width. But if we have VLX we should extract first. This way we don't end up with an overly large extend. This allows us to use vcmpeq to make all ones for the sign extend when DQI isn't available. Otherwise we get a VPTERNLOG. If we make v2i1/v4i1 legal like proposed in D41560, we could always do this and rely on the lowering of the extend to widen when necessary. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321538 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-28	[WinEH] Don't emit state stores or EH thunks for available_externally functions	Reid Kleckner
	The exception handler thunk needs to reference the LSDA of the parent function, which won't be emitted if it's available_externally. Fixes PR35736. ThinLTO ends up producing available_externally functions that use _CxxFrameHandler3. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321532 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-28	Fix tests after move to utohexstr.	Benjamin Kramer
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321527 91177308-0d34-0410-b5e6-96231b3b80d8