Age | Commit message (Collapse) | Author |
|
------------------------------------------------------------------------
r326393 | ctopper | 2018-03-01 01:08:38 +0100 (Thu, 01 Mar 2018) | 5 lines
[X86] Make sure we don't combine (fneg (fma X, Y, Z)) to a target specific node when there are no FMA instructions.
This would cause a 'cannot select' error at isel when we should have emitted a lib call and an xor.
Fixes PR36553.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@326423 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
------------------------------------------------------------------------
r325654 | ctopper | 2018-02-21 01:15:48 +0100 (Wed, 21 Feb 2018) | 10 lines
[X86] Disable CLWB for Cannon Lake
Cannon Lake does not support CLWB, therefore it
does not include all features listed under SKX anymore.
Instead, enumerate all SKX features with the exception of CLWB.
Patch by Gabor Buella
Differential Revision: https://reviews.llvm.org/D43380
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325671 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325112 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
------------------------------------------------------------------------
r324576 | ctopper | 2018-02-08 08:45:55 +0100 (Thu, 08 Feb 2018) | 20 lines
[X86] Don't emit KTEST instructions unless only the Z flag is being used
Summary:
KTEST has weird flag behavior. The Z flag is set for all bits in the AND of the k-registers being 0, and the C flag is set for all bits being 1. All other flags are cleared.
We currently emit this instruction in EmitTEST and don't check the condition code. This can lead to strange things like using the S flag after a KTEST for a signed compare.
The domain reassignment pass can also transform TEST instructions into KTEST and is not protected against the flag usage either. For now I've disabled this part of the domain reassignment pass. I tried to comment out the checks in the mir test so that we could recover them later, but I couldn't figure out how to get that to work.
This patch moves the KTEST handling into LowerSETCC and now creates a ktest+x86setcc. I've chosen this approach because I'd like to add support for the C flag for all ones in a followup patch. To do that requires that I can rewrite the condition code going in the x86setcc to be different than the original SETCC condition code.
This fixes PR36182. I'll file a PR to fix domain reassignment once this goes in. Should this be merged to 6.0?
Reviewers: spatel, guyblank, RKSimon, zvi
Reviewed By: guyblank
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42770
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325106 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
------------------------------------------------------------------------
r324497 | ctopper | 2018-02-07 19:32:15 +0100 (Wed, 07 Feb 2018) | 1 line
[X86] Regenerate test using update_mir_test_checks.py. NFC
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325105 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
------------------------------------------------------------------------
r325049 | rnk | 2018-02-13 12:47:49 -0800 (Tue, 13 Feb 2018) | 17 lines
[X86] Use EDI for retpoline when no scratch regs are left
Summary:
Instead of solving the hard problem of how to pass the callee to the indirect
jump thunk without a register, just use a CSR. At a call boundary, there's
nothing stopping us from using a CSR to hold the callee as long as we save and
restore it in the prologue.
Also, add tests for this mregparm=3 case. I wrote execution tests for
__llvm_retpoline_push, but they never got committed as lit tests, either
because I never rewrote them or because they got lost in merge conflicts.
Reviewers: chandlerc, dwmw2
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D43214
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325084 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
------------------------------------------------------------------------
r324645 | dwmw2 | 2018-02-08 12:06:05 -0800 (Thu, 08 Feb 2018) | 5 lines
[X86] Support 'V' register operand modifier
This allows the register name to be printed without the leading '%'.
This can be used for emitting calls to the retpoline thunks from inline
asm.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325083 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
------------------------------------------------------------------------
r324449 | chandlerc | 2018-02-06 22:16:24 -0800 (Tue, 06 Feb 2018) | 15 lines
[x86/retpoline] Make the external thunk names exactly match the names
that happened to end up in GCC.
This is really unfortunate, as the names don't have much rhyme or reason
to them. Originally in the discussions it seemed fine to rely on aliases
to map different names to whatever external thunk code developers wished
to use but there are practical problems with that in the kernel it turns
out. And since we're discovering this practical problems late and since
GCC has already shipped a release with one set of names, we are forced,
yet again, to blindly match what is there.
Somewhat rushing this patch out for the Linux kernel folks to test and
so we can get it patched into our releases.
Differential Revision: https://reviews.llvm.org/D42998
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325082 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
------------------------------------------------------------------------
r324002 | ctopper | 2018-02-01 21:48:50 +0100 (Thu, 01 Feb 2018) | 7 lines
[DAGCombiner] When folding (insert_subvector undef, (bitcast (extract_subvector N1, Idx)), Idx) -> (bitcast N1) make sure that N1 has the same total size as the original output
We were only checking the element count, but not the total width. This could cause illegal bitcasts to be created if for example the output was 512-bits, but N1 is 256 bits, and the extraction size was 128-bits.
Fixes PR36199
Differential Revision: https://reviews.llvm.org/D42809
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324216 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
------------------------------------------------------------------------
r323915 | chandlerc | 2018-01-31 21:56:37 +0100 (Wed, 31 Jan 2018) | 17 lines
[x86] Make the retpoline thunk insertion a machine function pass.
Summary:
This removes the need for a machine module pass using some deeply
questionable hacks. This should address PR36123 which is a case where in
full LTO the memory usage of a machine module pass actually ended up
being significant.
We should revert this on trunk as soon as we understand and fix the
memory usage issue, but we should include this in any backports of
retpolines themselves.
Reviewers: echristo, MatzeB
Subscribers: sanjoy, mcrosier, mehdi_amini, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D42726
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324071 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
------------------------------------------------------------------------
r323155 | chandlerc | 2018-01-22 23:05:25 +0100 (Mon, 22 Jan 2018) | 133 lines
Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..
Summary:
First, we need to explain the core of the vulnerability. Note that this
is a very incomplete description, please see the Project Zero blog post
for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
The basis for branch target injection is to direct speculative execution
of the processor to some "gadget" of executable code by poisoning the
prediction of indirect branches with the address of that gadget. The
gadget in turn contains an operation that provides a side channel for
reading data. Most commonly, this will look like a load of secret data
followed by a branch on the loaded value and then a load of some
predictable cache line. The attacker then uses timing of the processors
cache to determine which direction the branch took *in the speculative
execution*, and in turn what one bit of the loaded value was. Due to the
nature of these timing side channels and the branch predictor on Intel
processors, this allows an attacker to leak data only accessible to
a privileged domain (like the kernel) back into an unprivileged domain.
The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In many
cases, the compiler can simply use directed conditional branches and
a small search tree. LLVM already has support for lowering switches in
this way and the first step of this patch is to disable jump-table
lowering of switches and introduce a pass to rewrite explicit indirectbr
sequences into a switch over integers.
However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as
a trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures the
processor predicts the return to go to a controlled, known location. The
retpoline then "smashes" the return address pushed onto the stack by the
call with the desired target of the original indirect call. The result
is a predicted return to the next instruction after a call (which can be
used to trap speculative execution within an infinite loop) and an
actual indirect branch to an arbitrary address.
On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this device.
For 32-bit ABIs there isn't a guaranteed scratch register and so several
different retpoline variants are introduced to use a scratch register if
one is available in the calling convention and to otherwise use direct
stack push/pop sequences to pass the target address.
This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886
We also support a target feature that disables emission of the retpoline
thunk by the compiler to allow for custom thunks if users want them.
These are particularly useful in environments like kernels that
routinely do hot-patching on boot and want to hot-patch their thunk to
different code sequences. They can write this custom thunk and use
`-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
case, on x86-64 thu thunk names must be:
```
__llvm_external_retpoline_r11
```
or on 32-bit:
```
__llvm_external_retpoline_eax
__llvm_external_retpoline_ecx
__llvm_external_retpoline_edx
__llvm_external_retpoline_push
```
And the target of the retpoline is passed in the named register, or in
the case of the `push` suffix on the top of the stack via a `pushl`
instruction.
There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.
The only other indirect branches remaining that we are aware of are from
precompiled runtimes (such as crt0.o and similar). The ones we have
found are not really attackable, and so we have not focused on them
here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.
For kernels or other freestanding or fully static executables, the
compiler switch `-mretpoline` is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with `-mretpoline` and additionally link the dynamic
executable and all shared libraries with LLD and pass `-z retpolineplt`
(or use similar functionality from some other linker). We strongly
recommend also using `-z now` as non-lazy binding allows the
retpoline-mitigated PLT to be substantially smaller.
When manually apply similar transformations to `-mretpoline` to the
Linux kernel we observed very small performance hits to applications
running typical workloads, and relatively minor hits (approximately 2%)
even for extremely syscall-heavy applications. This is largely due to
the small number of indirect branches that occur in performance
sensitive paths of the kernel.
When using these patches on statically linked applications, especially
C++ applications, you should expect to see a much more dramatic
performance hit. For microbenchmarks that are switch, indirect-, or
virtual-call heavy we have seen overheads ranging from 10% to 50%.
However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
the impact of hot indirect calls (by speculatively promoting them to
direct calls) and allow optimized search trees to be used to lower
switches. If you need to deploy these techniques in C++ applications, we
*strongly* recommend that you ensure all hot call targets are statically
linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
tuned servers using all of these techniques saw 5% - 10% overhead from
the use of retpoline.
We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality available
as soon as possible. Happy for more code review, but we'd really like to
get these patches landed and backported ASAP for obvious reasons. We're
planning to backport this to both 6.0 and 5.0 release streams and get
a 5.0 release with just this cherry picked ASAP for distros and vendors.
This patch is the work of a number of people over the past month: Eric, Reid,
Rui, and myself. I'm mailing it out as a single commit due to the time
sensitive nature of landing this and the need to backport it. Huge thanks to
everyone who helped out here, and everyone at Intel who helped out in
discussions about how to craft this. Also, credit goes to Paul Turner (at
Google, but not an LLVM contributor) for much of the underlying retpoline
design.
Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer
Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D41723
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@324067 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
------------------------------------------------------------------------
r323672 | ctopper | 2018-01-29 18:56:57 +0100 (Mon, 29 Jan 2018) | 5 lines
[X86] Don't create SHRUNKBLEND when the condition is used by the true or false operand of the vselect.
Fixes PR34592.
Differential Revision: https://reviews.llvm.org/D42628
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323743 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
------------------------------------------------------------------------
r323190 | rksimon | 2018-01-23 12:39:06 +0100 (Tue, 23 Jan 2018) | 5 lines
[X86][SSE] LowerBUILD_VECTORAsVariablePermute - fix PSHUFB source/index operand ordering
As detailed in rL317463, PSHUFB (like most variable shuffle instructions) uses Op[0] for the source vector and Op[1] for the shuffle index vector, VPERMV works in reverse which is probably where the confusion comes from.
Differential Revision: https://reviews.llvm.org/D42380
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@323335 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
------------------------------------------------------------------------
r322644 | d0k | 2018-01-17 05:01:06 -0800 (Wed, 17 Jan 2018) | 7 lines
[X86] Don't mutate shuffle arguments after early-out for AVX512
The match* functions have the annoying behavior of modifying its inputs.
Save and restore the inputs, just in case the early out for AVX512 is
hit. This is still not great and its only a matter of time this kind of
bug happens again, but I couldn't come up with a better pattern without
rewriting significant chunks of this code. Fixes PR35977.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@322840 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
------------------------------------------------------------------------
r322724 | ctopper | 2018-01-17 10:46:01 -0800 (Wed, 17 Jan 2018) | 7 lines
[X86] When legalizing (v64i1 select i8, v64i1, v64i1) make sure not to introduce bitcasts to i64 in 32-bit mode
We legalize selects of masks with scalar conditions using a bitcast to an integer type. But if we are in 32-bit mode we can't convert v64i1 to i64. So instead split the v64i1 to v32i1 and concat it back together. Each half will then be legalized by bitcasting to i32 which is fine.
The test case is a little indirect. If we have the v64i1 select in IR it will get legalized by legalize vector ops which has a run of type legalization after it. That type legalization run is able to fix this i64 bitcast. So in order to avoid that we need a build_vector of a splat which legalize vector ops will ignore. Legalize DAG will then turn that into a select via LowerBUILD_VECTORvXi1. And the select will get legalized. In this case there is no type legalizer run to cleanup the bitcast.
This fixes pr35972.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@322835 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
------------------------------------------------------------------------
r322223 | matze | 2018-01-10 12:49:57 -0800 (Wed, 10 Jan 2018) | 5 lines
TargetLoweringBase: The ios simulator has no bzero function.
Make sure I really get back to the beahvior before my rewrite in r321035
which turned out not to be completely NFC as I changed the behavior for
the ios simulator environment.
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@322681 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
------------------------------------------------------------------------
r322272 | zvi | 2018-01-11 04:26:52 -0800 (Thu, 11 Jan 2018) | 15 lines
X86: Fix LowerBUILD_VECTORAsVariablePermute for case Src is smaller than Indices
Summary:
As RKSimon suggested in pr35820, in the case that Src is smaller in
bit-size than Indices, need to widen Src to avoid type mismatch.
Fixes pr35820
Reviewers: RKSimon, craig.topper
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41865
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@322679 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
------------------------------------------------------------------------
r321791 | sam_parker | 2018-01-04 01:42:27 -0800 (Thu, 04 Jan 2018) | 4 lines
[X86] Codegen test for PR37563
Adding test to ease review of D41628.
------------------------------------------------------------------------
------------------------------------------------------------------------
r321862 | sam_parker | 2018-01-05 00:47:23 -0800 (Fri, 05 Jan 2018) | 10 lines
[DAGCombine] Fix for PR37563
While searching for loads to be narrowed, equal sized loads were not
added to the list, resulting in anyext loads not being converted to
zext loads.
https://bugs.llvm.org/show_bug.cgi?id=35763
Differential Revision: https://reviews.llvm.org/D41628
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@322671 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
------------------------------------------------------------------------
r321991 | sam_parker | 2018-01-08 05:21:24 -0800 (Mon, 08 Jan 2018) | 9 lines
[DAGCombine] Fix for PR35761
I had falsely assumed that constant operands would be operand(1) of
the bin ops that may need their constant operand to be masked.
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=35761
Differential Revision: https://reviews.llvm.org/D41667
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@322670 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
------------------------------------------------------------------------
r322623 | avt77 | 2018-01-17 02:12:06 -0800 (Wed, 17 Jan 2018) | 3 lines
Allow usage of X86-prefixes as separate instrs.
Differential Revision: https://reviews.llvm.org/D42102
------------------------------------------------------------------------
git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@322654 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Differential Revision: https://reviews.llvm.org/D40524
Change-Id: Ie3a405b28503ceae999f5f3ba07a68fa733a2400
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321674 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
(PR33325)
This is an extension of D31156 with the goal that we'll allow memcmp() == 0 expansion
for x86 to use 2 pairs of loads per block.
The memcmp expansion pass (formerly part of CGP) will generate this kind of pattern
with oversized integer compares, so we want to transform these into x86-specific vector
nodes before legalization splits things into scalar chunks.
See PR33325 for more details:
https://bugs.llvm.org/show_bug.cgi?id=33325
Differential Revision: https://reviews.llvm.org/D41618
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321656 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Remove the acceptance of ANY_EXTEND nodes while trying to move and
nodes back to loads.
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=35765
Differential Revision: https://reviews.llvm.org/D41625
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321641 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Committing reproducer test for pr35765, fix to follow.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321640 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
result type would still be legal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321638 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321632 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Currently we do a lot of scalarization in these test cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321631 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Preliminary step to see what the effects of D41618 look like.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321624 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321620 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Removing outdated checks.
NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321619 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Removing outdated checks.
NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321618 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
sign bits.
If the input is all sign bits then the LSB through MSB are all the same so we don't need to be move the LSB to the MSB.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321617 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
false input being zero.
We can use zmm move with zero masking for this. We already had patterns for using a masked move, but we didn't check for the zero masking case separately.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321612 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
vector to v8i1 pre-legalize.
The CONCAT_VECTORS will be lowered to INSERT_SUBVECTOR later. In the modified cases this seems to be enough to trick a later DAG combine into running in a different order than allows the ANDs to be removed.
I'll admit this is a bit of a hack that happens to work, but using CONCAT_VECTORS is more consistent with other legalization code anyway.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321611 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
As it has a scalar source we don't treat it as a target shuffle so needs special handling.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321610 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321608 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Don't combine buildvector(binop(),binop(),binop(),binop()) -> binop(buildvector(), buildvector()) if its a splat - keep the binop scalar and just splat the result to avoid large vector constants.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321607 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
legalization sees the i4 and changes to load/store.
Same for v2i1 and i2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321602 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
legalization sees the i4 and changes to load/store.
Same for i2 and v2i1.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321601 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
don't have DQI.
We end up using an i8 load via an isel pattern from v8i1 anyway. This just makes it more explicit. This seems to improve codgen in some cases and I'd like to kill off some of the load patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321598 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
I don't think anything would actually expect the other bits to be zero.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321596 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321595 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Broadcast of sign/zero extended scalars resulting in unnecessary vector constants
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321584 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
To compare against (v2f32 build_vector(f32 uitofp(i32), f32 uitofp(i32)))
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321583 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
This allows us to remove some isel patterns.
This is mostly NFC, but we now use KSHIFTB instead of KSHIFTW with DQI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321576 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
As noted in PR34686, we are relying on a PSHUFD+PSHUFLW+PSHUFHW shuffle chain for most general vXi16 unary shuffles.
This patch checks for simpler PSHUFLW+PSHUFD and PSHUFHW+PSHUFD cases beforehand, building on some existing code that just handled splat shuffles.
By doing so we also prevent premature use of PSHUFB shuffles which can be slower and require the creation/loading of constant shuffle masks.
We now have the 'fast-variable-shuffle' option for hardware that prefers combining 2 or more shuffles to VPSHUFB etc.
Differential Revision: https://reviews.llvm.org/D38318
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321553 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321542 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
narrower extend.
Previously we used an extend from v8i1 to v8i32/v8i64. Then extracted to the final width. But if we have VLX we should extract first. This way we don't end up with an overly large extend.
This allows us to use vcmpeq to make all ones for the sign extend when DQI isn't available. Otherwise we get a VPTERNLOG.
If we make v2i1/v4i1 legal like proposed in D41560, we could always do this and rely on the lowering of the extend to widen when necessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321538 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
The exception handler thunk needs to reference the LSDA of the parent
function, which won't be emitted if it's available_externally.
Fixes PR35736. ThinLTO ends up producing available_externally functions
that use _CxxFrameHandler3.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321532 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321527 91177308-0d34-0410-b5e6-96231b3b80d8
|