ampere-computing/compiler-rt.git - compiler-rt including Ampere Computing toolchain specific patches

Age	Commit message (Collapse)	Author
2017-12-08	[scudo] Minor code generation improvement	Kostya Kortchinsky
	Summary: It looks like clang was generating somewhat weird assembly with the current code. `FromPrimary`, even though `const`, was replaced every time with the code generated for `size <= SizeClassMap::kMaxSize` instead of using a variable or register, and `FromPrimary` didn't induce `ClassId != 0` for the compiler, so a dead branch was generated for `getActuallyAllocatedSize(Ptr, ClassId)` since it's never called for `ClassId = 0` (Secondary backed allocations) [this one was more wishful thinking on my side than anything else]. I rearranged the code bit so that the generated assembly is less clunky. Also changed 2 whitespace inconsistencies that were bothering me. Reviewers: alekseyshl, flowerhack Reviewed By: flowerhack Subscribers: llvm-commits, #sanitizers Differential Revision: https://reviews.llvm.org/D40976 git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@320160 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-05	[scudo] Get rid of the thread local PRNG & header salt	Kostya Kortchinsky
	Summary: It was deemed that the salt in the chunk header didn't improve security significantly (and could actually decrease it). The initial idea was that the same chunk would different headers on different allocations, allowing for less predictability. The issue is that gathering the same chunk header with different salts can give information about the other "secrets" (cookie, pointer), and that if an attacker leaks a header, they can reuse it anyway for that same chunk anyway since we don't enforce the salt value. So we get rid of the salt in the header. This means we also get rid of the thread local Prng, and that we don't need a global Prng anymore as well. This makes everything faster. We reuse those 8 bits to store the `ClassId` of a chunk now (0 for a secondary based allocation). This way, we get some additional speed gains: - `ClassId` is computed outside of the locked block; - `getActuallyAllocatedSize` doesn't need the `GetSizeClass` call; - same for `deallocatePrimary`; We add a sanity check at init for this new field (all sanity checks are moved in their own function, `init` was getting crowded). Reviewers: alekseyshl, flowerhack Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40796 git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@319791 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-22	[scudo] Overhaul hardware CRC32 feature detection	Kostya Kortchinsky
	Summary: This patch aims at condensing the hardware CRC32 feature detection and making it slightly more effective on Android. The following changes are included: - remove the `CPUFeature` enum, and get rid of one level of nesting of functions: we only used CRC32, so we just implement and use `hasHardwareCRC32`; - allow for a weak `getauxval`: the Android toolchain is compiled at API level 14 for Android ARM, meaning no `getauxval` at compile time, yet we will run on API level 27+ devices. The `/proc/self/auxv` fallback can work but is worthless for a process like `init` where the proc filesystem doesn't exist yet. If a weak `getauxval` doesn't exist, then fallback. - couple of extra corrections. Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: kubamracek, aemerson, srhines, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D40322 git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@318859 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-03	[scudo] Rearrange #include order	Kostya Kortchinsky
	Summary: To be compliant with https://llvm.org/docs/CodingStandards.html#include-style, system headers have to come after local headers. Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39623 git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@317390 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-14	[sanitizers] Add a blocking boolean to GetRandom prototype	Kostya Kortchinsky
	Summary: On platforms with `getrandom`, the system call defaults to blocking. This becomes an issue in the very early stage of the boot for Scudo, when the RNG source is not set-up yet: the syscall will block and we'll stall. Introduce a parameter to specify that the function should not block, defaulting to blocking as the underlying syscall does. Update Scudo to use the non-blocking version. Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: llvm-commits, kubamracek Differential Revision: https://reviews.llvm.org/D36399 git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@310839 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-12	[scudo] PRNG makeover	Kostya Kortchinsky
	Summary: This follows the addition of `GetRandom` with D34412. We remove our `/dev/urandom` code and use the new function. Additionally, change the PRNG for a slightly faster version. One of the issues with the old code is that we have 64 full bits of randomness per "next", using only 8 of those for the Salt and discarding the rest. So we add a cached u64 in the PRNG that can serve up to 8 u8 before having to call the "next" function again. During some integration work, I also realized that some very early processes (like `init`) do not benefit from `/dev/urandom` yet. So if there is no `getrandom` syscall as well, we have to fallback to some sort of initialization of the PRNG. Now a few words on why XoRoShiRo and not something else. I have played a while with various PRNGs on 32 & 64 bit platforms. Some results are below. LCG 32 & 64 are usually faster but produce respectively 15 & 31 bits of entropy, meaning that to get a full 64-bit, you would need to call them several times. The simple XorShift is fast, produces 32 bits but is mediocre with regard to PRNG test suites, PCG is slower overall, and XoRoShiRo is faster than XorShift128+ and produces full 64 bits. %%% root@tulip-chiphd:/data # ./randtest.arm [+] starting xs32... [?] xs32 duration: 22431833053ns [+] starting lcg32... [?] lcg32 duration: 14941402090ns [+] starting pcg32... [?] pcg32 duration: 44941973771ns [+] starting xs128p... [?] xs128p duration: 48889786981ns [+] starting lcg64... [?] lcg64 duration: 33831042391ns [+] starting xos128p... [?] xos128p duration: 44850878605ns root@tulip-chiphd:/data # ./randtest.aarch64 [+] starting xs32... [?] xs32 duration: 22425151678ns [+] starting lcg32... [?] lcg32 duration: 14954255257ns [+] starting pcg32... [?] pcg32 duration: 37346265726ns [+] starting xs128p... [?] xs128p duration: 22523807219ns [+] starting lcg64... [?] lcg64 duration: 26141304679ns [+] starting xos128p... [?] xos128p duration: 14937033215ns %%% Reviewers: alekseyshl Reviewed By: alekseyshl Subscribers: aemerson, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35221 git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@307798 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-09	[scudo] CRC32 optimizations	Kostya Kortchinsky
	Summary: This change optimizes several aspects of the checksum used for chunk headers. First, there is no point in checking the weak symbol `computeHardwareCRC32` everytime, it will either be there or not when we start, so check it once during initialization and set the checksum type accordingly. Then, the loading of `HashAlgorithm` for SSE versions (and ARM equivalent) was not optimized out, while not necessary. So I reshuffled that part of the code, which duplicates a tiny bit of code, but ends up in a much cleaner assembly (and faster as we avoid an extraneous load and some calls). The following code is the checksum at the end of `scudoMalloc` for x86_64 with full SSE 4.2, before: ``` mov rax, 0FFFFFFFFFFFFFFh shl r10, 38h mov edi, dword ptr cs:_ZN7__scudoL6CookieE ; __scudo::Cookie and r14, rax lea rsi, [r13-10h] movzx eax, cs:_ZN7__scudoL13HashAlgorithmE ; __scudo::HashAlgorithm or r14, r10 mov rbx, r14 xor bx, bx call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov rsi, rbx mov edi, eax call _ZN7__scudo20computeHardwareCRC32Ejm ; __scudo::computeHardwareCRC32(uint,ulong) mov r14w, ax mov rax, r13 mov [r13-10h], r14 ``` After: ``` mov rax, cs:_ZN7__scudoL6CookieE ; __scudo::Cookie lea rcx, [rbx-10h] mov rdx, 0FFFFFFFFFFFFFFh and r14, rdx shl r9, 38h or r14, r9 crc32 eax, rcx mov rdx, r14 xor dx, dx mov eax, eax crc32 eax, rdx mov r14w, ax mov rax, rbx mov [rbx-10h], r14 ``` Reviewers: dvyukov, alekseyshl, kcc Reviewed By: alekseyshl Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D32971 git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@302538 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-27	[scudo] Move thread local variables into their own files	Kostya Kortchinsky
	Summary: This change introduces scudo_tls.h & scudo_tls_linux.cpp, where we move the thread local variables used by the allocator, namely the cache, quarantine cache & prng. `ScudoThreadContext` will hold those. This patch doesn't introduce any new platform support yet, this will be the object of a later patch. This also changes the PRNG so that the structure can be POD. Reviewers: kcc, dvyukov, alekseyshl Reviewed By: dvyukov, alekseyshl Subscribers: llvm-commits, mgorny Differential Revision: https://reviews.llvm.org/D32440 git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@301584 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-20	[scudo] Minor changes and refactoring	Kostya Kortchinsky
	Summary: This is part of D31947 that is being split into several smaller changes. This one deals with all the minor changes, more specifically: - Rename some variables and functions to make their purpose clearer; - Reorder some code; - Mark the hot termination incurring checks as `UNLIKELY`; if they happen, the program will die anyway; - Add a `getScudoChunk` method; - Add an `eraseHeader` method to ScudoChunk that will clear a header with 0s; - Add a parameter to `allocate` to know if the allocated chunk should be filled with zeros. This allows `calloc` to not have to call `GetActuallyAllocatedSize`; more changes to get rid of this function on the hot paths will follow; - reallocate was missing a check to verify that the pointer is properly aligned on `MinAlignment`; - The `Stats` in the secondary have to be protected by a mutex as the `Add` and `Sub` methods are actually not atomic; - The software CRC32 function was moved to the header to allow for inlining. Reviewers: dvyukov, alekseyshl, kcc Reviewed By: dvyukov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32242 git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@300846 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-18	[scudo] Refactor of CRC32 and ARM runtime CRC32 detection	Kostya Kortchinsky
	Summary: ARM & AArch64 runtime detection for hardware support of CRC32 has been added via check of the AT_HWVAL auxiliary vector. Following Michal's suggestions in D28417, the CRC32 code has been further changed and looks better now. When compiled with full relro (which is strongly suggested to benefit from additional hardening), the weak symbol for computeHardwareCRC32 is read-only and the assembly generated is fairly clean and straight forward. As suggested, an additional optimization is to skip the runtime check if SSE 4.2 has been enabled globally, as opposed to only for scudo_crc32.cpp. scudo_crc32.h has no purpose anymore and was removed. Reviewers: alekseyshl, kcc, rengolin, mgorny, phosek Reviewed By: rengolin, mgorny Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D28574 git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@292409 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-10	[scudo] Separate hardware CRC32 routines	Kostya Kortchinsky
	Summary: As raised in D28304, enabling SSE 4.2 for the whole Scudo tree leads to the emission of SSE 4.2 instructions everywhere, while the runtime checks only applied to the CRC32 computing function. This patch separates the CRC32 function taking advantage of the hardware into its own file, and only enabled -msse4.2 for that file, if detected to be supported by the compiler. Another consequence of removing SSE4.2 globally is realizing that memcpy were not being optimized, which turned out to be due to the -fno-builtin in SANITIZER_COMMON_CFLAGS. So we now explicitely enable builtins for Scudo. The resulting assembly looks good, with some CALLs are introduced instead of the CRC32 code being inlined. Reviewers: kcc, mgorny, alekseyshl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28417 git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@291570 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-30	[scudo] 32-bit and hardware agnostic support	Kostya Kortchinsky
	Summary: This update introduces i386 support for the Scudo Hardened Allocator, and offers software alternatives for functions that used to require hardware specific instruction sets. This should make porting to new architectures easier. Among the changes: - The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b). - Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit. - Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation. - Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks. Reviewers: alekseyshl, kcc Subscribers: filcab, llvm-commits, tberghammer, danalbert, srhines, mgorny, modocache Differential Revision: https://reviews.llvm.org/D26358 git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@288255 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-02	[scudo] add NORETURN to the declaration of dieWithMessage; this should fix a ↵	Kostya Serebryany
	warning in lib/scudo/scudo_termination.cpp git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@277546 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-07	[sanitizer] Initial implementation of a Hardened Allocator	Kostya Serebryany
	Summary: This is an initial implementation of a Hardened Allocator based on Sanitizer Common's CombinedAllocator. It aims at mitigating heap based vulnerabilities by adding several features to the base allocator, while staying relatively fast. The following were implemented: - additional consistency checks on the allocation function parameters and on the heap chunks; - use of checksum protected chunk header, to detect corruption; - randomness to the allocator base; - delayed freelist (quarantine), to mitigate use after free and overall determinism. Additional mitigations are in the works. Reviewers: eugenis, aizatsky, pcc, krasin, vitalybuka, glider, dvyukov, kcc Subscribers: kubabrecka, filcab, llvm-commits Differential Revision: http://reviews.llvm.org/D20084 git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@271968 91177308-0d34-0410-b5e6-96231b3b80d8