ampere-computing/compiler-rt.git - compiler-rt including Ampere Computing toolchain specific patches

diff options

author	Dmitry Vyukov <dvyukov@google.com>	2017-07-14 11:30:06 +0000
committer	Dmitry Vyukov <dvyukov@google.com>	2017-07-14 11:30:06 +0000
commit	12d16901ebba5bc7185298e9efa3c4d265fa52ce (patch)
tree	99bb72b221ea02d6103e09b30010b77785156512 /test/scudo
parent	8a5e425a68de4d2c80ff00a97bbcb3722a4716da (diff)

tsan: optimize sync clock memory consumption

This change implements 2 optimizations of sync clocks that reduce memory consumption: Use previously unused first level block space to store clock elements. Currently a clock for 100 threads consumes 3 512-byte blocks: 2 64-bit second level blocks to store clock elements +1 32-bit first level block to store indices to second level blocks Only 8 bytes of the first level block are actually used. With this change such clock consumes only 2 blocks. Share similar clocks differing only by a single clock entry for the current thread. When a thread does several release operations on fresh sync objects without intervening acquire operations in between (e.g. initialization of several fields in ctor), the resulting clocks differ only by a single entry for the current thread. This change reuses a single clock for such release operations. The current thread time (which is different for different clocks) is stored in dirty entries. We are experiencing issues with a large program that eats all 64M clock blocks (32GB of non-flushable memory) and crashes with dense allocator overflow. Max number of threads in the program is ~170 which is currently quite unfortunate (consume 4 blocks per clock). Currently it crashes after consuming 60+ GB of memory. The first optimization brings clock block consumption down to ~40M and allows the program to work. The second optimization further reduces block consumption to "modest" 16M blocks (~8GB of RAM) and reduces overall RAM consumption to ~30GB. Measurements on another real world C++ RPC benchmark show RSS reduction from 3.491G to 3.186G and a modest speedup of ~5%. Go parallel client/server HTTP benchmark: https://github.com/golang/benchmarks/blob/master/http/http.go shows RSS reduction from 320MB to 240MB and a few percent speedup. Reviewed in https://reviews.llvm.org/D35323 git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@308018 91177308-0d34-0410-b5e6-96231b3b80d8

Diffstat (limited to 'test/scudo')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: