summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2022-03-25 12:34:53 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2022-03-25 12:34:53 -0700
commit636f64db07f33a18630248b4c57e182cd315b0da (patch)
treeb27478715e415b5324924e0f6fccc47f28899c0a /Documentation
parentebcb577aee1448fd60904fc4126cbf7ec012bd0b (diff)
parent7f1b8e0d6360178e3527d4f14e6921c254a86035 (diff)
Merge tag 'ras_core_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov: - More noinstr fixes - Add an erratum workaround for Intel CPUs which, in certain circumstances, end up consuming an unrelated uncorrectable memory error when using fast string copy insns - Remove the MCE tolerance level control as it is not really needed or used anymore * tag 'ras_core_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Remove the tolerance level control x86/mce: Work around an erratum on fast string copy instructions x86/mce: Use arch atomic and bit helpers
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/removed/sysfs-mce37
-rw-r--r--Documentation/ABI/testing/sysfs-mce32
-rw-r--r--Documentation/vm/hwpoison.rst2
-rw-r--r--Documentation/x86/x86_64/boot-options.rst9
4 files changed, 38 insertions, 42 deletions
diff --git a/Documentation/ABI/removed/sysfs-mce b/Documentation/ABI/removed/sysfs-mce
new file mode 100644
index 000000000000..ef5dd2a80918
--- /dev/null
+++ b/Documentation/ABI/removed/sysfs-mce
@@ -0,0 +1,37 @@
+What: /sys/devices/system/machinecheck/machinecheckX/tolerant
+Contact: Borislav Petkov <bp@suse.de>
+Date: Dec, 2021
+Description:
+ Unused and obsolete after the advent of recoverable machine
+ checks (see last sentence below) and those are present since
+ 2010 (Nehalem).
+
+ Original description:
+
+ The entries appear for each CPU, but they are truly shared
+ between all CPUs.
+
+ Tolerance level. When a machine check exception occurs for a
+ non corrected machine check the kernel can take different
+ actions.
+
+ Since machine check exceptions can happen any time it is
+ sometimes risky for the kernel to kill a process because it
+ defies normal kernel locking rules. The tolerance level
+ configures how hard the kernel tries to recover even at some
+ risk of deadlock. Higher tolerant values trade potentially
+ better uptime with the risk of a crash or even corruption
+ (for tolerant >= 3).
+
+ == ===========================================================
+ 0 always panic on uncorrected errors, log corrected errors
+ 1 panic or SIGBUS on uncorrected errors, log corrected errors
+ 2 SIGBUS or log uncorrected errors, log corrected errors
+ 3 never panic or SIGBUS, log all errors (for testing only)
+ == ===========================================================
+
+ Default: 1
+
+ Note this only makes a difference if the CPU allows recovery
+ from a machine check exception. Current x86 CPUs generally
+ do not.
diff --git a/Documentation/ABI/testing/sysfs-mce b/Documentation/ABI/testing/sysfs-mce
index c8cd989034b4..83172f50e27c 100644
--- a/Documentation/ABI/testing/sysfs-mce
+++ b/Documentation/ABI/testing/sysfs-mce
@@ -53,38 +53,6 @@ Description:
(but some corrected errors might be still reported
in other ways)
-What: /sys/devices/system/machinecheck/machinecheckX/tolerant
-Contact: Andi Kleen <ak@linux.intel.com>
-Date: Feb, 2007
-Description:
- The entries appear for each CPU, but they are truly shared
- between all CPUs.
-
- Tolerance level. When a machine check exception occurs for a
- non corrected machine check the kernel can take different
- actions.
-
- Since machine check exceptions can happen any time it is
- sometimes risky for the kernel to kill a process because it
- defies normal kernel locking rules. The tolerance level
- configures how hard the kernel tries to recover even at some
- risk of deadlock. Higher tolerant values trade potentially
- better uptime with the risk of a crash or even corruption
- (for tolerant >= 3).
-
- == ===========================================================
- 0 always panic on uncorrected errors, log corrected errors
- 1 panic or SIGBUS on uncorrected errors, log corrected errors
- 2 SIGBUS or log uncorrected errors, log corrected errors
- 3 never panic or SIGBUS, log all errors (for testing only)
- == ===========================================================
-
- Default: 1
-
- Note this only makes a difference if the CPU allows recovery
- from a machine check exception. Current x86 CPUs generally
- do not.
-
What: /sys/devices/system/machinecheck/machinecheckX/trigger
Contact: Andi Kleen <ak@linux.intel.com>
Date: Feb, 2007
diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst
index 89b5f7a52077..c742de1769d1 100644
--- a/Documentation/vm/hwpoison.rst
+++ b/Documentation/vm/hwpoison.rst
@@ -60,8 +60,6 @@ There are two (actually three) modes memory failure recovery can be in:
vm.memory_failure_recovery sysctl set to zero:
All memory failures cause a panic. Do not attempt recovery.
- (on x86 this can be also affected by the tolerant level of the
- MCE subsystem)
early kill
(can be controlled globally and per process)
diff --git a/Documentation/x86/x86_64/boot-options.rst b/Documentation/x86/x86_64/boot-options.rst
index ccb7e86bf8d9..07aa0007f346 100644
--- a/Documentation/x86/x86_64/boot-options.rst
+++ b/Documentation/x86/x86_64/boot-options.rst
@@ -47,14 +47,7 @@ Please see Documentation/x86/x86_64/machinecheck.rst for sysfs runtime tunables.
in a reboot. On Intel systems it is enabled by default.
mce=nobootlog
Disable boot machine check logging.
- mce=tolerancelevel[,monarchtimeout] (number,number)
- tolerance levels:
- 0: always panic on uncorrected errors, log corrected errors
- 1: panic or SIGBUS on uncorrected errors, log corrected errors
- 2: SIGBUS or log uncorrected errors, log corrected errors
- 3: never panic or SIGBUS, log all errors (for testing only)
- Default is 1
- Can be also set using sysfs which is preferable.
+ mce=monarchtimeout (number)
monarchtimeout:
Sets the time in us to wait for other CPUs on machine checks. 0
to disable.