Commit 069486b6 authored by Yu Liao's avatar Yu Liao Committed by yanhaitao
Browse files

cpuinspect: add CPU-inspect infrastructure

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I7ZBQB

----------------------------------

This adds the CPU-inspect infrastructure. CPU-inspect is designed to
provide a framework for early detection of SDC by proactively executing
CPU inspection test cases.

Silent Data Corruption (SDC), sometimes referred to as Silent Data Error
(SDE), is an industry-wide issue impacting not only long-protected memory,
storage, and networking, but also computer CPUs. As with software issues,
hardware-induced SDC can contribute to data loss and corruption. An SDC
occurs when an impacted CPU inadvertently causes errors in the data it
processes. For example, an impacted CPU might miscalculate data (i.e.,
1+1=3). There may be no indication of these computational errors unless the
software systematically checks for errors [1].

SDC issues have been around for many years, but as chips have become more
advanced and compact in size, the transistors and lines have become so tiny
that small electrical fluctuations can cause errors. Most of these errors
are caused by defects during manufacturing and are screened out by the
vendors; others are caught by hardware error detection or correction.
However, some errors go undetected by hardware; therefore only detection
software can protect against such errors [1].

[1] https://support.google.com/cloud/answer/10759085



To use CPU-inspect, you need to load at least one inspector (the driver
that specifically executes the CPU inspection code)

Here is an example using CPU-inspect:

	# Set the cpumask of CPU-inspect to 10-20
	echo 10-20 > /sys/devices/system/cpu/cpuinspect/cpumask
	# set the max cpu utility of inspectiono threads to 50%
	echo 50 > /sys/devices/system/cpu/cpuinspect/cpu_utility
	# start the CPU inspection task
	echo 1 > /sys/devices/system/cpu/cpuinspect/start_patrol
	# Check the result to see if some faulty cpu are found
	cat /sys/devices/system/cpu/cpuinspect/result

In addition to being readable, the 'result' file in cpuinspect can also be
polled. The user that use poll() to monitor 'result' will return when a
faulty CPU is found or the inspection task is completed.

Signed-off-by: default avatarYu Liao <liaoyu15@huawei.com>
parent 5a39e8f3
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment