Skip to content
Commit bc6ba808 authored by Dan Williams's avatar Dan Williams
Browse files

nfit, address-range-scrub: rework and simplify ARS state machine



ARS is an operation that can take 10s to 100s of seconds to find media
errors that should rarely be present. If the platform crashes due to
media errors in persistent memory, the expectation is that the BIOS will
report those known errors in a 'short' ARS request.

A 'short' ARS request asks platform firmware to return an ARS payload
with all known errors, but without issuing a 'long' scrub. At driver
init a short request is issued to all PMEM ranges before registering
regions. Then, in the background, a long ARS is scheduled for each
region.

The ARS implementation is simplified to centralize ARS completion work
in the ars_complete() helper. The timeout is removed since there is no
facility to cancel ARS, and this otherwise arranges for system init to
never be blocked waiting for a 'long' ARS. The ars_state flags are used
to coordinate ARS requests from driver init, ARS requests from
userspace, and ARS requests in response to media error notifications.

Given that there is no notification of ARS completion the implementation
still needs to poll. It backs off exponentially to a maximum poll period
of 30 minutes.

Suggested-by: default avatarToshi Kani <toshi.kani@hpe.com>
Co-developed-by: default avatarDave Jiang <dave.jiang@intel.com>
Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
parent 459d0ddb
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment