Unverified Commit eb04e72b authored by Palmer Dabbelt's avatar Palmer Dabbelt
Browse files

Merge patch series "RISC-V Hardware Probing User Interface"

Evan Green <evan@rivosinc.com> says:

There's been a bunch of off-list discussions about this, including at
Plumbers.  The original plan was to do something involving providing an
ISA string to userspace, but ISA strings just aren't sufficient for a
stable ABI any more: in order to parse an ISA string users need the
version of the specifications that the string is written to, the version
of each extension (sometimes at a finer granularity than the RISC-V
releases/versions encode), and the expected use case for the ISA string
(ie, is it a U-mode or M-mode string).  That's a lot of complexity to
try and keep ABI compatible and it's probably going to continue to grow,
as even if there's no more complexity in the specifications we'll have
to deal with the various ISA string parsing oddities that end up all
over userspace.

Instead this patch set takes a very different approach and provides a set
of key/value pairs that encode various bits about the system.  The big
advantage here is that we can clearly define what these mean so we can
ensure ABI stability, but it also allows us to encode information that's
unlikely to ever appear in an ISA string (see the misaligned access
performance, for example).  The resulting interface looks a lot like
what arm64 and x86 do, and will hopefully fit well into something like
ACPI in the future.

The actual user interface is a syscall, with a vDSO function in front of
it. The vDSO function can answer some queries without a syscall at all,
and falls back to the syscall for cases it doesn't have answers to.
Currently we prepopulate it with an array of answers for all keys and
a CPU set of "all CPUs". This can be adjusted as necessary to provide
fast answers to the most common queries.

An example series in glibc exposing this syscall and using it in an
ifunc selector for memcpy can be found at [1].

I was asked about the performance delta between this and something like
sysfs. I created a small test program and ran it on a Nezha D1
Allwinner board. Doing each operation 100000 times and dividing, these
operations take the following amount of time:
 - open()+read()+close() of /sys/kernel/cpu_byteorder: 3.8us
 - access("/sys/kernel/cpu_byteorder", R_OK): 1.3us
 - riscv_hwprobe() vDSO and syscall: .0094us
 - riscv_hwprobe() vDSO with no syscall: 0.0091us

These numbers get farther apart if we query multiple keys, as sysfs will
scale linearly with the number of keys, where the dedicated syscall
stays the same. To frame these numbers, I also did a tight
fork/exec/wait loop, which I measured as 4.8ms. So doing 4
open/read/close operations is a delta of about 0.3%, versus a single vDSO
call is a delta of essentially zero.

[1] https://patchwork.ozlabs.org/project/glibc/list/?series=343050

* b4-shazam-merge:
  RISC-V: Add hwprobe vDSO function and data
  selftests: Test the new RISC-V hwprobe interface
  RISC-V: hwprobe: Support probing of misaligned access performance
  RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA
  RISC-V: Add a syscall for HW probing
  RISC-V: Move struct riscv_cpuinfo to new header

Link: https://lore.kernel.org/r/20230407231103.2622178-1-evan@rivosinc.com


Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
parents 6a249151 aa5af0aa
Loading
Loading
Loading
Loading
+86 −0
Original line number Diff line number Diff line
.. SPDX-License-Identifier: GPL-2.0

RISC-V Hardware Probing Interface
---------------------------------

The RISC-V hardware probing interface is based around a single syscall, which
is defined in <asm/hwprobe.h>::

    struct riscv_hwprobe {
        __s64 key;
        __u64 value;
    };

    long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
                           size_t cpu_count, cpu_set_t *cpus,
                           unsigned int flags);

The arguments are split into three groups: an array of key-value pairs, a CPU
set, and some flags. The key-value pairs are supplied with a count. Userspace
must prepopulate the key field for each element, and the kernel will fill in the
value if the key is recognized. If a key is unknown to the kernel, its key field
will be cleared to -1, and its value set to 0. The CPU set is defined by
CPU_SET(3). For value-like keys (eg. vendor/arch/impl), the returned value will
be only be valid if all CPUs in the given set have the same value. Otherwise -1
will be returned. For boolean-like keys, the value returned will be a logical
AND of the values for the specified CPUs. Usermode can supply NULL for cpus and
0 for cpu_count as a shortcut for all online CPUs. There are currently no flags,
this value must be zero for future compatibility.

On success 0 is returned, on failure a negative error code is returned.

The following keys are defined:

* :c:macro:`RISCV_HWPROBE_KEY_MVENDORID`: Contains the value of ``mvendorid``,
  as defined by the RISC-V privileged architecture specification.

* :c:macro:`RISCV_HWPROBE_KEY_MARCHID`: Contains the value of ``marchid``, as
  defined by the RISC-V privileged architecture specification.

* :c:macro:`RISCV_HWPROBE_KEY_MIMPLID`: Contains the value of ``mimplid``, as
  defined by the RISC-V privileged architecture specification.

* :c:macro:`RISCV_HWPROBE_KEY_BASE_BEHAVIOR`: A bitmask containing the base
  user-visible behavior that this kernel supports.  The following base user ABIs
  are defined:

  * :c:macro:`RISCV_HWPROBE_BASE_BEHAVIOR_IMA`: Support for rv32ima or
    rv64ima, as defined by version 2.2 of the user ISA and version 1.10 of the
    privileged ISA, with the following known exceptions (more exceptions may be
    added, but only if it can be demonstrated that the user ABI is not broken):

    * The :fence.i: instruction cannot be directly executed by userspace
      programs (it may still be executed in userspace via a
      kernel-controlled mechanism such as the vDSO).

* :c:macro:`RISCV_HWPROBE_KEY_IMA_EXT_0`: A bitmask containing the extensions
  that are compatible with the :c:macro:`RISCV_HWPROBE_BASE_BEHAVIOR_IMA`:
  base system behavior.

  * :c:macro:`RISCV_HWPROBE_IMA_FD`: The F and D extensions are supported, as
    defined by commit cd20cee ("FMIN/FMAX now implement
    minimumNumber/maximumNumber, not minNum/maxNum") of the RISC-V ISA manual.

  * :c:macro:`RISCV_HWPROBE_IMA_C`: The C extension is supported, as defined
    by version 2.2 of the RISC-V ISA manual.

* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: A bitmask that contains performance
  information about the selected set of processors.

  * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNKNOWN`: The performance of misaligned
    accesses is unknown.

  * :c:macro:`RISCV_HWPROBE_MISALIGNED_EMULATED`: Misaligned accesses are
    emulated via software, either in or below the kernel.  These accesses are
    always extremely slow.

  * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned accesses are supported
    in hardware, but are slower than the cooresponding aligned accesses
    sequences.

  * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned accesses are supported
    in hardware and are faster than the cooresponding aligned accesses
    sequences.

  * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNSUPPORTED`: Misaligned accesses are
    not supported at all and will generate a misaligned address fault.
+1 −0
Original line number Diff line number Diff line
@@ -7,6 +7,7 @@ RISC-V architecture

    boot-image-header
    vm-layout
    hwprobe
    patch-acceptance
    uabi

+1 −0
Original line number Diff line number Diff line
@@ -33,6 +33,7 @@ config RISCV
	select ARCH_HAS_STRICT_MODULE_RWX if MMU && !XIP_KERNEL
	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
	select ARCH_HAS_UBSAN_SANITIZE_ALL
	select ARCH_HAS_VDSO_DATA
	select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
	select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT
	select ARCH_STACKWALK
+10 −0
Original line number Diff line number Diff line
@@ -11,7 +11,9 @@
#include <linux/uaccess.h>
#include <asm/alternative.h>
#include <asm/cacheflush.h>
#include <asm/cpufeature.h>
#include <asm/errata_list.h>
#include <asm/hwprobe.h>
#include <asm/patch.h>
#include <asm/vendorid_list.h>

@@ -115,3 +117,11 @@ void __init_or_module thead_errata_patch_func(struct alt_entry *begin, struct al
	if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
		local_flush_icache_all();
}

void __init_or_module thead_feature_probe_func(unsigned int cpu,
					       unsigned long archid,
					       unsigned long impid)
{
	if ((archid == 0) && (impid == 0))
		per_cpu(misaligned_access_speed, cpu) = RISCV_HWPROBE_MISALIGNED_FAST;
}
+5 −0
Original line number Diff line number Diff line
@@ -30,6 +30,7 @@
#define ALT_OLD_PTR(a)			__ALT_PTR(a, old_offset)
#define ALT_ALT_PTR(a)			__ALT_PTR(a, alt_offset)

void __init probe_vendor_features(unsigned int cpu);
void __init apply_boot_alternatives(void);
void __init apply_early_boot_alternatives(void);
void apply_module_alternatives(void *start, size_t length);
@@ -52,11 +53,15 @@ void thead_errata_patch_func(struct alt_entry *begin, struct alt_entry *end,
			     unsigned long archid, unsigned long impid,
			     unsigned int stage);

void thead_feature_probe_func(unsigned int cpu, unsigned long archid,
			      unsigned long impid);

void riscv_cpufeature_patch_func(struct alt_entry *begin, struct alt_entry *end,
				 unsigned int stage);

#else /* CONFIG_RISCV_ALTERNATIVE */

static inline void probe_vendor_features(unsigned int cpu) { }
static inline void apply_boot_alternatives(void) { }
static inline void apply_early_boot_alternatives(void) { }
static inline void apply_module_alternatives(void *start, size_t length) { }
Loading