Skip to content
  1. Nov 09, 2023
  2. Nov 08, 2023
    • Palmer Dabbelt's avatar
      Merge patch series "riscv: Fix set_memory_XX() and set_direct_map_XX()" · 05942f78
      Palmer Dabbelt authored
      Alexandre Ghiti <alexghiti@rivosinc.com> says:
      
      Those 2 patches fix the set_memory_XX() and set_direct_map_XX() APIs, which
      in turn fix STRICT_KERNEL_RWX and memfd_secret(). Those were broken since the
      permission changes were not applied to the linear mapping because the linear
      mapping is mapped using hugepages and walk_page_range_novma() does not split
      such mappings.
      
      To fix that, patch 1 disables PGD mappings in the linear mapping as it is
      hard to propagate changes at this level in *all* the page tables, this has the
      downside of disabling PMD mapping for sv32 and PUD (1GB) mapping for sv39 in
      the linear mapping (for specific kernels, we could add a Kconfig to enable
      ARCH_HAS_SET_DIRECT_MAP and STRICT_KERNEL_RWX if needed, I'm pretty sure we'll
      discuss that).
      
      patch 2 implements the split of the huge linear mappings so that
      walk_page_range_novma() can properly apply the permissions. The whole split is
      protected with mmap_sem in write mode, but I'm wondering if that's enough,
      any opinion on that is appreciated.
      
      * b4-shazam-merge:
        riscv: Fix set_memory_XX() and set_direct_map_XX() by splitting huge linear mappings
        riscv: Don't use PGD entries for the linear mapping
      
      Link: https://lore.kernel.org/r/20231108075930.7157-1-alexghiti@rivosinc.com
      
      
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      05942f78
    • Alexandre Ghiti's avatar
      riscv: Fix set_memory_XX() and set_direct_map_XX() by splitting huge linear mappings · 311cd2f6
      Alexandre Ghiti authored
      
      
      When STRICT_KERNEL_RWX is set, any change of permissions on any kernel
      mapping (vmalloc/modules/kernel text...etc) should be applied on its
      linear mapping alias. The problem is that the riscv kernel uses huge
      mappings for the linear mapping and walk_page_range_novma() does not
      split those huge mappings.
      
      So this patchset implements such split in order to apply fine-grained
      permissions on the linear mapping.
      
      Below is the difference before and after (the first PUD mapping is split
      into PTE/PMD mappings):
      
      Before:
      
      ---[ Linear mapping ]---
      0xffffaf8000080000-0xffffaf8000200000    0x0000000080080000      1536K PTE     D A G . . W R V
      0xffffaf8000200000-0xffffaf8077c00000    0x0000000080200000      1914M PMD     D A G . . W R V
      0xffffaf8077c00000-0xffffaf8078800000    0x00000000f7c00000        12M PMD     D A G . . . R V
      0xffffaf8078800000-0xffffaf8078c00000    0x00000000f8800000         4M PMD     D A G . . W R V
      0xffffaf8078c00000-0xffffaf8079200000    0x00000000f8c00000         6M PMD     D A G . . . R V
      0xffffaf8079200000-0xffffaf807e600000    0x00000000f9200000        84M PMD     D A G . . W R V
      0xffffaf807e600000-0xffffaf807e716000    0x00000000fe600000      1112K PTE     D A G . . W R V
      0xffffaf807e717000-0xffffaf807e71a000    0x00000000fe717000        12K PTE     D A G . . W R V
      0xffffaf807e71d000-0xffffaf807e71e000    0x00000000fe71d000         4K PTE     D A G . . W R V
      0xffffaf807e722000-0xffffaf807e800000    0x00000000fe722000       888K PTE     D A G . . W R V
      0xffffaf807e800000-0xffffaf807fe00000    0x00000000fe800000        22M PMD     D A G . . W R V
      0xffffaf807fe00000-0xffffaf807ff54000    0x00000000ffe00000      1360K PTE     D A G . . W R V
      0xffffaf807ff55000-0xffffaf8080000000    0x00000000fff55000       684K PTE     D A G . . W R V
      0xffffaf8080000000-0xffffaf8400000000    0x0000000100000000        14G PUD     D A G . . W R V
      
      After:
      
      ---[ Linear mapping ]---
      0xffffaf8000080000-0xffffaf8000200000    0x0000000080080000      1536K PTE     D A G . . W R V
      0xffffaf8000200000-0xffffaf8077c00000    0x0000000080200000      1914M PMD     D A G . . W R V
      0xffffaf8077c00000-0xffffaf8078800000    0x00000000f7c00000        12M PMD     D A G . . . R V
      0xffffaf8078800000-0xffffaf8078a00000    0x00000000f8800000         2M PMD     D A G . . W R V
      0xffffaf8078a00000-0xffffaf8078c00000    0x00000000f8a00000         2M PTE     D A G . . W R V
      0xffffaf8078c00000-0xffffaf8079200000    0x00000000f8c00000         6M PMD     D A G . . . R V
      0xffffaf8079200000-0xffffaf807e600000    0x00000000f9200000        84M PMD     D A G . . W R V
      0xffffaf807e600000-0xffffaf807e716000    0x00000000fe600000      1112K PTE     D A G . . W R V
      0xffffaf807e717000-0xffffaf807e71a000    0x00000000fe717000        12K PTE     D A G . . W R V
      0xffffaf807e71d000-0xffffaf807e71e000    0x00000000fe71d000         4K PTE     D A G . . W R V
      0xffffaf807e722000-0xffffaf807e800000    0x00000000fe722000       888K PTE     D A G . . W R V
      0xffffaf807e800000-0xffffaf807fe00000    0x00000000fe800000        22M PMD     D A G . . W R V
      0xffffaf807fe00000-0xffffaf807ff54000    0x00000000ffe00000      1360K PTE     D A G . . W R V
      0xffffaf807ff55000-0xffffaf8080000000    0x00000000fff55000       684K PTE     D A G . . W R V
      0xffffaf8080000000-0xffffaf8080800000    0x0000000100000000         8M PMD     D A G . . W R V
      0xffffaf8080800000-0xffffaf8080af6000    0x0000000100800000      3032K PTE     D A G . . W R V
      0xffffaf8080af6000-0xffffaf8080af8000    0x0000000100af6000         8K PTE     D A G . X . R V
      0xffffaf8080af8000-0xffffaf8080c00000    0x0000000100af8000      1056K PTE     D A G . . W R V
      0xffffaf8080c00000-0xffffaf8081a00000    0x0000000100c00000        14M PMD     D A G . . W R V
      0xffffaf8081a00000-0xffffaf8081a40000    0x0000000101a00000       256K PTE     D A G . . W R V
      0xffffaf8081a40000-0xffffaf8081a44000    0x0000000101a40000        16K PTE     D A G . X . R V
      0xffffaf8081a44000-0xffffaf8081a52000    0x0000000101a44000        56K PTE     D A G . . W R V
      0xffffaf8081a52000-0xffffaf8081a54000    0x0000000101a52000         8K PTE     D A G . X . R V
      ...
      0xffffaf809e800000-0xffffaf80c0000000    0x000000011e800000       536M PMD     D A G . . W R V
      0xffffaf80c0000000-0xffffaf8400000000    0x0000000140000000        13G PUD     D A G . . W R V
      
      Note that this also fixes memfd_secret() syscall which uses
      set_direct_map_invalid_noflush() and set_direct_map_default_noflush() to
      remove the pages from the linear mapping. Below is the kernel page table
      while a memfd_secret() syscall is running, you can see all the !valid
      page table entries in the linear mapping:
      
      ...
      0xffffaf8082240000-0xffffaf8082241000    0x0000000102240000         4K PTE     D A G . . W R .
      0xffffaf8082241000-0xffffaf8082250000    0x0000000102241000        60K PTE     D A G . . W R V
      0xffffaf8082250000-0xffffaf8082252000    0x0000000102250000         8K PTE     D A G . . W R .
      0xffffaf8082252000-0xffffaf8082256000    0x0000000102252000        16K PTE     D A G . . W R V
      0xffffaf8082256000-0xffffaf8082257000    0x0000000102256000         4K PTE     D A G . . W R .
      0xffffaf8082257000-0xffffaf8082258000    0x0000000102257000         4K PTE     D A G . . W R V
      0xffffaf8082258000-0xffffaf8082259000    0x0000000102258000         4K PTE     D A G . . W R .
      0xffffaf8082259000-0xffffaf808225a000    0x0000000102259000         4K PTE     D A G . . W R V
      0xffffaf808225a000-0xffffaf808225c000    0x000000010225a000         8K PTE     D A G . . W R .
      0xffffaf808225c000-0xffffaf8082266000    0x000000010225c000        40K PTE     D A G . . W R V
      0xffffaf8082266000-0xffffaf8082268000    0x0000000102266000         8K PTE     D A G . . W R .
      0xffffaf8082268000-0xffffaf8082284000    0x0000000102268000       112K PTE     D A G . . W R V
      0xffffaf8082284000-0xffffaf8082288000    0x0000000102284000        16K PTE     D A G . . W R .
      0xffffaf8082288000-0xffffaf808229c000    0x0000000102288000        80K PTE     D A G . . W R V
      0xffffaf808229c000-0xffffaf80822a0000    0x000000010229c000        16K PTE     D A G . . W R .
      0xffffaf80822a0000-0xffffaf80822a5000    0x00000001022a0000        20K PTE     D A G . . W R V
      0xffffaf80822a5000-0xffffaf80822a6000    0x00000001022a5000         4K PTE     D A G . . . R V
      0xffffaf80822a6000-0xffffaf80822ab000    0x00000001022a6000        20K PTE     D A G . . W R V
      ...
      
      And when the memfd_secret() fd is released, the linear mapping is
      correctly reset:
      
      ...
      0xffffaf8082240000-0xffffaf80822a5000    0x0000000102240000       404K PTE     D A G . . W R V
      0xffffaf80822a5000-0xffffaf80822a6000    0x00000001022a5000         4K PTE     D A G . . . R V
      0xffffaf80822a6000-0xffffaf80822af000    0x00000001022a6000        36K PTE     D A G . . W R V
      ...
      
      Signed-off-by: default avatarAlexandre Ghiti <alexghiti@rivosinc.com>
      Link: https://lore.kernel.org/r/20231108075930.7157-3-alexghiti@rivosinc.com
      
      
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      311cd2f6
    • Alexandre Ghiti's avatar
      riscv: Don't use PGD entries for the linear mapping · 629db01c
      Alexandre Ghiti authored
      
      
      Propagating changes at this level is cumbersome as we need to go through
      all the page tables when that happens (either when changing the
      permissions or when splitting the mapping).
      
      Note that this prevents the use of 4MB mapping for sv32 and 1GB mapping for
      sv39 in the linear mapping.
      
      Signed-off-by: default avatarAlexandre Ghiti <alexghiti@rivosinc.com>
      Link: https://lore.kernel.org/r/20231108075930.7157-2-alexghiti@rivosinc.com
      
      
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      629db01c
    • Evan Green's avatar
      RISC-V: Probe misaligned access speed in parallel · 55e0bf49
      Evan Green authored
      
      
      Probing for misaligned access speed takes about 0.06 seconds. On a
      system with 64 cores, doing this in smp_callin() means it's done
      serially, extending boot time by 3.8 seconds. That's a lot of boot time.
      
      Instead of measuring each CPU serially, let's do the measurements on
      all CPUs in parallel. If we disable preemption on all CPUs, the
      jiffies stop ticking, so we can do this in stages of 1) everybody
      except core 0, then 2) core 0. The allocations are all done outside of
      on_each_cpu() to avoid calling alloc_pages() with interrupts disabled.
      
      For hotplugged CPUs that come in after the boot time measurement,
      register CPU hotplug callbacks, and do the measurement there. Interrupts
      are enabled in those callbacks, so they're fine to do alloc_pages() in.
      
      Reported-by: default avatarJisheng Zhang <jszhang@kernel.org>
      Closes: https://lore.kernel.org/all/mhng-9359993d-6872-4134-83ce-c97debe1cf9a@palmer-ri-x1c9/T/#mae9b8f40016f9df428829d33360144dc5026bcbf
      
      
      Fixes: 584ea656 ("RISC-V: Probe for unaligned access speed")
      Signed-off-by: default avatarEvan Green <evan@rivosinc.com>
      Link: https://lore.kernel.org/r/20231106225855.3121724-1-evan@rivosinc.com
      
      
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      55e0bf49
    • Evan Green's avatar
      RISC-V: Remove __init on unaligned_emulation_finish() · 6eb7a644
      Evan Green authored
      
      
      This function shouldn't be __init, since it's called during hotplug. The
      warning says it well enough:
      
      WARNING: modpost: vmlinux: section mismatch in reference:
      check_unaligned_access_all_cpus+0x13a (section: .text) ->
      unaligned_emulation_finish (section: .init.text)
      
      Signed-off-by: default avatarEvan Green <evan@rivosinc.com>
      Fixes: 71c54b3d ("riscv: report misaligned accesses emulation to hwprobe")
      Link: https://lore.kernel.org/r/20231106231105.3141413-1-evan@rivosinc.com
      
      
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      6eb7a644
    • Evan Green's avatar
      RISC-V: Show accurate per-hart isa in /proc/cpuinfo · d3d2cf1a
      Evan Green authored
      
      
      In /proc/cpuinfo, most of the information we show for each processor is
      specific to that hart: marchid, mvendorid, mimpid, processor, hart,
      compatible, and the mmu size. But the ISA string gets filtered through a
      lowest common denominator mask, so that if one CPU is missing an ISA
      extension, no CPUs will show it.
      
      Now that we track the ISA extensions for each hart, let's report ISA
      extension info accurately per-hart in /proc/cpuinfo. We cannot change
      the "isa:" line, as usermode may be relying on that line to show only
      the common set of extensions supported across all harts. Add a new "hart
      isa" line instead, which reports the true set of extensions for that
      hart.
      
      Signed-off-by: default avatarEvan Green <evan@rivosinc.com>
      Reviewed-by: default avatarAndrew Jones <ajones@ventanamicro.com>
      Reviewed-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Link: https://lore.kernel.org/r/20231106232439.3176268-1-evan@rivosinc.com
      
      
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      d3d2cf1a
    • Palmer Dabbelt's avatar
      RISC-V: Don't rely on positional structure initialization · 28ea54ba
      Palmer Dabbelt authored
      
      
      Without this I get a bunch of warnings along the lines of
      
          arch/riscv/kernel/module.c:535:26: error: positional initialization of field in 'struct' declared with 'designated_init' attribute [-Werror=designated-init]
            535 |         [R_RISCV_32] = { apply_r_riscv_32_rela },
      
      This just mades the member initializers explicit instead of positional.
      I also aligned some of the table, but mostly just to make the batch
      editing go faster.
      
      Fixes: b51fc88c ("Merge patch series "riscv: Add remaining module relocations and tests"")
      Reviewed-by: default avatarCharlie Jenkins <charlie@rivosinc.com>
      Link: https://lore.kernel.org/r/20231107155529.8368-1-palmer@rivosinc.com
      
      
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      28ea54ba
    • Palmer Dabbelt's avatar
      Merge patch series "riscv: Add remaining module relocations and tests" · b51fc88c
      Palmer Dabbelt authored
      Charlie Jenkins <charlie@rivosinc.com> says:
      
      A handful of module relocations were missing, this patch includes the
      remaining ones. I also wrote some test cases to ensure that module
      loading works properly. Some relocations cannot be supported in the
      kernel, these include the ones that rely on thread local storage and
      dynamic linking.
      
      This patch also overhauls the implementation of ADD/SUB/SET/ULEB128
      relocations to handle overflow. "Overflow" is different for ULEB128
      since it is a variable-length encoding that the compiler can be expected
      to generate enough space for. Instead of overflowing, ULEB128 will
      expand into the next 8-bit segment of the location.
      
      A psABI proposal [1] was merged that mandates that SET_ULEB128 and
      SUB_ULEB128 are paired, however the discussion following the merging of
      the pull request revealed that while the pull request was valid, it
      would be better for linkers to properly handle this overflow. This patch
      proactively implements this methodology for future compatibility.
      
      This can be tested by enabling KUNIT, RUNTIME_KERNEL_TESTING_MENU, and
      RISCV_MODULE_LINKING_KUNIT.
      
      [1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/403
      
      * b4-shazam-merge:
        riscv: Add tests for riscv module loading
        riscv: Add remaining module relocations
        riscv: Avoid unaligned access when relocating modules
      
      Link: https://lore.kernel.org/r/20231101-module_relocations-v9-0-8dfa3483c400@rivosinc.com
      
      
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      b51fc88c
    • Charlie Jenkins's avatar
      riscv: Add tests for riscv module loading · af71bc19
      Charlie Jenkins authored
      
      
      Add test cases for the two main groups of relocations added: SUB and
      SET, along with uleb128.
      
      Signed-off-by: default avatarCharlie Jenkins <charlie@rivosinc.com>
      Link: https://lore.kernel.org/r/20231101-module_relocations-v9-3-8dfa3483c400@rivosinc.com
      
      
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      af71bc19
    • Charlie Jenkins's avatar
      riscv: Add remaining module relocations · 8fd6c514
      Charlie Jenkins authored
      
      
      Add all final module relocations and add error logs explaining the ones
      that are not supported. Implement overflow checks for
      ADD/SUB/SET/ULEB128 relocations.
      
      Signed-off-by: default avatarCharlie Jenkins <charlie@rivosinc.com>
      Link: https://lore.kernel.org/r/20231101-module_relocations-v9-2-8dfa3483c400@rivosinc.com
      
      
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      8fd6c514
    • Emil Renner Berthing's avatar
      riscv: Avoid unaligned access when relocating modules · 8cbe0acc
      Emil Renner Berthing authored
      
      
      With the C-extension regular 32bit instructions are not
      necessarily aligned on 4-byte boundaries. RISC-V instructions
      are in fact an ordered list of 16bit little-endian
      "parcels", so access the instruction as such.
      
      This should also make the code work in case someone builds
      a big-endian RISC-V machine.
      
      Signed-off-by: default avatarEmil Renner Berthing <kernel@esmil.dk>
      Signed-off-by: default avatarCharlie Jenkins <charlie@rivosinc.com>
      Link: https://lore.kernel.org/r/20231101-module_relocations-v9-1-8dfa3483c400@rivosinc.com
      
      
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      8cbe0acc
    • Christoph Hellwig's avatar
      riscv: split cache ops out of dma-noncoherent.c · 946bb33d
      Christoph Hellwig authored
      
      
      The cache ops are also used by the pmem code which is unconditionally
      built into the kernel.  Move them into a separate file that is built
      based on the correct config option.
      
      Fixes: fd962781 ("riscv: RISCV_NONSTANDARD_CACHE_OPS shouldn't depend on RISCV_DMA_NONCOHERENT")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Tested-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Reviewed-by: default avatarLad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
      Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> #
      Link: https://lore.kernel.org/r/20231028155101.1039049-1-hch@lst.de
      
      
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      946bb33d
  3. Nov 07, 2023
  4. Nov 06, 2023