Skip to content
  1. Jul 14, 2023
  2. Jul 13, 2023
    • Mostafa Saleh's avatar
      KVM: arm64: Add missing BTI instructions · dcf89d11
      Mostafa Saleh authored
      Some bti instructions were missing from
      commit b53d4a27 ("KVM: arm64: Use BTI for nvhe")
      
      1) kvm_host_psci_cpu_entry
      kvm_host_psci_cpu_entry is called from __kvm_hyp_init_cpu through "br"
      instruction as __kvm_hyp_init_cpu resides in idmap section while
      kvm_host_psci_cpu_entry is in hyp .text so the offset is larger than
      128MB range covered by "b".
      Which means that this function should start with "bti j" instruction.
      
      LLVM which is the only compiler supporting BTI for Linux, adds "bti j"
      for jump tables or by when taking the address of the block [1].
      Same behaviour is observed with GCC.
      
      As kvm_host_psci_cpu_entry is a C function, this must be done in
      assembly.
      
      Another solution is to use X16/X17 with "br", as according to ARM
      ARM DDI0487I.a RLJHCL/IGMGRS, PACIASP has an implicit branch
      target identification instruction that is compatible with
      PSTATE.BTYPE 0b01 which includes "br X16/X17"
      And the kvm_host_psci_cpu_entry has PACIASP as it is an external
      function.
      Although, using explicit "bti" makes it more clear than relying on
      which register is used.
      
      A third solution is to clear SCTLR_EL2.BT, which would make PACIASP
      compatible PSTATE.BTYPE 0b11 ("br" to other registers).
      However this deviates from the kernel behaviour (in bti_enable()).
      
      2) Spectre vector table
      "br" instructions are generated at runtime for the vector table
      (__bp_harden_hyp_vecs).
      These branches would land on vectors in __kvm_hyp_vector at offset 8.
      As all the macros are defined with valid_vect/invalid_vect, it is
      sufficient to add "bti j" at the correct offset.
      
      [1] https://reviews.llvm.org/D52867
      
      
      
      Fixes: b53d4a27 ("KVM: arm64: Use BTI for nvhe")
      Signed-off-by: default avatarMostafa Saleh <smostafa@google.com>
      Reported-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Acked-by: default avatarMarc Zyngier <maz@kernel.org>
      Tested-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Link: https://lore.kernel.org/r/20230706152240.685684-1-smostafa@google.com
      
      
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      dcf89d11
    • Oliver Upton's avatar
      KVM: arm64: Correctly handle page aging notifiers for unaligned memslot · df6556ad
      Oliver Upton authored
      Userspace is allowed to select any PAGE_SIZE aligned hva to back guest
      memory. This is even the case with hugepages, although it is a rather
      suboptimal configuration as PTE level mappings are used at stage-2.
      
      The arm64 page aging handlers have an assumption that the specified
      range is exactly one page/block of memory, which in the aforementioned
      case is not necessarily true. All together this leads to the WARN() in
      kvm_age_gfn() firing.
      
      However, the WARN is only part of the issue as the table walkers visit
      at most a single leaf PTE. For hugepage-backed memory in a memslot that
      isn't hugepage-aligned, page aging entirely misses accesses to the
      hugepage beyond the first page in the memslot.
      
      Add a new walker dedicated to handling page aging MMU notifiers capable
      of walking a range of PTEs. Convert kvm(_test)_age_gfn() over to the new
      walker and drop the WARN that caught the issue in the first place. The
      implementation of this walker was inspired by the test_clear_young()
      implementation by Yu Zhao [*], but repurposed to address a bug in the
      existing aging implementation.
      
      Cc: stable@vger.kernel.org # v5.15
      Fixes: 056aad67 ("kvm: arm/arm64: Rework gpa callback handlers")
      Link: https://lore.kernel.org/kvmarm/20230526234435.662652-6-yuzhao@google.com/
      
      
      Co-developed-by: default avatarYu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Reported-by: default avatarReiji Watanabe <reijiw@google.com>
      Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Reviewed-by: default avatarShaoqin Huang <shahuang@redhat.com>
      Link: https://lore.kernel.org/r/20230627235405.4069823-1-oliver.upton@linux.dev
      
      
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      df6556ad
  3. Jul 12, 2023
    • Marc Zyngier's avatar
      KVM: arm64: Disable preemption in kvm_arch_hardware_enable() · 970dee09
      Marc Zyngier authored
      
      
      Since 0bf50497 ("KVM: Drop kvm_count_lock and instead protect
      kvm_usage_count with kvm_lock"), hotplugging back a CPU whilst
      a guest is running results in a number of ugly splats as most
      of this code expects to run with preemption disabled, which isn't
      the case anymore.
      
      While the context is preemptable, it isn't migratable, which should
      be enough. But we have plenty of preemptible() checks all over
      the place, and our per-CPU accessors also disable preemption.
      
      Since this affects released versions, let's do the easy fix first,
      disabling preemption in kvm_arch_hardware_enable(). We can always
      revisit this with a more invasive fix in the future.
      
      Fixes: 0bf50497 ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock")
      Reported-by: default avatarKristina Martsenko <kristina.martsenko@arm.com>
      Tested-by: default avatarKristina Martsenko <kristina.martsenko@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/aeab7562-2d39-e78e-93b1-4711f8cc3fa5@arm.com
      Cc: stable@vger.kernel.org # v6.3, v6.4
      Link: https://lore.kernel.org/r/20230703163548.1498943-1-maz@kernel.org
      
      
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      970dee09
    • Sudeep Holla's avatar
      KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm · fa729bc7
      Sudeep Holla authored
      
      
      Currently there is no synchronisation between finalize_pkvm() and
      kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
      kvm_arm_init() fails resulting in the following warning on all the CPUs
      and eventually a HYP panic:
      
        | kvm [1]: IPA Size Limit: 48 bits
        | kvm [1]: Failed to init hyp memory protection
        | kvm [1]: error initializing Hyp mode: -22
        |
        | <snip>
        |
        | WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
        | Modules linked in:
        | CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
        | Hardware name: FVP Base RevC (DT)
        | pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
        | pc : _kvm_host_prot_finalize+0x30/0x50
        | lr : __flush_smp_call_function_queue+0xd8/0x230
        |
        | Call trace:
        |  _kvm_host_prot_finalize+0x3c/0x50
        |  on_each_cpu_cond_mask+0x3c/0x6c
        |  pkvm_drop_host_privileges+0x4c/0x78
        |  finalize_pkvm+0x3c/0x5c
        |  do_one_initcall+0xcc/0x240
        |  do_initcall_level+0x8c/0xac
        |  do_initcalls+0x54/0x94
        |  do_basic_setup+0x1c/0x28
        |  kernel_init_freeable+0x100/0x16c
        |  kernel_init+0x20/0x1a0
        |  ret_from_fork+0x10/0x20
        | Failed to finalize Hyp protection: -22
        |     dtb=fvp-base-revc.dtb
        | kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
        | kvm [95]: nVHE call trace:
        | kvm [95]:  [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
        | kvm [95]:  [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
        | kvm [95]:  [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
        | kvm [95]:  [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
        | kvm [95]: ---[ end nVHE call trace ]---
        | kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
        | Kernel panic - not syncing: HYP panic:
        | PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
        | FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
        | VCPU:0000000000000000
        | CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G        W          6.4.0 #237
        | Hardware name: FVP Base RevC (DT)
        | Workqueue: rpciod rpc_async_schedule
        | Call trace:
        |  dump_backtrace+0xec/0x108
        |  show_stack+0x18/0x2c
        |  dump_stack_lvl+0x50/0x68
        |  dump_stack+0x18/0x24
        |  panic+0x138/0x33c
        |  nvhe_hyp_panic_handler+0x100/0x184
        |  new_slab+0x23c/0x54c
        |  ___slab_alloc+0x3e4/0x770
        |  kmem_cache_alloc_node+0x1f0/0x278
        |  __alloc_skb+0xdc/0x294
        |  tcp_stream_alloc_skb+0x2c/0xf0
        |  tcp_sendmsg_locked+0x3d0/0xda4
        |  tcp_sendmsg+0x38/0x5c
        |  inet_sendmsg+0x44/0x60
        |  sock_sendmsg+0x1c/0x34
        |  xprt_sock_sendmsg+0xdc/0x274
        |  xs_tcp_send_request+0x1ac/0x28c
        |  xprt_transmit+0xcc/0x300
        |  call_transmit+0x78/0x90
        |  __rpc_execute+0x114/0x3d8
        |  rpc_async_schedule+0x28/0x48
        |  process_one_work+0x1d8/0x314
        |  worker_thread+0x248/0x474
        |  kthread+0xfc/0x184
        |  ret_from_fork+0x10/0x20
        | SMP: stopping secondary CPUs
        | Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
        | PHYS_OFFSET: 0x80000000
        | CPU features: 0x00000000,1035b7a3,ccfe773f
        | Memory Limit: none
        | ---[ end Kernel panic - not syncing: HYP panic:
        | PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
        | FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
        | VCPU:0000000000000000 ]---
      
      Fix it by checking for the successfull initialisation of kvm_arm_init()
      in finalize_pkvm() before proceeding any futher.
      
      Fixes: 87727ba2 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
      Cc: Will Deacon <will@kernel.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: James Morse <james.morse@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Zenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Acked-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
      
      
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      fa729bc7
    • Marc Zyngier's avatar
      KVM: arm64: timers: Use CNTHCTL_EL2 when setting non-CNTKCTL_EL1 bits · fe769e6c
      Marc Zyngier authored
      
      
      It recently appeared that, when running VHE, there is a notable
      difference between using CNTKCTL_EL1 and CNTHCTL_EL2, despite what
      the architecture documents:
      
      - When accessed from EL2, bits [19:18] and [16:10] of CNTKCTL_EL1 have
        the same assignment as CNTHCTL_EL2
      - When accessed from EL1, bits [19:18] and [16:10] are RES0
      
      It is all OK, until you factor in NV, where the EL2 guest runs at EL1.
      In this configuration, CNTKCTL_EL11 doesn't trap, nor ends up in
      the VNCR page. This means that any write from the guest affecting
      CNTHCTL_EL2 using CNTKCTL_EL1 ends up losing some state. Not good.
      
      The fix it obvious: don't use CNTKCTL_EL1 if you want to change bits
      that are not part of the EL1 definition of CNTKCTL_EL1, and use
      CNTHCTL_EL2 instead. This doesn't change anything for a bare-metal OS,
      and fixes it when running under NV. The NV hypervisor will itself
      have to work harder to merge the two accessors.
      
      Note that there is a pending update to the architecture to address
      this issue by making the affected bits UNKNOWN when CNTKCTL_EL1 is
      used from EL2 with VHE enabled.
      
      Fixes: c605ee24 ("KVM: arm64: timers: Allow physical offset without CNTPOFF_EL2")
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: stable@vger.kernel.org # v6.4
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      Link: https://lore.kernel.org/r/20230627140557.544885-1-maz@kernel.org
      
      
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      fe769e6c
  4. Jul 10, 2023
  5. Jul 09, 2023