Skip to content
  1. Apr 08, 2021
    • Paolo Bonzini's avatar
      KVM: x86/mmu: preserve pending TLB flush across calls to kvm_tdp_mmu_zap_sp · 315f02c6
      Paolo Bonzini authored
      Right now, if a call to kvm_tdp_mmu_zap_sp returns false, the caller
      will skip the TLB flush, which is wrong.  There are two ways to fix
      it:
      
      - since kvm_tdp_mmu_zap_sp will not yield and therefore will not flush
        the TLB itself, we could change the call to kvm_tdp_mmu_zap_sp to
        use "flush |= ..."
      
      - or we can chain the flush argument through kvm_tdp_mmu_zap_sp down
        to __kvm_tdp_mmu_zap_gfn_range.  Note that kvm_tdp_mmu_zap_sp will
        neither yield nor flush, so flush would never go from true to
        false.
      
      This patch does the former to simplify application to stable kernels,
      and to make it further clearer that kvm_tdp_mmu_zap_sp will not flush.
      
      Cc: seanjc@google.com
      Fixes: 048f4980 ("KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping")
      Cc: <stable@vger.kernel.org> # 5.10.x: 048f4980: KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping
      Cc: <stable@vger.kernel.org> # 5.10.x: 33a31641
      
      : KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      315f02c6
  2. Apr 01, 2021
    • Vitaly Kuznetsov's avatar
      selftests: kvm: Check that TSC page value is small after KVM_SET_CLOCK(0) · 55626ca9
      Vitaly Kuznetsov authored
      
      
      Add a test for the issue when KVM_SET_CLOCK(0) call could cause
      TSC page value to go very big because of a signedness issue around
      hv_clock->system_time.
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210326155551.17446-3-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      55626ca9
    • Vitaly Kuznetsov's avatar
      KVM: x86: Prevent 'hv_clock->system_time' from going negative in kvm_guest_time_update() · 77fcbe82
      Vitaly Kuznetsov authored
      
      
      When guest time is reset with KVM_SET_CLOCK(0), it is possible for
      'hv_clock->system_time' to become a small negative number. This happens
      because in KVM_SET_CLOCK handling we set 'kvm->arch.kvmclock_offset' based
      on get_kvmclock_ns(kvm) but when KVM_REQ_CLOCK_UPDATE is handled,
      kvm_guest_time_update() does (masterclock in use case):
      
      hv_clock.system_time = ka->master_kernel_ns + v->kvm->arch.kvmclock_offset;
      
      And 'master_kernel_ns' represents the last time when masterclock
      got updated, it can precede KVM_SET_CLOCK() call. Normally, this is not a
      problem, the difference is very small, e.g. I'm observing
      hv_clock.system_time = -70 ns. The issue comes from the fact that
      'hv_clock.system_time' is stored as unsigned and 'system_time / 100' in
      compute_tsc_page_parameters() becomes a very big number.
      
      Use 'master_kernel_ns' instead of get_kvmclock_ns() when masterclock is in
      use and get_kvmclock_base_ns() when it's not to prevent 'system_time' from
      going negative.
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210331124130.337992-2-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      77fcbe82
    • Paolo Bonzini's avatar
      KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken · a83829f5
      Paolo Bonzini authored
      
      
      pvclock_gtod_sync_lock can be taken with interrupts disabled if the
      preempt notifier calls get_kvmclock_ns to update the Xen
      runstate information:
      
         spin_lock include/linux/spinlock.h:354 [inline]
         get_kvmclock_ns+0x25/0x390 arch/x86/kvm/x86.c:2587
         kvm_xen_update_runstate+0x3d/0x2c0 arch/x86/kvm/xen.c:69
         kvm_xen_update_runstate_guest+0x74/0x320 arch/x86/kvm/xen.c:100
         kvm_xen_runstate_set_preempted arch/x86/kvm/xen.h:96 [inline]
         kvm_arch_vcpu_put+0x2d8/0x5a0 arch/x86/kvm/x86.c:4062
      
      So change the users of the spinlock to spin_lock_irqsave and
      spin_unlock_irqrestore.
      
      Reported-by: default avatar <syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com>
      Fixes: 30b5c851
      
       ("KVM: x86/xen: Add support for vCPU runstate information")
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a83829f5
    • Paolo Bonzini's avatar
      KVM: x86: reduce pvclock_gtod_sync_lock critical sections · c2c647f9
      Paolo Bonzini authored
      
      
      There is no need to include changes to vcpu->requests into
      the pvclock_gtod_sync_lock critical section.  The changes to
      the shared data structures (in pvclock_update_vm_gtod_copy)
      already occur under the lock.
      
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c2c647f9
    • Paolo Bonzini's avatar
      6ebae23c
    • Paolo Bonzini's avatar
      KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit · 3c346c0c
      Paolo Bonzini authored
      Fixing nested_vmcb_check_save to avoid all TOC/TOU races
      is a bit harder in released kernels, so do the bare minimum
      by avoiding that EFER.SVME is cleared.  This is problematic
      because svm_set_efer frees the data structures for nested
      virtualization if EFER.SVME is cleared.
      
      Also check that EFER.SVME remains set after a nested vmexit;
      clearing it could happen if the bit is zero in the save area
      that is passed to KVM_SET_NESTED_STATE (the save area of the
      nested state corresponds to the nested hypervisor's state
      and is restored on the next nested vmexit).
      
      Cc: stable@vger.kernel.org
      Fixes: 2fcf4876
      
       ("KVM: nSVM: implement on demand allocation of the nested state")
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3c346c0c
    • Paolo Bonzini's avatar
      KVM: SVM: load control fields from VMCB12 before checking them · a58d9166
      Paolo Bonzini authored
      
      
      Avoid races between check and use of the nested VMCB controls.  This
      for example ensures that the VMRUN intercept is always reflected to the
      nested hypervisor, instead of being processed by the host.  Without this
      patch, it is possible to end up with svm->nested.hsave pointing to
      the MSR permission bitmap for nested guests.
      
      This bug is CVE-2021-29657.
      
      Reported-by: default avatarFelix Wilhelm <fwilhelm@google.com>
      Cc: stable@vger.kernel.org
      Fixes: 2fcf4876
      
       ("KVM: nSVM: implement on demand allocation of the nested state")
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a58d9166
  3. Mar 31, 2021
  4. Mar 29, 2021
    • Linus Torvalds's avatar
      Linux 5.12-rc5 · a5e13c6d
      Linus Torvalds authored
      v5.12-rc5
      a5e13c6d
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v5.12-2020-03-28' of... · f9e2bb42
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v5.12-2020-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tooling fixes from Arnaldo Carvalho de Melo:
      
       - Avoid write of uninitialized memory when generating PERF_RECORD_MMAP*
         records.
      
       - Fix 'perf top' BPF support related crash with perf_event_paranoid=3 +
         kptr_restrict.
      
       - Validate raw event with sysfs exported format bits.
      
       - Fix waipid on SIGCHLD delivery bugs in 'perf daemon'.
      
       - Change to use bash for daemon test on Debian, where the default is
         dash and thus fails for use of bashisms in this test.
      
       - Fix memory leak in vDSO found using ASAN.
      
       - Remove now useless (due to the fact that BPF now supports static
         vars) failing sub test "BPF relocation checker".
      
       - Fix auxtrace queue conflict.
      
       - Sync linux/kvm.h with the kernel sources.
      
      * tag 'perf-tools-fixes-for-v5.12-2020-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        perf test: Change to use bash for daemon test
        perf record: Fix memory leak in vDSO found using ASAN
        perf test: Remove now useless failing sub test "BPF relocation checker"
        perf daemon: Return from kill functions
        perf daemon: Force waipid for all session on SIGCHLD delivery
        perf top: Fix BPF support related crash with perf_event_paranoid=3 + kptr_restrict
        perf pmu: Validate raw event with sysfs exported format bits
        perf synthetic events: Avoid write of uninitialized memory when generating PERF_RECORD_MMAP* records
        tools headers UAPI: Sync linux/kvm.h with the kernel sources
        perf synthetic-events: Fix uninitialized 'kernel_thread' variable
        perf auxtrace: Fix auxtrace queue conflict
      f9e2bb42
    • Linus Torvalds's avatar
      Merge tag 'auxdisplay-for-linus-v5.12-rc6' of git://github.com/ojeda/linux · 3fef15f8
      Linus Torvalds authored
      Pull auxdisplay fix from Miguel Ojeda:
       "Remove in_interrupt() usage (Sebastian Andrzej Siewior)"
      
      * tag 'auxdisplay-for-linus-v5.12-rc6' of git://github.com/ojeda/linux:
        auxdisplay: Remove in_interrupt() usage.
      3fef15f8
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2021-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 36a14638
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Two fixes:
      
         - Fix build failure on Ubuntu with new GCC packages that turn
           on -fcf-protection
      
         - Fix SME memory encryption PTE encoding bug - AFAICT the code
           worked on 4K page sizes (level 1) but had the wrong shift at
           higher page level orders (level 2 and higher)"
      
      * tag 'x86-urgent-2021-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/build: Turn off -fcf-protection for realmode targets
        x86/mem_encrypt: Correct physical address calculation in __set_clr_pte_enc()
      36a14638
    • Linus Torvalds's avatar
      Merge tag 'locking-urgent-2021-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 47fbbc94
      Linus Torvalds authored
      Pull locking fix from Ingo Molnar:
       "Fix the non-debug mutex_lock_io_nested() method to map to
        mutex_lock_io() instead of mutex_lock().
      
        Right now nothing uses this API explicitly, but this is an
        accident waiting to happen"
      
      * tag 'locking-urgent-2021-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/mutex: Fix non debug version of mutex_lock_io_nested()
      47fbbc94
    • Linus Torvalds's avatar
      Merge tag '5.12-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6 · 81b1d39f
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Five cifs/smb3 fixes, two for stable.
      
        Includes an important fix for encryption and an ACL fix, as well as a
        fix for possible reflink data corruption"
      
      * tag '5.12-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: fix cached file size problems in duplicate extents (reflink)
        cifs: Silently ignore unknown oplock break handle
        cifs: revalidate mapping when we open files for SMB1 POSIX
        cifs: Fix chmod with modefromsid when an older ACE already exists.
        cifs: Adjust key sizes and key generation routines for AES256 encryption
      81b1d39f
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.12-2021-03-27' of git://git.kernel.dk/linux-block · b44d1ddc
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
      
       - Use thread info versions of flag testing, as discussed last week.
      
       - The series enabling PF_IO_WORKER to just take signals, instead of
         needing to special case that they do not in a bunch of places. Ends
         up being pretty trivial to do, and then we can revert all the special
         casing we're currently doing.
      
       - Kill dead pointer assignment
      
       - Fix hashed part of async work queue trace
      
       - Fix sign extension issue for IORING_OP_PROVIDE_BUFFERS
      
       - Fix a link completion ordering regression in this merge window
      
       - Cancellation fixes
      
      * tag 'io_uring-5.12-2021-03-27' of git://git.kernel.dk/linux-block:
        io_uring: remove unsued assignment to pointer io
        io_uring: don't cancel extra on files match
        io_uring: don't cancel-track common timeouts
        io_uring: do post-completion chore on t-out cancel
        io_uring: fix timeout cancel return code
        Revert "signal: don't allow STOP on PF_IO_WORKER threads"
        Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing"
        Revert "kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals"
        Revert "signal: don't allow sending any signals to PF_IO_WORKER threads"
        kernel: stop masking signals in create_io_thread()
        io_uring: handle signals for IO threads like a normal thread
        kernel: don't call do_exit() for PF_IO_WORKER threads
        io_uring: maintain CQE order of a failed link
        io-wq: fix race around pending work on teardown
        io_uring: do ctx sqd ejection in a clear context
        io_uring: fix provide_buffers sign extension
        io_uring: don't skip file_end_write() on reissue
        io_uring: correct io_queue_async_work() traces
        io_uring: don't use {test,clear}_tsk_thread_flag() for current
      b44d1ddc
    • Linus Torvalds's avatar
      Merge tag 'block-5.12-2021-03-27' of git://git.kernel.dk/linux-block · abed516e
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - Fix regression from this merge window with the xarray partition
         change, which allowed partition counts that overflow the u8 that
         holds the partition number (Ming)
      
       - Fix zone append warning (Johannes)
      
       - Segmentation count fix for multipage bvecs (David)
      
       - Partition scan fix (Chris)
      
      * tag 'block-5.12-2021-03-27' of git://git.kernel.dk/linux-block:
        block: don't create too many partitions
        block: support zone append bvecs
        block: recalculate segment count for multi-segment discards correctly
        block: clear GD_NEED_PART_SCAN later in bdev_disk_changed
      abed516e
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · e8cfe8fa
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Seven fixes, all in drivers (qla2xxx, mkt3sas, qedi, target,
        ibmvscsi).
      
        The most serious are the target pscsi oom and the qla2xxx revert which
        can otherwise cause a use after free"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: target: pscsi: Clean up after failure in pscsi_map_sg()
        scsi: target: pscsi: Avoid OOM in pscsi_map_sg()
        scsi: mpt3sas: Fix error return code of mpt3sas_base_attach()
        scsi: qedi: Fix error return code of qedi_alloc_global_queues()
        scsi: Revert "qla2xxx: Make sure that aborted commands are freed"
        scsi: ibmvfc: Make ibmvfc_wait_for_ops() MQ aware
        scsi: ibmvfc: Fix potential race in ibmvfc_wait_for_ops()
      e8cfe8fa
  5. Mar 28, 2021
  6. Mar 27, 2021