Skip to content
  1. Aug 02, 2021
  2. Jul 30, 2021
    • Paolo Bonzini's avatar
      KVM: x86: accept userspace interrupt only if no event is injected · fa7a549d
      Paolo Bonzini authored
      
      
      Once an exception has been injected, any side effects related to
      the exception (such as setting CR2 or DR6) have been taked place.
      Therefore, once KVM sets the VM-entry interruption information
      field or the AMD EVENTINJ field, the next VM-entry must deliver that
      exception.
      
      Pending interrupts are processed after injected exceptions, so
      in theory it would not be a problem to use KVM_INTERRUPT when
      an injected exception is present.  However, DOSEMU is using
      run->ready_for_interrupt_injection to detect interrupt windows
      and then using KVM_SET_SREGS/KVM_SET_REGS to inject the
      interrupt manually.  For this to work, the interrupt window
      must be delayed after the completion of the previous event
      injection.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarStas Sergeev <stsp2@yandex.ru>
      Tested-by: default avatarStas Sergeev <stsp2@yandex.ru>
      Fixes: 71cc849b
      
       ("KVM: x86: Fix split-irqchip vs interrupt injection window request")
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fa7a549d
  3. Jul 28, 2021
    • Paolo Bonzini's avatar
      KVM: add missing compat KVM_CLEAR_DIRTY_LOG · 8750f9bb
      Paolo Bonzini authored
      The arguments to the KVM_CLEAR_DIRTY_LOG ioctl include a pointer,
      therefore it needs a compat ioctl implementation.  Otherwise,
      32-bit userspace fails to invoke it on 64-bit kernels; for x86
      it might work fine by chance if the padding is zero, but not
      on big-endian architectures.
      
      Reported-by: Thomas Sattler
      Cc: stable@vger.kernel.org
      Fixes: 2a31b9db
      
       ("kvm: introduce manual dirty log reprotect")
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8750f9bb
    • Li RongQing's avatar
      KVM: use cpu_relax when halt polling · 74775654
      Li RongQing authored
      
      
      SMT siblings share caches and other hardware, and busy halt polling
      will degrade its sibling performance if its sibling is working
      
      Sean Christopherson suggested as below:
      
      "Rather than disallowing halt-polling entirely, on x86 it should be
      sufficient to simply have the hardware thread yield to its sibling(s)
      via PAUSE.  It probably won't get back all performance, but I would
      expect it to be close.
      This compiles on all KVM architectures, and AFAICT the intended usage
      of cpu_relax() is identical for all architectures."
      
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarLi RongQing <lirongqing@baidu.com>
      Message-Id: <20210727111247.55510-1-lirongqing@baidu.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      74775654
    • Maxim Levitsky's avatar
      KVM: SVM: use vmcb01 in svm_refresh_apicv_exec_ctrl · 5868b822
      Maxim Levitsky authored
      
      
      Currently when SVM is enabled in guest CPUID, AVIC is inhibited as soon
      as the guest CPUID is set.
      
      AVIC happens to be fully disabled on all vCPUs by the time any guest
      entry starts (if after migration the entry can be nested).
      
      The reason is that currently we disable avic right away on vCPU from which
      the kvm_request_apicv_update was called and for this case, it happens to be
      called on all vCPUs (by svm_vcpu_after_set_cpuid).
      
      After we stop doing this, AVIC will end up being disabled only when
      KVM_REQ_APICV_UPDATE is processed which is after we done switching to the
      nested guest.
      
      Fix this by just using vmcb01 in svm_refresh_apicv_exec_ctrl for avic
      (which is a right thing to do anyway).
      
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210713142023.106183-4-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5868b822
    • Maxim Levitsky's avatar
      KVM: SVM: tweak warning about enabled AVIC on nested entry · feea0136
      Maxim Levitsky authored
      
      
      It is possible that AVIC was requested to be disabled but
      not yet disabled, e.g if the nested entry is done right
      after svm_vcpu_after_set_cpuid.
      
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210713142023.106183-3-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      feea0136
    • Maxim Levitsky's avatar
      KVM: SVM: svm_set_vintr don't warn if AVIC is active but is about to be deactivated · f1577ab2
      Maxim Levitsky authored
      
      
      It is possible for AVIC inhibit and AVIC active state to be mismatched.
      Currently we disable AVIC right away on vCPU which started the AVIC inhibit
      request thus this warning doesn't trigger but at least in theory,
      if svm_set_vintr is called at the same time on multiple vCPUs,
      the warning can happen.
      
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210713142023.106183-2-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f1577ab2
    • Christian Borntraeger's avatar
      KVM: s390: restore old debugfs names · bb000f64
      Christian Borntraeger authored
      commit bc9e9e67 ("KVM: debugfs: Reuse binary stats descriptors")
      did replace the old definitions with the binary ones. While doing that
      it missed that some files are names different than the counters. This
      is especially important for kvm_stat which does have special handling
      for counters named instruction_*.
      
      Fixes: commit bc9e9e67
      
       ("KVM: debugfs: Reuse binary stats descriptors")
      CC: Jing Zhang <jingzhangos@google.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Message-Id: <20210726150108.5603-1-borntraeger@de.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bb000f64
    • Paolo Bonzini's avatar
      KVM: SVM: delay svm_vcpu_init_msrpm after svm->vmcb is initialized · 3fa5e8fd
      Paolo Bonzini authored
      
      
      Right now, svm_hv_vmcb_dirty_nested_enlightenments has an incorrect
      dereference of vmcb->control.reserved_sw before the vmcb is checked
      for being non-NULL.  The compiler is usually sinking the dereference
      after the check; instead of doing this ourselves in the source,
      ensure that svm_hv_vmcb_dirty_nested_enlightenments is only called
      with a non-NULL VMCB.
      
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Vineeth Pillai <viremana@linux.microsoft.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [Untested for now due to issues with my AMD machine. - Paolo]
      3fa5e8fd
    • David Matlack's avatar
      KVM: selftests: Introduce access_tracking_perf_test · c33e05d9
      David Matlack authored
      
      
      This test measures the performance effects of KVM's access tracking.
      Access tracking is driven by the MMU notifiers test_young, clear_young,
      and clear_flush_young. These notifiers do not have a direct userspace
      API, however the clear_young notifier can be triggered by marking a
      pages as idle in /sys/kernel/mm/page_idle/bitmap. This test leverages
      that mechanism to enable access tracking on guest memory.
      
      To measure performance this test runs a VM with a configurable number of
      vCPUs that each touch every page in disjoint regions of memory.
      Performance is measured in the time it takes all vCPUs to finish
      touching their predefined region.
      
      Example invocation:
      
        $ ./access_tracking_perf_test -v 8
        Testing guest mode: PA-bits:ANY, VA-bits:48,  4K pages
        guest physical test memory offset: 0xffdfffff000
      
        Populating memory             : 1.337752570s
        Writing to populated memory   : 0.010177640s
        Reading from populated memory : 0.009548239s
        Mark memory idle              : 23.973131748s
        Writing to idle memory        : 0.063584496s
        Mark memory idle              : 24.924652964s
        Reading from idle memory      : 0.062042814s
      
      Breaking down the results:
      
       * "Populating memory": The time it takes for all vCPUs to perform the
         first write to every page in their region.
      
       * "Writing to populated memory" / "Reading from populated memory": The
         time it takes for all vCPUs to write and read to every page in their
         region after it has been populated. This serves as a control for the
         later results.
      
       * "Mark memory idle": The time it takes for every vCPU to mark every
         page in their region as idle through page_idle.
      
       * "Writing to idle memory" / "Reading from idle memory": The time it
         takes for all vCPUs to write and read to every page in their region
         after it has been marked idle.
      
      This test should be portable across architectures but it is only enabled
      for x86_64 since that's all I have tested.
      
      Reviewed-by: default avatarBen Gardon <bgardon@google.com>
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20210713220957.3493520-7-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c33e05d9
    • David Matlack's avatar
      KVM: selftests: Fix missing break in dirty_log_perf_test arg parsing · 15b7b737
      David Matlack authored
      There is a missing break statement which causes a fallthrough to the
      next statement where optarg will be null and a segmentation fault will
      be generated.
      
      Fixes: 9e965bb7
      
       ("KVM: selftests: Add backing src parameter to dirty_log_perf_test")
      Reviewed-by: default avatarBen Gardon <bgardon@google.com>
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20210713220957.3493520-6-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      15b7b737
    • Juergen Gross's avatar
      x86/kvm: fix vcpu-id indexed array sizes · 76b4f357
      Juergen Gross authored
      
      
      KVM_MAX_VCPU_ID is the maximum vcpu-id of a guest, and not the number
      of vcpu-ids. Fix array indexed by vcpu-id to have KVM_MAX_VCPU_ID+1
      elements.
      
      Note that this is currently no real problem, as KVM_MAX_VCPU_ID is
      an odd number, resulting in always enough padding being available at
      the end of those arrays.
      
      Nevertheless this should be fixed in order to avoid rare problems in
      case someone is using an even number for KVM_MAX_VCPU_ID.
      
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Message-Id: <20210701154105.23215-2-jgross@suse.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      76b4f357
  4. Jul 26, 2021
  5. Jul 19, 2021
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-5.14-1' of... · 7025098a
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/arm64 fixes for 5.14, take #1
      
      - Fix MTE shared page detection
      
      - Fix selftest use of obsolete pthread_yield() in favour of sched_yield()
      
      - Enable selftest's use of PMU registers when asked to
      7025098a
    • Linus Torvalds's avatar
      Linux 5.14-rc2 · 2734d6c1
      Linus Torvalds authored
      2734d6c1
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v5.14-2021-07-18' of... · 8c25c447
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v5.14-2021-07-18' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Skip invalid hybrid PMU on hybrid systems when the atom (little) CPUs
         are offlined.
      
       - Fix 'perf test' problems related to the recently added hybrid
         (BIG/little) code.
      
       - Split ARM's coresight (hw tracing) decode by aux records to avoid
         fatal decoding errors.
      
       - Fix add event failure in 'perf probe' when running 32-bit perf in a
         64-bit kernel.
      
       - Fix 'perf sched record' failure when CONFIG_SCHEDSTATS is not set.
      
       - Fix memory and refcount leaks detected by ASAn when running 'perf
         test', should be clean of warnings now.
      
       - Remove broken definition of __LITTLE_ENDIAN from tools'
         linux/kconfig.h, which was breaking the build in some systems.
      
       - Cast PTHREAD_STACK_MIN to int as it may turn into 'long
         sysconf(__SC_THREAD_STACK_MIN_VALUE), breaking the build in some
         systems.
      
       - Fix libperf build error with LIBPFM4=1.
      
       - Sync UAPI files changed by the memfd_secret new syscall.
      
      * tag 'perf-tools-fixes-for-v5.14-2021-07-18' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (35 commits)
        perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set
        perf probe: Fix add event failure when running 32-bit perf in a 64-bit kernel
        perf data: Close all files in close_dir()
        perf probe-file: Delete namelist in del_events() on the error path
        perf test bpf: Free obj_buf
        perf trace: Free strings in trace__parse_events_option()
        perf trace: Free syscall tp fields in evsel->priv
        perf trace: Free syscall->arg_fmt
        perf trace: Free malloc'd trace fields on exit
        perf lzma: Close lzma stream on exit
        perf script: Fix memory 'threads' and 'cpus' leaks on exit
        perf script: Release zstd data
        perf session: Cleanup trace_event
        perf inject: Close inject.output on exit
        perf report: Free generated help strings for sort option
        perf env: Fix memory leak of cpu_pmu_caps
        perf test maps__merge_in: Fix memory leak of maps
        perf dso: Fix memory leak in dso__new_map()
        perf test event_update: Fix memory leak of unit
        perf test event_update: Fix memory leak of evlist
        ...
      8c25c447
    • Linus Torvalds's avatar
      Merge tag 'xfs-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · f0eb870a
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "A few fixes for issues in the new online shrink code, additional
        corrections for my recent bug-hunt w.r.t. extent size hints on
        realtime, and improved input checking of the GROWFSRT ioctl.
      
        IOW, the usual 'I somehow got bored during the merge window and
        resumed auditing the farther reaches of xfs':
      
         - Fix shrink eligibility checking when sparse inode clusters enabled
      
         - Reset '..' directory entries when unlinking directories to prevent
           verifier errors if fs is shrinked later
      
         - Don't report unusable extent size hints to FSGETXATTR
      
         - Don't warn when extent size hints are unusable because the sysadmin
           configured them that way
      
         - Fix insufficient parameter validation in GROWFSRT ioctl
      
         - Fix integer overflow when adding rt volumes to filesystem"
      
      * tag 'xfs-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: detect misaligned rtinherit directory extent size hints
        xfs: fix an integer overflow error in xfs_growfs_rt
        xfs: improve FSGROWFSRT precondition checking
        xfs: don't expose misaligned extszinherit hints to userspace
        xfs: correct the narrative around misaligned rtinherit/extszinherit dirs
        xfs: reset child dir '..' entry when unlinking child
        xfs: check for sparse inode clusters that cross new EOAG when shrinking
      f0eb870a