Skip to content
  1. Apr 16, 2019
    • Liran Alon's avatar
      KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU · e51bfdb6
      Liran Alon authored
      
      
      Issue was discovered when running kvm-unit-tests on KVM running as L1 on
      top of Hyper-V.
      
      When vmx_instruction_intercept unit-test attempts to run RDPMC to test
      RDPMC-exiting, it is intercepted by L1 KVM which it's EXIT_REASON_RDPMC
      handler raise #GP because vCPU exposed by Hyper-V doesn't support PMU.
      Instead of unit-test expectation to be reflected with EXIT_REASON_RDPMC.
      
      The reason vmx_instruction_intercept unit-test attempts to run RDPMC
      even though Hyper-V doesn't support PMU is because L1 expose to L2
      support for RDPMC-exiting. Which is reasonable to assume that is
      supported only in case CPU supports PMU to being with.
      
      Above issue can easily be simulated by modifying
      vmx_instruction_intercept config in x86/unittests.cfg to run QEMU with
      "-cpu host,+vmx,-pmu" and run unit-test.
      
      To handle issue, change KVM to expose RDPMC-exiting only when guest
      supports PMU.
      
      Reported-by: default avatarSaar Amar <saaramar@microsoft.com>
      Reviewed-by: default avatarMihai Carabas <mihai.carabas@oracle.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e51bfdb6
    • Liran Alon's avatar
      KVM: x86: Raise #GP when guest vCPU do not support PMU · 672ff6cf
      Liran Alon authored
      
      
      Before this change, reading a VMware pseduo PMC will succeed even when
      PMU is not supported by guest. This can easily be seen by running
      kvm-unit-test vmware_backdoors with "-cpu host,-pmu" option.
      
      Reviewed-by: default avatarMihai Carabas <mihai.carabas@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      672ff6cf
    • WANG Chao's avatar
      x86/kvm: move kvm_load/put_guest_xcr0 into atomic context · 1811d979
      WANG Chao authored
      
      
      guest xcr0 could leak into host when MCE happens in guest mode. Because
      do_machine_check() could schedule out at a few places.
      
      For example:
      
      kvm_load_guest_xcr0
      ...
      kvm_x86_ops->run(vcpu) {
        vmx_vcpu_run
          vmx_complete_atomic_exit
            kvm_machine_check
              do_machine_check
                do_memory_failure
                  memory_failure
                    lock_page
      
      In this case, host_xcr0 is 0x2ff, guest vcpu xcr0 is 0xff. After schedule
      out, host cpu has guest xcr0 loaded (0xff).
      
      In __switch_to {
           switch_fpu_finish
             copy_kernel_to_fpregs
               XRSTORS
      
      If any bit i in XSTATE_BV[i] == 1 and xcr0[i] == 0, XRSTORS will
      generate #GP (In this case, bit 9). Then ex_handler_fprestore kicks in
      and tries to reinitialize fpu by restoring init fpu state. Same story as
      last #GP, except we get DOUBLE FAULT this time.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWANG Chao <chao.wang@ucloud.cn>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1811d979
    • Vitaly Kuznetsov's avatar
      KVM: x86: svm: make sure NMI is injected after nmi_singlestep · 99c22179
      Vitaly Kuznetsov authored
      
      
      I noticed that apic test from kvm-unit-tests always hangs on my EPYC 7401P,
      the hanging test nmi-after-sti is trying to deliver 30000 NMIs and tracing
      shows that we're sometimes able to deliver a few but never all.
      
      When we're trying to inject an NMI we may fail to do so immediately for
      various reasons, however, we still need to inject it so enable_nmi_window()
      arms nmi_singlestep mode. #DB occurs as expected, but we're not checking
      for pending NMIs before entering the guest and unless there's a different
      event to process, the NMI will never get delivered.
      
      Make KVM_REQ_EVENT request on the vCPU from db_interception() to make sure
      pending NMIs are checked and possibly injected.
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      99c22179
    • Suthikulpanit, Suravee's avatar
      svm/avic: Fix invalidate logical APIC id entry · e44e3eac
      Suthikulpanit, Suravee authored
      Only clear the valid bit when invalidate logical APIC id entry.
      The current logic clear the valid bit, but also set the rest of
      the bits (including reserved bits) to 1.
      
      Fixes: 98d90582
      
       ('svm: Fix AVIC DFR and LDR handling')
      Signed-off-by: default avatarSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e44e3eac
    • Suthikulpanit, Suravee's avatar
      Revert "svm: Fix AVIC incomplete IPI emulation" · 4a58038b
      Suthikulpanit, Suravee authored
      This reverts commit bb218fbc.
      
      As Oren Twaig pointed out the old discussion:
      
        https://patchwork.kernel.org/patch/8292231/
      
      that the change coud potentially cause an extra IPI to be sent to
      the destination vcpu because the AVIC hardware already set the IRR bit
      before the incomplete IPI #VMEXIT with id=1 (target vcpu is not running).
      Since writting to ICR and ICR2 will also set the IRR. If something triggers
      the destination vcpu to get scheduled before the emulation finishes, then
      this could result in an additional IPI.
      
      Also, the issue mentioned in the commit bb218fbc
      
       was misdiagnosed.
      
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Reported-by: default avatarOren Twaig <oren@scalemp.com>
      Signed-off-by: default avatarSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4a58038b
    • Ben Gardon's avatar
      kvm: mmu: Fix overflow on kvm mmu page limit calculation · bc8a3d89
      Ben Gardon authored
      
      
      KVM bases its memory usage limits on the total number of guest pages
      across all memslots. However, those limits, and the calculations to
      produce them, use 32 bit unsigned integers. This can result in overflow
      if a VM has more guest pages that can be represented by a u32. As a
      result of this overflow, KVM can use a low limit on the number of MMU
      pages it will allocate. This makes KVM unable to map all of guest memory
      at once, prompting spurious faults.
      
      Tested: Ran all kvm-unit-tests on an Intel Haswell machine. This patch
      	introduced no new failures.
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bc8a3d89
    • Paolo Bonzini's avatar
      KVM: nVMX: always use early vmcs check when EPT is disabled · 2b27924b
      Paolo Bonzini authored
      
      
      The remaining failures of vmx.flat when EPT is disabled are caused by
      incorrectly reflecting VMfails to the L1 hypervisor.  What happens is
      that nested_vmx_restore_host_state corrupts the guest CR3, reloading it
      with the host's shadow CR3 instead, because it blindly loads GUEST_CR3
      from the vmcs01.
      
      For simplicity let's just always use hardware VMCS checks when EPT is
      disabled.  This way, nested_vmx_restore_host_state is not reached at
      all (or at least shouldn't be reached).
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2b27924b
    • Paolo Bonzini's avatar
      KVM: nVMX: allow tests to use bad virtual-APIC page address · 69090810
      Paolo Bonzini authored
      
      
      As mentioned in the comment, there are some special cases where we can simply
      clear the TPR shadow bit from the CPU-based execution controls in the vmcs02.
      Handle them so that we can remove some XFAILs from vmx.flat.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      69090810
  2. Apr 15, 2019
    • Sean Christopherson's avatar
      KVM: x86/mmu: Fix an inverted list_empty() check when zapping sptes · cfd32acf
      Sean Christopherson authored
      A recently introduced helper for handling zap vs. remote flush
      incorrectly bails early, effectively leaking defunct shadow pages.
      Manifests as a slab BUG when exiting KVM due to the shadow pages
      being alive when their associated cache is destroyed.
      
      ==========================================================================
      BUG kvm_mmu_page_header: Objects remaining in kvm_mmu_page_header on ...
      --------------------------------------------------------------------------
      Disabling lock debugging due to kernel taint
      INFO: Slab 0x00000000fc436387 objects=26 used=23 fp=0x00000000d023caee ...
      CPU: 6 PID: 4315 Comm: rmmod Tainted: G    B             5.1.0-rc2+ #19
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      Call Trace:
       dump_stack+0x46/0x5b
       slab_err+0xad/0xd0
       ? on_each_cpu_mask+0x3c/0x50
       ? ksm_migrate_page+0x60/0x60
       ? on_each_cpu_cond_mask+0x7c/0xa0
       ? __kmalloc+0x1ca/0x1e0
       __kmem_cache_shutdown+0x13a/0x310
       shutdown_cache+0xf/0x130
       kmem_cache_destroy+0x1d5/0x200
       kvm_mmu_module_exit+0xa/0x30 [kvm]
       kvm_arch_exit+0x45/0x60 [kvm]
       kvm_exit+0x6f/0x80 [kvm]
       vmx_exit+0x1a/0x50 [kvm_intel]
       __x64_sys_delete_module+0x153/0x1f0
       ? exit_to_usermode_loop+0x88/0xc0
       do_syscall_64+0x4f/0x100
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: a2113634
      
       ("KVM: x86/mmu: Split remote_flush+zap case out of kvm_mmu_flush_or_zap()")
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cfd32acf
  3. Apr 10, 2019
    • Brian Norris's avatar
      Bluetooth: btusb: request wake pin with NOAUTOEN · 771acc7e
      Brian Norris authored
      Badly-designed systems might have (for example) active-high wake pins
      that default to high (e.g., because of external pull ups) until they
      have an active firmware which starts driving it low.  This can cause an
      interrupt storm in the time between request_irq() and disable_irq().
      
      We don't support shared interrupts here, so let's just pre-configure the
      interrupt to avoid auto-enabling it.
      
      Fixes: fd913ef7 ("Bluetooth: btusb: Add out-of-band wakeup support")
      Fixes: 5364a0b4
      
       ("arm64: dts: rockchip: move QCA6174A wakeup pin into its USB node")
      Signed-off-by: default avatarBrian Norris <briannorris@chromium.org>
      Reviewed-by: default avatarMatthias Kaehlcke <mka@chromium.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      771acc7e
    • Linus Torvalds's avatar
      Merge tag 'mips_fixes_5.1_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · 0ee7fb36
      Linus Torvalds authored
      Pull MIPS fixes from Paul Burton:
       "A few minor MIPS fixes:
      
         - Provide struct pt_regs * from get_irq_regs() to kgdb_nmicallback()
           when handling an IPI triggered by kgdb_roundup_cpus(), matching the
           behavior of other architectures & resolving kgdb issues for SMP
           systems.
      
         - Defer a pointer dereference until after a NULL check in the
           irq_shutdown callback for SGI IP27 HUB interrupts.
      
         - A defconfig update for the MSCC Ocelot to enable some necessary
           drivers"
      
      * tag 'mips_fixes_5.1_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
        MIPS: generic: Add switchdev, pinctrl and fit to ocelot_defconfig
        MIPS: SGI-IP27: Fix use of unchecked pointer in shutdown_bridge_irq
        MIPS: KGDB: fix kgdb support for SMP platforms.
      0ee7fb36
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 972acfb4
      Linus Torvalds authored
      Pull misc fixes from Al Viro:
       "A few regression fixes from this cycle"
      
      * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        aio: use kmem_cache_free() instead of kfree()
        iov_iter: Fix build error without CONFIG_CRYPTO
        aio: Fix an error code in __io_submit_one()
      972acfb4
  4. Apr 09, 2019
  5. Apr 08, 2019
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · fd008d1a
      Linus Torvalds authored
      Pull crypto fix from Herbert Xu:
       "This fixes a bug in the implementation of xcbc and cmac in caam"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: caam - fix copy of next buffer for xcbc and cmac
      fd008d1a
    • Miaohe Lin's avatar
      net: vrf: Fix ping failed when vrf mtu is set to 0 · 5055376a
      Miaohe Lin authored
      When the mtu of a vrf device is set to 0, it would cause ping
      failed. So I think we should limit vrf mtu in a reasonable range
      to solve this problem. I set dev->min_mtu to IPV6_MIN_MTU, so it
      will works for both ipv4 and ipv6. And if dev->max_mtu still be 0
      can be confusing, so I set dev->max_mtu to ETH_MAX_MTU.
      
      Here is the reproduce step:
      
      1.Config vrf interface and set mtu to 0:
      3: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
      master vrf1 state UP mode DEFAULT group default qlen 1000
          link/ether 52:54:00:9e:dd:c1 brd ff:ff:ff:ff:ff:ff
      
      2.Ping peer:
      3: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
      master vrf1 state UP group default qlen 1000
          link/ether 52:54:00:9e:dd:c1 brd ff:ff:ff:ff:ff:ff
          inet 10.0.0.1/16 scope global enp4s0
             valid_lft forever preferred_lft forever
      connect: Network is unreachable
      
      3.Set mtu to default value, ping works:
      PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
      64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=1.88 ms
      
      Fixes: ad49bc63
      
       ("net: vrf: remove MTU limits for vrf device")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5055376a
    • Qian Cai's avatar
      slab: fix a crash by reading /proc/slab_allocators · fcf88917
      Qian Cai authored
      The commit 510ded33 ("slab: implement slab_root_caches list")
      changes the name of the list node within "struct kmem_cache" from "list"
      to "root_caches_node", but leaks_show() still use the "list" which
      causes a crash when reading /proc/slab_allocators.
      
      You need to have CONFIG_SLAB=y and CONFIG_MEMCG=y to see the problem,
      because without MEMCG all slab caches are root caches, and the "list"
      node happens to be the right one.
      
      Fixes: 510ded33
      
       ("slab: implement slab_root_caches list")
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Reviewed-by: default avatarTobin C. Harding <tobin@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fcf88917
    • Nicolas Dichtel's avatar
      selftests: add a tc matchall test case · b959ecf8
      Nicolas Dichtel authored
      This is a follow up of the commit 0db6f8be
      
       ("net/sched: fix ->get
      helper of the matchall cls").
      
      To test it:
      $ cd tools/testing/selftests/tc-testing
      $ ln -s ../plugin-lib/nsPlugin.py plugins/20-nsPlugin.py
      $ ./tdc.py -n -e 2638
      
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b959ecf8