Skip to content
  1. May 06, 2020
    • Peter Xu's avatar
      KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly · 495907ec
      Peter Xu authored
      
      
      KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared
      as supported.  My wild guess is that userspaces like QEMU are using "#ifdef
      KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be
      wrong because the compilation host may not be the runtime host.
      
      The userspace might still want to keep the old "#ifdef" though to not break the
      guest debug on old kernels.
      
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20200505154750.126300-1-peterx@redhat.com>
      [Do the same for PPC and s390. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      495907ec
    • Peter Xu's avatar
      KVM: selftests: Fix build for evmcs.h · 8ffdaf91
      Peter Xu authored
      
      
      I got this error when building kvm selftests:
      
      /usr/bin/ld: /home/xz/git/linux/tools/testing/selftests/kvm/libkvm.a(vmx.o):/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:222: multiple definition of `current_evmcs'; /tmp/cco1G48P.o:/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:222: first defined here
      /usr/bin/ld: /home/xz/git/linux/tools/testing/selftests/kvm/libkvm.a(vmx.o):/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:223: multiple definition of `current_vp_assist'; /tmp/cco1G48P.o:/home/xz/git/linux/tools/testing/selftests/kvm/include/evmcs.h:223: first defined here
      
      I think it's because evmcs.h is included both in a test file and a lib file so
      the structs have multiple declarations when linking.  After all it's not a good
      habit to declare structs in the header files.
      
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <20200504220607.99627-1-peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8ffdaf91
    • Paolo Bonzini's avatar
      kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits · 139f7425
      Paolo Bonzini authored
      Using CPUID data can be useful for the processor compatibility
      check, but that's it.  Using it to compute guest-reserved bits
      can have both false positives (such as LA57 and UMIP which we
      are already handling) and false negatives: in particular, with
      this patch we don't allow anymore a KVM guest to set CR4.PKE
      when CR4.PKE is clear on the host.
      
      Fixes: b9dd21e1
      
       ("KVM: x86: simplify handling of PKRU")
      Reported-by: default avatarJim Mattson <jmattson@google.com>
      Tested-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      139f7425
    • Sean Christopherson's avatar
      KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path · c7cb2d65
      Sean Christopherson authored
      Clear CF and ZF in the VM-Exit path after doing __FILL_RETURN_BUFFER so
      that KVM doesn't interpret clobbered RFLAGS as a VM-Fail.  Filling the
      RSB has always clobbered RFLAGS, its current incarnation just happens
      clear CF and ZF in the processs.  Relying on the macro to clear CF and
      ZF is extremely fragile, e.g. commit 089dd8e5
      
       ("x86/speculation:
      Change FILL_RETURN_BUFFER to work with objtool") tweaks the loop such
      that the ZF flag is always set.
      
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: f2fde6a5
      
       ("KVM: VMX: Move RSB stuffing to before the first RET after VM-Exit")
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200506035355.2242-1-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c7cb2d65
    • Kashyap Chamarthy's avatar
      docs/virt/kvm: Document configuring and running nested guests · 27abe577
      Kashyap Chamarthy authored
      
      
      This is a rewrite of this[1] Wiki page with further enhancements.  The
      doc also includes a section on debugging problems in nested
      environments, among other improvements.
      
      [1] https://www.linux-kvm.org/page/Nested_Guests
      
      Signed-off-by: default avatarKashyap Chamarthy <kchamart@redhat.com>
      Message-Id: <20200505112839.30534-1-kchamart@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      27abe577
  2. May 05, 2020
    • Paolo Bonzini's avatar
      kvm: ioapic: Restrict lazy EOI update to edge-triggered interrupts · 8be8f932
      Paolo Bonzini authored
      Commit f458d039
      
       ("kvm: ioapic: Lazy update IOAPIC EOI") introduces
      the following infinite loop:
      
      BUG: stack guard page was hit at 000000008f595917 \
      (stack is 00000000bdefe5a4..00000000ae2b06f5)
      kernel stack overflow (double-fault): 0000 [#1] SMP NOPTI
      RIP: 0010:kvm_set_irq+0x51/0x160 [kvm]
      Call Trace:
       irqfd_resampler_ack+0x32/0x90 [kvm]
       kvm_notify_acked_irq+0x62/0xd0 [kvm]
       kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm]
       ioapic_set_irq+0x20e/0x240 [kvm]
       kvm_ioapic_set_irq+0x5c/0x80 [kvm]
       kvm_set_irq+0xbb/0x160 [kvm]
       ? kvm_hv_set_sint+0x20/0x20 [kvm]
       irqfd_resampler_ack+0x32/0x90 [kvm]
       kvm_notify_acked_irq+0x62/0xd0 [kvm]
       kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm]
       ioapic_set_irq+0x20e/0x240 [kvm]
       kvm_ioapic_set_irq+0x5c/0x80 [kvm]
       kvm_set_irq+0xbb/0x160 [kvm]
       ? kvm_hv_set_sint+0x20/0x20 [kvm]
      ....
      
      The re-entrancy happens because the irq state is the OR of
      the interrupt state and the resamplefd state.  That is, we don't
      want to show the state as 0 until we've had a chance to set the
      resamplefd.  But if the interrupt has _not_ gone low then
      ioapic_set_irq is invoked again, causing an infinite loop.
      
      This can only happen for a level-triggered interrupt, otherwise
      irqfd_inject would immediately set the KVM_USERSPACE_IRQ_SOURCE_ID high
      and then low.  Fortunately, in the case of level-triggered interrupts the VMEXIT already happens because
      TMR is set.  Thus, fix the bug by restricting the lazy invocation
      of the ack notifier to edge-triggered interrupts, the only ones that
      need it.
      
      Tested-by: default avatarSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Reported-by: default avatar <borisvk@bstnet.org>
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Link: https://www.spinics.net/lists/kvm/msg213512.html
      Fixes: f458d039
      
       ("kvm: ioapic: Lazy update IOAPIC EOI")
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207489
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8be8f932
    • Suravee Suthikulpanit's avatar
      KVM: x86: Fixes posted interrupt check for IRQs delivery modes · 637543a8
      Suravee Suthikulpanit authored
      Current logic incorrectly uses the enum ioapic_irq_destination_types
      to check the posted interrupt destination types. However, the value was
      set using APIC_DM_XXX macros, which are left-shifted by 8 bits.
      
      Fixes by using the APIC_DM_FIXED and APIC_DM_LOWEST instead.
      
      Fixes: (fdcf7562
      
       'KVM: x86: Disable posted interrupts for non-standard IRQs delivery modes')
      Cc: Alexander Graf <graf@amazon.com>
      Signed-off-by: default avatarSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Message-Id: <1586239989-58305-1-git-send-email-suravee.suthikulpanit@amd.com>
      Reviewed-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Tested-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      637543a8
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-5.7-2' of... · 7134fa07
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
      
      KVM/arm fixes for Linux 5.7, take #2
      
      - Fix compilation with Clang
      - Correctly initialize GICv4.1 in the absence of a virtual ITS
      - Move SP_EL0 save/restore to the guest entry/exit code
      - Handle PC wrap around on 32bit guests, and narrow all 32bit
        registers on userspace access
      7134fa07
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-5.7-1' of... · 9e5e19f5
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
      
      KVM/arm fixes for Linux 5.7, take #1
      
      - Prevent the userspace API from interacting directly with the HW
        stage of the virtual GIC
      - Fix a couple of vGIC memory leaks
      - Tighten the rules around the use of the 32bit PSCI functions
        for 64bit guest, as well as the opposite situation (matches the
        specification)
      9e5e19f5
  3. May 04, 2020
  4. May 01, 2020
    • Marc Zyngier's avatar
      KVM: arm64: Fix 32bit PC wrap-around · 0225fd5e
      Marc Zyngier authored
      
      
      In the unlikely event that a 32bit vcpu traps into the hypervisor
      on an instruction that is located right at the end of the 32bit
      range, the emulation of that instruction is going to increment
      PC past the 32bit range. This isn't great, as userspace can then
      observe this value and get a bit confused.
      
      Conversly, userspace can do things like (in the context of a 64bit
      guest that is capable of 32bit EL0) setting PSTATE to AArch64-EL0,
      set PC to a 64bit value, change PSTATE to AArch32-USR, and observe
      that PC hasn't been truncated. More confusion.
      
      Fix both by:
      - truncating PC increments for 32bit guests
      - sanitizing all 32bit regs every time a core reg is changed by
        userspace, and that PSTATE indicates a 32bit mode.
      
      Cc: stable@vger.kernel.org
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      0225fd5e
  5. Apr 30, 2020
  6. Apr 23, 2020
    • Marc Zyngier's avatar
    • Marc Zyngier's avatar
    • Zenghui Yu's avatar
      KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi() · 57bdb436
      Zenghui Yu authored
      
      
      If we're going to fail out the vgic_add_lpi(), let's make sure the
      allocated vgic_irq memory is also freed. Though it seems that both
      cases are unlikely to fail.
      
      Signed-off-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200414030349.625-3-yuzenghui@huawei.com
      57bdb436
    • Zenghui Yu's avatar
      KVM: arm64: vgic-v3: Retire all pending LPIs on vcpu destroy · 969ce8b5
      Zenghui Yu authored
      
      
      It's likely that the vcpu fails to handle all virtual interrupts if
      userspace decides to destroy it, leaving the pending ones stay in the
      ap_list. If the un-handled one is a LPI, its vgic_irq structure will
      be eventually leaked because of an extra refcount increment in
      vgic_queue_irq_unlock().
      
      This was detected by kmemleak on almost every guest destroy, the
      backtrace is as follows:
      
      unreferenced object 0xffff80725aed5500 (size 128):
      comm "CPU 5/KVM", pid 40711, jiffies 4298024754 (age 166366.512s)
      hex dump (first 32 bytes):
      00 00 00 00 00 00 00 00 08 01 a9 73 6d 80 ff ff ...........sm...
      c8 61 ee a9 00 20 ff ff 28 1e 55 81 6c 80 ff ff .a... ..(.U.l...
      backtrace:
      [<000000004bcaa122>] kmem_cache_alloc_trace+0x2dc/0x418
      [<0000000069c7dabb>] vgic_add_lpi+0x88/0x418
      [<00000000bfefd5c5>] vgic_its_cmd_handle_mapi+0x4dc/0x588
      [<00000000cf993975>] vgic_its_process_commands.part.5+0x484/0x1198
      [<000000004bd3f8e3>] vgic_its_process_commands+0x50/0x80
      [<00000000b9a65b2b>] vgic_mmio_write_its_cwriter+0xac/0x108
      [<0000000009641ebb>] dispatch_mmio_write+0xd0/0x188
      [<000000008f79d288>] __kvm_io_bus_write+0x134/0x240
      [<00000000882f39ac>] kvm_io_bus_write+0xe0/0x150
      [<0000000078197602>] io_mem_abort+0x484/0x7b8
      [<0000000060954e3c>] kvm_handle_guest_abort+0x4cc/0xa58
      [<00000000e0d0cd65>] handle_exit+0x24c/0x770
      [<00000000b44a7fad>] kvm_arch_vcpu_ioctl_run+0x460/0x1988
      [<0000000025fb897c>] kvm_vcpu_ioctl+0x4f8/0xee0
      [<000000003271e317>] do_vfs_ioctl+0x160/0xcd8
      [<00000000e7f39607>] ksys_ioctl+0x98/0xd8
      
      Fix it by retiring all pending LPIs in the ap_list on the destroy path.
      
      p.s. I can also reproduce it on a normal guest shutdown. It is because
      userspace still send LPIs to vcpu (through KVM_SIGNAL_MSI ioctl) while
      the guest is being shutdown and unable to handle it. A little strange
      though and haven't dig further...
      
      Reviewed-by: default avatarJames Morse <james.morse@arm.com>
      Signed-off-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      [maz: moved the distributor deallocation down to avoid an UAF splat]
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200414030349.625-2-yuzenghui@huawei.com
      969ce8b5
    • Marc Zyngier's avatar
      KVM: arm: vgic-v2: Only use the virtual state when userspace accesses pending bits · ba1ed9e1
      Marc Zyngier authored
      There is no point in accessing the HW when writing to any of the
      ISPENDR/ICPENDR registers from userspace, as only the guest should
      be allowed to change the HW state.
      
      Introduce new userspace-specific accessors that deal solely with
      the virtual state. Note that the API differs from that of GICv3,
      where userspace exclusively uses ISPENDR to set the state. Too
      bad we can't reuse it.
      
      Fixes: 82e40f55
      
       ("KVM: arm/arm64: vgic-v2: Handle SGI bits in GICD_I{S,C}PENDR0 as WI")
      Reviewed-by: default avatarJames Morse <james.morse@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      ba1ed9e1
    • Marc Zyngier's avatar
      KVM: arm: vgic: Only use the virtual state when userspace accesses enable bits · 41ee52ec
      Marc Zyngier authored
      
      
      There is no point in accessing the HW when writing to any of the
      ISENABLER/ICENABLER registers from userspace, as only the guest
      should be allowed to change the HW state.
      
      Introduce new userspace-specific accessors that deal solely with
      the virtual state.
      
      Reported-by: default avatarJames Morse <james.morse@arm.com>
      Tested-by: default avatarJames Morse <james.morse@arm.com>
      Reviewed-by: default avatarJames Morse <james.morse@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      41ee52ec
    • Marc Zyngier's avatar
      KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read · 9a50ebbf
      Marc Zyngier authored
      
      
      When a guest tries to read the active state of its interrupts,
      we currently just return whatever state we have in memory. This
      means that if such an interrupt lives in a List Register on another
      CPU, we fail to obsertve the latest active state for this interrupt.
      
      In order to remedy this, stop all the other vcpus so that they exit
      and we can observe the most recent value for the state. This is
      similar to what we are doing for the write side of the same
      registers, and results in new MMIO handlers for userspace (which
      do not need to stop the guest, as it is supposed to be stopped
      already).
      
      Reported-by: default avatarJulien Grall <julien@xen.org>
      Reviewed-by: default avatarAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      9a50ebbf
  7. Apr 21, 2020
    • Paolo Bonzini's avatar
      Merge tag 'kvm-ppc-fixes-5.7-1' of... · 00a6a5ef
      Paolo Bonzini authored
      Merge tag 'kvm-ppc-fixes-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master
      
      PPC KVM fix for 5.7
      
      - Fix a regression introduced in the last merge window, which results
        in guests in HPT mode dying randomly.
      00a6a5ef
    • Paolo Bonzini's avatar
      Merge tag 'kvm-s390-master-5.7-2' of... · 3bda0386
      Paolo Bonzini authored
      Merge tag 'kvm-s390-master-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master
      
      KVM: s390: Fix for 5.7 and maintainer update
      
      - Silence false positive lockdep warning
      - add Claudio as reviewer
      3bda0386
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Handle non-present PTEs in page fault functions · ae49deda
      Paul Mackerras authored
      Since cd758a9b "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT
      page fault handler", it's been possible in fairly rare circumstances to
      load a non-present PTE in kvmppc_book3s_hv_page_fault() when running a
      guest on a POWER8 host.
      
      Because that case wasn't checked for, we could misinterpret the non-present
      PTE as being a cache-inhibited PTE.  That could mismatch with the
      corresponding hash PTE, which would cause the function to fail with -EFAULT
      a little further down.  That would propagate up to the KVM_RUN ioctl()
      generally causing the KVM userspace (usually qemu) to fall over.
      
      This addresses the problem by catching that case and returning to the guest
      instead.
      
      For completeness, this fixes the radix page fault handler in the same
      way.  For radix this didn't cause any obvious misbehaviour, because we
      ended up putting the non-present PTE into the guest's partition-scoped
      page tables, leading immediately to another hypervisor data/instruction
      storage interrupt, which would go through the page fault path again
      and fix things up.
      
      Fixes: cd758a9b
      
       "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler"
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1820402
      Reported-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Tested-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      ae49deda
    • Josh Poimboeuf's avatar
      kvm: Disable objtool frame pointer checking for vmenter.S · 7f4b5cde
      Josh Poimboeuf authored
      
      
      Frame pointers are completely broken by vmenter.S because it clobbers
      RBP:
      
        arch/x86/kvm/svm/vmenter.o: warning: objtool: __svm_vcpu_run()+0xe4: BP used as a scratch register
      
      That's unavoidable, so just skip checking that file when frame pointers
      are configured in.
      
      On the other hand, ORC can handle that code just fine, so leave objtool
      enabled in the !FRAME_POINTER case.
      
      Reported-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Message-Id: <01fae42917bacad18be8d2cbc771353da6603473.1587398610.git.jpoimboe@redhat.com>
      Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
      Fixes: 199cd1d7
      
       ("KVM: SVM: Split svm_vcpu_run inline assembly to separate file")
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7f4b5cde
  8. Apr 20, 2020
    • Claudio Imbrenda's avatar
      MAINTAINERS: add a reviewer for KVM/s390 · 2a173ec9
      Claudio Imbrenda authored
      
      
      Signed-off-by: default avatarClaudio Imbrenda <imbrenda@linux.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: default avatarCornelia Huck <cohuck@redhat.com>
      Acked-by: default avatarJanosch Frank <frankja@linux.ibm.com>
      Link: https://lore.kernel.org/r/20200417152936.772256-1-imbrenda@linux.ibm.com
      2a173ec9
    • Eric Farman's avatar
      KVM: s390: Fix PV check in deliverable_irqs() · d47c4c45
      Eric Farman authored
      The diag 0x44 handler, which handles a directed yield, goes into a
      a codepath that does a kvm_for_each_vcpu() and ultimately
      deliverable_irqs().  The new check for kvm_s390_pv_cpu_is_protected()
      contains an assertion that the vcpu->mutex is held, which isn't going
      to be the case in this scenario.
      
      The result is a plethora of these messages if the lock debugging
      is enabled, and thus an implication that we have a problem.
      
        WARNING: CPU: 9 PID: 16167 at arch/s390/kvm/kvm-s390.h:239 deliverable_irqs+0x1c6/0x1d0 [kvm]
        ...snip...
        Call Trace:
         [<000003ff80429bf2>] deliverable_irqs+0x1ca/0x1d0 [kvm]
        ([<000003ff80429b34>] deliverable_irqs+0x10c/0x1d0 [kvm])
         [<000003ff8042ba82>] kvm_s390_vcpu_has_irq+0x2a/0xa8 [kvm]
         [<000003ff804101e2>] kvm_arch_dy_runnable+0x22/0x38 [kvm]
         [<000003ff80410284>] kvm_vcpu_on_spin+0x8c/0x1d0 [kvm]
         [<000003ff80436888>] kvm_s390_handle_diag+0x3b0/0x768 [kvm]
         [<000003ff80425af4>] kvm_handle_sie_intercept+0x1cc/0xcd0 [kvm]
         [<000003ff80422bb0>] __vcpu_run+0x7b8/0xfd0 [kvm]
         [<000003ff80423de6>] kvm_arch_vcpu_ioctl_run+0xee/0x3e0 [kvm]
         [<000003ff8040ccd8>] kvm_vcpu_ioctl+0x2c8/0x8d0 [kvm]
         [<00000001504ced06>] ksys_ioctl+0xae/0xe8
         [<00000001504cedaa>] __s390x_sys_ioctl+0x2a/0x38
         [<0000000150cb9034>] system_call+0xd8/0x2d8
        2 locks held by CPU 2/KVM/16167:
         #0: 00000001951980c0 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x90/0x8d0 [kvm]
         #1: 000000019599c0f0 (&kvm->srcu){....}, at: __vcpu_run+0x4bc/0xfd0 [kvm]
        Last Breaking-Event-Address:
         [<000003ff80429b34>] deliverable_irqs+0x10c/0x1d0 [kvm]
        irq event stamp: 11967
        hardirqs last  enabled at (11975): [<00000001502992f2>] console_unlock+0x4ca/0x650
        hardirqs last disabled at (11982): [<0000000150298ee8>] console_unlock+0xc0/0x650
        softirqs last  enabled at (7940): [<0000000150cba6ca>] __do_softirq+0x422/0x4d8
        softirqs last disabled at (7929): [<00000001501cd688>] do_softirq_own_stack+0x70/0x80
      
      Considering what's being done here, let's fix this by removing the
      mutex assertion rather than acquiring the mutex for every other vcpu.
      
      Fixes: 201ae986
      
       ("KVM: s390: protvirt: Implement interrupt injection")
      Signed-off-by: default avatarEric Farman <farman@linux.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Link: https://lore.kernel.org/r/20200415190353.63625-1-farman@linux.ibm.com
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      d47c4c45
    • Linus Torvalds's avatar
      Linux 5.7-rc2 · ae83d0b4
      Linus Torvalds authored
      ae83d0b4
    • Brian Geffon's avatar
      mm: Fix MREMAP_DONTUNMAP accounting on VMA merge · dadbd85f
      Brian Geffon authored
      When remapping a mapping where a portion of a VMA is remapped
      into another portion of the VMA it can cause the VMA to become
      split. During the copy_vma operation the VMA can actually
      be remerged if it's an anonymous VMA whose pages have not yet
      been faulted. This isn't normally a problem because at the end
      of the remap the original portion is unmapped causing it to
      become split again.
      
      However, MREMAP_DONTUNMAP leaves that original portion in place which
      means that the VMA which was split and then remerged is not actually
      split at the end of the mremap. This patch fixes a bug where
      we don't detect that the VMAs got remerged and we end up
      putting back VM_ACCOUNT on the next mapping which is completely
      unreleated. When that next mapping is unmapped it results in
      incorrectly unaccounting for the memory which was never accounted,
      and eventually we will underflow on the memory comittment.
      
      There is also another issue which is similar, we're currently
      accouting for the number of pages in the new_vma but that's wrong.
      We need to account for the length of the remap operation as that's
      all that is being added. If there was a mapping already at that
      location its comittment would have been adjusted as part of
      the munmap at the start of the mremap.
      
      A really simple repro can be seen in:
      https://gist.github.com/bgaff/e101ce99da7d9a8c60acc641d07f312c
      
      Fixes: e346b381
      
       ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarBrian Geffon <bgeffon@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dadbd85f
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 86cc3398
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "Two build fixes for a couple clk drivers and a fix for the Unisoc
        serial clk where we want to keep it on for earlycon"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: sprd: don't gate uart console clock
        clk: mmp2: fix link error without mmp2
        clk: asm9260: fix __clk_hw_register_fixed_rate_with_accuracy typo
      86cc3398
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0fe5f9ca
      Linus Torvalds authored
      Pull x86 and objtool fixes from Thomas Gleixner:
       "A set of fixes for x86 and objtool:
      
        objtool:
      
         - Ignore the double UD2 which is emitted in BUG() when
           CONFIG_UBSAN_TRAP is enabled.
      
         - Support clang non-section symbols in objtool ORC dump
      
         - Fix switch table detection in .text.unlikely
      
         - Make the BP scratch register warning more robust.
      
        x86:
      
         - Increase microcode maximum patch size for AMD to cope with new CPUs
           which have a larger patch size.
      
         - Fix a crash in the resource control filesystem when the removal of
           the default resource group is attempted.
      
         - Preserve Code and Data Prioritization enabled state accross CPU
           hotplug.
      
         - Update split lock cpu matching to use the new X86_MATCH macros.
      
         - Change the split lock enumeration as Intel finaly decided that the
           IA32_CORE_CAPABILITIES bits are not architectural contrary to what
           the SDM claims. !@#%$^!
      
         - Add Tremont CPU models to the split lock detection cpu match.
      
         - Add a missing static attribute to make sparse happy"
      
      * tag 'x86-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/split_lock: Add Tremont family CPU models
        x86/split_lock: Bits in IA32_CORE_CAPABILITIES are not architectural
        x86/resctrl: Preserve CDP enable over CPU hotplug
        x86/resctrl: Fix invalid attempt at removing the default resource group
        x86/split_lock: Update to use X86_MATCH_INTEL_FAM6_MODEL()
        x86/umip: Make umip_insns static
        x86/microcode/AMD: Increase microcode PATCH_MAX_SIZE
        objtool: Make BP scratch register warning more robust
        objtool: Fix switch table detection in .text.unlikely
        objtool: Support Clang non-section symbols in ORC generation
        objtool: Support Clang non-section symbols in ORC dump
        objtool: Fix CONFIG_UBSAN_TRAP unreachable warnings
      0fe5f9ca
    • Linus Torvalds's avatar
      Merge tag 'timers-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3e0dea57
      Linus Torvalds authored
      Pull time namespace fix from Thomas Gleixner:
       "An update for the proc interface of time namespaces: Use symbolic
        names instead of clockid numbers. The usability nuisance of numbers
        was noticed by Michael when polishing the man page"
      
      * tag 'timers-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        proc, time/namespace: Show clock symbolic names in /proc/pid/timens_offsets
      3e0dea57
    • Linus Torvalds's avatar
      Merge tag 'perf-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b7374586
      Linus Torvalds authored
      Pull perf tooling fixes and updates from Thomas Gleixner:
      
       - Fix the header line of perf stat output for '--metric-only --per-socket'
      
       - Fix the python build with clang
      
       - The usual tools UAPI header synchronization
      
      * tag 'perf-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        tools headers: Synchronize linux/bits.h with the kernel sources
        tools headers: Adopt verbatim copy of compiletime_assert() from kernel sources
        tools headers: Update x86's syscall_64.tbl with the kernel sources
        tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
        tools headers UAPI: Update tools's copy of drm.h headers
        tools headers kvm: Sync linux/kvm.h with the kernel sources
        tools headers UAPI: Sync linux/fscrypt.h with the kernel sources
        tools include UAPI: Sync linux/vhost.h with the kernel sources
        tools arch x86: Sync asm/cpufeatures.h with the kernel sources
        tools headers UAPI: Sync linux/mman.h with the kernel
        tools headers UAPI: Sync sched.h with the kernel
        tools headers: Update linux/vdso.h and grab a copy of vdso/const.h
        perf stat: Fix no metric header if --per-socket and --metric-only set
        perf python: Check if clang supports -fno-semantic-interposition
        tools arch x86: Sync the msr-index.h copy with the kernel sources
      b7374586
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 80ade29e
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "A set of fixes/updates for the interrupt subsystem:
      
         - Remove setup_irq() and remove_irq(). All users have been converted
           so remove them before new users surface.
      
         - A set of bugfixes for various interrupt chip drivers
      
         - Add a few missing static attributes to address sparse warnings"
      
      * tag 'irq-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/irq-bcm7038-l1: Make bcm7038_l1_of_init() static
        irqchip/irq-mvebu-icu: Make legacy_bindings static
        irqchip/meson-gpio: Fix HARDIRQ-safe -> HARDIRQ-unsafe lock order
        irqchip/sifive-plic: Fix maximum priority threshold value
        irqchip/ti-sci-inta: Fix processing of masked irqs
        irqchip/mbigen: Free msi_desc on device teardown
        irqchip/gic-v4.1: Update effective affinity of virtual SGIs
        irqchip/gic-v4.1: Add support for VPENDBASER's Dirty+Valid signaling
        genirq: Remove setup_irq() and remove_irq()
      80ade29e
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 08dd3872
      Linus Torvalds authored
      Pull scheduler fixes from Thomas Gleixner:
       "Two fixes for the scheduler:
      
         - Work around an uninitialized variable warning where GCC can't
           figure it out.
      
         - Allow 'isolcpus=' to skip unknown subparameters so that older
           kernels work with the commandline of a newer kernel. Improve the
           error output while at it"
      
      * tag 'sched-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/vtime: Work around an unitialized variable warning
        sched/isolation: Allow "isolcpus=" to skip unknown sub-parameters
      08dd3872
    • Linus Torvalds's avatar
      Merge tag 'core-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5e7de581
      Linus Torvalds authored
      Pull RCU fix from Thomas Gleixner:
       "A single bugfix for RCU to prevent taking a lock in NMI context"
      
      * tag 'core-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        rcu: Don't acquire lock in NMI handler in rcu_nmi_enter_common()
      5e7de581
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 439f1da9
      Linus Torvalds authored
      Pull ext4 fixes from Ted Ts'o:
       "Miscellaneous bug fixes and cleanups for ext4, including a fix for
        generic/388 in data=journal mode, removing some BUG_ON's, and cleaning
        up some compiler warnings"
      
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: convert BUG_ON's to WARN_ON's in mballoc.c
        ext4: increase wait time needed before reuse of deleted inode numbers
        ext4: remove set but not used variable 'es' in ext4_jbd2.c
        ext4: remove set but not used variable 'es'
        ext4: do not zeroout extents beyond i_disksize
        ext4: fix return-value types in several function comments
        ext4: use non-movable memory for superblock readahead
        ext4: use matching invalidatepage in ext4_writepage
      439f1da9
    • Linus Torvalds's avatar
      Merge tag '5.7-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · aee0314b
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Three small smb3 fixes: two debug related (helping network tracing for
        SMB2 mounts, and the other removing an unintended debug line on
        signing failures), and one fixing a performance problem with 64K
        pages"
      
      * tag '5.7-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: remove overly noisy debug line in signing errors
        cifs: improve read performance for page size 64KB & cache=strict & vers=2.1+
        cifs: dump the session id and keys also for SMB2 sessions
      aee0314b
    • Linus Torvalds's avatar
      Merge tag 'flexible-array-member-5.7-rc2' of... · 13402837
      Linus Torvalds authored
      Merge tag 'flexible-array-member-5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
      
      Pull flexible-array member conversion from Gustavo Silva:
       "The current codebase makes use of the zero-length array language
        extension to the C90 standard, but the preferred mechanism to declare
        variable-length types such as these ones is a flexible array
        member[1][2], introduced in C99:
      
          struct foo {
              int stuff;
              struct boo array[];
          };
      
        By making use of the mechanism above, we will get a compiler warning
        in case the flexible array does not occur last in the structure, which
        will help us prevent some kind of undefined behavior bugs from being
        inadvertently introduced[3] to the codebase from now on.
      
        Also, notice that, dynamic memory allocations won't be affected by
        this change:
      
         "Flexible array members have incomplete type, and so the sizeof
          operator may not be applied. As a quirk of the original
          implementation of zero-length arrays, sizeof evaluates to zero."[1]
      
        sizeof(flexible-array-member) triggers a warning because flexible
        array members have incomplete type[1]. There are some instances of
        code in which the sizeof operator is being incorrectly/erroneously
        applied to zero-length arrays and the result is zero. Such instances
        may be hiding some bugs. So, this work (flexible-array member
        convertions) will also help to get completely rid of those sorts of
        issues.
      
        Notice that all of these patches have been baking in linux-next for
        quite a while now and, 238 more of these patches have already been
        merged into 5.7-rc1.
      
        There are a couple hundred more of these issues waiting to be
        addressed in the whole codebase"
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      
      * tag 'flexible-array-member-5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (28 commits)
        xattr.h: Replace zero-length array with flexible-array member
        uapi: linux: fiemap.h: Replace zero-length array with flexible-array member
        uapi: linux: dlm_device.h: Replace zero-length array with flexible-array member
        tpm_eventlog.h: Replace zero-length array with flexible-array member
        ti_wilink_st.h: Replace zero-length array with flexible-array member
        swap.h: Replace zero-length array with flexible-array member
        skbuff.h: Replace zero-length array with flexible-array member
        sched: topology.h: Replace zero-length array with flexible-array member
        rslib.h: Replace zero-length array with flexible-array member
        rio.h: Replace zero-length array with flexible-array member
        posix_acl.h: Replace zero-length array with flexible-array member
        platform_data: wilco-ec.h: Replace zero-length array with flexible-array member
        memcontrol.h: Replace zero-length array with flexible-array member
        list_lru.h: Replace zero-length array with flexible-array member
        lib: cpu_rmap: Replace zero-length array with flexible-array member
        irq.h: Replace zero-length array with flexible-array member
        ihex.h: Replace zero-length array with flexible-array member
        igmp.h: Replace zero-length array with flexible-array member
        genalloc.h: Replace zero-length array with flexible-array member
        ethtool.h: Replace zero-length array with flexible-array member
        ...
      13402837