Skip to content
  1. Mar 06, 2019
    • Christian Brauner's avatar
      selftests: add tests for pidfd_send_signal() · 575a0ae9
      Christian Brauner authored
      As suggested by Andrew Morton in [1] add selftests for the new
      sys_pidfd_send_signal() syscall:
      
      /* test_pidfd_send_signal_syscall_support */
      Test whether the pidfd_send_signal() syscall is supported and the tests can
      be run or need to be skipped.
      
      /* test_pidfd_send_signal_simple_success */
      Test whether sending a signal via a pidfd works.
      
      /* test_pidfd_send_signal_exited_fail */
      Verify that sending a signal to an already exited process fails with ESRCH.
      
      /* test_pidfd_send_signal_recycled_pid_fail */
      Verify that a recycled pid cannot be signaled via a pidfd referring to an
      already exited process that had the same pid (cf. [2], [3]).
      
      [1]: https://lore.kernel.org/lkml/20181228152012.dbf0508c2508138efc5f2bbe@linux-foundation.org/
      [2]: https://lore.kernel.org/lkml/20181230210245.GA30252@mail.hallyn.com/
      [3]: https://lore.kernel.org/lkml/20181230232711.7aayb7vnhogbv4co@brauner.io/
      
      
      
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: Andy Lutomirsky <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Florian Weimer <fweimer@redhat.com>
      Signed-off-by: default avatarChristian Brauner <christian@brauner.io>
      Reviewed-by: default avatarTycho Andersen <tycho@tycho.ws>
      Acked-by: default avatarSerge Hallyn <serge@hallyn.com>
      575a0ae9
    • Christian Brauner's avatar
      signal: add pidfd_send_signal() syscall · 3eb39f47
      Christian Brauner authored
      The kill() syscall operates on process identifiers (pid). After a process
      has exited its pid can be reused by another process. If a caller sends a
      signal to a reused pid it will end up signaling the wrong process. This
      issue has often surfaced and there has been a push to address this problem [1].
      
      This patch uses file descriptors (fd) from proc/<pid> as stable handles on
      struct pid. Even if a pid is recycled the handle will not change. The fd
      can be used to send signals to the process it refers to.
      Thus, the new syscall pidfd_send_signal() is introduced to solve this
      problem. Instead of pids it operates on process fds (pidfd).
      
      /* prototype and argument /*
      long pidfd_send_signal(int pidfd, int sig, siginfo_t *info, unsigned int flags);
      
      /* syscall number 424 */
      The syscall number was chosen to be 424 to align with Arnd's rework in his
      y2038 to minimize merge conflicts (cf. [25]).
      
      In addition to the pidfd and signal argument it takes an additional
      siginfo_t and flags argument. If the siginfo_t argument is NULL then
      pidfd_send_signal() is equivalent to kill(<positive-pid>, <signal>). If it
      is not NULL pidfd_send_signal() is equivalent to rt_sigqueueinfo().
      The flags argument is added to allow for future extensions of this syscall.
      It currently needs to be passed as 0. Failing to do so will cause EINVAL.
      
      /* pidfd_send_signal() replaces multiple pid-based syscalls */
      The pidfd_send_signal() syscall currently takes on the job of
      rt_sigqueueinfo(2) and parts of the functionality of kill(2), Namely, when a
      positive pid is passed to kill(2). It will however be possible to also
      replace tgkill(2) and rt_tgsigqueueinfo(2) if this syscall is extended.
      
      /* sending signals to threads (tid) and process groups (pgid) */
      Specifically, the pidfd_send_signal() syscall does currently not operate on
      process groups or threads. This is left for future extensions.
      In order to extend the syscall to allow sending signal to threads and
      process groups appropriately named flags (e.g. PIDFD_TYPE_PGID, and
      PIDFD_TYPE_TID) should be added. This implies that the flags argument will
      determine what is signaled and not the file descriptor itself. Put in other
      words, grouping in this api is a property of the flags argument not a
      property of the file descriptor (cf. [13]). Clarification for this has been
      requested by Eric (cf. [19]).
      When appropriate extensions through the flags argument are added then
      pidfd_send_signal() can additionally replace the part of kill(2) which
      operates on process groups as well as the tgkill(2) and
      rt_tgsigqueueinfo(2) syscalls.
      How such an extension could be implemented has been very roughly sketched
      in [14], [15], and [16]. However, this should not be taken as a commitment
      to a particular implementation. There might be better ways to do it.
      Right now this is intentionally left out to keep this patchset as simple as
      possible (cf. [4]).
      
      /* naming */
      The syscall had various names throughout iterations of this patchset:
      - procfd_signal()
      - procfd_send_signal()
      - taskfd_send_signal()
      In the last round of reviews it was pointed out that given that if the
      flags argument decides the scope of the signal instead of different types
      of fds it might make sense to either settle for "procfd_" or "pidfd_" as
      prefix. The community was willing to accept either (cf. [17] and [18]).
      Given that one developer expressed strong preference for the "pidfd_"
      prefix (cf. [13]) and with other developers less opinionated about the name
      we should settle for "pidfd_" to avoid further bikeshedding.
      
      The  "_send_signal" suffix was chosen to reflect the fact that the syscall
      takes on the job of multiple syscalls. It is therefore intentional that the
      name is not reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the
      fomer because it might imply that pidfd_send_signal() is a replacement for
      kill(2), and not the latter because it is a hassle to remember the correct
      spelling - especially for non-native speakers - and because it is not
      descriptive enough of what the syscall actually does. The name
      "pidfd_send_signal" makes it very clear that its job is to send signals.
      
      /* zombies */
      Zombies can be signaled just as any other process. No special error will be
      reported since a zombie state is an unreliable state (cf. [3]). However,
      this can be added as an extension through the @flags argument if the need
      ever arises.
      
      /* cross-namespace signals */
      The patch currently enforces that the signaler and signalee either are in
      the same pid namespace or that the signaler's pid namespace is an ancestor
      of the signalee's pid namespace. This is done for the sake of simplicity
      and because it is unclear to what values certain members of struct
      siginfo_t would need to be set to (cf. [5], [6]).
      
      /* compat syscalls */
      It became clear that we would like to avoid adding compat syscalls
      (cf. [7]).  The compat syscall handling is now done in kernel/signal.c
      itself by adding __copy_siginfo_from_user_generic() which lets us avoid
      compat syscalls (cf. [8]). It should be noted that the addition of
      __copy_siginfo_from_user_any() is caused by a bug in the original
      implementation of rt_sigqueueinfo(2) (cf. 12).
      With upcoming rework for syscall handling things might improve
      significantly (cf. [11]) and __copy_siginfo_from_user_any() will not gain
      any additional callers.
      
      /* testing */
      This patch was tested on x64 and x86.
      
      /* userspace usage */
      An asciinema recording for the basic functionality can be found under [9].
      With this patch a process can be killed via:
      
       #define _GNU_SOURCE
       #include <errno.h>
       #include <fcntl.h>
       #include <signal.h>
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
       #include <sys/stat.h>
       #include <sys/syscall.h>
       #include <sys/types.h>
       #include <unistd.h>
      
       static inline int do_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
                                               unsigned int flags)
       {
       #ifdef __NR_pidfd_send_signal
               return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
       #else
               return -ENOSYS;
       #endif
       }
      
       int main(int argc, char *argv[])
       {
               int fd, ret, saved_errno, sig;
      
               if (argc < 3)
                       exit(EXIT_FAILURE);
      
               fd = open(argv[1], O_DIRECTORY | O_CLOEXEC);
               if (fd < 0) {
                       printf("%s - Failed to open \"%s\"\n", strerror(errno), argv[1]);
                       exit(EXIT_FAILURE);
               }
      
               sig = atoi(argv[2]);
      
               printf("Sending signal %d to process %s\n", sig, argv[1]);
               ret = do_pidfd_send_signal(fd, sig, NULL, 0);
      
               saved_errno = errno;
               close(fd);
               errno = saved_errno;
      
               if (ret < 0) {
                       printf("%s - Failed to send signal %d to process %s\n",
                              strerror(errno), sig, argv[1]);
                       exit(EXIT_FAILURE);
               }
      
               exit(EXIT_SUCCESS);
       }
      
      /* Q&A
       * Given that it seems the same questions get asked again by people who are
       * late to the party it makes sense to add a Q&A section to the commit
       * message so it's hopefully easier to avoid duplicate threads.
       *
       * For the sake of progress please consider these arguments settled unless
       * there is a new point that desperately needs to be addressed. Please make
       * sure to check the links to the threads in this commit message whether
       * this has not already been covered.
       */
      Q-01: (Florian Weimer [20], Andrew Morton [21])
            What happens when the target process has exited?
      A-01: Sending the signal will fail with ESRCH (cf. [22]).
      
      Q-02:  (Andrew Morton [21])
             Is the task_struct pinned by the fd?
      A-02:  No. A reference to struct pid is kept. struct pid - as far as I
             understand - was created exactly for the reason to not require to
             pin struct task_struct (cf. [22]).
      
      Q-03: (Andrew Morton [21])
            Does the entire procfs directory remain visible? Just one entry
            within it?
      A-03: The same thing that happens right now when you hold a file descriptor
            to /proc/<pid> open (cf. [22]).
      
      Q-04: (Andrew Morton [21])
            Does the pid remain reserved?
      A-04: No. This patchset guarantees a stable handle not that pids are not
            recycled (cf. [22]).
      
      Q-05: (Andrew Morton [21])
            Do attempts to signal that fd return errors?
      A-05: See {Q,A}-01.
      
      Q-06: (Andrew Morton [22])
            Is there a cleaner way of obtaining the fd? Another syscall perhaps.
      A-06: Userspace can already trivially retrieve file descriptors from procfs
            so this is something that we will need to support anyway. Hence,
            there's no immediate need to add another syscalls just to make
            pidfd_send_signal() not dependent on the presence of procfs. However,
            adding a syscalls to get such file descriptors is planned for a
            future patchset (cf. [22]).
      
      Q-07: (Andrew Morton [21] and others)
            This fd-for-a-process sounds like a handy thing and people may well
            think up other uses for it in the future, probably unrelated to
            signals. Are the code and the interface designed to permit such
            future applications?
      A-07: Yes (cf. [22]).
      
      Q-08: (Andrew Morton [21] and others)
            Now I think about it, why a new syscall? This thing is looking
            rather like an ioctl?
      A-08: This has been extensively discussed. It was agreed that a syscall is
            preferred for a variety or reasons. Here are just a few taken from
            prior threads. Syscalls are safer than ioctl()s especially when
            signaling to fds. Processes are a core kernel concept so a syscall
            seems more appropriate. The layout of the syscall with its four
            arguments would require the addition of a custom struct for the
            ioctl() thereby causing at least the same amount or even more
            complexity for userspace than a simple syscall. The new syscall will
            replace multiple other pid-based syscalls (see description above).
            The file-descriptors-for-processes concept introduced with this
            syscall will be extended with other syscalls in the future. See also
            [22], [23] and various other threads already linked in here.
      
      Q-09: (Florian Weimer [24])
            What happens if you use the new interface with an O_PATH descriptor?
      A-09:
            pidfds opened as O_PATH fds cannot be used to send signals to a
            process (cf. [2]). Signaling processes through pidfds is the
            equivalent of writing to a file. Thus, this is not an operation that
            operates "purely at the file descriptor level" as required by the
            open(2) manpage. See also [4].
      
      /* References */
      [1]:  https://lore.kernel.org/lkml/20181029221037.87724-1-dancol@google.com/
      [2]:  https://lore.kernel.org/lkml/874lbtjvtd.fsf@oldenburg2.str.redhat.com/
      [3]:  https://lore.kernel.org/lkml/20181204132604.aspfupwjgjx6fhva@brauner.io/
      [4]:  https://lore.kernel.org/lkml/20181203180224.fkvw4kajtbvru2ku@brauner.io/
      [5]:  https://lore.kernel.org/lkml/20181121213946.GA10795@mail.hallyn.com/
      [6]:  https://lore.kernel.org/lkml/20181120103111.etlqp7zop34v6nv4@brauner.io/
      [7]:  https://lore.kernel.org/lkml/36323361-90BD-41AF-AB5B-EE0D7BA02C21@amacapital.net/
      [8]:  https://lore.kernel.org/lkml/87tvjxp8pc.fsf@xmission.com/
      [9]:  https://asciinema.org/a/IQjuCHew6bnq1cr78yuMv16cy
      [11]: https://lore.kernel.org/lkml/F53D6D38-3521-4C20-9034-5AF447DF62FF@amacapital.net/
      [12]: https://lore.kernel.org/lkml/87zhtjn8ck.fsf@xmission.com/
      [13]: https://lore.kernel.org/lkml/871s6u9z6u.fsf@xmission.com/
      [14]: https://lore.kernel.org/lkml/20181206231742.xxi4ghn24z4h2qki@brauner.io/
      [15]: https://lore.kernel.org/lkml/20181207003124.GA11160@mail.hallyn.com/
      [16]: https://lore.kernel.org/lkml/20181207015423.4miorx43l3qhppfz@brauner.io/
      [17]: https://lore.kernel.org/lkml/CAGXu5jL8PciZAXvOvCeCU3wKUEB_dU-O3q0tDw4uB_ojMvDEew@mail.gmail.com/
      [18]: https://lore.kernel.org/lkml/20181206222746.GB9224@mail.hallyn.com/
      [19]: https://lore.kernel.org/lkml/20181208054059.19813-1-christian@brauner.io/
      [20]: https://lore.kernel.org/lkml/8736rebl9s.fsf@oldenburg.str.redhat.com/
      [21]: https://lore.kernel.org/lkml/20181228152012.dbf0508c2508138efc5f2bbe@linux-foundation.org/
      [22]: https://lore.kernel.org/lkml/20181228233725.722tdfgijxcssg76@brauner.io/
      [23]: https://lwn.net/Articles/773459/
      [24]: https://lore.kernel.org/lkml/8736rebl9s.fsf@oldenburg.str.redhat.com/
      [25]: https://lore.kernel.org/lkml/CAK8P3a0ej9NcJM8wXNPbcGUyOUZYX+VLoDFdbenW3s3114oQZw@mail.gmail.com/
      
      
      
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Andy Lutomirsky <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Florian Weimer <fweimer@redhat.com>
      Signed-off-by: default avatarChristian Brauner <christian@brauner.io>
      Reviewed-by: default avatarTycho Andersen <tycho@tycho.ws>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarSerge Hallyn <serge@hallyn.com>
      Acked-by: default avatarAleksa Sarai <cyphar@cyphar.com>
      3eb39f47
  2. Jan 28, 2019
    • Linus Torvalds's avatar
      Linux 5.0-rc4 · f17b5f06
      Linus Torvalds authored
      f17b5f06
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8a5f0605
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "A set of fixes for x86:
      
         - Fix the swapped outb() parameters in the KASLR code
      
         - Fix the PKEY handling at fork which missed to preserve the pkey
           state for the child. Comes with a test case to validate that.
      
         - Fix the entry stack handling for XEN PV to respect that XEN PV
           systems enter the function already on the current thread stack and
           not on the trampoline.
      
         - Fix kexec load failure caused by using a stale value when the
           kexec_buf structure is reused for subsequent allocations.
      
         - Fix a bogus sizeof() in the memory encryption code
      
         - Enforce PCI dependency for the Intel Low Power Subsystem
      
         - Enforce PCI_LOCKLESS_CONFIG when PCI is enabled"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/Kconfig: Select PCI_LOCKLESS_CONFIG if PCI is enabled
        x86/entry/64/compat: Fix stack switching for XEN PV
        x86/kexec: Fix a kexec_file_load() failure
        x86/mm/mem_encrypt: Fix erroneous sizeof()
        x86/selftests/pkeys: Fork() to check for state being preserved
        x86/pkeys: Properly copy pkey state at fork()
        x86/kaslr: Fix incorrect i8254 outb() parameters
        x86/intel/lpss: Make PCI dependency explicit
      8a5f0605
    • Linus Torvalds's avatar
      Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 351e1aa6
      Linus Torvalds authored
      Pull x86 timer fixes from Thomas Gleixner:
       "Two commits which were missed to be sent during the merge window.
      
         - The TSC calibration fix turns out to be more urgent as recent
           Skylake-X systems seem to have massive trouble with calibration
           disturbance. This should go back into stable for that reason and it
           the risk of breakage is rather low.
      
         - Drop an unused define"
      
      * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/hpet: Remove unused FSEC_PER_NSEC define
        x86/tsc: Make calibration refinement more robust
      351e1aa6
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f907bb4c
      Linus Torvalds authored
      Pull timer fix from Thomas Glexiner:
       "A single regression fix to address the unintended breakage of posix
        cpu timers.
      
        This is caused by a new sanity check in the common code, which fails
        for posix cpu timers under certain conditions because the posix cpu
        timer code never updates the variable which is checked"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        posix-cpu-timers: Unbreak timer rearming
      f907bb4c
    • Linus Torvalds's avatar
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 98810518
      Linus Torvalds authored
      Pull locking fixes from Thomas Gleixner:
       "A small series of fixes which all address possible missed wakeups:
      
         - Document and fix the wakeup ordering of wake_q
      
         - Add the missing barrier in rcuwait_wake_up(), which was documented
           in the comment but missing in the code
      
         - Fix the possible missed wakeups in the rwsem and futex code"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/rwsem: Fix (possible) missed wakeup
        futex: Fix (possible) missed wakeup
        sched/wake_q: Fix wakeup ordering for wake_q
        sched/wake_q: Document wake_q_add()
        sched/wait: Fix rcuwait_wake_up() ordering
      98810518
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0d484375
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "A small set of fixes for the interrupt subsystem:
      
         - Fix a double increment in the irq descriptor allocator which
           resulted in a sanity check only being done for every second
           affinity mask
      
         - Add a missing device tree translation in the stm32-exti driver.
           Without that the interrupt association is completely wrong.
      
         - Initialize the mutex in the GIC-V3 MBI driver
      
         - Fix the alignment for aliasing devices in the GIC-V3-ITS driver so
           multi MSI allocations work correctly
      
         - Ensure that the initial affinity of a interrupt is not empty at
           startup time.
      
         - Drop bogus include in the madera irq chip driver
      
         - Fix KernelDoc regression"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/gic-v3-its: Align PCI Multi-MSI allocation on their size
        genirq/irqdesc: Fix double increment in alloc_descs()
        genirq: Fix the kerneldoc comment for struct irq_affinity_desc
        irqchip/madera: Drop GPIO includes
        irqchip/gic-v3-mbi: Fix uninitialized mbi_lock
        irqchip/stm32-exti: Add domain translate function
        genirq: Make sure the initial affinity is not empty
      0d484375
    • Linus Torvalds's avatar
      Merge tag 'edac_fix_for_5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp · 98354243
      Linus Torvalds authored
      Pull EDAC fix from Borislav Petkov:
       "Fix persistent register offsets of altera_edac, from Thor Thayer"
      
      * tag 'edac_fix_for_5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
        EDAC, altera: Fix S10 persistent register offset
      98354243
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20190127' of git://git.kernel.dk/linux-block · 419967d5
      Linus Torvalds authored
      Pull block revert from Jens Axboe:
       "Silly error snuck into a patch from the last series, let's do a revert
        to avoid a potential use-after-free"
      
      * tag 'for-linus-20190127' of git://git.kernel.dk/linux-block:
        Revert "block: cover another queue enter recursion via BIO_QUEUE_ENTERED"
      419967d5
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 1fc7f56d
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "Quite a few fixes for x86: nested virtualization save/restore, AMD
        nested virtualization and virtual APIC, 32-bit fixes, an important fix
        to restore operation on older processors, and a bunch of hyper-v
        bugfixes. Several are marked stable.
      
        There are also fixes for GCC warnings and for a GCC/objtool interaction"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: Mark expected switch fall-throughs
        KVM: x86: fix TRACE_INCLUDE_PATH and remove -I. header search paths
        KVM: selftests: check returned evmcs version range
        x86/kvm/hyper-v: nested_enable_evmcs() sets vmcs_version incorrectly
        KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function
        kvm: selftests: Fix region overlap check in kvm_util
        kvm: vmx: fix some -Wmissing-prototypes warnings
        KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1
        svm: Fix AVIC incomplete IPI emulation
        svm: Add warning message for AVIC IPI invalid target
        KVM: x86: WARN_ONCE if sending a PV IPI returns a fatal error
        KVM: x86: Fix PV IPIs for 32-bit KVM host
        x86/kvm/hyper-v: recommend using eVMCS only when it is enabled
        x86/kvm/hyper-v: don't recommend doing reset via synthetic MSR
        kvm: x86/vmx: Use kzalloc for cached_vmcs12
        KVM: VMX: Use the correct field var when clearing VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL
        KVM: x86: Fix single-step debugging
        x86/kvm/hyper-v: don't announce GUEST IDLE MSR support
      1fc7f56d
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-5.0-2' of git://git.infradead.org/users/hch/dma-mapping · c180f1b0
      Linus Torvalds authored
      Pull dma-mapping fix from Christoph Hellwig:
       "Fix a xen-swiotlb regression on arm64"
      
      * tag 'dma-mapping-5.0-2' of git://git.infradead.org/users/hch/dma-mapping:
        arm64/xen: fix xen-swiotlb cache flushing
      c180f1b0
    • Linus Torvalds's avatar
      Merge tag 'libnvdimm-fixes-5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 6a2651b5
      Linus Torvalds authored
      Pull libnvdimm fixes from Dan Williams:
       "A fix for namespace label support for non-Intel NVDIMMs that implement
        the ACPI standard label method.
      
        This has apparently never worked and could wait for v5.1. However it
        has enough visibility with hardware vendors [1] and distro bug
        trackers [2], and low enough risk that I decided it should go in for
        -rc4. The other fixups target the new, for v5.0, nvdimm security
        functionality. The larger init path fixup closes a memory leak and a
        potential userspace lockup due to missed notifications.
      
          [1] https://github.com/pmem/ndctl/issues/78
          [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1811785
      
        These have all soaked in -next for a week with no reported issues.
      
        Summary:
      
         - Fix support for NVDIMMs that implement the ACPI standard label
           methods.
      
         - Fix error handling for security overwrite (memory leak / userspace
           hang condition), and another one-line security cleanup"
      
      * tag 'libnvdimm-fixes-5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        acpi/nfit: Fix command-supported detection
        acpi/nfit: Block function zero DSMs
        libnvdimm/security: Require nvdimm_security_setup_events() to succeed
        nfit_test: fix security state pull for nvdimm security nfit_test
      6a2651b5
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 78e372e6
      Linus Torvalds authored
      Pull input fixes from Dmitry Torokhov:
       "A fixup for the input_event fix for y2038 Sparc64, and couple other
        minor fixes"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: input_event - fix the CONFIG_SPARC64 mixup
        Input: olpc_apsp - assign priv->dev earlier
        Input: uinput - fix undefined behavior in uinput_validate_absinfo()
        Input: raspberrypi-ts - fix link error
        Input: xpad - add support for SteelSeries Stratus Duo
        Input: input_event - provide override for sparc64
      78e372e6
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 037222ad
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Count ttl-dropped frames properly in mac80211, from Bob Copeland.
      
       2) Integer overflow in ktime handling of bcm can code, from Oliver
          Hartkopp.
      
       3) Fix RX desc handling wrt. hw checksumming in ravb, from Simon
          Horman.
      
       4) Various hash key fixes in hv_netvsc, from Haiyang Zhang.
      
       5) Use after free in ax25, from Eric Dumazet.
      
       6) Several fixes to the SSN support in SCTP, from Xin Long.
      
       7) Do not process frames after a NAPI reschedule in ibmveth, from
          Thomas Falcon.
      
       8) Fix NLA_POLICY_NESTED arguments, from Johannes Berg.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (42 commits)
        qed: Revert error handling changes.
        cfg80211: extend range deviation for DMG
        cfg80211: reg: remove warn_on for a normal case
        mac80211: Add attribute aligned(2) to struct 'action'
        mac80211: don't initiate TDLS connection if station is not associated to AP
        nl80211: fix NLA_POLICY_NESTED() arguments
        ibmveth: Do not process frames after calling napi_reschedule
        net: dev_is_mac_header_xmit() true for ARPHRD_RAWIP
        net: usb: asix: ax88772_bind return error when hw_reset fail
        MAINTAINERS: Update cavium networking drivers
        net/mlx4_core: Fix error handling when initializing CQ bufs in the driver
        net/mlx4_core: Add masking for a few queries on HCA caps
        sctp: set flow sport from saddr only when it's 0
        sctp: set chunk transport correctly when it's a new asoc
        sctp: improve the events for sctp stream adding
        sctp: improve the events for sctp stream reset
        ip_tunnel: Make none-tunnel-dst tunnel port work with lwtunnel
        ax25: fix possible use-after-free
        sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe
        hv_netvsc: fix typos in code comments
        ...
      037222ad
  3. Jan 27, 2019
    • Jens Axboe's avatar
      Revert "block: cover another queue enter recursion via BIO_QUEUE_ENTERED" · 947b7ac1
      Jens Axboe authored
      We can't touch a bio after ->make_request_fn(), for all we know it could
      already have been completed by the time this function returns.
      
      This reverts commit 698cef17
      
      .
      
      Reported-by: default avatar <syzbot+4df6ca820108fd248943@syzkaller.appspotmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      947b7ac1
    • Linus Torvalds's avatar
      Merge tag '5.0-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 7c2614bf
      Linus Torvalds authored
      Pull smb3 fixes from Steve French:
       "A set of small smb3 fixes, some fixing various crediting issues
        discovered during xfstest runs, five for stable"
      
      * tag '5.0-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: print CIFSMaxBufSize as part of /proc/fs/cifs/DebugData
        smb3: add credits we receive from oplock/break PDUs
        CIFS: Fix mounts if the client is low on credits
        CIFS: Do not assume one credit for async responses
        CIFS: Fix credit calculations in compound mid callback
        CIFS: Fix credit calculation for encrypted reads with errors
        CIFS: Fix credits calculations for reads with errors
        CIFS: Do not reconnect TCP session in add_credits()
        smb3: Cleanup license mess
        CIFS: Fix possible hang during async MTU reads and writes
        cifs: fix memory leak of an allocated cifs_ntsd structure
      7c2614bf
    • Linus Torvalds's avatar
      Merge tag 'vfio-v5.0-rc4' of git://github.com/awilliam/linux-vfio · 2580acb2
      Linus Torvalds authored
      Pull VFIO fixes from Alex Williamson:
      
       - cleanup licenses in new files (Thomas Gleixner)
      
       - cleanup new compiler warnings (Alexey Kardashevskiy)
      
      * tag 'vfio-v5.0-rc4' of git://github.com/awilliam/linux-vfio:
        vfio-pci/nvlink2: Fix ancient gcc warnings
        vfio/pci: Cleanup license mess
      2580acb2
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 7930851e
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Six fixes, all of which appear to have user visible consequences.
      
        The DMA one is a regression fix from the merge window and of the
        others, four are driver specific and one specific to the target code"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: ufs: Use explicit access size in ufshcd_dump_regs
        scsi: tcmu: fix use after free
        scsi: csiostor: fix NULL pointer dereference in csio_vport_set_state()
        scsi: lpfc: nvmet: avoid hang / use-after-free when destroying targetport
        scsi: lpfc: nvme: avoid hang / use-after-free when destroying localport
        scsi: communicate max segment size to the DMA mapping code
      7930851e
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20190125' of git://git.kernel.dk/linux-block · 6b8f9159
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A collection of fixes for this release. This contains:
      
         - Silence sparse rightfully complaining about non-static wbt
           functions (Bart)
      
         - Fixes for the zoned comments/ioctl documentation (Damien)
      
         - direct-io fix that's been lingering for a while (Ernesto)
      
         - cgroup writeback fix (Tejun)
      
         - Set of NVMe patches for nvme-rdma/tcp (Sagi, Hannes, Raju)
      
         - Block recursion tracking fix (Ming)
      
         - Fix debugfs command flag naming for a few flags (Jianchao)"
      
      * tag 'for-linus-20190125' of git://git.kernel.dk/linux-block:
        block: Fix comment typo
        uapi: fix ioctl documentation
        blk-wbt: Declare local functions static
        blk-mq: fix the cmd_flag_name array
        nvme-multipath: drop optimization for static ANA group IDs
        nvmet-rdma: fix null dereference under heavy load
        nvme-rdma: rework queue maps handling
        nvme-tcp: fix timeout handler
        nvme-rdma: fix timeout handler
        writeback: synchronize sync(2) against cgroup writeback membership switches
        block: cover another queue enter recursion via BIO_QUEUE_ENTERED
        direct-io: allow direct writes to empty inodes
      6b8f9159
  4. Jan 26, 2019
    • David S. Miller's avatar
      qed: Revert error handling changes. · abfd04f7
      David S. Miller authored
      This is new code and not bug fixes.
      
      This reverts all changes added by merge commit
      8fb18be9
      
      
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      abfd04f7
    • Linus Torvalds's avatar
      Merge tag 'mmc-v5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · ba606975
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
      
       - sdhci-acpi: Fixup build dependency for PCI
      
       - sdhci-omap: Resolve Kconfig warnings on keystone
      
       - sdhci-iproc: Propagate errors from DT parsing
      
       - meson-gx: Fixup IRQ handling in release callback
      
       - meson-gx: Use signal re-sampling to fixup tuning
      
       - dw_mmc-bluefield: Fix the license information
      
      * tag 'mmc-v5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: dw_mmc-bluefield: : Fix the license information
        mmc: meson-gx: enable signal re-sampling together with tuning
        mmc: sdhci-iproc: handle mmc_of_parse() errors during probe
        mmc: meson-gx: Free irq in release() callback
        mmc: host: Fix Kconfig warnings on keystone_defconfig
        mmc: sdhci-acpi: Make PCI dependency explicit
      ba606975
    • Linus Torvalds's avatar
      Merge tag 'char-misc-5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · d488bd21
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are some small char and misc driver fixes to resolve some
        reported issues, as well as a number of binderfs fixups that were
        found after auditing the filesystem code by Al Viro. As binderfs
        hasn't been in a previous release yet, it's good to get these in now
        before the first users show up.
      
        All of these have been in linux-next for a bit with no reported
        issues"
      
      * tag 'char-misc-5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (26 commits)
        i3c: master: Fix an error checking typo in 'cdns_i3c_master_probe()'
        binderfs: switch from d_add() to d_instantiate()
        binderfs: drop lock in binderfs_binder_ctl_create
        binderfs: kill_litter_super() before cleanup
        binderfs: rework binderfs_binder_device_create()
        binderfs: rework binderfs_fill_super()
        binderfs: prevent renaming the control dentry
        binderfs: remove outdated comment
        binderfs: use __u32 for device numbers
        binderfs: use correct include guards in header
        misc: pvpanic: fix warning implicit declaration
        char/mwave: fix potential Spectre v1 vulnerability
        misc: ibmvsm: Fix potential NULL pointer dereference
        binderfs: fix error return code in binderfs_fill_super()
        mei: me: add denverton innovation engine device IDs
        mei: me: mark LBG devices as having dma support
        mei: dma: silent the reject message
        binderfs: handle !CONFIG_IPC_NS builds
        binderfs: reserve devices for initial mount
        binderfs: rename header to binderfs.h
        ...
      d488bd21
    • Linus Torvalds's avatar
      Merge tag 'staging-5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 96f18cb8
      Linus Torvalds authored
      Pull staging driver fixes from Greg KH:
       "Here are some small staging driver fixes for 5.0-rc4.
      
        They resolve some reported bugs and add a new device id for one
        driver. Nothing major at all, but all good to have.
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'staging-5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        staging: android: ion: Support cpu access during dma_buf_detach
        staging: rtl8723bs: Fix build error with Clang when inlining is disabled
        staging: rtl8188eu: Add device code for D-Link DWA-121 rev B1
        staging: vchiq: Fix local event signalling
        Staging: wilc1000: unlock on error in init_chip()
        staging: wilc1000: fix memory leak in wilc_add_rx_gtk
        staging: wilc1000: fix registration frame size
      96f18cb8
    • Linus Torvalds's avatar
      Merge tag 'tty-5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 473721f9
      Linus Torvalds authored
      Pull tty/serial driver fixes from Greg KH:
       "Here are a number of small tty core and serial driver fixes for
        5.0-rc4 to resolve some reported issues.
      
        Nothing major, the small serial driver fixes, a tty core fixup for a
        crash that was reported, and some good vt fixes from Nicolas Pitre as
        he seems to be auditing that chunk of code a lot lately.
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'tty-5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        serial: fsl_lpuart: fix maximum acceptable baud rate with over-sampling
        tty: serial: qcom_geni_serial: Allow mctrl when flow control is disabled
        tty: Handle problem if line discipline does not have receive_buf
        vgacon: unconfuse vc_origin when using soft scrollback
        vt: invoke notifier on screen size change
        vt: always call notifier with the console lock held
        vt: make vt_console_print() compatible with the unicode screen buffer
        tty/n_hdlc: fix __might_sleep warning
        serial: 8250: Fix serial8250 initialization crash
        uart: Fix crash in uart_write and uart_put_char
      473721f9
    • Linus Torvalds's avatar
      Merge tag 'usb-5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · b48cef32
      Linus Torvalds authored
      Pull USB/PHY fixes from Greg KH:
       "Here are a number of small USB and PHY driver fixes for 5.0-rc4.
      
        Nothing major at all, just the usual selection of USB gadget bugfixes,
        some new USB serial driver ids, some SPDX fixes, and some PHY driver
        fixes for reported issues.
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'usb-5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        USB: serial: keyspan_usa: add proper SPDX lines for .h files
        USB: EHCI: ehci-mv: add MODULE_DEVICE_TABLE
        USB: leds: fix regression in usbport led trigger
        usb: chipidea: fix static checker warning for NULL pointer
        MAINTAINERS: email address update in MAINTAINERS entries
        USB: usbip: delete README file
        USB: serial: pl2303: add new PID to support PL2303TB
        usb: dwc2: gadget: Fix Remote Wakeup interrupt bit clearing
        phy: ath79-usb: Fix the main reset name to match the DT binding
        phy: ath79-usb: Fix the power on error path
        phy: fix build breakage: add PHY_MODE_SATA
        phy: ti: ensure priv is not null before dereferencing it
        USB: serial: ftdi_sio: fix GPIO not working in autosuspend
        usb: gadget: Potential NULL dereference on allocation error
        usb: dwc3: gadget: Fix the uninitialized link_state when udc starts
        usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup
        usb: dwc3: gadget: synchronize_irq dwc irq in suspend
        USB: serial: simple: add Motorola Tetra TPG2200 device id
      b48cef32
    • David S. Miller's avatar
      Merge tag 'mac80211-for-davem-2019-01-25' of... · 51795275
      David S. Miller authored
      Merge tag 'mac80211-for-davem-2019-01-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      
      
      Johannes Berg says:
      
      ====================
      Just a few small fixes:
       * avoid trying to operate TDLS when not connection,
         this is not valid and led to issues
       * count TTL-dropped frames in mesh better
       * deal with new WiGig channels in regulatory code
       * remove a WARN_ON() that can trigger due to benign
         races during device/driver registration
       * fix nested netlink policy maxattrs (syzkaller)
       * fix hwsim n_limits (syzkaller)
       * propagate __aligned(2) to a surrounding struct
       * return proper error in virt_wifi error path
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51795275
    • Gustavo A. R. Silva's avatar
      KVM: x86: Mark expected switch fall-throughs · b2869f28
      Gustavo A. R. Silva authored
      
      
      In preparation to enabling -Wimplicit-fallthrough, mark switch
      cases where we are expecting to fall through.
      
      This patch fixes the following warnings:
      
      arch/x86/kvm/lapic.c:1037:27: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/lapic.c:1876:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/hyperv.c:1637:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/svm.c:4396:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/mmu.c:4372:36: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/x86.c:3835:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/x86.c:7938:23: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/vmx/vmx.c:2015:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/vmx/vmx.c:1773:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      
      Warning level 3 was used: -Wimplicit-fallthrough=3
      
      This patch is part of the ongoing efforts to enabling -Wimplicit-fallthrough.
      
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b2869f28
    • Masahiro Yamada's avatar
      KVM: x86: fix TRACE_INCLUDE_PATH and remove -I. header search paths · 5cd5548f
      Masahiro Yamada authored
      
      
      The header search path -I. in kernel Makefiles is very suspicious;
      it allows the compiler to search for headers in the top of $(srctree),
      where obviously no header file exists.
      
      The reason of having -I. here is to make the incorrectly set
      TRACE_INCLUDE_PATH working.
      
      As the comment block in include/trace/define_trace.h says,
      TRACE_INCLUDE_PATH should be a relative path to the define_trace.h
      
      Fix the TRACE_INCLUDE_PATH, and remove the iffy include paths.
      
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5cd5548f
    • Vitaly Kuznetsov's avatar
      KVM: selftests: check returned evmcs version range · 35b531a1
      Vitaly Kuznetsov authored
      
      
      Check that KVM_CAP_HYPERV_ENLIGHTENED_VMCS returns correct version range.
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      35b531a1
    • Vitaly Kuznetsov's avatar
      x86/kvm/hyper-v: nested_enable_evmcs() sets vmcs_version incorrectly · 3a2f5773
      Vitaly Kuznetsov authored
      Commit e2e871ab ("x86/kvm/hyper-v: Introduce nested_get_evmcs_version()
      helper") broke EVMCS enablement: to set vmcs_version we now call
      nested_get_evmcs_version() but this function checks
      enlightened_vmcs_enabled flag which is not yet set so we end up returning
      zero.
      
      Fix the issue by re-arranging things in nested_enable_evmcs().
      
      Fixes: e2e871ab
      
       ("x86/kvm/hyper-v: Introduce nested_get_evmcs_version() helper")
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3a2f5773
    • Sean Christopherson's avatar
      KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function · 5ad6ece8
      Sean Christopherson authored
      ...along with the function's STACK_FRAME_NON_STANDARD tag.  Moving the
      asm blob results in a significantly smaller amount of code that is
      marked with STACK_FRAME_NON_STANDARD, which makes it far less likely
      that gcc will split the function and trigger a spurious objtool warning.
      As a bonus, removing STACK_FRAME_NON_STANDARD from vmx_vcpu_run() allows
      the bulk of code to be properly checked by objtool.
      
      Because %rbp is not loaded via VMCS fields, vmx_vcpu_run() must manually
      save/restore the host's RBP and load the guest's RBP prior to calling
      vmx_vmenter().  Modifying %rbp triggers objtool's stack validation code,
      and so vmx_vcpu_run() is tagged with STACK_FRAME_NON_STANDARD since it's
      impossible to avoid modifying %rbp.
      
      Unfortunately, vmx_vcpu_run() is also a gigantic function that gcc will
      split into separate functions, e.g. so that pieces of the function can
      be inlined.  Splitting the function means that the compiled Elf file
      will contain one or more vmx_vcpu_run.part.* functions in addition to
      a vmx_vcpu_run function.  Depending on where the function is split,
      objtool may warn about a "call without frame pointer save/setup" in
      vmx_vcpu_run.part.* since objtool's stack validation looks for exact
      names when whitelisting functions tagged with STACK_FRAME_NON_STANDARD.
      
      Up until recently, the undesirable function splitting was effectively
      blocked because vmx_vcpu_run() was tagged with __noclone.  At the time,
      __noclone had an unintended side effect that put vmx_vcpu_run() into a
      separate optimization unit, which in turn prevented gcc from inlining
      the function (or any of its own function calls) and thus eliminated gcc's
      motivation to split the function.  Removing the __noclone attribute
      allowed gcc to optimize vmx_vcpu_run(), exposing the objtool warning.
      
      Kudos to Qian Cai for root causing that the fnsplit optimization is what
      caused objtool to complain.
      
      Fixes: 453eafbe
      
       ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
      Tested-by: default avatarQian Cai <cai@lca.pw>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5ad6ece8
    • Ben Gardon's avatar
      kvm: selftests: Fix region overlap check in kvm_util · 94a980c3
      Ben Gardon authored
      
      
      Fix a call to userspace_mem_region_find to conform to its spec of
      taking an inclusive, inclusive range. It was previously being called
      with an inclusive, exclusive range. Also remove a redundant region bounds
      check in vm_userspace_mem_region_add. Region overlap checking is already
      performed by the call to userspace_mem_region_find.
      
      Tested: Compiled tools/testing/selftests/kvm with -static
      	Ran all resulting test binaries on an Intel Haswell test machine
      	All tests passed
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      94a980c3
    • Yi Wang's avatar
      kvm: vmx: fix some -Wmissing-prototypes warnings · 8997f657
      Yi Wang authored
      
      
      We get some warnings when building kernel with W=1:
      arch/x86/kvm/vmx/vmx.c:426:5: warning: no previous prototype for ‘kvm_fill_hv_flush_list_func’ [-Wmissing-prototypes]
      arch/x86/kvm/vmx/nested.c:58:6: warning: no previous prototype for ‘init_vmcs_shadow_fields’ [-Wmissing-prototypes]
      
      Make them static to fix this.
      
      Signed-off-by: default avatarYi Wang <wang.yi59@zte.com.cn>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8997f657
    • Vitaly Kuznetsov's avatar
      KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1 · 619ad846
      Vitaly Kuznetsov authored
      kvm-unit-tests' eventinj "NMI failing on IDT" test results in NMI being
      delivered to the host (L1) when it's running nested. The problem seems to
      be: svm_complete_interrupts() raises 'nmi_injected' flag but later we
      decide to reflect EXIT_NPF to L1. The flag remains pending and we do NMI
      injection upon entry so it got delivered to L1 instead of L2.
      
      It seems that VMX code solves the same issue in prepare_vmcs12(), this was
      introduced with code refactoring in commit 5f3d5799
      
       ("KVM: nVMX: Rework
      event injection and recovery").
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      619ad846
    • Suravee Suthikulpanit's avatar
      svm: Fix AVIC incomplete IPI emulation · bb218fbc
      Suravee Suthikulpanit authored
      
      
      In case of incomplete IPI with invalid interrupt type, the current
      SVM driver does not properly emulate the IPI, and fails to boot
      FreeBSD guests with multiple vcpus when enabling AVIC.
      
      Fix this by update APIC ICR high/low registers, which also
      emulate sending the IPI.
      
      Signed-off-by: default avatarSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bb218fbc
    • Suravee Suthikulpanit's avatar
      svm: Add warning message for AVIC IPI invalid target · 37ef0c44
      Suravee Suthikulpanit authored
      
      
      Print warning message when IPI target ID is invalid due to one of
      the following reasons:
        * In logical mode: cluster > max_cluster (64)
        * In physical mode: target > max_physical (512)
        * Address is not present in the physical or logical ID tables
      
      Signed-off-by: default avatarSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      37ef0c44
    • Sean Christopherson's avatar
      KVM: x86: WARN_ONCE if sending a PV IPI returns a fatal error · de81c2f9
      Sean Christopherson authored
      KVM hypercalls return a negative value error code in case of a fatal
      error, e.g. when the hypercall isn't supported or was made with invalid
      parameters.  WARN_ONCE on fatal errors when sending PV IPIs as any such
      error all but guarantees an SMP system will hang due to a missing IPI.
      
      Fixes: aaffcfd1
      
       ("KVM: X86: Implement PV IPIs in linux guest")
      Cc: stable@vger.kernel.org
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      de81c2f9
    • Sean Christopherson's avatar
      KVM: x86: Fix PV IPIs for 32-bit KVM host · 1ed199a4
      Sean Christopherson authored
      The recognition of the KVM_HC_SEND_IPI hypercall was unintentionally
      wrapped in "#ifdef CONFIG_X86_64", causing 32-bit KVM hosts to reject
      any and all PV IPI requests despite advertising the feature.  This
      results in all KVM paravirtualized guests hanging during SMP boot due
      to IPIs never being delivered.
      
      Fixes: 4180bf1b
      
       ("KVM: X86: Implement "send IPI" hypercall")
      Cc: stable@vger.kernel.org
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1ed199a4
    • Vitaly Kuznetsov's avatar
      x86/kvm/hyper-v: recommend using eVMCS only when it is enabled · f1adceaf
      Vitaly Kuznetsov authored
      We shouldn't probably be suggesting using Enlightened VMCS when it's not
      enabled (not supported from guest's point of view). Hyper-V on KVM seems
      to be fine either way but let's be consistent.
      
      Fixes: 2bc39970
      
       ("x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID")
      Reviewed-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f1adceaf