Skip to content
  1. Oct 20, 2020
    • Juergen Gross's avatar
      xen/netback: use lateeoi irq binding · 23025393
      Juergen Gross authored
      
      
      In order to reduce the chance for the system becoming unresponsive due
      to event storms triggered by a misbehaving netfront use the lateeoi
      irq binding for netback and unmask the event channel only just before
      going to sleep waiting for new events.
      
      Make sure not to issue an EOI when none is pending by introducing an
      eoi_pending element to struct xenvif_queue.
      
      When no request has been consumed set the spurious flag when sending
      the EOI for an interrupt.
      
      This is part of XSA-332.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarJulien Grall <julien@xen.org>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Reviewed-by: default avatarWei Liu <wl@xen.org>
      23025393
    • Juergen Gross's avatar
      xen/blkback: use lateeoi irq binding · 01263a1f
      Juergen Gross authored
      
      
      In order to reduce the chance for the system becoming unresponsive due
      to event storms triggered by a misbehaving blkfront use the lateeoi
      irq binding for blkback and unmask the event channel only after
      processing all pending requests.
      
      As the thread processing requests is used to do purging work in regular
      intervals an EOI may be sent only after having received an event. If
      there was no pending I/O request flag the EOI as spurious.
      
      This is part of XSA-332.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarJulien Grall <julien@xen.org>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Reviewed-by: default avatarWei Liu <wl@xen.org>
      01263a1f
    • Juergen Gross's avatar
      xen/events: add a new "late EOI" evtchn framework · 54c9de89
      Juergen Gross authored
      
      
      In order to avoid tight event channel related IRQ loops add a new
      framework of "late EOI" handling: the IRQ the event channel is bound
      to will be masked until the event has been handled and the related
      driver is capable to handle another event. The driver is responsible
      for unmasking the event channel via the new function xen_irq_lateeoi().
      
      This is similar to binding an event channel to a threaded IRQ, but
      without having to structure the driver accordingly.
      
      In order to support a future special handling in case a rogue guest
      is sending lots of unsolicited events, add a flag to xen_irq_lateeoi()
      which can be set by the caller to indicate the event was a spurious
      one.
      
      This is part of XSA-332.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarJulien Grall <julien@xen.org>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Reviewed-by: default avatarStefano Stabellini <sstabellini@kernel.org>
      Reviewed-by: default avatarWei Liu <wl@xen.org>
      54c9de89
    • Juergen Gross's avatar
      xen/events: fix race in evtchn_fifo_unmask() · f0133719
      Juergen Gross authored
      
      
      Unmasking a fifo event channel can result in unmasking it twice, once
      directly in the kernel and once via a hypercall in case the event was
      pending.
      
      Fix that by doing the local unmask only if the event is not pending.
      
      This is part of XSA-332.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      f0133719
    • Juergen Gross's avatar
      xen/events: add a proper barrier to 2-level uevent unmasking · 4d3fe31b
      Juergen Gross authored
      
      
      A follow-up patch will require certain write to happen before an event
      channel is unmasked.
      
      While the memory barrier is not strictly necessary for all the callers,
      the main one will need it. In order to avoid an extra memory barrier
      when using fifo event channels, mandate evtchn_unmask() to provide
      write ordering.
      
      The 2-level event handling unmask operation is missing an appropriate
      barrier, so add it. Fifo event channels are fine in this regard due to
      using sync_cmpxchg().
      
      This is part of XSA-332.
      
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarJulien Grall <julien@xen.org>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarJulien Grall <jgrall@amazon.com>
      Reviewed-by: default avatarWei Liu <wl@xen.org>
      4d3fe31b
    • Juergen Gross's avatar
      xen/events: avoid removing an event channel while handling it · 073d0552
      Juergen Gross authored
      
      
      Today it can happen that an event channel is being removed from the
      system while the event handling loop is active. This can lead to a
      race resulting in crashes or WARN() splats when trying to access the
      irq_info structure related to the event channel.
      
      Fix this problem by using a rwlock taken as reader in the event
      handling loop and as writer when deallocating the irq_info structure.
      
      As the observed problem was a NULL dereference in evtchn_from_irq()
      make this function more robust against races by testing the irq_info
      pointer to be not NULL before dereferencing it.
      
      And finally make all accesses to evtchn_to_irq[row][col] atomic ones
      in order to avoid seeing partial updates of an array element in irq
      handling. Note that irq handling can be entered only for event channels
      which have been valid before, so any not populated row isn't a problem
      in this regard, as rows are only ever added and never removed.
      
      This is XSA-331.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Reported-by: default avatarJinoh Kang <luke1337@theori.io>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarStefano Stabellini <sstabellini@kernel.org>
      Reviewed-by: default avatarWei Liu <wl@xen.org>
      073d0552
  2. Oct 05, 2020
  3. Oct 04, 2020
  4. Oct 03, 2020
  5. Oct 02, 2020
    • Linus Torvalds's avatar
      pipe: remove pipe_wait() and fix wakeup race with splice · 472e5b05
      Linus Torvalds authored
      The pipe splice code still used the old model of waiting for pipe IO by
      using a non-specific "pipe_wait()" that waited for any pipe event to
      happen, which depended on all pipe IO being entirely serialized by the
      pipe lock.  So by checking the state you were waiting for, and then
      adding yourself to the wait queue before dropping the lock, you were
      guaranteed to see all the wakeups.
      
      Strictly speaking, the actual wakeups were not done under the lock, but
      the pipe_wait() model still worked, because since the waiter held the
      lock when checking whether it should sleep, it would always see the
      current state, and the wakeup was always done after updating the state.
      
      However, commit 0ddad21d ("pipe: use exclusive waits when reading or
      writing") split the single wait-queue into two, and in the process also
      made the "wait for event" code wait for _two_ wait queues, and that then
      showed a race with the wakers that were not serialized by the pipe lock.
      
      It's only splice that used that "pipe_wait()" model, so the problem
      wasn't obvious, but Josef Bacik reports:
      
       "I hit a hang with fstest btrfs/187, which does a btrfs send into
        /dev/null. This works by creating a pipe, the write side is given to
        the kernel to write into, and the read side is handed to a thread that
        splices into a file, in this case /dev/null.
      
        The box that was hung had the write side stuck here [pipe_write] and
        the read side stuck here [splice_from_pipe_next -> pipe_wait].
      
        [ more details about pipe_wait() scenario ]
      
        The problem is we're doing the prepare_to_wait, which sets our state
        each time, however we can be woken up either with reads or writes. In
        the case above we race with the WRITER waking us up, and re-set our
        state to INTERRUPTIBLE, and thus never break out of schedule"
      
      Josef had a patch that avoided the issue in pipe_wait() by just making
      it set the state only once, but the deeper problem is that pipe_wait()
      depends on a level of synchonization by the pipe mutex that it really
      shouldn't.  And the whole "wait for any pipe state change" model really
      isn't very good to begin with.
      
      So rather than trying to work around things in pipe_wait(), remove that
      legacy model of "wait for arbitrary pipe event" entirely, and actually
      create functions that wait for the pipe actually being readable or
      writable, and can do so without depending on the pipe lock serializing
      everything.
      
      Fixes: 0ddad21d ("pipe: use exclusive waits when reading or writing")
      Link: https://lore.kernel.org/linux-fsdevel/bfa88b5ad6f069b2b679316b9e495a970130416c.1601567868.git.josef@toxicpanda.com/
      
      
      Reported-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-and-tested-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      472e5b05
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 44b6e23b
      Linus Torvalds authored
      Pull iommu fixes from Joerg Roedel:
      
       - Fix a device reference counting bug in the Exynos IOMMU driver.
      
       - Lockdep fix for the Intel VT-d driver.
      
       - Fix a bug in the AMD IOMMU driver which caused corruption of the IVRS
         ACPI table and caused IOMMU driver initialization failures in kdump
         kernels.
      
      * tag 'iommu-fixes-v5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/vt-d: Fix lockdep splat in iommu_flush_dev_iotlb()
        iommu/amd: Fix the overwritten field in IVMD header
        iommu/exynos: add missing put_device() call in exynos_iommu_of_xlate()
      44b6e23b