Skip to content
  1. Aug 27, 2020
  2. Aug 25, 2020
  3. Aug 24, 2020
    • Håkon Bugge's avatar
      IB/mlx4: Adjust delayed work when a dup is observed · 785167a1
      Håkon Bugge authored
      When scheduling delayed work to clean up the cache, if the entry already
      has been scheduled for deletion, we adjust the delay.
      
      Fixes: 3cf69cc8 ("IB/mlx4: Add CM paravirtualization")
      Link: https://lore.kernel.org/r/20200803061941.1139994-7-haakon.bugge@oracle.com
      
      
      Signed-off-by: default avatarHåkon Bugge <haakon.bugge@oracle.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      785167a1
    • Håkon Bugge's avatar
      IB/mlx4: Add support for REJ due to timeout · 227a0e14
      Håkon Bugge authored
      A CM REJ packet with its reason equal to timeout is a special beast in the
      sense that it doesn't have a Remote Communication ID nor does it have a
      Remote Port GID.
      
      Using CX-3 virtual functions, either from a bare-metal machine or
      pass-through from a VM, MAD packets are proxied through the PF driver.
      
      Since the VF drivers have separate name spaces for MAD Transaction Ids
      (TIDs), the PF driver has to re-map the TIDs and keep the book keeping
      in a cache.
      
      This proxying doesn't not handle said REJ packets.
      
      If the active side abandons its connection attempt after having sent a
      REQ, it will send a REJ with the reason being timeout. This example can be
      provoked by a simple user-verbs program, which ends up doing:
      
          rdma_connect(cm_id, &conn_param);
          rdma_destroy_id(cm_id);
      
      using the async librdmacm API.
      
      Having dynamic debug prints enabled in the mlx4_ib driver, we will then
      see:
      
      mlx4_ib_demux_cm_handler: Couldn't find an entry for pv_cm_id 0x0, attr_id 0x12
      
      The solution is to introduce a radix-tree. When a REQ packet is received
      and handled in mlx4_ib_demux_cm_handler(), we know the connecting peer's
      para-virtual cm_id and the destination slave. We then insert an entry into
      the tree with said information. We also schedule work to remove this entry
      from the tree and free it, in order to avoid memory leak.
      
      When a REJ packet with reason timeout is received, we can look up the
      slave in the tree, and deliver the packet to the correct slave.
      
      When a duplicate REQ packet is received, the entry is in the tree. In this
      case, we adjust the delayed work in order to avoid a too premature
      eviction of the entry.
      
      When cleaning up, we simply traverse the tree and modify any delayed work
      to use a zero delay. A subsequent flush of the system_wq will ensure all
      entries being wiped out.
      
      Fixes: 3cf69cc8 ("IB/mlx4: Add CM paravirtualization")
      Link: https://lore.kernel.org/r/20200803061941.1139994-6-haakon.bugge@oracle.com
      
      
      Signed-off-by: default avatarHåkon Bugge <haakon.bugge@oracle.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      227a0e14
    • Håkon Bugge's avatar
      IB/mlx4: Fix starvation in paravirt mux/demux · 7fd1507d
      Håkon Bugge authored
      The mlx4 driver will proxy MAD packets through the PF driver. A VM or an
      instantiated VF will send its MAD packets to the PF driver using
      loop-back. The PF driver will be informed by an interrupt, but defer the
      handling and polling of CQEs to a worker thread running on an ordered
      work-queue.
      
      Consider the following scenario: the VMs will in short proximity in time,
      for example due to a network event, send many MAD packets to the PF
      driver. Lets say there are K VMs, each sending N packets.
      
      The interrupt from the first VM will start the worker thread, which will
      poll N CQEs. A common case here is where the PF driver will multiplex the
      packets received from the VMs out on the wire QP.
      
      But before the wire QP has returned a send CQE and associated interrupt,
      the other K - 1 VMs have sent their N packets as well.
      
      The PF driver has to multiplex K * N packets out on the wire QP. But the
      send-queue on the wire QP has a finite capacity.
      
      So, in this scenario, if K * N is larger than the send-queue capacity of
      the wire QP, we will get MAD packets dropped on the floor with this
      dynamic debug message:
      
      mlx4_ib_multiplex_mad: failed sending GSI to wire on behalf of slave 2 (-11)
      
      and this despite the fact that the wire send-queue could have capacity,
      but the PF driver isn't aware, because the wire send CQEs have not yet
      been polled.
      
      We can also have a similar scenario inbound, with a wire recv-queue larger
      than the tunnel QP's send-queue. If many remote peers send MAD packets to
      the very same VM, the tunnel send-queue destined to the VM could allegedly
      be construed to be full by the PF driver.
      
      This starvation is fixed by introducing separate work queues for the wire
      QPs vs. the tunnel QPs.
      
      With this fix, using a dual ported HCA, 8 VFs instantiated, we could run
      cmtime on each of the 18 interfaces towards a similar configured peer,
      each cmtime instance with 800 QPs (all in all 14400 QPs) without a single
      CM packet getting lost.
      
      Fixes: 3cf69cc8 ("IB/mlx4: Add CM paravirtualization")
      Link: https://lore.kernel.org/r/20200803061941.1139994-5-haakon.bugge@oracle.com
      
      
      Signed-off-by: default avatarHåkon Bugge <haakon.bugge@oracle.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      7fd1507d
    • Håkon Bugge's avatar
      IB/mlx4: Separate tunnel and wire bufs parameters · 0ae207fb
      Håkon Bugge authored
      Using CX-3 in virtualized mode, MAD packets are proxied through the PF
      driver. The feed is N tunnel QPs, and what is received from the VFs is
      multiplexed out on the wire QP. Since this is a many-to-one scenario, it
      is better to have separate initialization parameters for the two usages.
      
      The number of wire and tunnel bufs are yanked up to 2K and 512
      respectively. With this set of parameters, a system consisting of eight
      physical servers, each with eight VMs and 14 I/O servers (BM), can run
      switch fail-over without seeing:
      
      mlx4_ib_demux_mad: failed sending GSI to slave 3 via tunnel qp (-11)
      
      or
      
      mlx4_ib_multiplex_mad: failed sending GSI to wire on behalf of slave 2 (-11)
      
      Fixes: 3cf69cc8 ("IB/mlx4: Add CM paravirtualization")
      Link: https://lore.kernel.org/r/20200803061941.1139994-4-haakon.bugge@oracle.com
      
      
      Signed-off-by: default avatarHåkon Bugge <haakon.bugge@oracle.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      0ae207fb
    • Håkon Bugge's avatar
      IB/mlx4: Add support for MRA · e7d087fc
      Håkon Bugge authored
      Using CX-3 in virtualized mode, MAD packets are proxied through the PF
      driver. However, the handling lacks support of the MRA (Message Receipt
      Acknowledgment) packet. When having dynamic debug enabled, we see tons of:
      
      mlx4_ib_multiplex_cm_handler: id{slave: 7, sl_cm_id: 0x8fcb45a0} is NULL! attr_id: 0x11
      
      Fixes: 3cf69cc8 ("IB/mlx4: Add CM paravirtualization")
      Link: https://lore.kernel.org/r/20200803061941.1139994-3-haakon.bugge@oracle.com
      
      
      Signed-off-by: default avatarHåkon Bugge <haakon.bugge@oracle.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      e7d087fc
    • Håkon Bugge's avatar
      IB/mlx4: Add and improve logging · 09461944
      Håkon Bugge authored
      Add missing check for success after call to mlx4_ib_send_to_wire() in
      mlx4_ib_multiplex_mad().
      
      Amended the existing pr_debug() in mlx4_ib_multiplex_cm_handler() and
      mlx4_ib_demux_cm_handler() with attr_id during a lookup failure.
      
      Removed two noisy pr_debug() in mad.c
      
      Fixes: 3cf69cc8 ("IB/mlx4: Add CM paravirtualization")
      Link: https://lore.kernel.org/r/20200803061941.1139994-2-haakon.bugge@oracle.com
      
      
      Signed-off-by: default avatarHåkon Bugge <haakon.bugge@oracle.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      09461944
  4. Aug 19, 2020
  5. Aug 17, 2020
    • Linus Torvalds's avatar
      Linux 5.9-rc1 · 9123e3a7
      Linus Torvalds authored
      9123e3a7
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block · 2cc3c4b3
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "A few differerent things in here.
      
        Seems like syzbot got some more io_uring bits wired up, and we got a
        handful of reports and the associated fixes are in here.
      
        General fixes too, and a lot of them marked for stable.
      
        Lastly, a bit of fallout from the async buffered reads, where we now
        more easily trigger short reads. Some applications don't really like
        that, so the io_read() code now handles short reads internally, and
        got a cleanup along the way so that it's now easier to read (and
        documented). We're now passing tests that failed before"
      
      * tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block:
        io_uring: short circuit -EAGAIN for blocking read attempt
        io_uring: sanitize double poll handling
        io_uring: internally retry short reads
        io_uring: retain iov_iter state over io_read/io_write calls
        task_work: only grab task signal lock when needed
        io_uring: enable lookup of links holding inflight files
        io_uring: fail poll arm on queue proc failure
        io_uring: hold 'ctx' reference around task_work queue + execute
        fs: RWF_NOWAIT should imply IOCB_NOIO
        io_uring: defer file table grabbing request cleanup for locked requests
        io_uring: add missing REQ_F_COMP_LOCKED for nested requests
        io_uring: fix recursive completion locking on oveflow flush
        io_uring: use TWA_SIGNAL for task_work uncondtionally
        io_uring: account locked memory before potential error case
        io_uring: set ctx sq/cq entry count earlier
        io_uring: Fix NULL pointer dereference in loop_rw_iter()
        io_uring: add comments on how the async buffered read retry works
        io_uring: io_async_buf_func() need not test page bit
      2cc3c4b3
    • Mike Rapoport's avatar
      parisc: fix PMD pages allocation by restoring pmd_alloc_one() · 6f6aea7e
      Mike Rapoport authored
      Commit 1355c31e ("asm-generic: pgalloc: provide generic pmd_alloc_one()
      and pmd_free_one()") converted parisc to use generic version of
      pmd_alloc_one() but it missed the fact that parisc uses order-1 pages for
      PMD.
      
      Restore the original version of pmd_alloc_one() for parisc, just use
      GFP_PGTABLE_KERNEL that implies __GFP_ZERO instead of GFP_KERNEL and
      memset.
      
      Fixes: 1355c31e
      
       ("asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one()")
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Tested-by: default avatarMeelis Roos <mroos@linux.ee>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Link: https://lkml.kernel.org/r/9f2b5ebd-e4a4-0fa1-6cd3-4b9f6892d1ad@linux.ee
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6f6aea7e
  6. Aug 16, 2020
    • Linus Torvalds's avatar
      Merge tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block · 4b6c093e
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A few fixes on the block side of things:
      
         - Discard granularity fix (Coly)
      
         - rnbd cleanups (Guoqing)
      
         - md error handling fix (Dan)
      
         - md sysfs fix (Junxiao)
      
         - Fix flush request accounting, which caused an IO slowdown for some
           configurations (Ming)
      
         - Properly propagate loop flag for partition scanning (Lennart)"
      
      * tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block:
        block: fix double account of flush request's driver tag
        loop: unset GENHD_FL_NO_PART_SCAN on LOOP_CONFIGURE
        rnbd: no need to set bi_end_io in rnbd_bio_map_kern
        rnbd: remove rnbd_dev_submit_io
        md-cluster: Fix potential error pointer dereference in resize_bitmaps()
        block: check queue's limits.discard_granularity in __blkdev_issue_discard()
        md: get sysfs entry after redundancy attr group create
      4b6c093e
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.9-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · d84835b1
      Linus Torvalds authored
      Pull RISC-V fix from Palmer Dabbelt:
       "I collected a single fix during the merge window: we managed to break
        the early trap setup on !MMU, this fixes it"
      
      * tag 'riscv-for-linus-5.9-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: Setup exception vector for nommu platform
      d84835b1
    • Linus Torvalds's avatar
      Merge tag 'sh-for-5.9' of git://git.libc.org/linux-sh · 5bbec3cf
      Linus Torvalds authored
      Pull arch/sh updates from Rich Felker:
       "Cleanup, SECCOMP_FILTER support, message printing fixes, and other
        changes to arch/sh"
      
      * tag 'sh-for-5.9' of git://git.libc.org/linux-sh: (34 commits)
        sh: landisk: Add missing initialization of sh_io_port_base
        sh: bring syscall_set_return_value in line with other architectures
        sh: Add SECCOMP_FILTER
        sh: Rearrange blocks in entry-common.S
        sh: switch to copy_thread_tls()
        sh: use the generic dma coherent remap allocator
        sh: don't allow non-coherent DMA for NOMMU
        dma-mapping: consolidate the NO_DMA definition in kernel/dma/Kconfig
        sh: unexport register_trapped_io and match_trapped_io_handler
        sh: don't include <asm/io_trapped.h> in <asm/io.h>
        sh: move the ioremap implementation out of line
        sh: move ioremap_fixed details out of <asm/io.h>
        sh: remove __KERNEL__ ifdefs from non-UAPI headers
        sh: sort the selects for SUPERH alphabetically
        sh: remove -Werror from Makefiles
        sh: Replace HTTP links with HTTPS ones
        arch/sh/configs: remove obsolete CONFIG_SOC_CAMERA*
        sh: stacktrace: Remove stacktrace_ops.stack()
        sh: machvec: Modernize printing of kernel messages
        sh: pci: Modernize printing of kernel messages
        ...
      5bbec3cf
    • Jens Axboe's avatar
      io_uring: short circuit -EAGAIN for blocking read attempt · f91daf56
      Jens Axboe authored
      One case was missed in the short IO retry handling, and that's hitting
      -EAGAIN on a blocking attempt read (eg from io-wq context). This is a
      problem on sockets that are marked as non-blocking when created, they
      don't carry any REQ_F_NOWAIT information to help us terminate them
      instead of perpetually retrying.
      
      Fixes: 227c0c96
      
       ("io_uring: internally retry short reads")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f91daf56
    • Jens Axboe's avatar
      io_uring: sanitize double poll handling · d4e7cd36
      Jens Axboe authored
      
      
      There's a bit of confusion on the matching pairs of poll vs double poll,
      depending on if the request is a pure poll (IORING_OP_POLL_ADD) or
      poll driven retry.
      
      Add io_poll_get_double() that returns the double poll waitqueue, if any,
      and io_poll_get_single() that returns the original poll waitqueue. With
      that, remove the argument to io_poll_remove_double().
      
      Finally ensure that wait->private is cleared once the double poll handler
      has run, so that remove knows it's already been seen.
      
      Cc: stable@vger.kernel.org # v5.8
      Reported-by: default avatar <syzbot+7f617d4a9369028b8a2c@syzkaller.appspotmail.com>
      Fixes: 18bceab1
      
       ("io_uring: allow POLL_ADD with double poll_wait() users")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d4e7cd36