Skip to content
  1. Sep 25, 2020
  2. Sep 23, 2020
  3. Sep 19, 2020
    • Jason Gunthorpe's avatar
      RDMA/ucma: Rework ucma_migrate_id() to avoid races with destroy · f5449e74
      Jason Gunthorpe authored
      ucma_destroy_id() assumes that all things accessing the ctx will do so via
      the xarray. This assumption violated only in the case the FD is being
      closed, then the ctx is reached via the ctx_list. Normally this is OK
      since ucma_destroy_id() cannot run concurrenty with release(), however
      with ucma_migrate_id() is involved this can violated as the close of the
      2nd FD can run concurrently with destroy on the first:
      
                      CPU0                      CPU1
              ucma_destroy_id(fda)
                                        ucma_migrate_id(fda -> fdb)
                                             ucma_get_ctx()
              xa_lock()
               _ucma_find_context()
               xa_erase()
              xa_unlock()
                                             xa_lock()
                                              ctx->file = new_file
                                              list_move()
                                             xa_unlock()
                                            ucma_put_ctx()
      
                                         ucma_close(fdb)
                                            _destroy_id()
                                            kfree(ctx)
      
              _destroy_id()
                wait_for_completion()
                // boom, ctx was freed
      
      The ctx->file must be modified under the handler and xa_lock, and prior to
      modification the ID must be rechecked that it is still reachable from
      cur_file, ie there is no parallel destroy or migrate.
      
      To make this work remove the double locking and streamline the control
      flow. The double locking was obsoleted by the handler lock now directly
      preventing new uevents from being created, and the ctx_list cannot be read
      while holding fgets on both files. Removing the double locking also
      removes the need to check for the same file.
      
      Fixes: 88314e4d
      
       ("RDMA/cma: add support for rdma_migrate_id()")
      Link: https://lore.kernel.org/r/0-v1-05c5a4090305+3a872-ucma_syz_migrate_jgg@nvidia.com
      Reported-and-tested-by: default avatar <syzbot+cc6fc752b3819e082d0c@syzkaller.appspotmail.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      f5449e74
    • Jason Gunthorpe's avatar
      RDMA/mlx5: Clarify what the UMR is for when creating MRs · 8383da3e
      Jason Gunthorpe authored
      
      
      Once a mkey is created it can be modified using UMR. This is desirable for
      performance reasons. However, different hardware has restrictions on what
      modifications are possible using UMR. Make sense of these checks:
      
      - mlx5_ib_can_reconfig_with_umr() returns true if the access flags can be
        altered. Most cases create MRs using 0 access flags (now made clear by
        consistent use of set_mkc_access_pd_addr_fields()), but the old logic
        here was tormented. Make it clear that this is checking if the current
        access_flags can be modified using UMR to different access_flags. It is
        always OK to use UMR to change flags that all HW supports.
      
      - mlx5_ib_can_load_pas_with_umr() returns true if UMR can be used to
        enable and update the PAS/XLT. Enabling requires updating the entity
        size, so UMR ends up completely disabled on this old hardware. Make it
        clear why it is disabled. FRWR, ODP and cache always requires
        mlx5_ib_can_load_pas_with_umr().
      
      - mlx5_ib_pas_fits_in_mr() is used to tell if an existing MR can be
        resized to hold a new PAS list. This only works for cached MR's because
        we don't store the PAS list size in other cases.
      
      To be very clear, arrange things so any pre-created MR's in the cache
      check the newly requested access_flags before allowing the MR to leave the
      cache. If UMR cannot set the required access_flags the cache fails to
      create the MR.
      
      This in turn means relaxed ordering and atomic are now correctly blocked
      early for implicit ODP on older HW.
      
      Link: https://lore.kernel.org/r/20200914112653.345244-6-leon@kernel.org
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      8383da3e
    • Jason Gunthorpe's avatar
      RDMA/mlx5: Disable IB_DEVICE_MEM_MGT_EXTENSIONS if IB_WR_REG_MR can't work · 0ec52f01
      Jason Gunthorpe authored
      set_reg_wr() always fails if !umr_modify_entity_size_disabled because
      mlx5_ib_can_use_umr() always fails. Without set_reg_wr() IB_WR_REG_MR
      doesn't work and that means the device should not advertise
      IB_DEVICE_MEM_MGT_EXTENSIONS.
      
      Fixes: 841b07f9
      
       ("IB/mlx5: Block MR WR if UMR is not possible")
      Link: https://lore.kernel.org/r/20200914112653.345244-5-leon@kernel.org
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      0ec52f01
    • Jason Gunthorpe's avatar
      RDMA/mlx5: Make mkeys always owned by the kernel's PD when not enabled · 5eb29f0d
      Jason Gunthorpe authored
      Any mkey that is not enabled and assigned to userspace should have the PD
      set to a kernel owned PD.
      
      When cache entries are created for the first time the PDN is set to 0,
      which is probably a kernel PD, but be explicit.
      
      When a MR is registered using the hybrid reg_create with UMR xlt & enable
      the disabled mkey is pointing at the user PD, keep it pointing at the
      kernel until a UMR enables it and sets the user PD.
      
      Fixes: 9ec4483a
      
       ("IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cache")
      Link: https://lore.kernel.org/r/20200914112653.345244-4-leon@kernel.org
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      5eb29f0d
    • Jason Gunthorpe's avatar
      RDMA/mlx5: Use set_mkc_access_pd_addr_fields() in reg_create() · 1c97ca3d
      Jason Gunthorpe authored
      
      
      reg_create() open codes this helper, use the shared code.
      
      Link: https://lore.kernel.org/r/20200914112653.345244-3-leon@kernel.org
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      1c97ca3d
    • Jason Gunthorpe's avatar
      RDMA/mlx5: Remove dead check for EAGAIN after alloc_mr_from_cache() · 2e4e706e
      Jason Gunthorpe authored
      alloc_mr_from_cache() no longer returns EAGAIN, this is just dead code
      now.
      
      Fixes: aad719dc
      
       ("RDMA/mlx5: Allow MRs to be created in the cache synchronously")
      Link: https://lore.kernel.org/r/20200914112653.345244-2-leon@kernel.org
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      2e4e706e
  4. Sep 18, 2020
  5. Sep 17, 2020
    • Jason Gunthorpe's avatar
      RDMA/cma: Fix use after free race in roce multicast join · b5de0c60
      Jason Gunthorpe authored
      The roce path triggers a work queue that continues to touch the id_priv
      but doesn't hold any reference on it. Futher, unlike in the IB case, the
      work queue is not fenced during rdma_destroy_id().
      
      This can trigger a use after free if a destroy is triggered in the
      incredibly narrow window after the queue_work and the work starting and
      obtaining the handler_mutex.
      
      The only purpose of this work queue is to run the ULP event callback from
      the standard context, so switch the design to use the existing
      cma_work_handler() scheme. This simplifies quite a lot of the flow:
      
      - Use the cma_work_handler() callback to launch the work for roce. This
        requires generating the event synchronously inside the
        rdma_join_multicast(), which in turn means the dummy struct
        ib_sa_multicast can become a simple stack variable.
      
      - cm_work_handler() used the id_priv kref, so we can entirely eliminate
        the kref inside struct cma_multicast. Since the cma_multicast never
        leaks into an unprotected work queue the kfree can be done at the same
        time as for IB.
      
      - Eliminating the general multicast.ib requires using cma_set_mgid() in a
        few places to recompute the mgid.
      
      Fixes: 3c86aa70
      
       ("RDMA/cm: Add RDMA CM support for IBoE devices")
      Link: https://lore.kernel.org/r/20200902081122.745412-9-leon@kernel.org
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      b5de0c60
    • Jason Gunthorpe's avatar
      RDMA/cma: Consolidate the destruction of a cma_multicast in one place · 3788d299
      Jason Gunthorpe authored
      
      
      Two places were open coding this sequence, and also pull in
      cma_leave_roce_mc_group() which was called only once.
      
      Link: https://lore.kernel.org/r/20200902081122.745412-8-leon@kernel.org
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      3788d299