Skip to content
  1. Apr 16, 2023
  2. Apr 13, 2023
  3. Apr 09, 2023
    • Patrick Kelsey's avatar
      IB/hfi1: Place struct mmu_rb_handler on cache line start · 866694af
      Patrick Kelsey authored
      
      
      Place struct mmu_rb_handler on cache line start like so:
      
      	struct mmu_rb_handler *h;
      	void *free_ptr;
      	int ret;
      
      	free_ptr = kzalloc(sizeof(*h) + cache_line_size() - 1, GFP_KERNEL);
      	if (!free_ptr)
      		return -ENOMEM;
      
      	h = PTR_ALIGN(free_ptr, cache_line_size());
      
      Additionally, move struct mmu_rb_handler fields "root" and "ops_args" to
      start after the next cacheline using the "____cacheline_aligned_in_smp"
      annotation.
      
      Allocating an additional cache_line_size() - 1 bytes to place
      struct mmu_rb_handler on a cache line start does increase memory
      consumption.
      
      However, few struct mmu_rb_handler are created when hfi1 is in use.
      As mmu_rb_handler->root and mmu_rb_handler->ops_args are accessed
      frequently, the advantage of having them both within a cache line is
      expected to outweigh the disadvantage of the additional memory
      consumption per struct mmu_rb_handler.
      
      Signed-off-by: default avatarBrendan Cunningham <bcunningham@cornelisnetworks.com>
      Signed-off-by: default avatarPatrick Kelsey <pat.kelsey@cornelisnetworks.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
      Link: https://lore.kernel.org/r/168088636963.3027109.16959757980497822530.stgit@252.162.96.66.static.eigbox.net
      
      
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      866694af
    • Patrick Kelsey's avatar
      IB/hfi1: Fix bugs with non-PAGE_SIZE-end multi-iovec user SDMA requests · 00cbce5c
      Patrick Kelsey authored
      
      
      hfi1 user SDMA request processing has two bugs that can cause data
      corruption for user SDMA requests that have multiple payload iovecs
      where an iovec other than the tail iovec does not run up to the page
      boundary for the buffer pointed to by that iovec.a
      
      Here are the specific bugs:
      1. user_sdma_txadd() does not use struct user_sdma_iovec->iov.iov_len.
         Rather, user_sdma_txadd() will add up to PAGE_SIZE bytes from iovec
         to the packet, even if some of those bytes are past
         iovec->iov.iov_len and are thus not intended to be in the packet.
      2. user_sdma_txadd() and user_sdma_send_pkts() fail to advance to the
         next iovec in user_sdma_request->iovs when the current iovec
         is not PAGE_SIZE and does not contain enough data to complete the
         packet. The transmitted packet will contain the wrong data from the
         iovec pages.
      
      This has not been an issue with SDMA packets from hfi1 Verbs or PSM2
      because they only produce iovecs that end short of PAGE_SIZE as the tail
      iovec of an SDMA request.
      
      Fixing these bugs exposes other bugs with the SDMA pin cache
      (struct mmu_rb_handler) that get in way of supporting user SDMA requests
      with multiple payload iovecs whose buffers do not end at PAGE_SIZE. So
      this commit fixes those issues as well.
      
      Here are the mmu_rb_handler bugs that non-PAGE_SIZE-end multi-iovec
      payload user SDMA requests can hit:
      1. Overlapping memory ranges in mmu_rb_handler will result in duplicate
         pinnings.
      2. When extending an existing mmu_rb_handler entry (struct mmu_rb_node),
         the mmu_rb code (1) removes the existing entry under a lock, (2)
         releases that lock, pins the new pages, (3) then reacquires the lock
         to insert the extended mmu_rb_node.
      
         If someone else comes in and inserts an overlapping entry between (2)
         and (3), insert in (3) will fail.
      
         The failure path code in this case unpins _all_ pages in either the
         original mmu_rb_node or the new mmu_rb_node that was inserted between
         (2) and (3).
      3. In hfi1_mmu_rb_remove_unless_exact(), mmu_rb_node->refcount is
         incremented outside of mmu_rb_handler->lock. As a result, mmu_rb_node
         could be evicted by another thread that gets mmu_rb_handler->lock and
         checks mmu_rb_node->refcount before mmu_rb_node->refcount is
         incremented.
      4. Related to #2 above, SDMA request submission failure path does not
         check mmu_rb_node->refcount before freeing mmu_rb_node object.
      
         If there are other SDMA requests in progress whose iovecs have
         pointers to the now-freed mmu_rb_node(s), those pointers to the
         now-freed mmu_rb nodes will be dereferenced when those SDMA requests
         complete.
      
      Fixes: 7be85676 ("IB/hfi1: Don't remove RB entry when not needed.")
      Fixes: 77241056 ("IB/hfi1: add driver files")
      Signed-off-by: default avatarBrendan Cunningham <bcunningham@cornelisnetworks.com>
      Signed-off-by: default avatarPatrick Kelsey <pat.kelsey@cornelisnetworks.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
      Link: https://lore.kernel.org/r/168088636445.3027109.10054635277810177889.stgit@252.162.96.66.static.eigbox.net
      
      
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      00cbce5c
    • Patrick Kelsey's avatar
      IB/hfi1: Fix SDMA mmu_rb_node not being evicted in LRU order · 9fe8fec5
      Patrick Kelsey authored
      
      
      hfi1_mmu_rb_remove_unless_exact() did not move mmu_rb_node objects in
      mmu_rb_handler->lru_list after getting a cache hit on an mmu_rb_node.
      
      As a result, hfi1_mmu_rb_evict() was not guaranteed to evict truly
      least-recently used nodes.
      
      This could be a performance issue for an application when that
      application:
      - Uses some long-lived buffers frequently.
      - Uses a large number of buffers once.
      - Hits the mmu_rb_handler cache size or pinned-page limits, forcing
        mmu_rb_handler cache entries to be evicted.
      
      In this case, the one-time use buffers cause the long-lived buffer
      entries to eventually filter to the end of the LRU list where
      hfi1_mmu_rb_evict() will consider evicting a frequently-used long-lived
      entry instead of evicting one of the one-time use entries.
      
      Fix this by inserting new mmu_rb_node at the tail of
      mmu_rb_handler->lru_list and move mmu_rb_ndoe to the tail of
      mmu_rb_handler->lru_list when the mmu_rb_node is a hit in
      hfi1_mmu_rb_remove_unless_exact(). Change hfi1_mmu_rb_evict() to evict
      from the head of mmu_rb_handler->lru_list instead of the tail.
      
      Fixes: 0636e9ab ("IB/hfi1: Add cache evict LRU list")
      Signed-off-by: default avatarBrendan Cunningham <bcunningham@cornelisnetworks.com>
      Signed-off-by: default avatarPatrick Kelsey <pat.kelsey@cornelisnetworks.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
      Link: https://lore.kernel.org/r/168088635931.3027109.10423156330761536044.stgit@252.162.96.66.static.eigbox.net
      
      
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      9fe8fec5
    • Ehab Ababneh's avatar
      IB/hfi1: Suppress useless compiler warnings · cf0455f1
      Ehab Ababneh authored
      
      
      These warnings can cause build failure:
      
      In file included from ./include/trace/define_trace.h:102,
                       from drivers/infiniband/hw/hfi1/trace_dbg.h:111,
                       from drivers/infiniband/hw/hfi1/trace.h:15,
                       from drivers/infiniband/hw/hfi1/trace.c:6:
      drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘trace_event_get_offsets_hfi1_trace_template’:
      ./include/trace/trace_events.h:261:9: warning: function ‘trace_event_get_offsets_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
        struct trace_event_raw_##call __maybe_unused *entry;  \
               ^~~~~~~~~~~~~~~~
      drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’
       DECLARE_EVENT_CLASS(hfi1_trace_template,
       ^~~~~~~~~~~~~~~~~~~
      In file included from ./include/trace/define_trace.h:102,
                       from drivers/infiniband/hw/hfi1/trace_dbg.h:111,
                       from drivers/infiniband/hw/hfi1/trace.h:15,
                       from drivers/infiniband/hw/hfi1/trace.c:6:
      drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘trace_event_raw_event_hfi1_trace_template’:
      ./include/trace/trace_events.h:386:9: warning: function ‘trace_event_raw_event_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
        struct trace_event_raw_##call *entry;    \
               ^~~~~~~~~~~~~~~~
      drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’
       DECLARE_EVENT_CLASS(hfi1_trace_template,
       ^~~~~~~~~~~~~~~~~~~
      In file included from ./include/trace/define_trace.h:103,
                       from drivers/infiniband/hw/hfi1/trace_dbg.h:111,
                       from drivers/infiniband/hw/hfi1/trace.h:15,
                       from drivers/infiniband/hw/hfi1/trace.c:6:
      drivers/infiniband/hw/hfi1/./trace_dbg.h: In function ‘perf_trace_hfi1_trace_template’:
      ./include/trace/perf.h:70:9: warning: function ‘perf_trace_hfi1_trace_template’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
        struct hlist_head *head;     \
               ^~~~~~~~~~
      drivers/infiniband/hw/hfi1/./trace_dbg.h:25:1: note: in expansion of macro ‘DECLARE_EVENT_CLASS’
       DECLARE_EVENT_CLASS(hfi1_trace_template,
       ^~~~~~~~~~~~~~~~~~~
      
      Solution adapted here is similar to the one in fbbc95a4
      
      Signed-off-by: default avatarEhab Ababneh <ehab.ababneh@cornelisnetworks.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
      Link: https://lore.kernel.org/r/168088635415.3027109.5711716700328939402.stgit@252.162.96.66.static.eigbox.net
      
      
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      cf0455f1
    • Dean Luick's avatar
      IB/hfi1: Remove trace newlines · d2590edc
      Dean Luick authored
      
      
      The hfi1_cdbg trace mechanism appends a newline.  Remove trailing
      newlines from all format strings.
      
      Signed-off-by: default avatarDean Luick <dean.luick@cornelisnetworks.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
      Link: https://lore.kernel.org/r/168088634897.3027109.10401662436950683555.stgit@252.162.96.66.static.eigbox.net
      
      
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      d2590edc
    • Saravanan Vajravel's avatar
      RDMA/srpt: Add a check for valid 'mad_agent' pointer · eca5cd94
      Saravanan Vajravel authored
      
      
      When unregistering MAD agent, srpt module has a non-null check
      for 'mad_agent' pointer before invoking ib_unregister_mad_agent().
      This check can pass if 'mad_agent' variable holds an error value.
      The 'mad_agent' can have an error value for a short window when
      srpt_add_one() and srpt_remove_one() is executed simultaneously.
      
      In srpt module, added a valid pointer check for 'sport->mad_agent'
      before unregistering MAD agent.
      
      This issue can hit when RoCE driver unregisters ib_device
      
      Stack Trace:
      ------------
      BUG: kernel NULL pointer dereference, address: 000000000000004d
      PGD 145003067 P4D 145003067 PUD 2324fe067 PMD 0
      Oops: 0002 [#1] PREEMPT SMP NOPTI
      CPU: 10 PID: 4459 Comm: kworker/u80:0 Kdump: loaded Tainted: P
      Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.5.4 01/13/2020
      Workqueue: bnxt_re bnxt_re_task [bnxt_re]
      RIP: 0010:_raw_spin_lock_irqsave+0x19/0x40
      Call Trace:
        ib_unregister_mad_agent+0x46/0x2f0 [ib_core]
        IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
        ? __schedule+0x20b/0x560
        srpt_unregister_mad_agent+0x93/0xd0 [ib_srpt]
        srpt_remove_one+0x20/0x150 [ib_srpt]
        remove_client_context+0x88/0xd0 [ib_core]
        bond0: (slave p2p1): link status definitely up, 100000 Mbps full duplex
        disable_device+0x8a/0x160 [ib_core]
        bond0: active interface up!
        ? kernfs_name_hash+0x12/0x80
       (NULL device *): Bonding Info Received: rdev: 000000006c0b8247
        __ib_unregister_device+0x42/0xb0 [ib_core]
       (NULL device *):         Master: mode: 4 num_slaves:2
        ib_unregister_device+0x22/0x30 [ib_core]
       (NULL device *):         Slave: id: 105069936 name:p2p1 link:0 state:0
        bnxt_re_stopqps_and_ib_uninit+0x83/0x90 [bnxt_re]
        bnxt_re_alloc_lag+0x12e/0x4e0 [bnxt_re]
      
      Fixes: a42d985b ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
      Reviewed-by: default avatarSelvin Xavier <selvin.xavier@broadcom.com>
      Reviewed-by: default avatarKashyap Desai <kashyap.desai@broadcom.com>
      Signed-off-by: default avatarSaravanan Vajravel <saravanan.vajravel@broadcom.com>
      Link: https://lore.kernel.org/r/20230406042549.507328-1-saravanan.vajravel@broadcom.com
      
      
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      eca5cd94
    • Mark Zhang's avatar
      RDMA/cm: Trace icm_send_rej event before the cm state is reset · bd9de1ba
      Mark Zhang authored
      
      
      Trace icm_send_rej event before the cm state is reset to idle, so that
      correct cm state will be logged. For example when an incoming request is
      rejected, the old trace log was:
          icm_send_rej: local_id=961102742 remote_id=3829151631 state=IDLE reason=REJ_CONSUMER_DEFINED
      With this patch:
          icm_send_rej: local_id=312971016 remote_id=3778819983 state=MRA_REQ_SENT reason=REJ_CONSUMER_DEFINED
      
      Fixes: 8dc105be ("RDMA/cm: Add tracepoints to track MAD send operations")
      Signed-off-by: default avatarMark Zhang <markzhang@nvidia.com>
      Link: https://lore.kernel.org/r/20230330072351.481200-1-markzhang@nvidia.com
      
      
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      bd9de1ba
  4. Apr 04, 2023
  5. Apr 03, 2023
  6. Mar 30, 2023
  7. Mar 29, 2023
  8. Mar 24, 2023