- Dec 16, 2014
-
-
Roland Dreier authored
-
Haggai Eran authored
* Implement the relevant invalidation functions (zap MTTs as needed) * Implement interlocking (and rollback in the page fault handlers) for cases of a racing notifier and fault. * With this patch we can now enable the capability bits for supporting RC send/receive/RDMA read/RDMA write, and UD send. Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Shachar Raindel <raindel@mellanox.com> Signed-off-by:
Haggai Eran <haggaie@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Haggai Eran authored
Signed-off-by:
Shachar Raindel <raindel@mellanox.com> Signed-off-by:
Haggai Eran <haggaie@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Haggai Eran authored
This patch implement a page fault handler (leaving the pages pinned as of time being). The page fault handler handles initiator and responder page faults for UD/RC transports, for send/receive operations, as well as RDMA read/write initiator support. Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Shachar Raindel <raindel@mellanox.com> Signed-off-by:
Haggai Eran <haggaie@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Haggai Eran authored
* Refactor MR registration and cleanup, and fix reg_pages accounting. * Create a work queue to handle page fault events in a kthread context. * Register a fault handler to get events from the core for each QP. The registered fault handler is empty in this patch, and only a later patch implements it. Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Shachar Raindel <raindel@mellanox.com> Signed-off-by:
Haggai Eran <haggaie@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Haggai Eran authored
The new function allows updating the page tables of a memory region after it was created. This can be used to handle page faults and page invalidations. Since mlx5_ib_update_mtt will need to work from within page invalidation, so it must not block on memory allocation. It employs an atomic memory allocation mechanism that is used as a fallback when kmalloc(GFP_ATOMIC) fails. In order to reuse code from mlx5_ib_populate_pas, the patch splits this function and add the needed parameters. Signed-off-by:
Haggai Eran <haggaie@mellanox.com> Signed-off-by:
Shachar Raindel <raindel@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Haggai Eran authored
This patch wraps together several changes needed for on-demand paging support in the mlx5_ib_populate_pas function, and when registering memory regions. * Instead of accepting a UMR bit telling the function to enable all access flags, the function now accepts the access flags themselves. * For on-demand paging memory regions, fill the memory tables from the correct list, and enable/disable the access flags per-page according to whether the page is present. * A new bit is set to enable writing of access flags when using the firmware create_mkey command. * Disable contig pages when on-demand paging is enabled. In addition the patch changes the UMR code to use PTR_ALIGN instead of our own macro. Signed-off-by:
Haggai Eran <haggaie@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Haggai Eran authored
The patch adds infrastructure to query ODP capabilities in the mlx5 driver. The code will read the capabilities from the device, and enable only those capabilities that both the driver and the device supports. At this point ODP is not supported, so no capability is copied from the device, but the patch exposes the global ODP device capability bit. Signed-off-by:
Shachar Raindel <raindel@mellanox.com> Signed-off-by:
Haggai Eran <haggaie@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Haggai Eran authored
* Add a handler function pointer in the mlx5_core_qp struct for page fault events. Handle page fault events by calling the handler function, if not NULL. * Add on-demand paging capability query command. * Export command for resuming QPs after page faults. * Add various constants related to paging support. Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Shachar Raindel <raindel@mellanox.com> Signed-off-by:
Haggai Eran <haggaie@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Roland Dreier authored
In commit 0c7aac85 ("net/mlx5_core: Remove unused dev cap enum fields"), the flag MLX5_DEV_CAP_FLAG_ON_DMND_PG was removed. Unfortunately the on-demand paging changes actually use it, so re-add the missing flag. Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Sagi Grimberg authored
In case the last argument of the connection string is processed as a string (destination GID for example). Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Acked-by:
Bart Van Assche <bvanassche@acm.org> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Haggai Eran authored
* Add an interval tree implementation for ODP umems. Create an interval tree for each ucontext (including a count of the number of ODP MRs in this context, semaphore, etc.), and register ODP umems in the interval tree. * Add MMU notifiers handling functions, using the interval tree to notify only the relevant umems and underlying MRs. * Register to receive MMU notifier events from the MM subsystem upon ODP MR registration (and unregister accordingly). * Add a completion object to synchronize the destruction of ODP umems. * Add mechanism to abort page faults when there's a concurrent invalidation. The way we synchronize between concurrent invalidations and page faults is by keeping a counter of currently running invalidations, and a sequence number that is incremented whenever an invalidation is caught. The page fault code checks the counter and also verifies that the sequence number hasn't progressed before it updates the umem's page tables. This is similar to what the kvm module does. In order to prevent the case where we register a umem in the middle of an ongoing notifier, we also keep a per ucontext counter of the total number of active mmu notifiers. We only enable new umems when all the running notifiers complete. Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Shachar Raindel <raindel@mellanox.com> Signed-off-by:
Haggai Eran <haggaie@mellanox.com> Signed-off-by:
Yuval Dagan <yuvalda@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Shachar Raindel authored
* Extend the umem struct to keep the ODP related data. * Allocate and initialize the ODP related information in the umem (page_list, dma_list) and freeing as needed in the end of the run. * Store a reference to the process PID struct in the ucontext. Used to safely obtain the task_struct and the mm during fault handling, without preventing the task destruction if needed. * Add 2 helper functions: ib_umem_odp_map_dma_pages and ib_umem_odp_unmap_dma_pages. These functions get the DMA addresses of specific pages of the umem (and, currently, pin them). * Support for page faults only - IB core will keep the reference on the pages used and call put_page when freeing an ODP umem area. Invalidations support will be added in a later patch. Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Shachar Raindel <raindel@mellanox.com> Signed-off-by:
Haggai Eran <haggaie@mellanox.com> Signed-off-by:
Majd Dibbiny <majd@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Sagi Grimberg authored
* Add a configuration option for enable on-demand paging support in the infiniband subsystem (CONFIG_INFINIBAND_ON_DEMAND_PAGING). In a later patch, this configuration option will select the MMU_NOTIFIER configuration option to enable mmu notifiers. * Add a flag for on demand paging (ODP) support in the IB device capabilities. * Add a flag to request ODP MR in the access flags to reg_mr. * Fail registrations done with the ODP flag when the low-level driver doesn't support this. * Change the conditions in which an MR will be writable to explicitly specify the access flags. This is to avoid making an MR writable just because it is an ODP MR. * Add a ODP capabilities to the extended query device verb. Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Shachar Raindel <raindel@mellanox.com> Signed-off-by:
Haggai Eran <haggaie@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Eli Cohen authored
Add extensible query device capabilities verb to allow adding new features. ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to copy capability fields to be used by both ib_uverbs_query_device and ib_uverbs_ex_query_device. Signed-off-by:
Eli Cohen <eli@mellanox.com> Signed-off-by:
Haggai Eran <haggaie@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Haggai Eran authored
Add a helper function mlx5_ib_read_user_wqe to read information from user-space owned work queues. The function will be used in a later patch by the page-fault handling code in mlx5_ib. Signed-off-by:
Haggai Eran <haggaie@mellanox.com> [ Add stub for ib_umem_copy_from() for CONFIG_INFINIBAND_USER_MEM=n - Roland ] Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Haggai Eran authored
In some drivers there's a need to read data from a user space area that was pinned using ib_umem when running from a different process context. The ib_umem_copy_from function allows reading data from the physical pages pinned in the ib_umem struct. Signed-off-by:
Haggai Eran <haggaie@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Haggai Eran authored
In order to allow umems that do not pin memory, we need the umem to keep track of its region's address. This makes the offset field redundant, and so this patch removes it. Signed-off-by:
Haggai Eran <haggaie@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Haggai Eran authored
The current UMR interface doesn't allow partial updates to a memory region's page tables. This patch changes the interface to allow that. It also changes the way the UMR operation validates the memory region's state. When set, IB_SEND_UMR_FAIL_IF_FREE will cause the UMR operation to fail if the MKEY is in the free state. When it is unchecked the operation will check that it isn't in the free state. Signed-off-by:
Haggai Eran <haggaie@mellanox.com> Signed-off-by:
Shachar Raindel <raindel@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Haggai Eran authored
Since UMR code now uses its own context struct on the stack, the pas and dma pointers for the UMR operation that remained in the mlx5_ib_mr struct are not necessary. This patch removes them. Fixes: a74d2416 ("IB/mlx5: Refactor UMR to have its own context struct") Signed-off-by:
Haggai Eran <haggaie@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Devesh Sharma authored
For user applications that use UD QPs, always resolve destination MAC from the GRH. This is to avoid failure due to any garbage value in the attr->dmac. Signed-off-by:
Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by:
Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Mitesh Ahuja authored
Signed-off-by:
Mitesh Ahuja <mitesh.ahuja@emulex.com> Signed-off-by:
Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Yuval Shaia authored
Move check for DPDP out of the loop to make the code more readable. Signed-off-by:
Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Jack Morgenstein authored
This error was detected by sparse static checker: drivers/infiniband/hw/mlx4/mr.c:226:21: warning: symbol 'err' shadows an earlier one drivers/infiniband/hw/mlx4/mr.c:197:13: originally declared here Signed-off-by:
Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Or Gerlitz authored
Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Sagi Grimberg authored
Following few recent Block integrity updates, we align the iSER data integrity offload settings with: - Deprecate pi_guard module param - Expose support for DIX type 0. - Use scsi_transfer_length for the transfer length - Get pi_interval, ref_tag, ref_remap, bg_type and check_mask setting from scsi_cmnd Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Sagi Grimberg authored
Use likely() for wc.status == IB_WC_SUCCESS Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Sagi Grimberg authored
And fix a checkpatch warning. Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Sagi Grimberg authored
No reason to settle with four, can use the min between device max comp vectors and number of cores. Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Sagi Grimberg authored
It is enough to check mem_h pointer assignment, mem_h == NULL will indicate that buffer is not registered using mr. Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Sagi Grimberg authored
Eliminates code duplication. Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Sagi Grimberg authored
When closing the connection, we should first terminate the connection (in case it was not previously terminated) to guarantee the QP is in error state and we are done with servicing IO. Only then go ahead with tasks cleanup via iscsi_conn_stop. Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Sagi Grimberg authored
In certain scenarios (target kill with live IO) scsi TMFs may race with iser RDMA teardown, which might cause NULL dereference on iser IB device handle (which might have been freed). In this case we take a conditional lock for TMFs and check the connection state (avoid introducing lock contention in the IO path). This is indeed best effort approach, but sufficient to survive multi targets sudden death while heavy IO is inflight. While we are on it, add a nice kernel-doc style documentation. Reported-by:
Ariel Nahum <arieln@mellanox.com> Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Ariel Nahum authored
If rdma_cm error event comes after ep_poll but before conn_bind, we should protect against dereferncing the device (which may have been terminated) in session_create and conn_create (already protected) callbacks. Signed-off-by:
Ariel Nahum <arieln@mellanox.com> Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Sagi Grimberg authored
Use uintptr_t to handle wr_id casting, which was found by Kbuild test robot and smatch. Also remove an internal definition of variable which potentially shadows an external one (and make sparse happy). Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Max Gurtovoy authored
Fix a regression was introduced in commit 6df5a128 ("IB/iser: Suppress scsi command send completions"). The sig_count was wrongly set to be static variable, thus it is possible that we won't reach to (sig_count % ISER_SIGNAL_BATCH) == 0 condition (due to races) and the send queue will be overflowed. Instead keep sig_count per connection. We don't need it to be atomic as we are safe under the iscsi session frwd_lock taken by libiscsi on the queuecommand path. Fixes: 6df5a128 ("IB/iser: Suppress scsi command send completions") Signed-off-by:
Max Gurtovoy <maxg@mellanox.com> Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Sagi Grimberg authored
When creating a connection QP we choose the least used CQ and inc the number of active QPs on that. If we fail to create the QP, we need to decrement the active QPs counter. Reported-by:
Roi Dayan <roid@mellanox.com> Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Ariel Nahum authored
No real need to wait for TIMEWAIT_EXIT before we destroy the RDMA resources (also TIMEAWAIT_EXIT is not guarenteed to always arrive). As for the cma_id, only destroy it if the state is not DOWN where in this case, conn_release is already running and we don't want to compete. Signed-off-by:
Ariel Nahum <arieln@mellanox.com> Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Sagi Grimberg authored
In case of the HCA going into catasrophic error flow, the beacon post_send is likely to fail, so surely there will be no completion for it. In this case, use a best effort approach and don't wait for beacon completion if we failed to post the send. Reported-by:
Alex Tabachnik <alext@mellanox.com> Signed-off-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
Minh Tran authored
Re-adjust max CQEs per CQ and max send_wr per QP according to the resource limits supported by underlying hardware. Signed-off-by:
Minh Tran <minhduc.tran@emulex.com> Signed-off-by:
Jayamohan Kallickal <jayamohan.kallickal@emulex.com> Acked-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-