staging/rdma/hfi1: Use parallel workqueue for SDMA engines
The workqueue is currently single threaded per port which for a small number of SDMA engines is ok. For hfi1, the there are up to 16 SDMA engines that can be fed descriptors in parallel. Use alloc_workqueue with a workqueue limit of the number of sdma engines and with WQ_CPU_INTENSIVE and WQ_HIGHPRI specified. Then change send to use the new scheduler which no longer needs to get the s_lock Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>