riscv: qspinlock: Add basic queued_spinlock support
The requirements of qspinlock have been documented by commit: a8ad07e5 ("asm-generic: qspinlock: Indicate the use of mixed-size atomics"). Although RISC-V ISA gives out a weaker forward guarantee LR/SC, which doesn't satisfy the requirements of qspinlock above, it won't prevent some riscv vendors from implementing a strong fwd guarantee LR/SC in microarchitecture to match xchg_tail requirement. T-HEAD C9xx processor is the one. We've tested the patch on SOPHGO sg2042 & th1520 and passed the stress test on Fedora & Ubuntu & OpenEuler ... Here is the performance comparison between qspinlock and ticket_lock on sg2042 (64 cores): sysbench test=threads threads=32 yields=100 lock=8 (+13.8%): queued_spinlock 0.5109/0.00 ticket_spinlock 0.5814/0.00 perf futex/hash (+6.7%): queued_spinlock 14443937 operations/sec (+- 0.09%) ticket_spinlock 1353215 operations/sec (+- 0.15%) perf futex/wake-parallel (+8.6%): queued_spinlock (waking 1/64 threads) in 0.0253 ms (+-2.90%) ticket_spinlock (waking 1/64 threads) in 0.0275 ms (+-3.12%) perf futex/requeue (+4.2%): queued_spinlock Requeued 64 of 64 threads in 0.0785 ms (+-0.55%) ticket_spinlock Requeued 64 of 64 threads in 0.0818 ms (+-4.12%) System Benchmarks (+6.4%) queued_spinlock: System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 628613745.4 53865.8 Double-Precision Whetstone 55.0 182422.8 33167.8 Execl Throughput 43.0 13116.6 3050.4 File Copy 1024 bufsize 2000 maxblocks 3960.0 7762306.2 19601.8 File Copy 256 bufsize 500 maxblocks 1655.0 3417556.8 20649.9 File Copy 4096 bufsize 8000 maxblocks 5800.0 7427995.7 12806.9 Pipe Throughput 12440.0 23058600.5 18535.9 Pipe-based Context Switching 4000.0 2835617.7 7089.0 Process Creation 126.0 12537.3 995.0 Shell Scripts (1 concurrent) 42.4 57057.4 13456.9 Shell Scripts (8 concurrent) 6.0 7367.1 12278.5 System Call Overhead 15000.0 33308301.3 22205.5 ======== System Benchmarks Index Score 12426.1 ticket_spinlock: System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 626541701.9 53688.2 Double-Precision Whetstone 55.0 181921.0 33076.5 Execl Throughput 43.0 12625.1 2936.1 File Copy 1024 bufsize 2000 maxblocks 3960.0 6553792.9 16550.0 File Copy 256 bufsize 500 maxblocks 1655.0 3189231.6 19270.3 File Copy 4096 bufsize 8000 maxblocks 5800.0 7221277.0 12450.5 Pipe Throughput 12440.0 20594018.7 16554.7 Pipe-based Context Switching 4000.0 2571117.7 6427.8 Process Creation 126.0 10798.4 857.0 Shell Scripts (1 concurrent) 42.4 57227.5 13497.1 Shell Scripts (8 concurrent) 6.0 7329.2 12215.3 System Call Overhead 15000.0 30766778.4 20511.2 ======== System Benchmarks Index Score 11670.7 The qspinlock has a significant improvement on SOPHGO SG2042 64 cores platform than the ticket_lock. Signed-off-by: Guo Ren <guoren@kernel.org> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Please register or sign in to comment