epoll: do not take global 'epmutex' for simple topologies
When calling EPOLL_CTL_ADD for an epoll file descriptor that is attached directly to a wakeup source, we do not need to take the global 'epmutex', unless the epoll file descriptor is nested. The purpose of taking the 'epmutex' on add is to prevent complex topologies such as loops and deep wakeup paths from forming in parallel through multiple EPOLL_CTL_ADD operations. However, for the simple case of an epoll file descriptor attached directly to a wakeup source (with no nesting), we do not need to hold the 'epmutex'. This patch along with 'epoll: optimize EPOLL_CTL_DEL using rcu' improves scalability on larger systems. Quoting Nathan Zimmer's mail on SPECjbb performance: "On the 16 socket run the performance went from 35k jOPS to 125k jOPS. In addition the benchmark when from scaling well on 10 sockets to scaling well on just over 40 sockets. ... Currently the benchmark stops scaling at around 40-44 sockets but it seems like I found a second unrelated bottleneck." [akpm@linux-foundation.org: use `bool' for boolean variables, remove unneeded/undesirable cast of void*, add missed ep_scan_ready_list() kerneldoc] Signed-off-by: Jason Baron <jbaron@akamai.com> Tested-by: Nathan Zimmer <nzimmer@sgi.com> Cc: Eric Wong <normalperson@yhbt.net> Cc: Nelson Elhage <nelhage@nelhage.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Please register or sign in to comment