Skip to content
  1. May 02, 2013
  2. May 01, 2013
    • Dmitry Torokhov's avatar
      Merge branch 'next' into for-linus · bf61c884
      Dmitry Torokhov authored
      Prepare first set of updates for 3.10 merge window.
      bf61c884
    • Linus Torvalds's avatar
      Merge branch 'ipc-scalability' · 823e75f7
      Linus Torvalds authored
      Merge IPC cleanup and scalability patches from Andrew Morton.
      
      This cleans up many of the oddities in the IPC code, uses the list
      iterator helpers, splits out locking and adds per-semaphore locks for
      greater scalability of the IPC semaphore code.
      
      Most normal user-level locking by now uses futexes (ie pthreads, but
      also a lot of specialized locks), but SysV IPC semaphores are apparently
      still used in some big applications, either for portability reasons, or
      because they offer tracking and undo (and you don't need to have a
      special shared memory area for them).
      
      Our IPC semaphore scalability was pitiful.  We used to lock much too big
      ranges, and we used to have a single ipc lock per ipc semaphore array.
      Most loads never cared, but some do.  There are some numbers in the
      individual commits.
      
      * ipc-scalability:
        ipc: sysv shared memory limited to 8TiB
        ipc/msg.c: use list_for_each_entry_[safe] for list traversing
        ipc,sem: fine grained locking for semtimedop
        ipc,sem: have only one list in struct sem_queue
        ipc,sem: open code and rename sem_lock
        ipc,sem: do not hold ipc lock more than necessary
        ipc: introduce lockless pre_down ipcctl
        ipc: introduce obtaining a lockless ipc object
        ipc: remove bogus lock comment for ipc_checkid
        ipc/msgutil.c: use linux/uaccess.h
        ipc: refactor msg list search into separate function
        ipc: simplify msg list search
        ipc: implement MSG_COPY as a new receive mode
        ipc: remove msg handling from queue scan
        ipc: set EFAULT as default error in load_msg()
        ipc: tighten msg copy loops
        ipc: separate msg allocation from userspace copy
        ipc: clamp with min()
      823e75f7
    • Robin Holt's avatar
      ipc: sysv shared memory limited to 8TiB · d69f3bad
      Robin Holt authored
      
      
      Trying to run an application which was trying to put data into half of
      memory using shmget(), we found that having a shmall value below 8EiB-8TiB
      would prevent us from using anything more than 8TiB.  By setting
      kernel.shmall greater than 8EiB-8TiB would make the job work.
      
      In the newseg() function, ns->shm_tot which, at 8TiB is INT_MAX.
      
      ipc/shm.c:
       458 static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
       459 {
      ...
       465         int numpages = (size + PAGE_SIZE -1) >> PAGE_SHIFT;
      ...
       474         if (ns->shm_tot + numpages > ns->shm_ctlall)
       475                 return -ENOSPC;
      
      [akpm@linux-foundation.org: make ipc/shm.c:newseg()'s numpages size_t, not int]
      Signed-off-by: default avatarRobin Holt <holt@sgi.com>
      Reported-by: default avatarAlex Thorlton <athorlton@sgi.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d69f3bad
    • Nikola Pajkovsky's avatar
      ipc/msg.c: use list_for_each_entry_[safe] for list traversing · 41239fe8
      Nikola Pajkovsky authored
      
      
      The ipc/msg.c code does its list operations by hand and it open-codes the
      accesses, instead of using for_each_entry_[safe].
      
      Signed-off-by: default avatarNikola Pajkovsky <npajkovs@redhat.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      41239fe8
    • Rik van Riel's avatar
      ipc,sem: fine grained locking for semtimedop · 6062a8dc
      Rik van Riel authored
      
      
      Introduce finer grained locking for semtimedop, to handle the common case
      of a program wanting to manipulate one semaphore from an array with
      multiple semaphores.
      
      If the call is a semop manipulating just one semaphore in an array with
      multiple semaphores, only take the lock for that semaphore itself.
      
      If the call needs to manipulate multiple semaphores, or another caller is
      in a transaction that manipulates multiple semaphores, the sem_array lock
      is taken, as well as all the locks for the individual semaphores.
      
      On a 24 CPU system, performance numbers with the semop-multi
      test with N threads and N semaphores, look like this:
      
      	vanilla		Davidlohr's	Davidlohr's +	Davidlohr's +
      threads			patches		rwlock patches	v3 patches
      10	610652		726325		1783589		2142206
      20	341570		365699		1520453		1977878
      30	288102		307037		1498167		2037995
      40	290714		305955		1612665		2256484
      50	288620		312890		1733453		2650292
      60	289987		306043		1649360		2388008
      70	291298		306347		1723167		2717486
      80	290948		305662		1729545		2763582
      90	290996		306680		1736021		2757524
      100	292243		306700		1773700		3059159
      
      [davidlohr.bueso@hp.com: do not call sem_lock when bogus sma]
      [davidlohr.bueso@hp.com: make refcounter atomic]
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarDavidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Jason Low <jason.low2@hp.com>
      Reviewed-by: default avatarMichel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: default avatarEmmanuel Benisty <benisty.e@gmail.com>
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6062a8dc
    • Rik van Riel's avatar
      ipc,sem: have only one list in struct sem_queue · 9f1bc2c9
      Rik van Riel authored
      
      
      Having only one list in struct sem_queue, and only queueing simple
      semaphore operations on the list for the semaphore involved, allows us to
      introduce finer grained locking for semtimedop.
      
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarDavidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f1bc2c9
    • Rik van Riel's avatar
      ipc,sem: open code and rename sem_lock · c460b662
      Rik van Riel authored
      
      
      Rename sem_lock() to sem_obtain_lock(), so we can introduce a sem_lock()
      later that only locks the sem_array and does nothing else.
      
      Open code the locking from ipc_lock() in sem_obtain_lock() so we can
      introduce finer grained locking for the sem_array in the next patch.
      
      [akpm@linux-foundation.org: propagate the ipc_obtain_object() errno out of sem_obtain_lock()]
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarDavidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c460b662
    • Davidlohr Bueso's avatar
      ipc,sem: do not hold ipc lock more than necessary · 16df3674
      Davidlohr Bueso authored
      
      
      Instead of holding the ipc lock for permissions and security checks, among
      others, only acquire it when necessary.
      
      Some numbers....
      
      1) With Rik's semop-multi.c microbenchmark we can see the following
         results:
      
      Baseline (3.9-rc1):
      cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
      total operations: 151452270, ops/sec 5048409
      
      +  59.40%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
      +   6.14%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
      +   3.84%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
      +   3.64%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
      +   2.06%            a.out  [kernel.kallsyms]  [k] copy_user_enhanced_fast_string
      +   1.86%            a.out  [kernel.kallsyms]  [k] ipc_lock
      
      With this patchset:
      cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
      total operations: 273156400, ops/sec 9105213
      
      +  18.54%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
      +  11.72%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
      +   7.70%            a.out  [kernel.kallsyms]  [k] ipc_has_perm.isra.21
      +   6.58%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
      +   6.54%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
      +   4.71%            a.out  [kernel.kallsyms]  [k] ipc_obtain_object_check
      
      2) While on an Oracle swingbench DSS (data mining) workload the
         improvements are not as exciting as with Rik's benchmark, we can see
         some positive numbers.  For an 8 socket machine the following are the
         percentages of %sys time incurred in the ipc lock:
      
      Baseline (3.9-rc1):
      100 swingbench users: 8,74%
      400 swingbench users: 21,86%
      800 swingbench users: 84,35%
      
      With this patchset:
      100 swingbench users: 8,11%
      400 swingbench users: 19,93%
      800 swingbench users: 77,69%
      
      [riel@redhat.com: fix two locking bugs]
      [sasha.levin@oracle.com: prevent releasing RCU read lock twice in semctl_main]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarDavidlohr Bueso <davidlohr.bueso@hp.com>
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarChegu Vinod <chegu_vinod@hp.com>
      Acked-by: default avatarMichel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      16df3674