Skip to content
  1. May 20, 2016
    • Joonsoo Kim's avatar
      mm: rename _count, field of the struct page, to _refcount · 0139aa7b
      Joonsoo Kim authored
      
      
      Many developers already know that field for reference count of the
      struct page is _count and atomic type.  They would try to handle it
      directly and this could break the purpose of page reference count
      tracepoint.  To prevent direct _count modification, this patch rename it
      to _refcount and add warning message on the code.  After that, developer
      who need to handle reference count will find that field should not be
      accessed directly.
      
      [akpm@linux-foundation.org: fix comments, per Vlastimil]
      [akpm@linux-foundation.org: Documentation/vm/transhuge.txt too]
      [sfr@canb.auug.org.au: sync ethernet driver changes]
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Sunil Goutham <sgoutham@cavium.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Manish Chopra <manish.chopra@qlogic.com>
      Cc: Yuval Mintz <yuval.mintz@qlogic.com>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0139aa7b
    • Joonsoo Kim's avatar
      mm/page_ref: use page_ref helper instead of direct modification of _count · 6d061f9f
      Joonsoo Kim authored
      
      
      page_reference manipulation functions are introduced to track down
      reference count change of the page.  Use it instead of direct
      modification of _count.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Sunil Goutham <sgoutham@cavium.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6d061f9f
    • Li Peng's avatar
      mm/slub.c: fix sysfs filename in comment · 43efd3ea
      Li Peng authored
      
      
      /sys/kernel/slab/xx/defrag_ratio should be remote_node_defrag_ratio.
      
      Link: http://lkml.kernel.org/r/1463449242-5366-1-git-send-email-lip@dtdream.com
      Signed-off-by: default avatarLi Peng <lip@dtdream.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      43efd3ea
    • Yang Shi's avatar
      mm: slab: remove ZONE_DMA_FLAG · a3187e43
      Yang Shi authored
      
      
      Now we have IS_ENABLED helper to check if a Kconfig option is enabled or
      not, so ZONE_DMA_FLAG sounds no longer useful.
      
      And, the use of ZONE_DMA_FLAG in slab looks pointless according to the
      comment [1] from Johannes Weiner, so remove them and ORing passed in
      flags with the cache gfp flags has been done in kmem_getpages().
      
      [1] https://lkml.org/lkml/2014/9/25/553
      
      Link: http://lkml.kernel.org/r/1462381297-11009-1-git-send-email-yang.shi@linaro.org
      Signed-off-by: default avatarYang Shi <yang.shi@linaro.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a3187e43
    • Thomas Garnier's avatar
      mm: SLAB freelist randomization · c7ce4f60
      Thomas Garnier authored
      
      
      Provides an optional config (CONFIG_SLAB_FREELIST_RANDOM) to randomize
      the SLAB freelist.  The list is randomized during initialization of a
      new set of pages.  The order on different freelist sizes is pre-computed
      at boot for performance.  Each kmem_cache has its own randomized
      freelist.  Before pre-computed lists are available freelists are
      generated dynamically.  This security feature reduces the predictability
      of the kernel SLAB allocator against heap overflows rendering attacks
      much less stable.
      
      For example this attack against SLUB (also applicable against SLAB)
      would be affected:
      
        https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
      
      Also, since v4.6 the freelist was moved at the end of the SLAB.  It
      means a controllable heap is opened to new attacks not yet publicly
      discussed.  A kernel heap overflow can be transformed to multiple
      use-after-free.  This feature makes this type of attack harder too.
      
      To generate entropy, we use get_random_bytes_arch because 0 bits of
      entropy is available in the boot stage.  In the worse case this function
      will fallback to the get_random_bytes sub API.  We also generate a shift
      random number to shift pre-computed freelist for each new set of pages.
      
      The config option name is not specific to the SLAB as this approach will
      be extended to other allocators like SLUB.
      
      Performance results highlighted no major changes:
      
      Hackbench (running 90 10 times):
      
        Before average: 0.0698
        After average: 0.0663 (-5.01%)
      
      slab_test 1 run on boot.  Difference only seen on the 2048 size test
      being the worse case scenario covered by freelist randomization.  New
      slab pages are constantly being created on the 10000 allocations.
      Variance should be mainly due to getting new pages every few
      allocations.
      
      Before:
      
        Single thread testing
        =====================
        1. Kmalloc: Repeatedly allocate then free test
        10000 times kmalloc(8) -> 99 cycles kfree -> 112 cycles
        10000 times kmalloc(16) -> 109 cycles kfree -> 140 cycles
        10000 times kmalloc(32) -> 129 cycles kfree -> 137 cycles
        10000 times kmalloc(64) -> 141 cycles kfree -> 141 cycles
        10000 times kmalloc(128) -> 152 cycles kfree -> 148 cycles
        10000 times kmalloc(256) -> 195 cycles kfree -> 167 cycles
        10000 times kmalloc(512) -> 257 cycles kfree -> 199 cycles
        10000 times kmalloc(1024) -> 393 cycles kfree -> 251 cycles
        10000 times kmalloc(2048) -> 649 cycles kfree -> 228 cycles
        10000 times kmalloc(4096) -> 806 cycles kfree -> 370 cycles
        10000 times kmalloc(8192) -> 814 cycles kfree -> 411 cycles
        10000 times kmalloc(16384) -> 892 cycles kfree -> 455 cycles
        2. Kmalloc: alloc/free test
        10000 times kmalloc(8)/kfree -> 121 cycles
        10000 times kmalloc(16)/kfree -> 121 cycles
        10000 times kmalloc(32)/kfree -> 121 cycles
        10000 times kmalloc(64)/kfree -> 121 cycles
        10000 times kmalloc(128)/kfree -> 121 cycles
        10000 times kmalloc(256)/kfree -> 119 cycles
        10000 times kmalloc(512)/kfree -> 119 cycles
        10000 times kmalloc(1024)/kfree -> 119 cycles
        10000 times kmalloc(2048)/kfree -> 119 cycles
        10000 times kmalloc(4096)/kfree -> 121 cycles
        10000 times kmalloc(8192)/kfree -> 119 cycles
        10000 times kmalloc(16384)/kfree -> 119 cycles
      
      After:
      
        Single thread testing
        =====================
        1. Kmalloc: Repeatedly allocate then free test
        10000 times kmalloc(8) -> 130 cycles kfree -> 86 cycles
        10000 times kmalloc(16) -> 118 cycles kfree -> 86 cycles
        10000 times kmalloc(32) -> 121 cycles kfree -> 85 cycles
        10000 times kmalloc(64) -> 176 cycles kfree -> 102 cycles
        10000 times kmalloc(128) -> 178 cycles kfree -> 100 cycles
        10000 times kmalloc(256) -> 205 cycles kfree -> 109 cycles
        10000 times kmalloc(512) -> 262 cycles kfree -> 136 cycles
        10000 times kmalloc(1024) -> 342 cycles kfree -> 157 cycles
        10000 times kmalloc(2048) -> 701 cycles kfree -> 238 cycles
        10000 times kmalloc(4096) -> 803 cycles kfree -> 364 cycles
        10000 times kmalloc(8192) -> 835 cycles kfree -> 404 cycles
        10000 times kmalloc(16384) -> 896 cycles kfree -> 441 cycles
        2. Kmalloc: alloc/free test
        10000 times kmalloc(8)/kfree -> 121 cycles
        10000 times kmalloc(16)/kfree -> 121 cycles
        10000 times kmalloc(32)/kfree -> 123 cycles
        10000 times kmalloc(64)/kfree -> 142 cycles
        10000 times kmalloc(128)/kfree -> 121 cycles
        10000 times kmalloc(256)/kfree -> 119 cycles
        10000 times kmalloc(512)/kfree -> 119 cycles
        10000 times kmalloc(1024)/kfree -> 119 cycles
        10000 times kmalloc(2048)/kfree -> 119 cycles
        10000 times kmalloc(4096)/kfree -> 119 cycles
        10000 times kmalloc(8192)/kfree -> 119 cycles
        10000 times kmalloc(16384)/kfree -> 119 cycles
      
      [akpm@linux-foundation.org: propagate gfp_t into cache_random_seq_create()]
      Signed-off-by: default avatarThomas Garnier <thgarnie@google.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Laura Abbott <labbott@fedoraproject.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c7ce4f60
    • Vladimir Davydov's avatar
      mm/slub.c: replace kick_all_cpus_sync() with synchronize_sched() in kmem_cache_shrink() · 81ae6d03
      Vladimir Davydov authored
      
      
      When we call __kmem_cache_shrink on memory cgroup removal, we need to
      synchronize kmem_cache->cpu_partial update with put_cpu_partial that
      might be running on other cpus.  Currently, we achieve that by using
      kick_all_cpus_sync, which works as a system wide memory barrier.  Though
      fast it is, this method has a flaw - it issues a lot of IPIs, which
      might hurt high performance or real-time workloads.
      
      To fix this, let's replace kick_all_cpus_sync with synchronize_sched.
      Although the latter one may take much longer to finish, it shouldn't be
      a problem in this particular case, because memory cgroups are destroyed
      asynchronously from a workqueue so that no user visible effects should
      be introduced.  OTOH, it will save us from excessive IPIs when someone
      removes a cgroup.
      
      Anyway, even if using synchronize_sched turns out to take too long, we
      can always introduce a kind of __kmem_cache_shrink batching so that this
      method would only be called once per one cgroup destruction (not per
      each per memcg kmem cache as it is now).
      
      Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      81ae6d03
    • Joonsoo Kim's avatar
      mm/slab: lockless decision to grow cache · 801faf0d
      Joonsoo Kim authored
      
      
      To check whether free objects exist or not precisely, we need to grab a
      lock.  But, accuracy isn't that important because race window would be
      even small and if there is too much free object, cache reaper would reap
      it.  So, this patch makes the check for free object exisistence not to
      hold a lock.  This will reduce lock contention in heavily allocation
      case.
      
      Note that until now, n->shared can be freed during the processing by
      writing slabinfo, but, with some trick in this patch, we can access it
      freely within interrupt disabled period.
      
      Below is the result of concurrent allocation/free in slab allocation
      benchmark made by Christoph a long time ago.  I make the output simpler.
      The number shows cycle count during alloc/free respectively so less is
      better.
      
        * Before
        Kmalloc N*alloc N*free(32): Average=248/966
        Kmalloc N*alloc N*free(64): Average=261/949
        Kmalloc N*alloc N*free(128): Average=314/1016
        Kmalloc N*alloc N*free(256): Average=741/1061
        Kmalloc N*alloc N*free(512): Average=1246/1152
        Kmalloc N*alloc N*free(1024): Average=2437/1259
        Kmalloc N*alloc N*free(2048): Average=4980/1800
        Kmalloc N*alloc N*free(4096): Average=9000/2078
      
        * After
        Kmalloc N*alloc N*free(32): Average=344/792
        Kmalloc N*alloc N*free(64): Average=347/882
        Kmalloc N*alloc N*free(128): Average=390/959
        Kmalloc N*alloc N*free(256): Average=393/1067
        Kmalloc N*alloc N*free(512): Average=683/1229
        Kmalloc N*alloc N*free(1024): Average=1295/1325
        Kmalloc N*alloc N*free(2048): Average=2513/1664
        Kmalloc N*alloc N*free(4096): Average=4742/2172
      
      It shows that allocation performance decreases for the object size up to
      128 and it may be due to extra checks in cache_alloc_refill().  But,
      with considering improvement of free performance, net result looks the
      same.  Result for other size class looks very promising, roughly, 50%
      performance improvement.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      801faf0d
    • Joonsoo Kim's avatar
      mm/slab: refill cpu cache through a new slab without holding a node lock · 213b4695
      Joonsoo Kim authored
      
      
      Until now, cache growing makes a free slab on node's slab list and then
      we can allocate free objects from it.  This necessarily requires to hold
      a node lock which is very contended.  If we refill cpu cache before
      attaching it to node's slab list, we can avoid holding a node lock as
      much as possible because this newly allocated slab is only visible to
      the current task.  This will reduce lock contention.
      
      Below is the result of concurrent allocation/free in slab allocation
      benchmark made by Christoph a long time ago.  I make the output simpler.
      The number shows cycle count during alloc/free respectively so less is
      better.
      
        * Before
        Kmalloc N*alloc N*free(32): Average=355/750
        Kmalloc N*alloc N*free(64): Average=452/812
        Kmalloc N*alloc N*free(128): Average=559/1070
        Kmalloc N*alloc N*free(256): Average=1176/980
        Kmalloc N*alloc N*free(512): Average=1939/1189
        Kmalloc N*alloc N*free(1024): Average=3521/1278
        Kmalloc N*alloc N*free(2048): Average=7152/1838
        Kmalloc N*alloc N*free(4096): Average=13438/2013
      
        * After
        Kmalloc N*alloc N*free(32): Average=248/966
        Kmalloc N*alloc N*free(64): Average=261/949
        Kmalloc N*alloc N*free(128): Average=314/1016
        Kmalloc N*alloc N*free(256): Average=741/1061
        Kmalloc N*alloc N*free(512): Average=1246/1152
        Kmalloc N*alloc N*free(1024): Average=2437/1259
        Kmalloc N*alloc N*free(2048): Average=4980/1800
        Kmalloc N*alloc N*free(4096): Average=9000/2078
      
      It shows that contention is reduced for all the object sizes and
      performance increases by 30 ~ 40%.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      213b4695
    • Joonsoo Kim's avatar
      mm/slab: separate cache_grow() to two parts · 76b342bd
      Joonsoo Kim authored
      
      
      This is a preparation step to implement lockless allocation path when
      there is no free objects in kmem_cache.
      
      What we'd like to do here is to refill cpu cache without holding a node
      lock.  To accomplish this purpose, refill should be done after new slab
      allocation but before attaching the slab to the management list.  So,
      this patch separates cache_grow() to two parts, allocation and attaching
      to the list in order to add some code inbetween them in the following
      patch.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      76b342bd
    • Joonsoo Kim's avatar
      mm/slab: make cache_grow() handle the page allocated on arbitrary node · 511e3a05
      Joonsoo Kim authored
      
      
      Currently, cache_grow() assumes that allocated page's nodeid would be
      same with parameter nodeid which is used for allocation request.  If we
      discard this assumption, we can handle fallback_alloc() case gracefully.
      So, this patch makes cache_grow() handle the page allocated on arbitrary
      node and clean-up relevant code.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      511e3a05
    • Joonsoo Kim's avatar
      mm/slab: racy access/modify the slab color · 03d1d43a
      Joonsoo Kim authored
      
      
      Slab color isn't needed to be changed strictly.  Because locking for
      changing slab color could cause more lock contention so this patch
      implements racy access/modify the slab color.  This is a preparation
      step to implement lockless allocation path when there is no free objects
      in the kmem_cache.
      
      Below is the result of concurrent allocation/free in slab allocation
      benchmark made by Christoph a long time ago.  I make the output simpler.
      The number shows cycle count during alloc/free respectively so less is
      better.
      
        * Before
        Kmalloc N*alloc N*free(32): Average=365/806
        Kmalloc N*alloc N*free(64): Average=452/690
        Kmalloc N*alloc N*free(128): Average=736/886
        Kmalloc N*alloc N*free(256): Average=1167/985
        Kmalloc N*alloc N*free(512): Average=2088/1125
        Kmalloc N*alloc N*free(1024): Average=4115/1184
        Kmalloc N*alloc N*free(2048): Average=8451/1748
        Kmalloc N*alloc N*free(4096): Average=16024/2048
      
        * After
        Kmalloc N*alloc N*free(32): Average=355/750
        Kmalloc N*alloc N*free(64): Average=452/812
        Kmalloc N*alloc N*free(128): Average=559/1070
        Kmalloc N*alloc N*free(256): Average=1176/980
        Kmalloc N*alloc N*free(512): Average=1939/1189
        Kmalloc N*alloc N*free(1024): Average=3521/1278
        Kmalloc N*alloc N*free(2048): Average=7152/1838
        Kmalloc N*alloc N*free(4096): Average=13438/2013
      
      It shows that contention is reduced for object size >= 1024 and
      performance increases by roughly 15%.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      03d1d43a
    • Joonsoo Kim's avatar
      mm/slab: don't keep free slabs if free_objects exceeds free_limit · 6052b788
      Joonsoo Kim authored
      
      
      Currently, determination to free a slab is done whenever each freed
      object is put into the slab.  This has a following problem.
      
      Assume free_limit = 10 and nr_free = 9.
      
      Free happens as following sequence and nr_free changes as following.
      
      free(become a free slab) free(not become a free slab) nr_free: 9 -> 10
      (at first free) -> 11 (at second free)
      
      If we try to check if we can free current slab or not on each object
      free, we can't free any slab in this situation because current slab
      isn't a free slab when nr_free exceed free_limit (at second free) even
      if there is a free slab.
      
      However, if we check it lastly, we can free 1 free slab.
      
      This problem would cause to keep too much memory in the slab subsystem.
      This patch try to fix it by checking number of free object after all
      free work is done.  If there is free slab at that time, we can free slab
      as much as possible so we keep free slab as minimal.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6052b788
    • Joonsoo Kim's avatar
      mm/slab: clean-up kmem_cache_node setup · c3d332b6
      Joonsoo Kim authored
      
      
      There are mostly same code for setting up kmem_cache_node either in
      cpuup_prepare() or alloc_kmem_cache_node().  Factor out and clean-up
      them.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Tested-by: default avatarNishanth Menon <nm@ti.com>
      Tested-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c3d332b6
    • Joonsoo Kim's avatar
      mm/slab: factor out kmem_cache_node initialization code · ded0ecf6
      Joonsoo Kim authored
      
      
      It can be reused on other place, so factor out it.  Following patch will
      use it.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ded0ecf6
    • Joonsoo Kim's avatar
      mm/slab: drain the free slab as much as possible · a5aa63a5
      Joonsoo Kim authored
      
      
      slabs_tofree() implies freeing all free slab.  We can do it with just
      providing INT_MAX.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a5aa63a5
    • Joonsoo Kim's avatar
      mm/slab: remove BAD_ALIEN_MAGIC again · 8888177e
      Joonsoo Kim authored
      Initial attemp to remove BAD_ALIEN_MAGIC is once reverted by 'commit
      edcad250
      
       ("Revert "slab: remove BAD_ALIEN_MAGIC"")' because it
      causes a problem on m68k which has many node but !CONFIG_NUMA.  In this
      case, although alien cache isn't used at all but to cope with some
      initialization path, garbage value is used and that is BAD_ALIEN_MAGIC.
      Now, this patch set use_alien_caches to 0 when !CONFIG_NUMA, there is no
      initialization path problem so we don't need BAD_ALIEN_MAGIC at all.  So
      remove it.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Tested-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8888177e
    • Joonsoo Kim's avatar
      mm/slab: fix the theoretical race by holding proper lock · 18726ca8
      Joonsoo Kim authored
      
      
      While processing concurrent allocation, SLAB could be contended a lot
      because it did a lots of work with holding a lock.  This patchset try to
      reduce the number of critical section to reduce lock contention.  Major
      changes are lockless decision to allocate more slab and lockless cpu
      cache refill from the newly allocated slab.
      
      Below is the result of concurrent allocation/free in slab allocation
      benchmark made by Christoph a long time ago.  I make the output simpler.
      The number shows cycle count during alloc/free respectively so less is
      better.
      
        * Before
        Kmalloc N*alloc N*free(32): Average=365/806
        Kmalloc N*alloc N*free(64): Average=452/690
        Kmalloc N*alloc N*free(128): Average=736/886
        Kmalloc N*alloc N*free(256): Average=1167/985
        Kmalloc N*alloc N*free(512): Average=2088/1125
        Kmalloc N*alloc N*free(1024): Average=4115/1184
        Kmalloc N*alloc N*free(2048): Average=8451/1748
        Kmalloc N*alloc N*free(4096): Average=16024/2048
      
        * After
        Kmalloc N*alloc N*free(32): Average=344/792
        Kmalloc N*alloc N*free(64): Average=347/882
        Kmalloc N*alloc N*free(128): Average=390/959
        Kmalloc N*alloc N*free(256): Average=393/1067
        Kmalloc N*alloc N*free(512): Average=683/1229
        Kmalloc N*alloc N*free(1024): Average=1295/1325
        Kmalloc N*alloc N*free(2048): Average=2513/1664
        Kmalloc N*alloc N*free(4096): Average=4742/2172
      
      It shows that performance improves greatly (roughly more than 50%) for
      the object class whose size is more than 128 bytes.
      
      This patch (of 11):
      
      If we don't hold neither the slab_mutex nor the node lock, node's shared
      array cache could be freed and re-populated.  If __kmem_cache_shrink()
      is called at the same time, it will call drain_array() with n->shared
      without holding node lock so problem can happen.  This patch fix the
      situation by holding the node lock before trying to drain the shared
      array.
      
      In addition, add a debug check to confirm that n->shared access race
      doesn't exist.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      18726ca8
    • Arnd Bergmann's avatar
      kernel/padata.c: hide unused functions · 19d795b6
      Arnd Bergmann authored
      
      
      A recent cleanup removed some exported functions that were not used
      anywhere, which in turn exposed the fact that some other functions in
      the same file are only used in some configurations.
      
      We now get a warning about them when CONFIG_HOTPLUG_CPU is disabled:
      
        kernel/padata.c:670:12: error: '__padata_remove_cpu' defined but not used [-Werror=unused-function]
         static int __padata_remove_cpu(struct padata_instance *pinst, int cpu)
                    ^~~~~~~~~~~~~~~~~~~
        kernel/padata.c:650:12: error: '__padata_add_cpu' defined but not used [-Werror=unused-function]
         static int __padata_add_cpu(struct padata_instance *pinst, int cpu)
      
      This rearranges the code so the __padata_remove_cpu/__padata_add_cpu
      functions are within the #ifdef that protects the code that calls them.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Fixes: 4ba6d78c671e ("kernel/padata.c: removed unused code")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Richard Cochran <rcochran@linutronix.de>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      19d795b6
    • Richard Cochran's avatar
      kernel/padata.c: removed unused code · 815613da
      Richard Cochran authored
      
      
      By accident I stumbled across code that has never been used.  This
      driver has EXPORT_SYMBOL functions, and the only user of the code is
      pcrypt.c, but this only uses a subset of the exported symbols.
      
      According to 'git log -G', the functions, padata_set_cpumasks,
      padata_add_cpu, and padata_remove_cpu have never been used since they
      were first introduced.  This patch removes the unused code.
      
      On one 64 bit build, with CRYPTO_PCRYPT built in, the text is more than
      4k smaller.
      
        kbuild_hp> size $KBUILD_OUTPUT/vmlinux
            text    data     bss      dec hex    filename
        10566658 4678360 1122304 16367322 f9beda vmlinux
        10561984 4678360 1122304 16362648 f9ac98 vmlinux
      
      On another config, 32 bit, the saving is about 0.5k bytes.
      
        kbuild_hp-x86> size $KBUILD_OUTPUT/vmlinux
        6012005 2409513 2785280 11206798 ab008e vmlinux
        6011491 2409513 2785280 11206284 aafe8c vmlinux
      
      Signed-off-by: default avatarRichard Cochran <rcochran@linutronix.de>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      815613da
    • Guozhonghua's avatar
      ocfs2: clean up an unneeded goto in ocfs2_put_slot() · 8f9b1802
      Guozhonghua authored
      
      
      The goto is not useful in ocfs2_put_slot(), so delete it.
      
      Signed-off-by: default avatarGuozhonghua <guozhonghua@h3c.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8f9b1802
    • Jun Piao's avatar
      ocfs2: clean up unused parameter 'count' in o2hb_read_block_input() · aa6913db
      Jun Piao authored
      
      
      Clean up unused parameter 'count' in o2hb_read_block_input().
      
      Signed-off-by: default avatarJun Piao <piaojun@huawei.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aa6913db
    • piaojun's avatar
      ocfs2: clean up an unused variable 'wants_rotate' in ocfs2_truncate_rec · c14688ea
      piaojun authored
      
      
      Clean up an unused variable 'wants_rotate' in ocfs2_truncate_rec.
      
      Signed-off-by: default avatarJun Piao <piaojun@huawei.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c14688ea
    • Guozhonghua's avatar
      ocfs2: fix comment in struct ocfs2_extended_slot · 8ba44221
      Guozhonghua authored
      
      
      The comment in ocfs2_extended_slot has the offset wrong.
      
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8ba44221
    • Changbin Du's avatar
      debugobjects: insulate non-fixup logic related to static obj from fixup callbacks · b9fdac7f
      Changbin Du authored
      
      
      When activating a static object we need make sure that the object is
      tracked in the object tracker.  If it is a non-static object then the
      activation is illegal.
      
      In previous implementation, each subsystem need take care of this in
      their fixup callbacks.  Actually we can put it into debugobjects core.
      Thus we can save duplicated code, and have *pure* fixup callbacks.
      
      To achieve this, a new callback "is_static_object" is introduced to let
      the type specific code decide whether a object is static or not.  If
      yes, we take it into object tracker, otherwise give warning and invoke
      fixup callback.
      
      This change has paassed debugobjects selftest, and I also do some test
      with all debugobjects supports enabled.
      
      At last, I have a concern about the fixups that can it change the object
      which is in incorrect state on fixup? Because the 'addr' may not point
      to any valid object if a non-static object is not tracked.  Then Change
      such object can overwrite someone's memory and cause unexpected
      behaviour.  For example, the timer_fixup_activate bind timer to function
      stub_timer.
      
      Link: http://lkml.kernel.org/r/1462576157-14539-1-git-send-email-changbin.du@intel.com
      [changbin.du@intel.com: improve code comments where invoke the new is_static_object callback]
        Link: http://lkml.kernel.org/r/1462777431-8171-1-git-send-email-changbin.du@intel.com
      Signed-off-by: default avatarDu, Changbin <changbin.du@intel.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Triplett <josh@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b9fdac7f
    • Changbin Du's avatar
      Documentation: update debugobjects doc · 8bad1cd0
      Changbin Du authored
      
      
      Update documentation creangponding to change(debugobjects: make fixup
      functions return bool instead of int).
      
      Signed-off-by: default avatarDu, Changbin <changbin.du@intel.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Triplett <josh@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8bad1cd0
    • Changbin Du's avatar
      percpu_counter: update debugobjects fixup callbacks return type · d99b1d89
      Changbin Du authored
      
      
      Update the return type to use bool instead of int, corresponding to
      cheange (debugobjects: make fixup functions return bool instead of int).
      
      Signed-off-by: default avatarDu, Changbin <changbin.du@intel.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Triplett <josh@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d99b1d89
    • Changbin Du's avatar
      rcu: update debugobjects fixup callbacks return type · 3263d28e
      Changbin Du authored
      
      
      Update the return type to use bool instead of int, corresponding to
      cheange (debugobjects: make fixup functions return bool instead of int).
      
      Signed-off-by: default avatarDu, Changbin <changbin.du@intel.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Triplett <josh@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3263d28e
    • Changbin Du's avatar
      timer: update debugobjects fixup callbacks return type · e3252464
      Changbin Du authored
      
      
      Update the return type to use bool instead of int, corresponding to
      cheange (debugobjects: make fixup functions return bool instead of int).
      
      Signed-off-by: default avatarDu, Changbin <changbin.du@intel.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Triplett <josh@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e3252464
    • Changbin Du's avatar
      workqueue: update debugobjects fixup callbacks return type · 02a982a6
      Changbin Du authored
      
      
      Update the return type to use bool instead of int, corresponding to
      change (debugobjects: make fixup functions return bool instead of int)
      
      Signed-off-by: default avatarDu, Changbin <changbin.du@intel.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Triplett <josh@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      02a982a6
    • Changbin Du's avatar
      debugobjects: correct the usage of fixup call results · e7a8e78b
      Changbin Du authored
      
      
      If debug_object_fixup() return non-zero when problem has been fixed.
      But the code got it backwards, it taks 0 as fixup successfully.  So fix
      it.
      
      Signed-off-by: default avatarDu, Changbin <changbin.du@intel.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Triplett <josh@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e7a8e78b
    • Changbin Du's avatar
      debugobjects: make fixup functions return bool instead of int · b1e4d9d8
      Changbin Du authored
      
      
      I am going to introduce debugobjects infrastructure to USB subsystem.
      But before this, I found the code of debugobjects could be improved.
      This patchset will make fixup functions return bool type instead of int.
      Because fixup only need report success or no.  boolean is the 'real'
      type.
      
      This patch (of 7):
      
      The object debugging infrastructure core provides some fixup callbacks
      for the subsystem who use it.  These callbacks are called from the debug
      code whenever a problem in debug_object_init is detected.  And
      debugobjects core suppose them returns 1 when the fixup was successful,
      otherwise 0.  So the return type is boolean.
      
      A bad thing is that debug_object_fixup use the return value for
      arithmetic operation.  It confused me that what is the reall return
      type.
      
      Reading over the whole code, I found some place do use the return value
      incorrectly(see next patch).  So why use bool type instead?
      
      Signed-off-by: default avatarDu, Changbin <changbin.du@intel.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Triplett <josh@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b1e4d9d8
    • Vineet Gupta's avatar
      scripts/bloat-o-meter: print percent change · b21e91c3
      Vineet Gupta authored
      
      
      This adds an additional line of output (to reduce the chances of
      breaking any existing output parsers) which prints the total size before
      and after and the relative difference.
      
        add/remove: 39/0 grow/shrink: 12408/55 up/down: 362227/-1430 (360797)
        function                                     old     new   delta
        ext4_fill_super                            10556   12590   +2034
        _fpadd_parts                                   -    1186   +1186
        ntfs_fill_super                             5340    6164    +824
        ...
        ...
        __divdf3                                     752     386    -366
        unlzma                                      3682    3274    -408
        Total: Before=5023101, After=5383898, chg 7.000000%
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      
      Link: http://lkml.kernel.org/r/1463124110-30314-1-git-send-email-vgupta@synopsys.com
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b21e91c3
    • Kees Cook's avatar
      scripts/spelling.txt: add "fimware" misspelling · bad7de74
      Kees Cook authored
      
      
      A few instances of "fimware" instead of "firmware" were found.  Fix
      these and add it to the spelling.txt file.
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reported-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bad7de74
    • Konstantin Khlebnikov's avatar
      scripts/decode_stacktrace.sh: handle symbols in modules · 310c6dd0
      Konstantin Khlebnikov authored
      
      
      scripts/decode_stacktrace.sh presently displays module symbols as
      
      	func+0x0ff/0x5153 [module]
      
      Add a third argument: the pathname of a directory where the script
      should look for the file module.ko so that the output appears as
      
      	func (foo/bar.c:123) module
      
      Without the argument or if the module file isn't found the script prints
      such symbols as is without decoding.
      
      Signed-off-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      310c6dd0
    • Deepa Dinamani's avatar
      time: remove timespec_add_safe() · 8e4f70e2
      Deepa Dinamani authored
      
      
      All references to timespec_add_safe() now use timespec64_add_safe().
      
      The plan is to replace struct timespec references with struct timespec64
      throughout the kernel as timespec is not y2038 safe.
      
      Drop timespec_add_safe() and use timespec64_add_safe() for all
      architectures.
      
      Link: http://lkml.kernel.org/r/1461947989-21926-4-git-send-email-deepa.kernel@gmail.com
      Signed-off-by: default avatarDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8e4f70e2
    • Deepa Dinamani's avatar
      fs: poll/select/recvmmsg: use timespec64 for timeout events · 766b9f92
      Deepa Dinamani authored
      
      
      struct timespec is not y2038 safe.  Even though timespec might be
      sufficient to represent timeouts, use struct timespec64 here as the plan
      is to get rid of all timespec reference in the kernel.
      
      The patch transitions the common functions: poll_select_set_timeout()
      and select_estimate_accuracy() to use timespec64.  And, all the syscalls
      that use these functions are transitioned in the same patch.
      
      The restart block parameters for poll uses monotonic time.  Use
      timespec64 here as well to assign timeout value.  This parameter in the
      restart block need not change because this only holds the monotonic
      timestamp at which timeout should occur.  And, unsigned long data type
      should be big enough for this timestamp.
      
      The system call interfaces will be handled in a separate series.
      
      Compat interfaces need not change as timespec64 is an alias to struct
      timespec on a 64 bit system.
      
      Link: http://lkml.kernel.org/r/1461947989-21926-3-git-send-email-deepa.kernel@gmail.com
      Signed-off-by: default avatarDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      766b9f92
    • Deepa Dinamani's avatar
      time: add missing implementation for timespec64_add_safe() · bc2c53e5
      Deepa Dinamani authored
      
      
      timespec64_add_safe() has been defined in time64.h for 64 bit systems.
      But, 32 bit systems only have an extern function prototype defined.
      Provide a definition for the above function.
      
      The function will be necessary as part of y2038 changes.  struct
      timespec is not y2038 safe.  All references to timespec will be replaced
      by struct timespec64.  The function is meant to be a replacement for
      timespec_add_safe().
      
      The implementation is similar to timespec_add_safe().
      
      Link: http://lkml.kernel.org/r/1461947989-21926-2-git-send-email-deepa.kernel@gmail.com
      Signed-off-by: default avatarDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bc2c53e5
    • Jan Kara's avatar
      fsnotify: avoid spurious EMFILE errors from inotify_init() · 35e48176
      Jan Kara authored
      
      
      Inotify instance is destroyed when all references to it are dropped.
      That not only means that the corresponding file descriptor needs to be
      closed but also that all corresponding instance marks are freed (as each
      mark holds a reference to the inotify instance).  However marks are
      freed only after SRCU period ends which can take some time and thus if
      user rapidly creates and frees inotify instances, number of existing
      inotify instances can exceed max_user_instances limit although from user
      point of view there is always at most one existing instance.  Thus
      inotify_init() returns EMFILE error which is hard to justify from user
      point of view.  This problem is exposed by LTP inotify06 testcase on
      some machines.
      
      We fix the problem by making sure all group marks are properly freed
      while destroying inotify instance.  We wait for SRCU period to end in
      that path anyway since we have to make sure there is no event being
      added to the instance while we are tearing down the instance.  So it
      takes only some plumbing to allow for marks to be destroyed in that path
      as well and not from a dedicated work item.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reported-by: default avatarXiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
      Tested-by: default avatarXiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      35e48176
  2. May 19, 2016
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 2600a46e
      Linus Torvalds authored
      Pull tracing updates from Steven Rostedt:
       "This includes two new updates for the ftrace infrastructure.
      
         - With the changing of the code for filtering events by pid, from a
           list of pids to a bitmask, we can now easily implement following
           forks.  With a new tracing option "event-fork" which, when set,
           will have tasks with pids in set_event_pid, when they fork, to have
           their child pids added to set_event_pid and the child will be
           traced as well.
      
           Note, if "event-fork" is set and a task with its pid in
           set_event_pid exits, its pid will be removed from set_event_pid
      
         - The addition of Tom Zanussi's hist triggers.  This includes a very
           thorough documentatino on how to use the hist triggers with events.
           This introduces a quick and easy way to get histogram data from
           events and their fields.
      
        Some other cleanups and updates were added as well.  Like Masami
        Hiramatsu added test cases for the event trigger and hist triggers.
        Also I added a speed up of filtering by using a temp buffer when
        filters are set"
      
      * tag 'trace-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (45 commits)
        tracing: Use temp buffer when filtering events
        tracing: Remove TRACE_EVENT_FL_USE_CALL_FILTER logic
        tracing: Remove unused function trace_current_buffer_lock_reserve()
        tracing: Remove one use of trace_current_buffer_lock_reserve()
        tracing: Have trace_buffer_unlock_commit() call the _regs version with NULL
        tracing: Remove unused function trace_current_buffer_discard_commit()
        tracing: Move trace_buffer_unlock_commit{_regs}() to local header
        tracing: Fold filter_check_discard() into its only user
        tracing: Make filter_check_discard() local
        tracing: Move event_trigger_unlock_commit{_regs}() to local header
        tracing: Don't use the address of the buffer array name in copy_from_user
        tracing: Handle tracing_map_alloc_elts() error path correctly
        tracing: Add check for NULL event field when creating hist field
        tracing: checking for NULL instead of IS_ERR()
        tracing: Do not inherit event-fork option for instances
        tracing: Fix unsigned comparison to zero in hist trigger code
        kselftests/ftrace: Add a test for log2 modifier of hist trigger
        tracing: Add hist trigger 'log2' modifier
        kselftests/ftrace: Add hist trigger testcases
        kselftests/ftrace : Add event trigger testcases
        ...
      2600a46e
    • Linus Torvalds's avatar
      Merge branch 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit · 03e1aa1c
      Linus Torvalds authored
      Pull audit updates from Paul Moore:
       "Four small audit patches for 4.7.
      
        Two are simple cleanups around the audit thread management code, one
        adds a tty field to AUDIT_LOGIN events, and the final patch makes
        tty_name() usable regardless of CONFIG_TTY.
      
        Nothing controversial, and it all passes our regression test"
      
      * 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit:
        tty: provide tty_name() even without CONFIG_TTY
        audit: add tty field to LOGIN event
        audit: we don't need to __set_current_state(TASK_RUNNING)
        audit: cleanup prune_tree_thread
      03e1aa1c