Skip to content
  1. Oct 28, 2016
    • Michael Holzheu's avatar
      s390/hypfs: Use get_free_page() instead of kmalloc to ensure page alignment · 237d6e68
      Michael Holzheu authored
      Since commit d86bd1be
      
       ("mm/slub: support left redzone") it is no longer
      guaranteed that kmalloc(PAGE_SIZE) returns page aligned memory.
      
      After the above commit we get an error for diag224 because aligned
      memory is required. This leads to the following user visible error:
      
       # mount none -t s390_hypfs /sys/hypervisor/
       mount: unknown filesystem type 's390_hypfs'
      
       # dmesg | grep hypfs
       hypfs.cccfb8: The hardware system does not provide all functions
                     required by hypfs
       hypfs.7a79f0: Initialization of hypfs failed with rc=-61
      
      Fix this problem and use get_free_page() instead of kmalloc() to get
      correctly aligned memory.
      
      Cc: stable@vger.kernel.org # v3.6+
      Signed-off-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      237d6e68
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 14970f20
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "20 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        drivers/misc/sgi-gru/grumain.c: remove bogus 0x prefix from printk
        cris/arch-v32: cryptocop: print a hex number after a 0x prefix
        ipack: print a hex number after a 0x prefix
        block: DAC960: print a hex number after a 0x prefix
        fs: exofs: print a hex number after a 0x prefix
        lib/genalloc.c: start search from start of chunk
        mm: memcontrol: do not recurse in direct reclaim
        CREDITS: update credit information for Martin Kepplinger
        proc: fix NULL dereference when reading /proc/<pid>/auxv
        mm: kmemleak: ensure that the task stack is not freed during scanning
        lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB
        latent_entropy: raise CONFIG_FRAME_WARN by default
        kconfig.h: remove config_enabled() macro
        ipc: account for kmem usage on mqueue and msg
        mm/slab: improve performance of gathering slabinfo stats
        mm: page_alloc: use KERN_CONT where appropriate
        mm/list_lru.c: avoid error-path NULL pointer deref
        h8300: fix syscall restarting
        kcov: properly check if we are in an interrupt
        mm/slab: fix kmemcg cache creation delayed issue
      14970f20
    • Dimitri Sivanich's avatar
      drivers/misc/sgi-gru/grumain.c: remove bogus 0x prefix from printk · 8e819101
      Dimitri Sivanich authored
      
      
      Would like to have this be a decimal number.
      
      Link: http://lkml.kernel.org/r/20161026134746.GA30169@sgi.com
      Signed-off-by: default avatarDimitri Sivanich <sivanich@sgi.com>
      Reported-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8e819101
    • Uwe Kleine-König's avatar
      cris/arch-v32: cryptocop: print a hex number after a 0x prefix · 17a88939
      Uwe Kleine-König authored
      
      
      It makes the result hard to interpret correctly if a base 10 number is
      prefixed by 0x.  So change to a hex number.
      
      Link: http://lkml.kernel.org/r/20161026125658.25728-6-u.kleine-koenig@pengutronix.de
      Signed-off-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      17a88939
    • Uwe Kleine-König's avatar
      ipack: print a hex number after a 0x prefix · 9105585d
      Uwe Kleine-König authored
      
      
      It makes the result hard to interpret correctly if a base 10 number is
      prefixed by 0x.  So change to a hex number.
      
      Link: http://lkml.kernel.org/r/20161026125658.25728-4-u.kleine-koenig@pengutronix.de
      Signed-off-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
      Cc: Jens Taprogge <jens.taprogge@taprogge.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9105585d
    • Uwe Kleine-König's avatar
      block: DAC960: print a hex number after a 0x prefix · ee52c44d
      Uwe Kleine-König authored
      
      
      It makes the message hard to interpret correctly if a base 10 number is
      prefixed by 0x.  So change to a hex number.
      
      Link: http://lkml.kernel.org/r/20161026125658.25728-3-u.kleine-koenig@pengutronix.de
      Signed-off-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ee52c44d
    • Uwe Kleine-König's avatar
      fs: exofs: print a hex number after a 0x prefix · 14f947c8
      Uwe Kleine-König authored
      
      
      It makes the message hard to interpret correctly if a base 10 number is
      prefixed by 0x.  So change to a hex number.
      
      Link: http://lkml.kernel.org/r/20161026125658.25728-2-u.kleine-koenig@pengutronix.de
      Signed-off-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Cc: Boaz Harrosh <ooo@electrozaur.com>
      Cc: Benny Halevy <bhalevy@primarydata.com>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      14f947c8
    • Daniel Mentz's avatar
      lib/genalloc.c: start search from start of chunk · 62e931fa
      Daniel Mentz authored
      gen_pool_alloc_algo() iterates over the chunks of a pool trying to find
      a contiguous block of memory that satisfies the allocation request.
      
      The shortcut
      
      	if (size > atomic_read(&chunk->avail))
      		continue;
      
      makes the loop skip over chunks that do not have enough bytes left to
      fulfill the request.  There are two situations, though, where an
      allocation might still fail:
      
      (1) The available memory is not contiguous, i.e.  the request cannot
          be fulfilled due to external fragmentation.
      
      (2) A race condition.  Another thread runs the same code concurrently
          and is quicker to grab the available memory.
      
      In those situations, the loop calls pool->algo() to search the entire
      chunk, and pool->algo() returns some value that is >= end_bit to
      indicate that the search failed.  This return value is then assigned to
      start_bit.  The variables start_bit and end_bit describe the range that
      should be searched, and this range should be reset for every chunk that
      is searched.  Today, the code fails to reset start_bit to 0.  As a
      result, prefixes of subsequent chunks are ignored.  Memory allocations
      might fail even though there is plenty of room left in these prefixes of
      those other chunks.
      
      Fixes: 7f184275
      
       ("lib, Make gen_pool memory allocator lockless")
      Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.com
      Signed-off-by: default avatarDaniel Mentz <danielmentz@google.com>
      Reviewed-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      62e931fa
    • Johannes Weiner's avatar
      mm: memcontrol: do not recurse in direct reclaim · 89a28483
      Johannes Weiner authored
      
      
      On 4.0, we saw a stack corruption from a page fault entering direct
      memory cgroup reclaim, calling into btrfs_releasepage(), which then
      tried to allocate an extent and recursed back into a kmem charge ad
      nauseam:
      
        [...]
        btrfs_releasepage+0x2c/0x30
        try_to_release_page+0x32/0x50
        shrink_page_list+0x6da/0x7a0
        shrink_inactive_list+0x1e5/0x510
        shrink_lruvec+0x605/0x7f0
        shrink_zone+0xee/0x320
        do_try_to_free_pages+0x174/0x440
        try_to_free_mem_cgroup_pages+0xa7/0x130
        try_charge+0x17b/0x830
        memcg_charge_kmem+0x40/0x80
        new_slab+0x2d9/0x5a0
        __slab_alloc+0x2fd/0x44f
        kmem_cache_alloc+0x193/0x1e0
        alloc_extent_state+0x21/0xc0
        __clear_extent_bit+0x2b5/0x400
        try_release_extent_mapping+0x1a3/0x220
        __btrfs_releasepage+0x31/0x70
        btrfs_releasepage+0x2c/0x30
        try_to_release_page+0x32/0x50
        shrink_page_list+0x6da/0x7a0
        shrink_inactive_list+0x1e5/0x510
        shrink_lruvec+0x605/0x7f0
        shrink_zone+0xee/0x320
        do_try_to_free_pages+0x174/0x440
        try_to_free_mem_cgroup_pages+0xa7/0x130
        try_charge+0x17b/0x830
        mem_cgroup_try_charge+0x65/0x1c0
        handle_mm_fault+0x117f/0x1510
        __do_page_fault+0x177/0x420
        do_page_fault+0xc/0x10
        page_fault+0x22/0x30
      
      On later kernels, kmem charging is opt-in rather than opt-out, and that
      particular kmem allocation in btrfs_releasepage() is no longer being
      charged and won't recurse and overrun the stack anymore.
      
      But it's not impossible for an accounted allocation to happen from the
      memcg direct reclaim context, and we needed to reproduce this crash many
      times before we even got a useful stack trace out of it.
      
      Like other direct reclaimers, mark tasks in memcg reclaim PF_MEMALLOC to
      avoid recursing into any other form of direct reclaim.  Then let
      recursive charges from PF_MEMALLOC contexts bypass the cgroup limit.
      
      Link: http://lkml.kernel.org/r/20161025141050.GA13019@cmpxchg.org
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      89a28483
    • Martin Kepplinger's avatar
      CREDITS: update credit information for Martin Kepplinger · 8f72cb4e
      Martin Kepplinger authored
      
      
      Content and employer changed.
      
      Link: http://lkml.kernel.org/r/1477304102-28830-1-git-send-email-martin.kepplinger@ginzinger.com
      Signed-off-by: default avatarMartin Kepplinger <martink@posteo.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8f72cb4e
    • Leon Yu's avatar
      proc: fix NULL dereference when reading /proc/<pid>/auxv · 06b2849d
      Leon Yu authored
      Reading auxv of any kernel thread results in NULL pointer dereferencing
      in auxv_read() where mm can be NULL.  Fix that by checking for NULL mm
      and bailing out early.  This is also the original behavior changed by
      recent commit c5317167 ("proc: switch auxv to use of __mem_open()").
      
        # cat /proc/2/auxv
        Unable to handle kernel NULL pointer dereference at virtual address 000000a8
        Internal error: Oops: 17 [#1] PREEMPT SMP ARM
        CPU: 3 PID: 113 Comm: cat Not tainted 4.9.0-rc1-ARCH+ #1
        Hardware name: BCM2709
        task: ea3b0b00 task.stack: e99b2000
        PC is at auxv_read+0x24/0x4c
        LR is at do_readv_writev+0x2fc/0x37c
        Process cat (pid: 113, stack limit = 0xe99b2210)
        Call chain:
          auxv_read
          do_readv_writev
          vfs_readv
          default_file_splice_read
          splice_direct_to_actor
          do_splice_direct
          do_sendfile
          SyS_sendfile64
          ret_fast_syscall
      
      Fixes: c5317167
      
       ("proc: switch auxv to use of __mem_open()")
      Link: http://lkml.kernel.org/r/1476966200-14457-1-git-send-email-chianglungyu@gmail.com
      Signed-off-by: default avatarLeon Yu <chianglungyu@gmail.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Mateusz Guzik <mguzik@redhat.com>
      Cc: Janis Danisevskis <jdanis@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      06b2849d
    • Catalin Marinas's avatar
      mm: kmemleak: ensure that the task stack is not freed during scanning · 37df49f4
      Catalin Marinas authored
      Commit 68f24b08 ("sched/core: Free the stack early if
      CONFIG_THREAD_INFO_IN_TASK") may cause the task->stack to be freed
      during kmemleak_scan() execution, leading to either a NULL pointer fault
      (if task->stack is NULL) or kmemleak accessing already freed memory.
      
      This patch uses the new try_get_task_stack() API to ensure that the task
      stack is not freed during kmemleak stack scanning.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=173901.
      
      Fixes: 68f24b08
      
       ("sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK")
      Link: http://lkml.kernel.org/r/1476266223-14325-1-git-send-email-catalin.marinas@arm.com
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: default avatarCAI Qian <caiqian@redhat.com>
      Tested-by: default avatarCAI Qian <caiqian@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: CAI Qian <caiqian@redhat.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      37df49f4
    • Dmitry Vyukov's avatar
      lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB · 02754e0a
      Dmitry Vyukov authored
      
      
      KASAN uses stackdepot to memorize stacks for all kmalloc/kfree calls.
      Current stackdepot capacity is 16MB (1024 top level entries x 4 pages on
      second level).  Size of each stack is (num_frames + 3) * sizeof(long).
      Which gives us ~84K stacks.  This capacity was chosen empirically and it
      is enough to run kernel normally.
      
      However, when lots of configs are enabled and a fuzzer tries to maximize
      code coverage, it easily hits the limit within tens of minutes.  I've
      tested for long a time with number of top level entries bumped 4x
      (4096).  And I think I've seen overflow only once.  But I don't have all
      configs enabled and code coverage has not reached maximum yet.  So bump
      it 8x to 8192.
      
      Since we have two-level table, memory cost of this is very moderate --
      currently the top-level table is 8KB, with this patch it is 64KB, which
      is negligible under KASAN.
      
      Here is some approx math.
      
      128MB allows us to memorize ~670K stacks (assuming stack is ~200b).
      I've grepped kernel for kmalloc|kfree|kmem_cache_alloc|kmem_cache_free|
      kzalloc|kstrdup|kstrndup|kmemdup and it gives ~60K matches.  Most of
      alloc/free call sites are reachable with only one stack.  But some
      utility functions can have large fanout.  Assuming average fanout is 5x,
      total number of alloc/free stacks is ~300K.
      
      Link: http://lkml.kernel.org/r/1476458416-122131-1-git-send-email-dvyukov@google.com
      Signed-off-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Baozeng Ding <sploving1@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      02754e0a
    • Kees Cook's avatar
      latent_entropy: raise CONFIG_FRAME_WARN by default · 0e07f663
      Kees Cook authored
      
      
      When building with the latent_entropy plugin, set the default
      CONFIG_FRAME_WARN to 2048, since some __init functions have many basic
      blocks that, when instrumented by the latent_entropy plugin, grow beyond
      1024 byte stack size on 32-bit builds.
      
      Link: http://lkml.kernel.org/r/20161018211216.GA39687@beast
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Cc: Emese Revfy <re.emese@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0e07f663
    • Masahiro Yamada's avatar
      kconfig.h: remove config_enabled() macro · c0a0aba8
      Masahiro Yamada authored
      
      
      The use of config_enabled() is ambiguous.  For config options,
      IS_ENABLED(), IS_REACHABLE(), etc.  will make intention clearer.
      Sometimes config_enabled() has been used for non-config options because
      it is useful to check whether the given symbol is defined or not.
      
      I have been tackling on deprecating config_enabled(), and now is the
      time to finish this work.
      
      Some new users have appeared for v4.9-rc1, but it is trivial to replace
      them:
      
       - arch/x86/mm/kaslr.c
        replace config_enabled() with IS_ENABLED() because
        CONFIG_X86_ESPFIX64 and CONFIG_EFI are boolean.
      
       - include/asm-generic/export.h
        replace config_enabled() with __is_defined().
      
      Then, config_enabled() can be removed now.
      
      Going forward, please use IS_ENABLED(), IS_REACHABLE(), etc. for config
      options, and __is_defined() for non-config symbols.
      
      Link: http://lkml.kernel.org/r/1476616078-32252-1-git-send-email-yamada.masahiro@socionext.com
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarNicolas Pitre <nicolas.pitre@linaro.org>
      Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Paul Bolle <pebolle@tiscali.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c0a0aba8
    • Aristeu Rozanski's avatar
      ipc: account for kmem usage on mqueue and msg · 8c8d4d45
      Aristeu Rozanski authored
      
      
      When kmem accounting switched from account by default to only account if
      flagged by __GFP_ACCOUNT, IPC mqueue and messages was left out.
      
      The production use case at hand is that mqueues should be customizable
      via sysctls in Docker containers in a Kubernetes cluster.  This can only
      be safely allowed to the users of the cluster (without the risk that
      they can cause resource shortage on a node, influencing other users'
      containers) if all resources they control are bounded, i.e.  accounted
      for.
      
      Link: http://lkml.kernel.org/r/1476806075-1210-1-git-send-email-arozansk@redhat.com
      Signed-off-by: default avatarAristeu Rozanski <arozansk@redhat.com>
      Reported-by: default avatarStefan Schimanski <sttts@redhat.com>
      Acked-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Stefan Schimanski <sttts@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8c8d4d45
    • Aruna Ramakrishna's avatar
      mm/slab: improve performance of gathering slabinfo stats · 07a63c41
      Aruna Ramakrishna authored
      
      
      On large systems, when some slab caches grow to millions of objects (and
      many gigabytes), running 'cat /proc/slabinfo' can take up to 1-2
      seconds.  During this time, interrupts are disabled while walking the
      slab lists (slabs_full, slabs_partial, and slabs_free) for each node,
      and this sometimes causes timeouts in other drivers (for instance,
      Infiniband).
      
      This patch optimizes 'cat /proc/slabinfo' by maintaining a counter for
      total number of allocated slabs per node, per cache.  This counter is
      updated when a slab is created or destroyed.  This enables us to skip
      traversing the slabs_full list while gathering slabinfo statistics, and
      since slabs_full tends to be the biggest list when the cache is large,
      it results in a dramatic performance improvement.  Getting slabinfo
      statistics now only requires walking the slabs_free and slabs_partial
      lists, and those lists are usually much smaller than slabs_full.
      
      We tested this after growing the dentry cache to 70GB, and the
      performance improved from 2s to 5ms.
      
      Link: http://lkml.kernel.org/r/1472517876-26814-1-git-send-email-aruna.ramakrishna@oracle.com
      Signed-off-by: default avatarAruna Ramakrishna <aruna.ramakrishna@oracle.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      07a63c41
    • Joe Perches's avatar
      mm: page_alloc: use KERN_CONT where appropriate · 1f84a18f
      Joe Perches authored
      Recent changes to printk require KERN_CONT uses to continue logging
      messages.  So add KERN_CONT where necessary.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Fixes: 4bcc595c
      
       ("printk: reinstate KERN_CONT for printing continuation lines")
      Link: http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git.joe@perches.com
      Reported-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1f84a18f
    • Alexander Polakov's avatar
      mm/list_lru.c: avoid error-path NULL pointer deref · 1bc11d70
      Alexander Polakov authored
      
      
      As described in https://bugzilla.kernel.org/show_bug.cgi?id=177821:
      
      After some analysis it seems to be that the problem is in alloc_super().
      In case list_lru_init_memcg() fails it goes into destroy_super(), which
      calls list_lru_destroy().
      
      And in list_lru_init() we see that in case memcg_init_list_lru() fails,
      lru->node is freed, but not set NULL, which then leads list_lru_destroy()
      to believe it is initialized and call memcg_destroy_list_lru().
      memcg_destroy_list_lru() in turn can access lru->node[i].memcg_lrus,
      which is NULL.
      
      [akpm@linux-foundation.org: add comment]
      Signed-off-by: default avatarAlexander Polakov <apolyakov@beget.ru>
      Acked-by: default avatarVladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1bc11d70
    • Mark Rutland's avatar
      h8300: fix syscall restarting · 21753583
      Mark Rutland authored
      Back in commit f56141e3 ("all arches, signal: move restart_block to
      struct task_struct"), all architectures and core code were changed to
      use task_struct::restart_block.  However, when h8300 support was
      subsequently restored in v4.2, it was not updated to account for this,
      and maintains thread_info::restart_block, which is not kept in sync.
      
      This patch drops the redundant restart_block from thread_info, and moves
      h8300 to the common one in task_struct, ensuring that syscall restarting
      always works as expected.
      
      Fixes: f56141e3
      
       ("all arches, signal: move restart_block to struct task_struct")
      Link: http://lkml.kernel.org/r/1476714934-11635-1-git-send-email-mark.rutland@arm.com
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: uclinux-h8-devel@lists.sourceforge.jp
      Cc: <stable@vger.kernel.org>	[4.2+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      21753583
    • Andrey Konovalov's avatar
      kcov: properly check if we are in an interrupt · b274c0bb
      Andrey Konovalov authored
      
      
      in_interrupt() returns a nonzero value when we are either in an
      interrupt or have bh disabled via local_bh_disable().  Since we are
      interested in only ignoring coverage from actual interrupts, do a proper
      check instead of just calling in_interrupt().
      
      As a result of this change, kcov will start to collect coverage from
      within local_bh_disable()/local_bh_enable() sections.
      
      Link: http://lkml.kernel.org/r/1476115803-20712-1-git-send-email-andreyknvl@google.com
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Acked-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Nicolai Stange <nicstange@gmail.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b274c0bb
    • Joonsoo Kim's avatar
      mm/slab: fix kmemcg cache creation delayed issue · 86d9f485
      Joonsoo Kim authored
      There is a bug report that SLAB makes extreme load average due to over
      2000 kworker thread.
      
        https://bugzilla.kernel.org/show_bug.cgi?id=172981
      
      This issue is caused by kmemcg feature that try to create new set of
      kmem_caches for each memcg.  Recently, kmem_cache creation is slowed by
      synchronize_sched() and futher kmem_cache creation is also delayed since
      kmem_cache creation is synchronized by a global slab_mutex lock.  So,
      the number of kworker that try to create kmem_cache increases quietly.
      
      synchronize_sched() is for lockless access to node's shared array but
      it's not needed when a new kmem_cache is created.  So, this patch rules
      out that case.
      
      Fixes: 801faf0d
      
       ("mm/slab: lockless decision to grow cache")
      Link: http://lkml.kernel.org/r/1475734855-4837-1-git-send-email-iamjoonsoo.kim@lge.com
      Reported-by: default avatarDoug Smythies <dsmythies@telus.net>
      Tested-by: default avatarDoug Smythies <dsmythies@telus.net>
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      86d9f485
    • Linus Torvalds's avatar
      Allow KASAN and HOTPLUG_MEMORY to co-exist when doing build testing · 67463e54
      Linus Torvalds authored
      No, KASAN may not be able to co-exist with HOTPLUG_MEMORY at runtime,
      but for build testing there is no reason not to allow them together.
      
      This hopefully means better build coverage and fewer embarrasing silly
      problems like the one fixed by commit 9db4f36e
      
       ("mm: remove unused
      variable in memory hotplug") in the future.
      
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      67463e54
    • Linus Torvalds's avatar
      mm: remove unused variable in memory hotplug · 9db4f36e
      Linus Torvalds authored
      When I removed the per-zone bitlock hashed waitqueues in commit
      9dcb8b68
      
       ("mm: remove per-zone hashtable of bitlock waitqueues"), I
      removed all the magic hotplug memory initialization of said waitqueues
      too.
      
      But when I actually _tested_ the resulting build, I stupidly assumed
      that "allmodconfig" would enable memory hotplug.  And it doesn't,
      because it enables KASAN instead, which then disables hotplug memory
      support.
      
      As a result, my build test of the per-zone waitqueues was totally
      broken, and I didn't notice that the compiler warns about the now unused
      iterator variable 'i'.
      
      I guess I should be happy that that seems to be the worst breakage from
      my clearly horribly failed test coverage.
      
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9db4f36e
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 4e68af0b
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "I2C has some driver bugfixes, module autoload fixes, and driver
        enablement on some architectures"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: imx: defer probe if bus recovery GPIOs are not ready
        i2c: designware: Avoid aborted transfers with fast reacting I2C slaves
        i2c: i801: Fix I2C Block Read on 8-Series/C220 and later
        i2c: xgene: Avoid dma_buffer overrun
        i2c: digicolor: Fix module autoload
        i2c: xlr: Fix module autoload for OF registration
        i2c: xlp9xx: Fix module autoload
        i2c: jz4780: Fix module autoload
        i2c: allow configuration of imx driver for ColdFire architecture
        i2c: mark device nodes only in case of successful instantiation
        i2c: rk3x: Give the tuning value 0 during rk3x_i2c_v0_calc_timings
        i2c: hix5hd2: allow build with ARCH_HISI
      4e68af0b
    • Linus Torvalds's avatar
      Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux · 7f2145b0
      Linus Torvalds authored
      Pull thermal updates from Zhang Rui:
       "The latest Thermal Management updates for v4.9-rc3:
      
         - Fix a regression introduced by commit
           b721ca0d(thermal/powerclamp: remove cpu whitelist), that
           powerclamp driver checks cpu support in a wrong way. From: Eric
           Ernst.
      
         - Fix a problem that intel_pch_thermal driver misses passive trip
           point when the PCH thermal device has an ACPI companion device
           associated. From: Srinivas Pandruvada.
      
         - Add missing support for Haswell PCH thermal sensor. From: Srinivas
           Pandruvada"
      
      * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
        thermal/powerclamp: correct cpu support check
        thermal: intel_pch_thermal: Enable Haswell PCH
        thermal: intel_pch_thermal: Add an ACPI passive trip
      7f2145b0
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 55bea71e
      Linus Torvalds authored
      Pull s390 fixes from Martin Schwidefsky:
       "A few more s390 patches for 4.9:
         - a fix for an overflow in the dasd driver reported by UBSAN
         - fix a regression and add hotplug memory to the zone movable again
         - add ignore defines for the pkey system calls
         - fix the ouput of the merged stack tracer
         - replace printk with pr_cont in arch/s390 where appropriate
         - remove the arch specific return_address function again
         - ignore reserved channel paths at boot time
         - add a missing hugetlb_bad_size call to the arch backend"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/mm: fix zone calculation in arch_add_memory()
        s390/dumpstack: use pr_cont within show_stack and die
        s390/dumpstack: get rid of return_address again
        s390/disassambler: use pr_cont where appropriate
        s390/dumpstack: use pr_cont where appropriate
        s390/dumpstack: restore reliable indicator for call traces
        s390/mm: use hugetlb_bad_size()
        s390/cio: don't register chpids in reserved state
        s390: ignore pkey system calls
        s390/dasd: avoid undefined behaviour
      55bea71e
    • Linus Torvalds's avatar
      Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux · 7618c6a1
      Linus Torvalds authored
      Pull module maintainership updates from Rusty Russell:
       "(Quoting from the MAINTAINERS commit:)
      
        Being a Linux kernel maintainer has been my proudest professional
        accomplishment, spanning the last 19 years. But now we have a surfeit
        of excellent hackers, and I can hand this over without regret.
      
        I'll still be around as co-maintainer for another cycle, but Jessica
        is now the one to convince if you want your patches applied. She
        rocks, and is far more timely than me too!"
      
      * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
        MAINTAINERS: Begin module maintainer transition
      7618c6a1
    • Linus Torvalds's avatar
      Merge tag 'for-linus-4.9-rc2-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux · e3300ffe
      Linus Torvalds authored
      Pull oreangefs updates from Mike Marshall:
       "A couple of orangefs cleanups sent in by other developers:
      
         - use d_fsdata instead of d_time (Miklos Szeredi)
      
         - use file_inode(file) instead of file->f_path.dentry->d_inode (Amir
           Goldstein)"
      
      * tag 'for-linus-4.9-rc2-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
        orangefs: don't use d_time
        orangefs: user file_inode() where it is due
      e3300ffe
    • Linus Torvalds's avatar
      Merge tag 'xfs-fixes-for-linus-4.9-rc3' of... · e890038e
      Linus Torvalds authored
      Merge tag 'xfs-fixes-for-linus-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
      
      Pull xfs fixes from Dave Chinner:
       "This update contains fixes for most of the outstanding regressions
        introduced with the 4.9-rc1 XFS merge. There is also a fix for an
        iomap bug, too.
      
        This is a quite a bit larger than I'd prefer for a -rc3, but most of
        the change comes from cleaning up the new reflink copy on write code;
        it's much simpler and easier to understand now. These changes fixed
        several bugs in the new code, and it wasn't clear that there was an
        easier/simpler way to fix them. The rest of the fixes are the usual
        size you'd expect at this stage.
      
        I've left the commits to soak in linux-next for a some extra time
        because of the size before asking you to pull, no new problems with
        them have been reported so I think it's all OK.
      
        Summary:
         - iomap page offset masking fix for page faults
         - add IOMAP_REPORT to distinguish between read and fiemap map
           requests
         - cleanups to new shared data extent code
         - fix mount active status on failed log recovery
         - fix broken dquots in a buffer calculation
         - fix locking order issues and merge xfs_reflink_remap_range and
           xfs_file_share_range
         - rework unmapping of CoW extents and remove now unused functions
         - clean state when CoW is done"
      
      * tag 'xfs-fixes-for-linus-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (25 commits)
        xfs: clear cowblocks tag when cow fork is emptied
        xfs: fix up inode cowblocks tracking tracepoints
        fs: Do to trim high file position bits in iomap_page_mkwrite_actor
        xfs: remove xfs_bunmapi_cow
        xfs: optimize xfs_reflink_end_cow
        xfs: optimize xfs_reflink_cancel_cow_blocks
        xfs: refactor xfs_bunmapi_cow
        xfs: optimize writes to reflink files
        xfs: don't bother looking at the refcount tree for reads
        xfs: handle "raw" delayed extents xfs_reflink_trim_around_shared
        xfs: add xfs_trim_extent
        iomap: add IOMAP_REPORT
        xfs: merge xfs_reflink_remap_range and xfs_file_share_range
        xfs: remove xfs_file_wait_for_io
        xfs: move inode locking from xfs_reflink_remap_range to xfs_file_share_range
        xfs: fix the same_inode check in xfs_file_share_range
        xfs: remove the same fs check from xfs_file_share_range
        libxfs: v3 inodes are only valid on crc-enabled filesystems
        libxfs: clean up _calc_dquots_per_chunk
        xfs: unset MS_ACTIVE if mount fails
        ...
      e890038e
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 18c2152d
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Two small fixes: one is a fatal section mismatch (reference to init
        after it's discarded) and the other two are iscsi locking fixes"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: NCR5380: no longer mark irq probing as __init
        scsi: be2iscsi: Replace _bh with _irqsave/irqrestore
        scsi: libiscsi: Fix locking in __iscsi_conn_send_pdu
      18c2152d
    • Linus Torvalds's avatar
      Merge branch 'for-4.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata · 4a3c390c
      Linus Torvalds authored
      Pull libata fixes from Tejun Heo:
       "The AHCI MSI handling change in rc1 was a bit broken and caused disk
        probing failures on some machines.  These three patches should fix the
        issues"
      
      David Howells comments:
       "My test machine fell foul of this using a PCIe M.2-attached SSD card.
        The patches fix it for me"
      
      * 'for-4.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
        ahci: fix the single MSI-X case in ahci_init_one
        ahci: fix nvec check
        ahci: only try to use multi-MSI mode if there is more than 1 port
      4a3c390c
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 9c953d63
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A set of fixes for this series, most notably the fix for the blk-mq
        software queue regression in from this merge window.
      
        Apart from that, a fix for an unlikely hang if a queue is flooded with
        FUA requests from Ming, and a few small fixes for nbd and badblocks.
        Lastly, a rename update for the proc softirq output, since the block
        polling code was made generic"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        blk-mq: update hardware and software queues for sleeping alloc
        block: flush: fix IO hang in case of flood fua req
        nbd: fix incorrect unlock of nbd->sock_lock in sock_shutdown
        badblocks: badblocks_set/clear update unacked_exist
        softirq: Display IRQ_POLL for irq-poll statistics
      9c953d63
    • Linus Torvalds's avatar
      mm: remove per-zone hashtable of bitlock waitqueues · 9dcb8b68
      Linus Torvalds authored
      
      
      The per-zone waitqueues exist because of a scalability issue with the
      page waitqueues on some NUMA machines, but it turns out that they hurt
      normal loads, and now with the vmalloced stacks they also end up
      breaking gfs2 that uses a bit_wait on a stack object:
      
           wait_on_bit(&gh->gh_iflags, HIF_WAIT, TASK_UNINTERRUPTIBLE)
      
      where 'gh' can be a reference to the local variable 'mount_gh' on the
      stack of fill_super().
      
      The reason the per-zone hash table breaks for this case is that there is
      no "zone" for virtual allocations, and trying to look up the physical
      page to get at it will fail (with a BUG_ON()).
      
      It turns out that I actually complained to the mm people about the
      per-zone hash table for another reason just a month ago: the zone lookup
      also hurts the regular use of "unlock_page()" a lot, because the zone
      lookup ends up forcing several unnecessary cache misses and generates
      horrible code.
      
      As part of that earlier discussion, we had a much better solution for
      the NUMA scalability issue - by just making the page lock have a
      separate contention bit, the waitqueue doesn't even have to be looked at
      for the normal case.
      
      Peter Zijlstra already has a patch for that, but let's see if anybody
      even notices.  In the meantime, let's fix the actual gfs2 breakage by
      simplifying the bitlock waitqueues and removing the per-zone issue.
      
      Reported-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Tested-by: default avatarBob Peterson <rpeterso@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9dcb8b68
  2. Oct 27, 2016
  3. Oct 26, 2016
  4. Oct 25, 2016