Skip to content
  1. Dec 15, 2016
    • Francisco Blas Izquierdo Riera (klondike)'s avatar
      initramfs: select builtin initram compression algorithm on KConfig instead of Makefile · 35e669e1
      
      
      Move the current builtin initram compression algorithm selection from
      the Makefile into the INITRAMFS_COMPRESSION variable.  This makes
      deciding algorithm precedence easier and would allow for overrides if
      new algorithms want to be tested.
      
      Link: http://lkml.kernel.org/r/57EAD769.1090401@klondike.es
      Signed-off-by: default avatarFrancisco Blas Izquierdo Riera (klondike) <klondike@klondike.es>
      Cc: P J P <ppandit@redhat.com>
      Cc: Paul Bolle <pebolle@tiscali.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      35e669e1
    • Petr Mladek's avatar
      kdb: call vkdb_printf() from vprintk_default() only when wanted · 34aaff40
      Petr Mladek authored
      
      
      kdb_trap_printk allows to pass normal printk() messages to kdb via
      vkdb_printk().  For example, it is used to get backtrace using the
      classic show_stack(), see kdb_show_stack().
      
      vkdb_printf() tries to avoid a potential infinite loop by disabling the
      trap.  But this approach is racy, for example:
      
      CPU1					CPU2
      
      vkdb_printf()
        // assume that kdb_trap_printk == 0
        saved_trap_printk = kdb_trap_printk;
        kdb_trap_printk = 0;
      
      					kdb_show_stack()
      					  kdb_trap_printk++;
      
      Problem1: Now, a nested printk() on CPU0 calls vkdb_printf()
      	  even when it should have been disabled. It will not
      	  cause a deadlock but...
      
         // using the outdated saved value: 0
         kdb_trap_printk = saved_trap_printk;
      
      					  kdb_trap_printk--;
      
      Problem2: Now, kdb_trap_printk == -1 and will stay like this.
         It means that all messages will get passed to kdb from
         now on.
      
      This patch removes the racy saved_trap_printk handling.  Instead, the
      recursion is prevented by a check for the locked CPU.
      
      The solution is still kind of racy.  A non-related printk(), from
      another process, might get trapped by vkdb_printf().  And the wanted
      printk() might not get trapped because kdb_printf_cpu is assigned.  But
      this problem existed even with the original code.
      
      A proper solution would be to get_cpu() before setting kdb_trap_printk
      and trap messages only from this CPU.  I am not sure if it is worth the
      effort, though.
      
      In fact, the race is very theoretical.  When kdb is running any of the
      commands that use kdb_trap_printk there is a single active CPU and the
      other CPUs should be in a holding pen inside kgdb_cpu_enter().
      
      The only time this is violated is when there is a timeout waiting for
      the other CPUs to report to the holding pen.
      
      Finally, note that the situation is a bit schizophrenic.  vkdb_printf()
      explicitly allows recursion but only from KDB code that calls
      kdb_printf() directly.  On the other hand, the generic printk()
      recursion is not allowed because it might cause an infinite loop.  This
      is why we could not hide the decision inside vkdb_printf() easily.
      
      Link: http://lkml.kernel.org/r/1480412276-16690-4-git-send-email-pmladek@suse.com
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Cc: Daniel Thompson <daniel.thompson@linaro.org>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      34aaff40
    • Petr Mladek's avatar
      kdb: properly synchronize vkdb_printf() calls with other CPUs · d5d8d3d0
      Petr Mladek authored
      
      
      kdb_printf_lock does not prevent other CPUs from entering the critical
      section because it is ignored when KDB_STATE_PRINTF_LOCK is set.
      
      The problematic situation might look like:
      
      CPU0					CPU1
      
      vkdb_printf()
        if (!KDB_STATE(PRINTF_LOCK))
          KDB_STATE_SET(PRINTF_LOCK);
          spin_lock_irqsave(&kdb_printf_lock, flags);
      
      					vkdb_printf()
      					  if (!KDB_STATE(PRINTF_LOCK))
      
      BANG: The PRINTF_LOCK state is set and CPU1 is entering the critical
      section without spinning on the lock.
      
      The problem is that the code tries to implement locking using two state
      variables that are not handled atomically.  Well, we need a custom
      locking because we want to allow reentering the critical section on the
      very same CPU.
      
      Let's use solution from Petr Zijlstra that was proposed for a similar
      scenario, see
      https://lkml.kernel.org/r/20161018171513.734367391@infradead.org
      
      This patch uses the same trick with cmpxchg().  The only difference is
      that we want to handle only recursion from the same context and
      therefore we disable interrupts.
      
      In addition, KDB_STATE_PRINTF_LOCK is removed.  In fact, we are not able
      to set it a non-racy way.
      
      Link: http://lkml.kernel.org/r/1480412276-16690-3-git-send-email-pmladek@suse.com
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Reviewed-by: default avatarDaniel Thompson <daniel.thompson@linaro.org>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d5d8d3d0
    • Petr Mladek's avatar
      kdb: remove unused kdb_event handling · d1bd8ead
      Petr Mladek authored
      
      
      kdb_event state variable is only set but never checked in the kernel
      code.
      
      http://www.spinics.net/lists/kdb/msg01733.html suggests that this
      variable affected WARN_CONSOLE_UNLOCKED() in the original
      implementation.  But this check never went upstream.
      
      The semantic is unclear and racy.  The value is updated after the
      kdb_printf_lock is acquired and after it is released.  It should be
      symmetric at minimum.  The value should be manipulated either inside or
      outside the locked area.
      
      Fortunately, it seems that the original function is gone and we could
      simply remove the state variable.
      
      Link: http://lkml.kernel.org/r/1480412276-16690-2-git-send-email-pmladek@suse.com
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Suggested-by: default avatarDaniel Thompson <daniel.thompson@linaro.org>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d1bd8ead
    • Douglas Anderson's avatar
      kernel/debug/debug_core.c: more properly delay for secondary CPUs · 2d13bb64
      Douglas Anderson authored
      
      
      We've got a delay loop waiting for secondary CPUs.  That loop uses
      loops_per_jiffy.  However, loops_per_jiffy doesn't actually mean how
      many tight loops make up a jiffy on all architectures.  It is quite
      common to see things like this in the boot log:
      
        Calibrating delay loop (skipped), value calculated using timer
        frequency.. 48.00 BogoMIPS (lpj=24000)
      
      In my case I was seeing lots of cases where other CPUs timed out
      entering the debugger only to print their stack crawls shortly after the
      kdb> prompt was written.
      
      Elsewhere in kgdb we already use udelay(), so that should be safe enough
      to use to implement our timeout.  We'll delay 1 ms for 1000 times, which
      should give us a full second of delay (just like the old code wanted)
      but allow us to notice that we're done every 1 ms.
      
      [akpm@linux-foundation.org: simplifications, per Daniel]
      Link: http://lkml.kernel.org/r/1477091361-2039-1-git-send-email-dianders@chromium.org
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reviewed-by: default avatarDaniel Thompson <daniel.thompson@linaro.org>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Brian Norris <briannorris@chromium.org>
      Cc: <stable@vger.kernel.org>	[4.0+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2d13bb64
    • Kefeng Wang's avatar
      kcov: add more missing includes · db862358
      Kefeng Wang authored
      
      
      It is fragile that some definitions acquired via transitive
      dependencies, as shown in below:
      
      atomic_*        (<linux/atomic.h>)
      ENOMEM/EN*      (<linux/errno.h>)
      EXPORT_SYMBOL   (<linux/export.h>)
      device_initcall (<linux/init.h>)
      preempt_*       (<linux/preempt.h>)
      
      Include them to prevent possible issues.
      
      Link: http://lkml.kernel.org/r/1481163221-40170-1-git-send-email-wangkefeng.wang@huawei.com
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Suggested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      db862358
    • Andreas Platschek's avatar
      Kconfig: lib/Kconfig.ubsan fix reference to ubsan documentation · 04625547
      Andreas Platschek authored
      
      
      Documenation/ubsan.txt was moved to Documentation/dev-tools/ubsan.rst,
      this fixes the reference.
      
      Link: http://lkml.kernel.org/r/1476698152-29340-3-git-send-email-andreas.platschek@opentech.at
      Signed-off-by: default avatarAndreas Platschek <andreas.platschek@opentech.at>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      04625547
    • Andreas Platschek's avatar
      Kconfig: lib/Kconfig.debug: fix references to Documenation · 700199b0
      Andreas Platschek authored
      
      
      Documentation on development tools was moved to Documentation/devl-tools
      and sphinxified (renamed from .txt to .rst).
      
      References in lib/Kconfig.debug need to be updated to the new location.
      
      Link: http://lkml.kernel.org/r/1476698152-29340-2-git-send-email-andreas.platschek@opentech.at
      Signed-off-by: default avatarAndreas Platschek <andreas.platschek@opentech.at>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      700199b0
    • Dan Carpenter's avatar
      relay: check array offset before using it · 9a29d0fb
      Dan Carpenter authored
      Smatch complains that we started using the array offset before we
      checked that it was valid.
      
      Fixes: 017c59c0
      
       ('relay: Use per CPU constructs for the relay channel buffer pointers')
      Link: http://lkml.kernel.org/r/20161013084947.GC16198@mwanda
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a29d0fb
    • Alexander Duyck's avatar
      igb: update code to better handle incrementing page count · bd4171a5
      Alexander Duyck authored
      
      
      Update the driver code so that we do bulk updates of the page reference
      count instead of just incrementing it by one reference at a time.  The
      advantage to doing this is that we cut down on atomic operations and
      this in turn should give us a slight improvement in cycles per packet.
      In addition if we eventually move this over to using build_skb the gains
      will be more noticeable.
      
      Link: http://lkml.kernel.org/r/20161110113616.76501.17072.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Cc: Helge Deller <deller@gmx.de>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Keguang Zhang <keguang.zhang@gmail.com>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Tobias Klauser <tklauser@distanz.ch>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bd4171a5
    • Alexander Duyck's avatar
      igb: update driver to make use of DMA_ATTR_SKIP_CPU_SYNC · 5be59554
      Alexander Duyck authored
      
      
      The ARM architecture provides a mechanism for deferring cache line
      invalidation in the case of map/unmap.  This patch makes use of this
      mechanism to avoid unnecessary synchronization.
      
      A secondary effect of this change is that the portion of the page that
      has been synchronized for use by the CPU should be writable and could be
      passed up the stack (at least on ARM).
      
      The last bit that occurred to me is that on architectures where the
      sync_for_cpu call invalidates cache lines we were prefetching and then
      invalidating the first 128 bytes of the packet.  To avoid that I have
      moved the sync up to before we perform the prefetch and allocate the
      skbuff so that we can actually make use of it.
      
      Link: http://lkml.kernel.org/r/20161110113611.76501.98897.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Cc: Helge Deller <deller@gmx.de>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Keguang Zhang <keguang.zhang@gmail.com>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Tobias Klauser <tklauser@distanz.ch>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5be59554
    • Alexander Duyck's avatar
      mm: add support for releasing multiple instances of a page · 44fdffd7
      Alexander Duyck authored
      
      
      Add a function that allows us to batch free a page that has multiple
      references outstanding.  Specifically this function can be used to drop
      a page being used in the page frag alloc cache.  With this drivers can
      make use of functionality similar to the page frag alloc cache without
      having to do any workarounds for the fact that there is no function that
      frees multiple references.
      
      Link: http://lkml.kernel.org/r/20161110113606.76501.70752.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Cc: Helge Deller <deller@gmx.de>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Keguang Zhang <keguang.zhang@gmail.com>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Tobias Klauser <tklauser@distanz.ch>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      44fdffd7
    • Alexander Duyck's avatar
      dma: add calls for dma_map_page_attrs and dma_unmap_page_attrs · 0495c3d3
      Alexander Duyck authored
      
      
      Add support for mapping and unmapping a page with attributes.
      
      The primary use for this is currently to allow for us to pass the
      DMA_ATTR_SKIP_CPU_SYNC attribute when mapping and unmapping a page.  On
      some architectures such as ARM the synchronization has significant
      overhead and if we are already taking care of the sync_for_cpu and
      sync_for_device from the driver there isn't much need to handle this in
      the map/unmap calls as well.
      
      Link: http://lkml.kernel.org/r/20161110113601.76501.46095.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0495c3d3
    • Alexander Duyck's avatar
      arch/xtensa: add option to skip DMA sync as a part of mapping · 4bfa135a
      Alexander Duyck authored
      
      
      This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
      avoid invoking cache line invalidation if the driver will just handle it
      via a sync_for_cpu or sync_for_device call.
      
      Link: http://lkml.kernel.org/r/20161110113555.76501.52536.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4bfa135a
    • Alexander Duyck's avatar
      arch/tile: add option to skip DMA sync as a part of map and unmap · 33c77e53
      Alexander Duyck authored
      
      
      This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
      avoid invoking cache line invalidation if the driver will just handle it
      via a sync_for_cpu or sync_for_device call.
      
      Link: http://lkml.kernel.org/r/20161110113550.76501.73060.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      33c77e53
    • Alexander Duyck's avatar
      arch/sparc: add option to skip DMA sync as a part of map and unmap · 68bbc28f
      Alexander Duyck authored
      
      
      This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
      avoid invoking cache line invalidation if the driver will just handle it
      via a sync_for_cpu or sync_for_device call.
      
      Link: http://lkml.kernel.org/r/20161110113544.76501.40008.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      68bbc28f
    • Alexander Duyck's avatar
      arch/sh: add option to skip DMA sync as a part of mapping · a0812001
      Alexander Duyck authored
      
      
      This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
      avoid invoking cache line invalidation if the driver will just handle it
      via a sync_for_cpu or sync_for_device call.
      
      Link: http://lkml.kernel.org/r/20161110113539.76501.6539.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a0812001
    • Alexander Duyck's avatar
      arch/powerpc: add option to skip DMA sync as a part of mapping · 6f774809
      Alexander Duyck authored
      
      
      This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
      avoid invoking cache line invalidation if the driver will just handle it
      via a sync_for_cpu or sync_for_device call.
      
      Link: http://lkml.kernel.org/r/20161110113534.76501.86492.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6f774809
    • Alexander Duyck's avatar
      arch/parisc: add option to skip DMA sync as a part of map and unmap · f50a2bd2
      Alexander Duyck authored
      
      
      This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
      avoid invoking cache line invalidation if the driver will just handle it
      via a sync_for_cpu or sync_for_device call.
      
      Link: http://lkml.kernel.org/r/20161110113529.76501.44762.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f50a2bd2
    • Alexander Duyck's avatar
      arch/openrisc: add option to skip DMA sync as a part of mapping · 043b42bc
      Alexander Duyck authored
      
      
      This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
      avoid invoking cache line invalidation if the driver will just handle it
      via a sync_for_cpu or sync_for_device call.
      
      Link: http://lkml.kernel.org/r/20161110113524.76501.87966.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      043b42bc
    • Alexander Duyck's avatar
      arch/nios2: add option to skip DMA sync as a part of map and unmap · abdf4799
      Alexander Duyck authored
      
      
      This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
      avoid invoking cache line invalidation if the driver will just handle it
      via a sync_for_cpu or sync_for_device call.
      
      Link: http://lkml.kernel.org/r/20161110113518.76501.52225.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Reviewed-by: default avatarTobias Klauser <tklauser@distanz.ch>
      Cc: Ley Foon Tan <lftan@altera.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      abdf4799
    • Alexander Duyck's avatar
      arch/mips: add option to skip DMA sync as a part of map and unmap · 9f318d47
      Alexander Duyck authored
      
      
      This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
      avoid invoking cache line invalidation if the driver will just handle it
      via a sync_for_cpu or sync_for_device call.
      
      Link: http://lkml.kernel.org/r/20161110113513.76501.32321.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Keguang Zhang <keguang.zhang@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f318d47
    • Alexander Duyck's avatar
      arch/microblaze: add option to skip DMA sync as a part of map and unmap · 98ac2fc2
      Alexander Duyck authored
      
      
      This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
      avoid invoking cache line invalidation if the driver will just handle it
      via a sync_for_cpu or sync_for_device call.
      
      Link: http://lkml.kernel.org/r/20161110113508.76501.77583.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      98ac2fc2
    • Alexander Duyck's avatar
      arch/metag: add option to skip DMA sync as a part of map and unmap · 38bdbdc7
      Alexander Duyck authored
      
      
      This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
      avoid invoking cache line invalidation if the driver will just handle it
      via a sync_for_cpu or sync_for_device call.
      
      Link: http://lkml.kernel.org/r/20161110113503.76501.80809.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      38bdbdc7
    • Alexander Duyck's avatar
      arch/m68k: add option to skip DMA sync as a part of mapping · 5140d234
      Alexander Duyck authored
      
      
      This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
      avoid invoking cache line invalidation if the driver will just handle it
      later via a sync_for_cpu or sync_for_device call.
      
      Link: http://lkml.kernel.org/r/20161110113457.76501.77603.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5140d234
    • Alexander Duyck's avatar
      arch/hexagon: Add option to skip DMA sync as a part of mapping · b8a346dd
      Alexander Duyck authored
      
      
      This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
      avoid invoking cache line invalidation if the driver will just handle it
      later via a sync_for_cpu or sync_for_device call.
      
      Link: http://lkml.kernel.org/r/20161110113452.76501.45864.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b8a346dd
    • Alexander Duyck's avatar
      arch/frv: add option to skip sync on DMA map · 34f8be79
      Alexander Duyck authored
      
      
      The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the
      DMA APIs in the arch/arm folder.  This change is meant to correct that
      so that we get consistent behavior.
      
      Link: http://lkml.kernel.org/r/20161110113447.76501.93160.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      34f8be79
    • Alexander Duyck's avatar
      arch/c6x: add option to skip sync on DMA map and unmap · 64c596b5
      Alexander Duyck authored
      
      
      This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
      avoid invoking cache line invalidation if the driver will just handle it
      later via a sync_for_cpu or sync_for_device call.
      
      Link: http://lkml.kernel.org/r/20161110113442.76501.7673.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarMark Salter <msalter@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      64c596b5
    • Alexander Duyck's avatar
      arch/blackfin: add option to skip sync on DMA map · 8c16a2e2
      Alexander Duyck authored
      
      
      The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the
      DMA APIs in the arch/arm folder.  This change is meant to correct that
      so that we get consistent behavior.
      
      Link: http://lkml.kernel.org/r/20161110113436.76501.13386.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Cc: Steven Miao <realmz6@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8c16a2e2
    • Alexander Duyck's avatar
      arch/avr32: add option to skip sync on DMA map · e8b4762c
      Alexander Duyck authored
      
      
      The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the
      DMA APIs in the arch/arm folder.  This change is meant to correct that
      so that we get consistent behavior.
      
      Link: http://lkml.kernel.org/r/20161110113430.76501.79737.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarHans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e8b4762c
    • Alexander Duyck's avatar
      arch/arm: add option to skip sync on DMA map and unmap · fc1b138d
      Alexander Duyck authored
      
      
      The use of DMA_ATTR_SKIP_CPU_SYNC was not consistent across all of the
      DMA APIs in the arch/arm folder.  This change is meant to correct that
      so that we get consistent behavior.
      
      Link: http://lkml.kernel.org/r/20161110113424.76501.2715.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fc1b138d
    • Alexander Duyck's avatar
      arch/arc: add option to skip sync on DMA mapping · 8a3385d2
      Alexander Duyck authored
      
      
      Patch series "Add support for DMA writable pages being writable by the
      network stack", v3.
      
      The first 19 patches in the set add support for the DMA attribute
      DMA_ATTR_SKIP_CPU_SYNC on multiple platforms/architectures.  This is
      needed so that we can flag the calls to dma_map/unmap_page so that we do
      not invalidate cache lines that do not currently belong to the device.
      Instead we have to take care of this in the driver via a call to
      sync_single_range_for_cpu prior to freeing the Rx page.
      
      Patch 20 adds support for dma_map_page_attrs and dma_unmap_page_attrs so
      that we can unmap and map a page using the DMA_ATTR_SKIP_CPU_SYNC
      attribute.
      
      Patch 21 adds support for freeing a page that has multiple references
      being held by a single caller.  This way we can free page fragments that
      were allocated by a given driver.
      
      The last 2 patches use these updates in the igb driver, and lay the
      groundwork to allow for us to reimplement the use of build_skb.
      
      This patch (of 23):
      
      This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
      avoid invoking cache line invalidation if the driver will just handle it
      later via a sync_for_cpu or sync_for_device call.
      
      Link: http://lkml.kernel.org/r/20161110113419.76501.38491.stgit@ahduyck-blue-test.jf.intel.com
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8a3385d2
    • Tetsuo Handa's avatar
      sysctl: add KERN_CONT to deprecated_sysctl_warning() · 7560ef39
      Tetsuo Handa authored
      
      
      Do not break lines while printk()ing values.
      
        kernel: warning: process `tomoyo_file_tes' used the deprecated sysctl system call with
        kernel: 3.
        kernel: 5.
        kernel: 56.
        kernel:
      
      Link: http://lkml.kernel.org/r/1480814833-4976-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7560ef39
    • zhong jiang's avatar
      kexec: add cond_resched into kimage_alloc_crash_control_pages · 8e53c073
      zhong jiang authored
      A soft lookup will occur when I run trinity in syscall kexec_load.  the
      corresponding stack information is as follows.
      
        BUG: soft lockup - CPU#6 stuck for 22s! [trinity-c6:13859]
        Kernel panic - not syncing: softlockup: hung tasks
        CPU: 6 PID: 13859 Comm: trinity-c6 Tainted: G           O L ----V-------   3.10.0-327.28.3.35.zhongjiang.x86_64 #1
        Hardware name: Huawei Technologies Co., Ltd. Tecal BH622 V2/BC01SRSA0, BIOS RMIBV386 06/30/2014
        Call Trace:
         <IRQ>  dump_stack+0x19/0x1b
         panic+0xd8/0x214
         watchdog_timer_fn+0x1cc/0x1e0
         __hrtimer_run_queues+0xd2/0x260
         hrtimer_interrupt+0xb0/0x1e0
         ? call_softirq+0x1c/0x30
         local_apic_timer_interrupt+0x37/0x60
         smp_apic_timer_interrupt+0x3f/0x60
         apic_timer_interrupt+0x6d/0x80
         <EOI>  ? kimage_alloc_control_pages+0x80/0x270
         ? kmem_cache_alloc_trace+0x1ce/0x1f0
         ? do_kimage_alloc_init+0x1f/0x90
         kimage_alloc_init+0x12a/0x180
         SyS_kexec_load+0x20a/0x260
         system_call_...
      8e53c073
    • Baoquan He's avatar
      kexec: export the value of phys_base instead of symbol address · 401721ec
      Baoquan He authored
      
      
      Currently in x86_64, the symbol address of phys_base is exported to
      vmcoreinfo.  Dave Anderson complained this is really useless for his
      Crash implementation.  Because in user-space utility Crash and
      Makedumpfile which exported vmcore information is mainly used for, value
      of phys_base is needed to covert virtual address of exported kernel
      symbol to physical address.  Especially init_level4_pgt, if we want to
      access and go over the page table to look up a PA corresponding to VA,
      firstly we need calculate
      
        page_dir = SYMBOL(init_level4_pgt) - __START_KERNEL_map + phys_base;
      
      Now in Crash and Makedumpfile, we have to analyze the vmcore elf program
      header to get value of phys_base.  As Dave said, it would be preferable
      if it were readily availabl in vmcoreinfo rather than depending upon the
      PT_LOAD semantics.
      
      Hence in this patch change to export the value of phys_base instead of
      its virtual address.
      
      And people also complained that KERNEL_IMAGE_SIZE exporting is x86_64
      only, should be moved into arch dependent function
      arch_crash_save_vmcoreinfo.  Do the moving in this patch.
      
      Link: http://lkml.kernel.org/r/1478568596-30060-2-git-send-email-bhe@redhat.com
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Xunlei Pang <xlpang@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Eugene Surovegin <surovegin@google.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      401721ec
    • Baoquan He's avatar
      Revert "kdump, vmcoreinfo: report memory sections virtual addresses" · 69f58384
      Baoquan He authored
      This reverts commit 0549a3c0 ("kdump, vmcoreinfo: report memory
      sections virtual addresses").
      
      Commit 0549a3c0
      
       tells the userspace utility makedumpfile the
      randomized base address of these memmory sections when mm kaslr is
      enabled.  However the following patch "kexec: export the value of
      phys_base instead of symbol address" makes makedumpfile not need these
      addresses any more.
      
      Besides we should use VMCOREINFO_NUMBER to export the value of the
      variable so that we can use the existing number_table mechanism of
      Makedumpfile to fetch it.  So revert it now.  If needed we can add it
      later.
      
      http://lists.infradead.org/pipermail/kexec/2016-October/017540.html
      Link: http://lkml.kernel.org/r/1478568596-30060-1-git-send-email-bhe@redhat.com
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Xunlei Pang <xlpang@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Eugene Surovegin <surovegin@google.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69f58384
    • Alexey Dobriyan's avatar
      coredump: clarify "unsafe core_pattern" warning · 760c6a91
      Alexey Dobriyan authored
      
      
      I was amused to find "unsafe core_pattern" warning having these lines in
      /etc/sysctl.conf:
      
      	fs.suid_dumpable=2
      	kernel.core_pattern=/core/core-%e-%p-%E
      	kernel.core_uses_pid=0
      
      Turns out kernel is formally right.  Default core_pattern is just "core",
      which doesn't qualify for secure path while setting suid.dumpable.
      
      Hint admins about solution, clarify sysctl names, delete unnecessary '\'
      characters (string literals are concatenated regardless) and reformat for
      easier grepping.
      
      Link: http://lkml.kernel.org/r/20161029152124.GA1258@avx2
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      760c6a91
    • Waiman Long's avatar
      signals: avoid unnecessary taking of sighand->siglock · c7be96af
      Waiman Long authored
      
      
      When running certain database workload on a high-end system with many
      CPUs, it was found that spinlock contention in the sigprocmask syscalls
      became a significant portion of the overall CPU cycles as shown below.
      
        9.30%  9.30%  905387  dataserver  /proc/kcore 0x7fff8163f4d2
        [k] _raw_spin_lock_irq
                  |
                  ---_raw_spin_lock_irq
                     |
                     |--99.34%-- __set_current_blocked
                     |          sigprocmask
                     |          sys_rt_sigprocmask
                     |          system_call_fastpath
                     |          |
                     |          |--50.63%-- __swapcontext
                     |          |          |
                     |          |          |--99.91%-- upsleepgeneric
                     |          |
                     |          |--49.36%-- __setcontext
                     |          |          ktskRun
      
      Looking further into the swapcontext function in glibc, it was found that
      the function always call sigprocmask() without checking if there are
      changes in the signal mask.
      
      A check was added to the __set_current_blocked() function to avoid taking
      the sighand->siglock spinlock if there is no change in the signal mask.
      This will prevent unneeded spinlock contention when many threads are
      trying to call sigprocmask().
      
      With this patch applied, the spinlock contention in sigprocmask() was
      gone.
      
      Link: http://lkml.kernel.org/r/1474979209-11867-1-git-send-email-Waiman.Long@hpe.com
      Signed-off-by: default avatarWaiman Long <Waiman.Long@hpe.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Stas Sergeev <stsp@list.ru>
      Cc: Scott J Norton <scott.norton@hpe.com>
      Cc: Douglas Hatch <doug.hatch@hpe.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c7be96af
    • Michal Hocko's avatar
      mm, compaction: allow compaction for GFP_NOFS requests · 73e64c51
      Michal Hocko authored
      compaction has been disabled for GFP_NOFS and GFP_NOIO requests since
      the direct compaction was introduced by commit 56de7263
      
       ("mm:
      compaction: direct compact when a high-order allocation fails").  The
      main reason is that the migration of page cache pages might recurse back
      to fs/io layer and we could potentially deadlock.  This is overly
      conservative because all the anonymous memory is migrateable in the
      GFP_NOFS context just fine.  This might be a large portion of the memory
      in many/most workkloads.
      
      Remove the GFP_NOFS restriction and make sure that we skip all fs pages
      (those with a mapping) while isolating pages to be migrated.  We cannot
      consider clean fs pages because they might need a metadata update so
      only isolate pages without any mapping for nofs requests.
      
      The effect of this patch will be probably very limited in many/most
      workloads because higher order GFP_NOFS requests are quite rare,
      although different configurations might lead to very different results.
      David Chinner has mentioned a heavy metadata workload with 64kB block
      which to quote him:
      
      : Unfortunately, there was an era of cargo cult configuration tweaks in the
      : Ceph community that has resulted in a large number of production machines
      : with XFS filesystems configured this way.  And a lot of them store large
      : numbers of small files and run under significant sustained memory
      : pressure.
      :
      : I slowly working towards getting rid of these high order allocations and
      : replacing them with the equivalent number of single page allocations, but
      : I haven't got that (complex) change working yet.
      
      We can do the following to simulate that workload:
      $ mkfs.xfs -f -n size=64k <dev>
      $ mount <dev> /mnt/scratch
      $ time ./fs_mark  -D  10000  -S0  -n  100000  -s  0  -L  32 \
              -d  /mnt/scratch/0  -d  /mnt/scratch/1 \
              -d  /mnt/scratch/2  -d  /mnt/scratch/3 \
              -d  /mnt/scratch/4  -d  /mnt/scratch/5 \
              -d  /mnt/scratch/6  -d  /mnt/scratch/7 \
              -d  /mnt/scratch/8  -d  /mnt/scratch/9 \
              -d  /mnt/scratch/10  -d  /mnt/scratch/11 \
              -d  /mnt/scratch/12  -d  /mnt/scratch/13 \
              -d  /mnt/scratch/14  -d  /mnt/scratch/15
      
      and indeed is hammers the system with many high order GFP_NOFS requests as
      per a simle tracepoint during the load:
      $ echo '!(gfp_flags & 0x80) && (gfp_flags &0x400000)' > $TRACE_MNT/events/kmem/mm_page_alloc/filter
      I am getting
      5287609 order=0
           37 order=1
      1594905 order=2
      3048439 order=3
      6699207 order=4
        66645 order=5
      
      My testing was done in a kvm guest so performance numbers should be
      taken with a grain of salt but there seems to be a difference when the
      patch is applied:
      
      * Original kernel
      FSUse%        Count         Size    Files/sec     App Overhead
           1      1600000            0       4300.1         20745838
           3      3200000            0       4239.9         23849857
           5      4800000            0       4243.4         25939543
           6      6400000            0       4248.4         19514050
           8      8000000            0       4262.1         20796169
           9      9600000            0       4257.6         21288675
          11     11200000            0       4259.7         19375120
          13     12800000            0       4220.7         22734141
          14     14400000            0       4238.5         31936458
          16     16000000            0       4231.5         23409901
          18     17600000            0       4045.3         23577700
          19     19200000            0       2783.4         58299526
          21     20800000            0       2678.2         40616302
          23     22400000            0       2693.5         83973996
      
      and xfs complaining about memory allocation not making progress
      [ 2304.372647] XFS: fs_mark(3289) possible memory allocation deadlock size 65624 in kmem_alloc (mode:0x2408240)
      [ 2304.443323] XFS: fs_mark(3285) possible memory allocation deadlock size 65728 in kmem_alloc (mode:0x2408240)
      [ 4796.772477] XFS: fs_mark(3424) possible memory allocation deadlock size 46936 in kmem_alloc (mode:0x2408240)
      [ 4796.775329] XFS: fs_mark(3423) possible memory allocation deadlock size 51416 in kmem_alloc (mode:0x2408240)
      [ 4797.388808] XFS: fs_mark(3424) possible memory allocation deadlock size 65728 in kmem_alloc (mode:0x2408240)
      
      * Patched kernel
      FSUse%        Count         Size    Files/sec     App Overhead
           1      1600000            0       4289.1         19243934
           3      3200000            0       4241.6         32828865
           5      4800000            0       4248.7         32884693
           6      6400000            0       4314.4         19608921
           8      8000000            0       4269.9         24953292
           9      9600000            0       4270.7         33235572
          11     11200000            0       4346.4         40817101
          13     12800000            0       4285.3         29972397
          14     14400000            0       4297.2         20539765
          16     16000000            0       4219.6         18596767
          18     17600000            0       4273.8         49611187
          19     19200000            0       4300.4         27944451
          21     20800000            0       4270.6         22324585
          22     22400000            0       4317.6         22650382
          24     24000000            0       4065.2         22297964
      
      So the dropdown at Count 19200000 didn't happen and there was only a
      single warning about allocation not making progress
      [ 3063.815003] XFS: fs_mark(3272) possible memory allocation deadlock size 65624 in kmem_alloc (mode:0x2408240)
      
      This suggests that the patch has helped even though there is not all that
      much of anonymous memory as the workload mostly generates fs metadata.  I
      assume the success rate would be higher with more anonymous memory which
      should be the case in many workloads.
      
      [akpm@linux-foundation.org: fix comment]
      Link: http://lkml.kernel.org/r/20161012114721.31853-1-mhocko@kernel.org
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      73e64c51
    • Konstantin Khlebnikov's avatar
      kernel/watchdog: use nmi registers snapshot in hardlockup handler · 4d1f0fb0
      Konstantin Khlebnikov authored
      NMI handler doesn't call set_irq_regs(), it's set only by normal IRQ.
      Thus get_irq_regs() returns NULL or stale registers snapshot with IP/SP
      pointing to the code interrupted by IRQ which was interrupted by NMI.
      NULL isn't a problem: in this case watchdog calls dump_stack() and
      prints full stack trace including NMI.  But if we're stuck in IRQ
      handler then NMI watchlog will print stack trace without IRQ part at
      all.
      
      This patch uses registers snapshot passed into NMI handler as arguments:
      these registers point exactly to the instruction interrupted by NMI.
      
      Fixes: 55537871
      
       ("kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup")
      Link: http://lkml.kernel.org/r/146771764784.86724.6006627197118544150.stgit@buzz
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: <stable@vger.kernel.org>	[4.4+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4d1f0fb0