Skip to content
  1. Nov 10, 2021
    • Thomas Gleixner's avatar
      mm/scatterlist: replace the !preemptible warning in sg_miter_stop() · 723aca20
      Thomas Gleixner authored
      
      
      sg_miter_stop() checks for disabled preemption before unmapping a page
      via kunmap_atomic().  The kernel doc mentions under context that
      preemption must be disabled if SG_MITER_ATOMIC is set.
      
      There is no active requirement for the caller to have preemption
      disabled before invoking sg_mitter_stop().  The sg_mitter_*()
      implementation itself has no such requirement.
      
      In fact, preemption is disabled by kmap_atomic() as part of
      sg_miter_next() and remains disabled as long as there is an active
      SG_MITER_ATOMIC mapping.  This is a consequence of kmap_atomic() and not
      a requirement for sg_mitter_*() itself.
      
      The user chooses SG_MITER_ATOMIC because it uses the API in a context
      where blocking is not possible or blocking is possible but he chooses a
      lower weight mapping which is not available on all CPUs and so it might
      need less overhead to setup at a price that now preemption will be
      disabled.
      
      The kmap_atomic() implementation on PREEMPT_RT does not disable
      preemption.  It simply disables CPU migration to ensure that the task
      remains on the same CPU while the caller remains preemptible.  This in
      turn triggers the warning in sg_miter_stop() because preemption is
      allowed.
      
      The PREEMPT_RT and !PREEMPT_RT implementation of kmap_atomic() disable
      pagefaults as a requirement.  It is sufficient to check for this instead
      of disabled preemption.
      
      Check for disabled pagefault handler in the SG_MITER_ATOMIC case.
      Remove the "preemption disabled" part from the kernel doc as the
      sg_milter*() implementation does not care.
      
      [bigeasy@linutronix.de: commit description]
      
      Link: https://lkml.kernel.org/r/20211015211409.cqopacv3pxdwn2ty@linutronix.de
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      723aca20
    • Alexey Dobriyan's avatar
      lib: uninline simple_strntoull() as well · 839b395e
      Alexey Dobriyan authored
      
      
      Codegen become bloated again after simple_strntoull() introduction
      
      	add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-224 (-224)
      	Function                                     old     new   delta
      	simple_strtoul                                 5       2      -3
      	simple_strtol                                 23      20      -3
      	simple_strtoull                              119      15    -104
      	simple_strtoll                               155      41    -114
      
      Link: https://lkml.kernel.org/r/YVmlB9yY4lvbNKYt@localhost.localdomain
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Richard Fitzgerald <rf@opensource.cirrus.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      839b395e
    • Lucas De Marchi's avatar
      include/linux/string_helpers.h: add linux/string.h for strlen() · bfb3ba32
      Lucas De Marchi authored
      
      
      linux/string_helpers.h uses strlen(), so include the correponding header.
      Otherwise we get a compilation error if it's not also included by whoever
      included the helper.
      
      Link: https://lkml.kernel.org/r/20211005212634.3223113-1-lucas.demarchi@intel.com
      Signed-off-by: default avatarLucas De Marchi <lucas.demarchi@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bfb3ba32
    • Imran Khan's avatar
      lib, stackdepot: add helper to print stack entries into buffer · 0f68d45e
      Imran Khan authored
      
      
      To print stack entries into a buffer, users of stackdepot, first get a
      list of stack entries using stack_depot_fetch and then print this list
      into a buffer using stack_trace_snprint.  Provide a helper in stackdepot
      for this purpose.  Also change above mentioned users to use this helper.
      
      [imran.f.khan@oracle.com: fix build error]
        Link: https://lkml.kernel.org/r/20210915175321.3472770-4-imran.f.khan@oracle.com
      [imran.f.khan@oracle.com: export stack_depot_snprint() to modules]
        Link: https://lkml.kernel.org/r/20210916133535.3592491-4-imran.f.khan@oracle.com
      
      Link: https://lkml.kernel.org/r/20210915014806.3206938-4-imran.f.khan@oracle.com
      Signed-off-by: default avatarImran Khan <imran.f.khan@oracle.com>
      Suggested-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: Jani Nikula <jani.nikula@intel.com>	[i915]
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Maxime Ripard <mripard@kernel.org>
      Cc: Thomas Zimmermann <tzimmermann@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0f68d45e
    • Imran Khan's avatar
      lib, stackdepot: add helper to print stack entries · 505be481
      Imran Khan authored
      
      
      To print a stack entries, users of stackdepot, first use stack_depot_fetch
      to get a list of stack entries and then use stack_trace_print to print
      this list.  Provide a helper in stackdepot to print stack entries based on
      stackdepot handle.  Also change above mentioned users to use this helper.
      
      Link: https://lkml.kernel.org/r/20210915014806.3206938-3-imran.f.khan@oracle.com
      Signed-off-by: default avatarImran Khan <imran.f.khan@oracle.com>
      Suggested-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Maxime Ripard <mripard@kernel.org>
      Cc: Thomas Zimmermann <tzimmermann@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      505be481
    • Imran Khan's avatar
      lib, stackdepot: check stackdepot handle before accessing slabs · 4d4712c1
      Imran Khan authored
      
      
      Patch series "lib, stackdepot: check stackdepot handle before accessing slabs", v2.
      
      PATCH-1: Checks validity of a stackdepot handle before proceeding to
      access stackdepot slab/objects.
      
      PATCH-2: Adds a helper in stackdepot, to allow users to print stack
      entries just by specifying the stackdepot handle.  It also changes such
      users to use this new interface.
      
      PATCH-3: Adds a helper in stackdepot, to allow users to print stack
      entries into buffers just by specifying the stackdepot handle and
      destination buffer.  It also changes such users to use this new interface.
      
      This patch (of 3):
      
      stack_depot_save allocates slabs that will be used for storing objects in
      future.If this slab allocation fails we may get to a situation where space
      allocation for a new stack_record fails, causing stack_depot_save to
      return 0 as handle.  If user of this handle ends up invoking
      stack_depot_fetch with this handle value, current implementation of
      stack_depot_fetch will end up using slab from wrong index.  To avoid this
      check handle value at the beginning.
      
      Link: https://lkml.kernel.org/r/20210915175321.3472770-1-imran.f.khan@oracle.com
      Link: https://lkml.kernel.org/r/20210915014806.3206938-1-imran.f.khan@oracle.com
      Link: https://lkml.kernel.org/r/20210915014806.3206938-2-imran.f.khan@oracle.com
      Signed-off-by: default avatarImran Khan <imran.f.khan@oracle.com>
      Suggested-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Maxime Ripard <mripard@kernel.org>
      Cc: Thomas Zimmermann <tzimmermann@suse.de>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4d4712c1
    • Lukas Bulwahn's avatar
      MAINTAINERS: rectify entry for ALLWINNER HARDWARE SPINLOCK SUPPORT · 57235b6e
      Lukas Bulwahn authored
      Commit f9e784dc ("dt-bindings: hwlock: add sun6i_hwspinlock") adds
      Documentation/devicetree/bindings/hwlock/allwinner,sun6i-a31-hwspinlock.yaml,
      but the related commit 3c881e05
      
       ("hwspinlock: add sun6i hardware
      spinlock support") adds a file reference to
      allwinner,sun6i-hwspinlock.yaml instead.
      
      Hence, ./scripts/get_maintainer.pl --self-test=patterns complains:
      
        warning: no file matches  F:  Documentation/devicetree/bindings/hwlock/allwinner,sun6i-hwspinlock.yaml
      
      Rectify this file reference in ALLWINNER HARDWARE SPINLOCK SUPPORT.
      
      Link: https://lkml.kernel.org/r/20211026141902.4865-5-lukas.bulwahn@gmail.com
      Reviewed-by: default avatarWilken Gottwalt <wilken.gottwalt@posteo.net>
      Signed-off-by: default avatarLukas Bulwahn <lukas.bulwahn@gmail.com>
      Cc: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
      Cc: Edmund Dea <edmund.j.dea@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
      Cc: Punit Agrawal <punit1.agrawal@toshiba.co.jp>
      Cc: Ralf Ramsauer <ralf.ramsauer@oth-regensburg.de>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Yu Chen <chenyu56@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      57235b6e
    • Lukas Bulwahn's avatar
      MAINTAINERS: rectify entry for INTEL KEEM BAY DRM DRIVER · 65e5acbb
      Lukas Bulwahn authored
      Commit ed794057 ("drm/kmb: Build files for KeemBay Display driver")
      refers to the non-existing file intel,kmb_display.yaml in
      Documentation/devicetree/bindings/display/.
      
      Commit 5a76b1ed ("dt-bindings: display: Add support for Intel
      KeemBay Display") originating from the same patch series however adds
      the file intel,keembay-display.yaml in that directory instead.
      
      So, refer to intel,keembay-display.yaml in the INTEL KEEM BAY DRM DRIVER
      section instead.
      
      Link: https://lkml.kernel.org/r/20211026141902.4865-4-lukas.bulwahn@gmail.com
      Fixes: ed794057
      
       ("drm/kmb: Build files for KeemBay Display driver")
      Signed-off-by: default avatarLukas Bulwahn <lukas.bulwahn@gmail.com>
      Cc: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
      Cc: Edmund Dea <edmund.j.dea@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
      Cc: Punit Agrawal <punit1.agrawal@toshiba.co.jp>
      Cc: Ralf Ramsauer <ralf.ramsauer@oth-regensburg.de>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Wilken Gottwalt <wilken.gottwalt@posteo.net>
      Cc: Yu Chen <chenyu56@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      65e5acbb
    • Lukas Bulwahn's avatar
      MAINTAINERS: rectify entry for HIKEY960 ONBOARD USB GPIO HUB DRIVER · b39c9206
      Lukas Bulwahn authored
      Commit 7a6ff4c4 ("misc: hisi_hikey_usb: Driver to support onboard
      USB gpio hub on Hikey960") refers to the non-existing file
      Documentation/devicetree/bindings/misc/hisilicon-hikey-usb.yaml, but
      this commit's patch series does not add any related devicetree binding
      in misc.
      
      So, just drop this file reference in HIKEY960 ONBOARD USB GPIO HUB DRIVER.
      
      Link: https://lkml.kernel.org/r/20211026141902.4865-3-lukas.bulwahn@gmail.com
      Fixes: 7a6ff4c4
      
       ("misc: hisi_hikey_usb: Driver to support onboard USB gpio hub on Hikey960")
      Signed-off-by: default avatarLukas Bulwahn <lukas.bulwahn@gmail.com>
      Cc: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
      Cc: Edmund Dea <edmund.j.dea@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
      Cc: Punit Agrawal <punit1.agrawal@toshiba.co.jp>
      Cc: Ralf Ramsauer <ralf.ramsauer@oth-regensburg.de>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Wilken Gottwalt <wilken.gottwalt@posteo.net>
      Cc: Yu Chen <chenyu56@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b39c9206
    • Lukas Bulwahn's avatar
      MAINTAINERS: rectify entry for ARM/TOSHIBA VISCONTI ARCHITECTURE · 46bfa85f
      Lukas Bulwahn authored
      Patch series "Rectify file references for dt-bindings in MAINTAINERS", v5.
      
      A patch series that cleans up some file references for dt-bindings in
      MAINTAINERS.
      
      This patch (of 4):
      
      Commit 836863a0 ("MAINTAINERS: Add information for Toshiba Visconti
      ARM SoCs") refers to the non-existing file toshiba,tmpv7700-pinctrl.yaml
      in ./Documentation/devicetree/bindings/pinctrl/.  Commit 1825c1fe
      ("pinctrl: Add DT bindings for Toshiba Visconti TMPV7700 SoC")
      originating from the same patch series however adds the file
      toshiba,visconti-pinctrl.yaml in that directory instead.
      
      So, refer to toshiba,visconti-pinctrl.yaml in the ARM/TOSHIBA VISCONTI
      ARCHITECTURE section instead.
      
      Link: https://lkml.kernel.org/r/20211026141902.4865-1-lukas.bulwahn@gmail.com
      Link: https://lkml.kernel.org/r/20211026141902.4865-2-lukas.bulwahn@gmail.com
      Fixes: 836863a0
      
       ("MAINTAINERS: Add information for Toshiba Visconti ARM SoCs")
      Signed-off-by: default avatarLukas Bulwahn <lukas.bulwahn@gmail.com>
      Acked-by: default avatarNobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
      Reviewed-by: default avatarNobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Punit Agrawal <punit1.agrawal@toshiba.co.jp>
      Cc: Anitha Chrisanthus <anitha.chrisanthus@intel.com>
      Cc: Wilken Gottwalt <wilken.gottwalt@posteo.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Cc: Yu Chen <chenyu56@huawei.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Edmund Dea <edmund.j.dea@intel.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Ralf Ramsauer <ralf.ramsauer@oth-regensburg.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      46bfa85f
    • Kees Cook's avatar
      MAINTAINERS: add "exec & binfmt" section with myself and Eric · b15be237
      Kees Cook authored
      
      
      I'd like more continuity of review for the exec and binfmt (and ELF)
      stuff.  Eric and I have been the most active lately, so list us as
      reviewers.
      
      Link: https://lkml.kernel.org/r/20211006180200.1178142-1-keescook@chromium.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b15be237
    • Colin Ian King's avatar
      mailmap: update email address for Colin King · 7d60ac00
      Colin Ian King authored
      
      
      Colin King has moved to Intel to update gmail and Canonical email
      addresses.
      
      Link: https://lkml.kernel.org/r/20211102231617.78569-1-colin.i.king@gmail.com
      Signed-off-by: default avatarColin Ian King <colin.i.king@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7d60ac00
    • Rasmus Villemoes's avatar
      linux/container_of.h: switch to static_assert · e1edc277
      Rasmus Villemoes authored
      
      
      _Static_assert() is evaluated already in the compiler's frontend, and
      gives a somehat more to-the-point error, compared to the BUILD_BUG_ON
      macro, which only fires after the optimizer has had a chance to
      eliminate calls to functions marked with __attribute__((error)).  In
      theory, this might make builds a tiny bit faster.
      
      There's also a little less gunk in the error message emitted:
      
        lib/sort.c: In function `foo':
        include/linux/build_bug.h:78:41: error: static assertion failed: "pointer type mismatch in container_of()"
           78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
      
      compared to
      
        lib/sort.c: In function `foo':
        include/linux/compiler_types.h:322:38: error: call to `__compiletime_assert_2' declared with attribute error: pointer type mismatch in container_of()
          322 |  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
      
      While at it, fix the copy-pasto in container_of_safe().
      
      Link: https://lkml.kernel.org/r/20211015090530.2774079-1-linux@rasmusvillemoes.dk
      Link: https://lore.kernel.org/lkml/20211014132331.GA4811@kernel.org/T/
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Reviewed-by: default avatarMiguel Ojeda <ojeda@kernel.org>
      Acked-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e1edc277
    • Stephen Rothwell's avatar
      kernel.h: split out instruction pointer accessors · e52340de
      Stephen Rothwell authored
      
      
      bottom_half.h needs _THIS_IP_ to be standalone, so split that and
      _RET_IP_ out from kernel.h into the new instruction_pointer.h.  kernel.h
      directly needs them, so include it there and replace the include of
      kernel.h with this new file in bottom_half.h.
      
      Link: https://lkml.kernel.org/r/20211028161248.45232-1-andriy.shevchenko@linux.intel.com
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e52340de
    • Andy Shevchenko's avatar
      include/linux/generic-radix-tree.h: replace kernel.h with the necessary inclusions · b4b87651
      Andy Shevchenko authored
      
      
      When kernel.h is used in the headers it adds a lot into dependency hell,
      especially when there are circular dependencies are involved.
      
      Replace kernel.h inclusion with the list of what is really being used.
      
      [akpm@linux-foundation.org: include math.h for round_up()]
      
      Link: https://lkml.kernel.org/r/20211027150548.80042-1-andriy.shevchenko@linux.intel.com
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b4b87651
    • Andy Shevchenko's avatar
      include/linux/radix-tree.h: replace kernel.h with the necessary inclusions · 98e1385e
      Andy Shevchenko authored
      
      
      When kernel.h is used in the headers it adds a lot into dependency hell,
      especially when there are circular dependencies are involved.
      
      Replace kernel.h inclusion with the list of what is really being used.
      
      Link: https://lkml.kernel.org/r/20211027150528.80003-1-andriy.shevchenko@linux.intel.com
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      98e1385e
    • Andy Shevchenko's avatar
      include/linux/sbitmap.h: replace kernel.h with the necessary inclusions · 1fcbd5de
      Andy Shevchenko authored
      
      
      When kernel.h is used in the headers it adds a lot into dependency hell,
      especially when there are circular dependencies are involved.
      
      Replace kernel.h inclusion with the list of what is really being used.
      
      Link: https://lkml.kernel.org/r/20211027150437.79921-1-andriy.shevchenko@linux.intel.com
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1fcbd5de
    • Andy Shevchenko's avatar
      include/linux/delay.h: replace kernel.h with the necessary inclusions · 5f6286a6
      Andy Shevchenko authored
      
      
      When kernel.h is used in the headers it adds a lot into dependency hell,
      especially when there are circular dependencies are involved.
      
      Replace kernel.h inclusion with the list of what is really being used.
      
      [akpm@linux-foundation.org: cxd2880_common.h needs bits.h for GENMASK()]
      [andriy.shevchenko@linux.intel.com: delay.h: fix for removed kernel.h]
        Link: https://lkml.kernel.org/r/20211028170143.56523-1-andriy.shevchenko@linux.intel.com
      [akpm@linux-foundation.org: include/linux/fwnode.h needs bits.h for BIT()]
      
      Link: https://lkml.kernel.org/r/20211027150324.79827-1-andriy.shevchenko@linux.intel.com
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5f6286a6
    • Andy Shevchenko's avatar
      include/media/media-entity.h: replace kernel.h with the necessary inclusions · 28b2e8f3
      Andy Shevchenko authored
      
      
      When kernel.h is used in the headers it adds a lot into dependency hell,
      especially when there are circular dependencies are involved.
      
      Replace kernel.h inclusion with the list of what is really being used.
      
      Link: https://lkml.kernel.org/r/20211013170417.87909-8-andriy.shevchenko@linux.intel.com
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Acked-by: default avatarSakari Ailus <sakari.ailus@linux.intel.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Brendan Higgins <brendanhiggins@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thorsten Leemhuis <regressions@leemhuis.info>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      28b2e8f3
    • Andy Shevchenko's avatar
      include/linux/plist.h: replace kernel.h with the necessary inclusions · c540f959
      Andy Shevchenko authored
      
      
      When kernel.h is used in the headers it adds a lot into dependency hell,
      especially when there are circular dependencies are involved.
      
      Replace kernel.h inclusion with the list of what is really being used.
      
      Link: https://lkml.kernel.org/r/20211013170417.87909-7-andriy.shevchenko@linux.intel.com
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Brendan Higgins <brendanhiggins@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thorsten Leemhuis <regressions@leemhuis.info>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c540f959
    • Andy Shevchenko's avatar
      include/linux/llist.h: replace kernel.h with the necessary inclusions · 50b09d61
      Andy Shevchenko authored
      
      
      When kernel.h is used in the headers it adds a lot into dependency hell,
      especially when there are circular dependencies are involved.
      
      Replace kernel.h inclusion with the list of what is really being used.
      
      Link: https://lkml.kernel.org/r/20211013170417.87909-6-andriy.shevchenko@linux.intel.com
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Brendan Higgins <brendanhiggins@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thorsten Leemhuis <regressions@leemhuis.info>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      50b09d61
    • Andy Shevchenko's avatar
      include/linux/list.h: replace kernel.h with the necessary inclusions · cd7187e1
      Andy Shevchenko authored
      
      
      When kernel.h is used in the headers it adds a lot into dependency hell,
      especially when there are circular dependencies are involved.
      
      Replace kernel.h inclusion with the list of what is really being used.
      
      Link: https://lkml.kernel.org/r/20211013170417.87909-5-andriy.shevchenko@linux.intel.com
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Brendan Higgins <brendanhiggins@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thorsten Leemhuis <regressions@leemhuis.info>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cd7187e1
    • Andy Shevchenko's avatar
      include/kunit/test.h: replace kernel.h with the necessary inclusions · ec54c289
      Andy Shevchenko authored
      
      
      When kernel.h is used in the headers it adds a lot into dependency hell,
      especially when there are circular dependencies are involved.
      
      Replace kernel.h inclusion with the list of what is really being used.
      
      Link: https://lkml.kernel.org/r/20211013170417.87909-4-andriy.shevchenko@linux.intel.com
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Brendan Higgins <brendanhiggins@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thorsten Leemhuis <regressions@leemhuis.info>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ec54c289
    • Andy Shevchenko's avatar
      kernel.h: split out container_of() and typeof_member() macros · d2a8ebbf
      Andy Shevchenko authored
      
      
      kernel.h is being used as a dump for all kinds of stuff for a long time.
      Here is the attempt cleaning it up by splitting out container_of() and
      typeof_member() macros.
      
      For time being include new header back to kernel.h to avoid twisted
      indirected includes for existing users.
      
      Note, there are _a lot_ of headers and modules that include kernel.h
      solely for one of these macros and this allows to unburden compiler for
      the twisted inclusion paths and to make new code cleaner in the future.
      
      Link: https://lkml.kernel.org/r/20211013170417.87909-3-andriy.shevchenko@linux.intel.com
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Brendan Higgins <brendanhiggins@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thorsten Leemhuis <regressions@leemhuis.info>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d2a8ebbf
    • Andy Shevchenko's avatar
      kernel.h: drop unneeded <linux/kernel.h> inclusion from other headers · f5d80614
      Andy Shevchenko authored
      
      
      Patch series "kernel.h further split", v5.
      
      kernel.h is a set of something which is not related to each other and
      often used in non-crossed compilation units, especially when drivers
      need only one or two macro definitions from it.
      
      This patch (of 7):
      
      There is no evidence we need kernel.h inclusion in certain headers.
      Drop unneeded <linux/kernel.h> inclusion from other headers.
      
      [sfr@canb.auug.org.au: bottom_half.h needs kernel]
        Link: https://lkml.kernel.org/r/20211015202908.1c417ae2@canb.auug.org.au
      
      Link: https://lkml.kernel.org/r/20211013170417.87909-1-andriy.shevchenko@linux.intel.com
      Link: https://lkml.kernel.org/r/20211013170417.87909-2-andriy.shevchenko@linux.intel.com
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Brendan Higgins <brendanhiggins@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
      Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Thorsten Leemhuis <regressions@leemhuis.info>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f5d80614
    • Stephen Brennan's avatar
      proc: allow pid_revalidate() during LOOKUP_RCU · da4d6b9c
      Stephen Brennan authored
      Problem Description:
      
      When running running ~128 parallel instances of
      
        TZ=/etc/localtime ps -fe >/dev/null
      
      on a 128CPU machine, the %sys utilization reaches 97%, and perf shows
      the following code path as being responsible for heavy contention on the
      d_lockref spinlock:
      
            walk_component()
              lookup_fast()
                d_revalidate()
                  pid_revalidate() // returns -ECHILD
                unlazy_child()
                  lockref_get_not_dead(&nd->path.dentry->d_lockref) <-- contention
      
      The reason is that pid_revalidate() is triggering a drop from RCU to ref
      path walk mode.  All concurrent path lookups thus try to grab a
      reference to the dentry for /proc/, before re-executing pid_revalidate()
      and then stepping into the /proc/$pid directory.  Thus there is huge
      spinlock contention.
      
      This patch allows pid_revalidate() to execute in RCU mode, meaning that
      the path lookup can successfully enter the /proc/$pid directory while
      still in RCU mode.  Later on, the path lookup may still drop into ref
      mode, but the contention will be much reduced at this point.
      
      By applying this patch, %sys utilization falls to around 85% under the
      same workload, and the number of ps processes executed per unit time
      increases by 3x-4x.  Although this particular workload is a bit
      contrived, we have seen some large collections of eager monitoring
      scripts which produced similarly high %sys time due to contention in the
      /proc directory.
      
      As a result this patch, Al noted that several procfs methods which were
      only called in ref-walk mode could now be called from RCU mode.  To
      ensure that this patch is safe, I audited all the inode get_link and
      permission() implementations, as well as dentry d_revalidate()
      implementations, in fs/proc.  The purpose here is to ensure that they
      either are safe to call in RCU (i.e.  don't sleep) or correctly bail out
      of RCU mode if they don't support it.  My analysis shows that all
      at-risk procfs methods are safe to call under RCU, and thus this patch
      is safe.
      
      Procfs RCU-walk Analysis:
      
      This analysis is up-to-date with 5.15-rc3.  When called under RCU mode,
      these functions have arguments as follows:
      
      * get_link() receives a NULL dentry pointer when called in RCU mode.
      * permission() receives MAY_NOT_BLOCK in the mode parameter when called
        from RCU.
      * d_revalidate() receives LOOKUP_RCU in flags.
      
      For the following functions, either they are trivially RCU safe, or they
      explicitly bail at the beginning of the function when they run:
      
      proc_ns_get_link       (bails out)
      proc_get_link          (RCU safe)
      proc_pid_get_link      (bails out)
      map_files_d_revalidate (bails out)
      map_misc_d_revalidate  (bails out)
      proc_net_d_revalidate  (RCU safe)
      proc_sys_revalidate    (bails out, also not under /proc/$pid)
      tid_fd_revalidate      (bails out)
      proc_sys_permission    (not under /proc/$pid)
      
      The remainder of the functions require a bit more detail:
      
      * proc_fd_permission: RCU safe. All of the body of this function is
        under rcu_read_lock(), except generic_permission() which declares
        itself RCU safe in its documentation string.
      * proc_self_get_link uses GFP_ATOMIC in the RCU case, so it is RCU aware
        and otherwise looks safe. The same is true of proc_thread_self_get_link.
      * proc_map_files_get_link: calls ns_capable, which calls capable(), and
        thus calls into the audit code (see note #1 below). The remainder is
        just a call to the trivially safe proc_pid_get_link().
      * proc_pid_permission: calls ptrace_may_access(), which appears RCU
        safe, although it does call into the "security_ptrace_access_check()"
        hook, which looks safe under smack and selinux. Just the audit code is
        of concern. Also uses get_task_struct() and put_task_struct(), see
        note #2 below.
      * proc_tid_comm_permission: Appears safe, though calls put_task_struct
        (see note #2 below).
      
      Note #1:
        Most of the concern of RCU safety has centered around the audit code.
        However, since b17ec22f
      
       ("selinux: slow_avc_audit has become
        non-blocking"), it's safe to call this code under RCU. So all of the
        above are safe by my estimation.
      
      Note #2: get_task_struct() and put_task_struct():
        The majority of get_task_struct() is under RCU read lock, and in any
        case it is a simple increment. But put_task_struct() is complex, given
        that it could at some point free the task struct, and this process has
        many steps which I couldn't manually verify. However, several other
        places call put_task_struct() under RCU, so it appears safe to use
        here too (see kernel/hung_task.c:165 or rcu/tree-stall.h:296)
      
      Patch description:
      
      pid_revalidate() drops from RCU into REF lookup mode.  When many threads
      are resolving paths within /proc in parallel, this can result in heavy
      spinlock contention on d_lockref as each thread tries to grab a
      reference to the /proc dentry (and drop it shortly thereafter).
      
      Investigation indicates that it is not necessary to drop RCU in
      pid_revalidate(), as no RCU data is modified and the function never
      sleeps.  So, remove the LOOKUP_RCU check.
      
      Link: https://lkml.kernel.org/r/20211004175629.292270-2-stephen.s.brennan@oracle.com
      Signed-off-by: default avatarStephen Brennan <stephen.s.brennan@oracle.com>
      Cc: Konrad Wilk <konrad.wilk@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      da4d6b9c
    • David Hildenbrand's avatar
      virtio-mem: kdump mode to sanitize /proc/vmcore access · ce281462
      David Hildenbrand authored
      
      
      Although virtio-mem currently supports reading unplugged memory in the
      hypervisor, this will change in the future, indicated to the device via
      a new feature flag.
      
      We similarly sanitized /proc/kcore access recently.  [1]
      
      Let's register a vmcore callback, to allow vmcore code to check if a PFN
      belonging to a virtio-mem device is either currently plugged and should
      be dumped or is currently unplugged and should not be accessed, instead
      mapping the shared zeropage or returning zeroes when reading.
      
      This is important when not capturing /proc/vmcore via tools like
      "makedumpfile" that can identify logically unplugged virtio-mem memory
      via PG_offline in the memmap, but simply by e.g., copying the file.
      
      Distributions that support virtio-mem+kdump have to make sure that the
      virtio_mem module will be part of the kdump kernel or the kdump initrd;
      dracut was recently [2] extended to include virtio-mem in the generated
      initrd.  As long as no special kdump kernels are used, this will
      automatically make sure that virtio-mem will be around in the kdump
      initrd and sanitize /proc/vmcore access -- with dracut.
      
      With this series, we'll send one virtio-mem state request for every ~2
      MiB chunk of virtio-mem memory indicated in the vmcore that we intend to
      read/map.
      
      In the future, we might want to allow building virtio-mem for kdump mode
      only, even without CONFIG_MEMORY_HOTPLUG and friends: this way, we could
      support special stripped-down kdump kernels that have many other config
      options disabled; we'll tackle that once required.  Further, we might
      want to try sensing bigger blocks (e.g., memory sections) first before
      falling back to device blocks on demand.
      
      Tested with Fedora rawhide, which contains a recent kexec-tools version
      (considering "System RAM (virtio_mem)" when creating the vmcore header)
      and a recent dracut version (including the virtio_mem module in the
      kdump initrd).
      
      Link: https://lkml.kernel.org/r/20210526093041.8800-1-david@redhat.com [1]
      Link: https://github.com/dracutdevs/dracut/pull/1157 [2]
      Link: https://lkml.kernel.org/r/20211005121430.30136-10-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ce281462
    • David Hildenbrand's avatar
      virtio-mem: factor out hotplug specifics from virtio_mem_remove() into virtio_mem_deinit_hotplug() · ffc763d0
      David Hildenbrand authored
      
      
      Let's prepare for a new virtio-mem kdump mode in which we don't actually
      hot(un)plug any memory but only observe the state of device blocks.
      
      Link: https://lkml.kernel.org/r/20211005121430.30136-9-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ffc763d0
    • David Hildenbrand's avatar
      virtio-mem: factor out hotplug specifics from virtio_mem_probe() into virtio_mem_init_hotplug() · 84e17e68
      David Hildenbrand authored
      
      
      Let's prepare for a new virtio-mem kdump mode in which we don't actually
      hot(un)plug any memory but only observe the state of device blocks.
      
      Link: https://lkml.kernel.org/r/20211005121430.30136-8-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      84e17e68
    • David Hildenbrand's avatar
      virtio-mem: factor out hotplug specifics from virtio_mem_init() into virtio_mem_init_hotplug() · 94300fcf
      David Hildenbrand authored
      
      
      Let's prepare for a new virtio-mem kdump mode in which we don't actually
      hot(un)plug any memory but only observe the state of device blocks.
      
      Link: https://lkml.kernel.org/r/20211005121430.30136-7-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      94300fcf
    • David Hildenbrand's avatar
      proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore callbacks · cc5f2704
      David Hildenbrand authored
      
      
      Let's support multiple registered callbacks, making sure that
      registering vmcore callbacks cannot fail.  Make the callback return a
      bool instead of an int, handling how to deal with errors internally.
      Drop unused HAVE_OLDMEM_PFN_IS_RAM.
      
      We soon want to make use of this infrastructure from other drivers:
      virtio-mem, registering one callback for each virtio-mem device, to
      prevent reading unplugged virtio-mem memory.
      
      Handle it via a generic vmcore_cb structure, prepared for future
      extensions: for example, once we support virtio-mem on s390x where the
      vmcore is completely constructed in the second kernel, we want to detect
      and add plugged virtio-mem memory ranges to the vmcore in order for them
      to get dumped properly.
      
      Handle corner cases that are unexpected and shouldn't happen in sane
      setups: registering a callback after the vmcore has already been opened
      (warn only) and unregistering a callback after the vmcore has already been
      opened (warn and essentially read only zeroes from that point on).
      
      Link: https://lkml.kernel.org/r/20211005121430.30136-6-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cc5f2704
    • David Hildenbrand's avatar
      proc/vmcore: let pfn_is_ram() return a bool · 2c9feeae
      David Hildenbrand authored
      
      
      The callback should deal with errors internally, it doesn't make sense
      to expose these via pfn_is_ram().  We'll rework the callbacks next.
      Right now we consider errors as if "it's RAM"; no functional change.
      
      Link: https://lkml.kernel.org/r/20211005121430.30136-5-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2c9feeae
    • David Hildenbrand's avatar
      x86/xen: print a warning when HVMOP_get_mem_type fails · 934fadf4
      David Hildenbrand authored
      
      
      HVMOP_get_mem_type is not expected to fail, "This call failing is
      indication of something going quite wrong and it would be good to know
      about this." [1]
      
      Let's add a pr_warn_once().
      
      Link: https://lkml.kernel.org/r/3b935aa0-6d85-0bcd-100e-15098add3c4c@oracle.com [1]
      Link: https://lkml.kernel.org/r/20211005121430.30136-4-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Suggested-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      934fadf4
    • David Hildenbrand's avatar
      x86/xen: simplify xen_oldmem_pfn_is_ram() · d452a489
      David Hildenbrand authored
      
      
      Let's simplify return handling.
      
      Link: https://lkml.kernel.org/r/20211005121430.30136-3-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d452a489
    • David Hildenbrand's avatar
      x86/xen: update xen_oldmem_pfn_is_ram() documentation · 434b90f3
      David Hildenbrand authored
      
      
      After removing /dev/kmem, sanitizing /proc/kcore and handling /dev/mem,
      this series tackles the last sane way how a VM could accidentially
      access logically unplugged memory managed by a virtio-mem device:
      /proc/vmcore
      
      When dumping memory via "makedumpfile", PG_offline pages, used by
      virtio-mem to flag logically unplugged memory, are already properly
      excluded; however, especially when accessing/copying /proc/vmcore "the
      usual way", we can still end up reading logically unplugged memory part
      of a virtio-mem device.
      
      Patch #1-#3 are cleanups.  Patch #4 extends the existing
      oldmem_pfn_is_ram mechanism.  Patch #5-#7 are virtio-mem refactorings
      for patch #8, which implements the virtio-mem logic to query the state
      of device blocks.
      
      Patch #8:
       "Although virtio-mem currently supports reading unplugged memory in the
        hypervisor, this will change in the future, indicated to the device
        via a new feature flag. We similarly sanitized /proc/kcore access
        recently.
        [...]
        Distributions that support virtio-mem+kdump have to make sure that the
        virtio_mem module will be part of the kdump kernel or the kdump
        initrd; dracut was recently [2] extended to include virtio-mem in the
        generated initrd. As long as no special kdump kernels are used, this
        will automatically make sure that virtio-mem will be around in the
        kdump initrd and sanitize /proc/vmcore access -- with dracut"
      
      This is the last remaining bit to support
      VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE [3] in the Linux implementation of
      virtio-mem.
      
      Note: this is best-effort.  We'll never be able to control what runs
      inside the second kernel, really, but we also don't have to care: we
      only care about sane setups where we don't want our VM getting zapped
      once we touch the wrong memory location while dumping.  While we usually
      expect sane setups to use "makedumfile", nothing really speaks against
      just copying /proc/vmcore, especially in environments where HWpoisioning
      isn't typically expected.  Also, we really don't want to put all our
      trust completely on the memmap, so sanitizing also makes sense when just
      using "makedumpfile".
      
      [1] https://lkml.kernel.org/r/20210526093041.8800-1-david@redhat.com
      [2] https://github.com/dracutdevs/dracut/pull/1157
      [3] https://lists.oasis-open.org/archives/virtio-comment/202109/msg00021.html
      
      This patch (of 9):
      
      The callback is only used for the vmcore nowadays.
      
      Link: https://lkml.kernel.org/r/20211005121430.30136-1-david@redhat.com
      Link: https://lkml.kernel.org/r/20211005121430.30136-2-david@redhat.com
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrvsky@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      434b90f3
    • Florian Weimer's avatar
      procfs: do not list TID 0 in /proc/<pid>/task · 0658a096
      Florian Weimer authored
      
      
      If a task exits concurrently, task_pid_nr_ns may return 0.
      
      [akpm@linux-foundation.org: coding style tweaks]
      [adobriyan@gmail.com: test that /proc/*/task doesn't contain "0"]
        Link: https://lkml.kernel.org/r/YV88AnVzHxPafQ9o@localhost.localdomain
      
      Link: https://lkml.kernel.org/r/8735pn5dx7.fsf@oldenburg.str.redhat.com
      Signed-off-by: default avatarFlorian Weimer <fweimer@redhat.com>
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Acked-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Reviewed-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0658a096
    • zhangyiru's avatar
      mm,hugetlb: remove mlock ulimit for SHM_HUGETLB · 83c1fd76
      zhangyiru authored
      Commit 21a3c273
      
       ("mm, hugetlb: add thread name and pid to
      SHM_HUGETLB mlock rlimit warning") marked this as deprecated in 2012,
      but it is not deleted yet.
      
      Mike says he still sees that message in log files on occasion, so maybe we
      should preserve this warning.
      
      Also remove hugetlbfs related user_shm_unlock in ipc/shm.c and remove the
      user_shm_unlock after out.
      
      Link: https://lkml.kernel.org/r/20211103105857.25041-1-zhangyiru3@huawei.com
      Signed-off-by: default avatarzhangyiru <zhangyiru3@huawei.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Liu Zixian <liuzixian4@huawei.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: wuxu.wu <wuxu.wu@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      83c1fd76
    • Johannes Weiner's avatar
      vfs: keep inodes with page cache off the inode shrinker LRU · 51b8c1fe
      Johannes Weiner authored
      Historically (pre-2.5), the inode shrinker used to reclaim only empty
      inodes and skip over those that still contained page cache.  This caused
      problems on highmem hosts: struct inode could put fill lowmem zones
      before the cache was getting reclaimed in the highmem zones.
      
      To address this, the inode shrinker started to strip page cache to
      facilitate reclaiming lowmem.  However, this comes with its own set of
      problems: the shrinkers may drop actively used page cache just because
      the inodes are not currently open or dirty - think working with a large
      git tree.  It further doesn't respect cgroup memory protection settings
      and can cause priority inversions between containers.
      
      Nowadays, the page cache also holds non-resident info for evicted cache
      pages in order to detect refaults.  We've come to rely heavily on this
      data inside reclaim for protecting the cache workingset and driving swap
      behavior.  We also use it to quantify and report workload health through
      psi.  The latter in turn is used for fleet health monitoring, as well as
      driving automated memory sizing of workloads and containers, proactive
      reclaim and memory offloading schemes.
      
      The consequences of dropping page cache prematurely is that we're seeing
      subtle and not-so-subtle failures in all of the above-mentioned
      scenarios, with the workload generally entering unexpected thrashing
      states while losing the ability to reliably detect it.
      
      To fix this on non-highmem systems at least, going back to rotating
      inodes on the LRU isn't feasible.  We've tried (commit a76cf1a4
      ("mm: don't reclaim inodes with many attached pages")) and failed
      (commit 69056ee6
      
       ("Revert "mm: don't reclaim inodes with many
      attached pages"")).
      
      The issue is mostly that shrinker pools attract pressure based on their
      size, and when objects get skipped the shrinkers remember this as
      deferred reclaim work.  This accumulates excessive pressure on the
      remaining inodes, and we can quickly eat into heavily used ones, or
      dirty ones that require IO to reclaim, when there potentially is plenty
      of cold, clean cache around still.
      
      Instead, this patch keeps populated inodes off the inode LRU in the
      first place - just like an open file or dirty state would.  An otherwise
      clean and unused inode then gets queued when the last cache entry
      disappears.  This solves the problem without reintroducing the reclaim
      issues, and generally is a bit more scalable than having to wade through
      potentially hundreds of thousands of busy inodes.
      
      Locking is a bit tricky because the locks protecting the inode state
      (i_lock) and the inode LRU (lru_list.lock) don't nest inside the
      irq-safe page cache lock (i_pages.xa_lock).  Page cache deletions are
      serialized through i_lock, taken before the i_pages lock, to make sure
      depopulated inodes are queued reliably.  Additions may race with
      deletions, but we'll check again in the shrinker.  If additions race
      with the shrinker itself, we're protected by the i_lock: if find_inode()
      or iput() win, the shrinker will bail on the elevated i_count or
      I_REFERENCED; if the shrinker wins and goes ahead with the inode, it
      will set I_FREEING and inhibit further igets(), which will cause the
      other side to create a new instance of the inode instead.
      
      Link: https://lkml.kernel.org/r/20210614211904.14420-4-hannes@cmpxchg.org
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      51b8c1fe
  2. Nov 07, 2021