Skip to content
  1. Nov 21, 2021
    • Yunfeng Ye's avatar
      mm: emit the "free" trace report before freeing memory in kmem_cache_free() · 9a543f00
      Yunfeng Ye authored
      After the memory is freed, it can be immediately allocated by other
      CPUs, before the "free" trace report has been emitted.  This causes
      inaccurate traces.
      
      For example, if the following sequence of events occurs:
      
          CPU 0                 CPU 1
      
        (1) alloc xxxxxx
        (2) free  xxxxxx
                               (3) alloc xxxxxx
                               (4) free  xxxxxx
      
      Then they will be inaccurately reported via tracing, so that they appear
      to have happened in this order:
      
          CPU 0                 CPU 1
      
        (1) alloc xxxxxx
                               (2) alloc xxxxxx
        (3) free  xxxxxx
                               (4) free  xxxxxx
      
      This makes it look like CPU 1 somehow managed to allocate memory that
      CPU 0 still had allocated for itself.
      
      In order to avoid this, emit the "free xxxxxx" tracing report just
      before the actual call to free the memory, instead of just after it.
      
      Link: https://lkml.kernel.org/r/374eb75d-7404-8721-4e1e-65b0e5b17279@huawei.com
      
      
      Signed-off-by: default avatarYunfeng Ye <yeyunfeng@huawei.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a543f00
    • Alexander Mikhalitsyn's avatar
      shm: extend forced shm destroy to support objects from several IPC nses · 85b6d246
      Alexander Mikhalitsyn authored
      Currently, the exit_shm() function not designed to work properly when
      task->sysvshm.shm_clist holds shm objects from different IPC namespaces.
      
      This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it
      leads to use-after-free (reproducer exists).
      
      This is an attempt to fix the problem by extending exit_shm mechanism to
      handle shm's destroy from several IPC ns'es.
      
      To achieve that we do several things:
      
      1. add a namespace (non-refcounted) pointer to the struct shmid_kernel
      
      2. during new shm object creation (newseg()/shmget syscall) we
         initialize this pointer by current task IPC ns
      
      3. exit_shm() fully reworked such that it traverses over all shp's in
         task->sysvshm.shm_clist and gets IPC namespace not from current task
         as it was before but from shp's object itself, then call
         shm_destroy(shp, ns).
      
      Note: We need to be really careful here, because as it was said before
      (1), our pointer to IPC ns non-refcnt'ed.  T...
      85b6d246
    • Alexander Mikhalitsyn's avatar
      ipc: WARN if trying to remove ipc object which is absent · 126e8bee
      Alexander Mikhalitsyn authored
      Patch series "shm: shm_rmid_forced feature fixes".
      
      Some time ago I met kernel crash after CRIU restore procedure,
      fortunately, it was CRIU restore, so, I had dump files and could do
      restore many times and crash reproduced easily.  After some
      investigation I've constructed the minimal reproducer.  It was found
      that it's use-after-free and it happens only if sysctl
      kernel.shm_rmid_forced = 1.
      
      The key of the problem is that the exit_shm() function not handles shp's
      object destroy when task->sysvshm.shm_clist contains items from
      different IPC namespaces.  In most cases this list will contain only
      items from one IPC namespace.
      
      How can this list contain object from different namespaces? The
      exit_shm() function is designed to clean up this list always when
      process leaves IPC namespace.  But we made a mistake a long time ago and
      did not add a exit_shm() call into the setns() syscall procedures.
      
      The first idea was just to add this call to setns() syscall but it
      obviously changes semantics of setns() syscall and that's
      userspace-visible change.  So, I gave up on this idea.
      
      The first real attempt to address the issue was just to omit forced
      destroy if we meet shp object not from current task IPC namespace [1].
      But that was not the best idea because task->sysvshm.shm_clist was
      protected by rwsem which belongs to current task IPC namespace.  It
      means that list corruption may occur.
      
      Second approach is just extend exit_shm() to properly handle shp's from
      different IPC namespaces [2].  This is really non-trivial thing, I've
      put a lot of effort into that but not believed that it's possible to
      make it fully safe, clean and clear.
      
      Thanks to the efforts of Manfred Spraul working an elegant solution was
      designed.  Thanks a lot, Manfred!
      
      Eric also suggested the way to address the issue in ("[RFC][PATCH] shm:
      In shm_exit destroy all created and never attached segments") Eric's
      idea was to maintain a list of shm_clists one per IPC namespace, use
      lock-less lists.  But there is some extra memory consumption-related
      concerns.
      
      An alternative solution which was suggested by me was implemented in
      ("shm: reset shm_clist on setns but omit forced shm destroy").  The idea
      is pretty simple, we add exit_shm() syscall to setns() but DO NOT
      destroy shm segments even if sysctl kernel.shm_rmid_forced = 1, we just
      clean up the task->sysvshm.shm_clist list.
      
      This chages semantics of setns() syscall a little bit but in comparision
      to the "naive" solution when we just add exit_shm() without any special
      exclusions this looks like a safer option.
      
      [1] https://lkml.org/lkml/2021/7/6/1108
      [2] https://lkml.org/lkml/2021/7/14/736
      
      This patch (of 2):
      
      Let's produce a warning if we trying to remove non-existing IPC object
      from IPC namespace kht/idr structures.
      
      This allows us to catch possible bugs when the ipc_rmid() function was
      called with inconsistent struct ipc_ids*, struct kern_ipc_perm*
      arguments.
      
      Link: https://lkml.kernel.org/r/20211027224348.611025-1-alexander.mikhalitsyn@virtuozzo.com
      Link: https://lkml.kernel.org/r/20211027224348.611025-2-alexander.mikhalitsyn@virtuozzo.com
      
      
      Co-developed-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAlexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Andrei Vagin <avagin@gmail.com>
      Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Cc: Vasily Averin <vvs@virtuozzo.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      126e8bee
    • Matthew Wilcox's avatar
      mm/swap.c:put_pages_list(): reinitialise the page list · 3cd018b4
      Matthew Wilcox authored
      While free_unref_page_list() puts pages onto the CPU local LRU list, it
      does not remove them from the list they were passed in on.  That makes
      the list_head appear to be non-empty, and would lead to various
      corruption problems if we didn't have an assertion that the list was
      empty.
      
      Reinitialise the list after calling free_unref_page_list() to avoid this
      problem.
      
      Link: https://lkml.kernel.org/r/YYp40A2lNrxaZji8@casper.infradead.org
      Fixes: 988c69f1
      
       ("mm: optimise put_pages_list()")
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarSteve French <stfrench@microsoft.com>
      Reported-by: default avatarNamjae Jeon <linkinjeon@kernel.org>
      Tested-by: default avatarSteve French <stfrench@microsoft.com>
      Tested-by: default avatarNamjae Jeon <linkinjeon@kernel.org>
      Cc: Steve French <smfrench@gmail.com>
      Cc: Hyeoncheol Lee <hyc.lee@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3cd018b4
  2. Nov 20, 2021
    • Linus Torvalds's avatar
      Merge tag 'libata-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · a90af8f1
      Linus Torvalds authored
      Pull libata fixes from Damien Le Moal:
      
       - Prevent accesses to unsupported log pages as that causes device scan
         failures with LLDDs using libsas (from me).
      
       - A couple of fixes for AMD AHCI adapters handling of low power modes
         and resume (from Mario).
      
       - Fix a compilation warning (from me).
      
      * tag 'libata-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
        ata: libata-sata: Declare ata_ncq_sdev_attrs static
        ata: libahci: Adjust behavior when StorageD3Enable _DSD is set
        ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile
        ata: libata: add missing ata_identify_page_supported() calls
        ata: libata: improve ata_read_log_page() error message
      a90af8f1
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.16-6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · e4365e36
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix double free in destroy_hist_field
      
       - Harden memset() of trace_iterator structure
      
       - Do not warn in trace printk check when test buffer fills up
      
      * tag 'trace-v5.16-6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Don't use out-of-sync va_list in event printing
        tracing: Use memset_startat() to zero struct trace_iterator
        tracing/histogram: Fix UAF in destroy_hist_field()
      e4365e36
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v5.16-2021-11-19' of... · 8b98436a
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v5.16-2021-11-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Fix the 'local_weight', 'weight' (memory access latency),
         'local_ins_lat', 'ins_lat' (instruction latency) and 'pstage_cyc'
         (pipeline stage cycles) sort key sample aggregation.
      
       - Fix 'perf test' entry for watchpoints on s/390.
      
       - Fix branch_stack entry endianness check in the 'perf test' sample
         parsing test.
      
       - Fix ARM SPE handling on 'perf inject'.
      
       - Fix memory leaks detected with ASan.
      
       - Fix build on arm64 related to reallocarray() availability.
      
       - Sync copies of kernel headers: cpufeatures, kvm, MIPS syscalltable
         (futex_waitv).
      
      * tag 'perf-tools-fixes-for-v5.16-2021-11-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        perf evsel: Fix memory leaks relating to unit
        perf report: Fix memory leaks around perf_tip()
        perf hist: Fix memory leak of a perf_hpp_fmt
        tools headers UAPI: Sync MIPS syscall table file changed by new futex_waitv syscall
        tools build: Fix removal of feature-sync-compare-and-swap feature detection
        perf inject: Fix ARM SPE handling
        perf bench: Fix two memory leaks detected with ASan
        perf test sample-parsing: Fix branch_stack entry endianness check
        tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
        perf sort: Fix the 'p_stage_cyc' sort key behavior
        perf sort: Fix the 'ins_lat' sort key behavior
        perf sort: Fix the 'weight' sort key behavior
        perf tools: Set COMPAT_NEED_REALLOCARRAY for CONFIG_AUXTRACE=1
        perf tests wp: Remove unused functions on s390
        tools headers UAPI: Sync linux/kvm.h with the kernel sources
        tools headers cpufeatures: Sync with the kernel sources
      8b98436a
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 9539ba43
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
       "I have two patches for 5.16:
      
         - allow external modules to be built against read-only source trees
      
         - turn KVM on in the defconfigs
      
        The second one isn't technically a fix, but it got tied up pending
        some defconfig cleanups that ended up finding some larger issues. I
        figured it'd be better to get the config changes some more testing,
        but didn't want to hold up turning KVM on for that"
      
      * tag 'riscv-for-linus-5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: fix building external modules
        RISC-V: Enable KVM in RV64 and RV32 defconfigs as a module
      9539ba43
    • Linus Torvalds's avatar
      Merge branch 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of... · 7af959b5
      Linus Torvalds authored
      Merge branch 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
      
      Pull exit-vs-signal handling fixes from Eric Biederman:
       "This is a small set of changes where debuggers were no longer able to
        intercept synchronous SIGTRAP and SIGSEGV, introduced by the exit
        cleanups.
      
        This is essentially the change you suggested with all of i's dotted
        and the t's crossed so that ptrace can intercept all of the cases it
        has been able to intercept the past, and all of the cases that made it
        to exit without giving ptrace a chance still don't give ptrace a
        chance"
      
      * 'SA_IMMUTABLE-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
        signal: Replace force_fatal_sig with force_exit_sig when in doubt
        signal: Don't always set SA_IMMUTABLE for forced signals
      7af959b5
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · ecd510d2
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Six fixes, five in drivers (ufs, qla2xxx, iscsi) and one core change
        to fix a regression in user space device state setting, which is used
        by the iscsi daemons to effect device recovery"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: qla2xxx: Fix mailbox direction flags in qla2xxx_get_adapter_id()
        scsi: ufs: core: Fix another task management completion race
        scsi: ufs: core: Fix task management completion timeout race
        scsi: core: sysfs: Fix hang when device state is set via sysfs
        scsi: iscsi: Unblock session then wake up error handler
        scsi: ufs: core: Improve SCSI abort handling
      ecd510d2
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · a8b5f8f2
      Linus Torvalds authored
      Pull rdma fixes from Jason Gunthorpe:
       "There are a few big regression items from the merge window suggesting
        that people are testing rc1's but not testing the for-next branches:
      
         - Warnings fixes
      
         - Crash in hf1 when creating QPs and setting counters
      
         - Some old mlx4 cards fail to probe due to missing counters
      
         - Syzkaller crash in the new counters code"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
        MAINTAINERS: Update for VMware PVRDMA driver
        RDMA/nldev: Check stat attribute before accessing it
        RDMA/mlx4: Do not fail the registration on port stats
        IB/hfi1: Properly allocate rdma counter desc memory
        RDMA/core: Set send and receive CQ before forwarding to the driver
        RDMA/netlink: Add __maybe_unused to static inline in C file
      a8b5f8f2
    • Linus Torvalds's avatar
      Merge tag 'gpio-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 44791698
      Linus Torvalds authored
      Pull gpio fixes from Bartosz Golaszewski:
      
       - fix a coccicheck warning in gpio-virtio
      
       - fix gpio selftests build issues
      
       - fix a Kconfig issue in gpio-rockchip
      
      * tag 'gpio-fixes-for-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpio: rockchip: needs GENERIC_IRQ_CHIP to fix build errors
        selftests: gpio: restore CFLAGS options
        selftests: gpio: fix uninitialised variable warning
        selftests: gpio: fix gpio compiling error
        gpio: virtio: remove unneeded semicolon
      44791698
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2021-11-19' of git://anongit.freedesktop.org/drm/drm · ad44518a
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "This week's fixes, pretty quiet, about right for rc2. amdgpu is the
        bulk of them but the scheduler ones have been reported in a few places
        I think.
      
        Otherwise just some minor i915 fixes and a few other scattered around:
      
        scheduler:
         - two refcounting fixes
      
        cma-helper:
         - use correct free path for noncoherent
      
        efifb:
         - probing fix
      
        amdgpu:
         - Better debugging info for SMU msgs
         - Better error reporting when adding IP blocks
         - Fix UVD powergating regression on CZ
         - Clock reporting fix for navi1x
         - OLED panel backlight fix
         - Fix scaling on VGA/DVI for non-DC display code
         - Fix GLFCLK handling for RGP on some APUs
         - fix potential memory leak
      
        amdkfd:
         - GPU reset fix
      
        i915:
         - return error handling fix
         - ADL-P display fix
         - TGL DSI display clocks fix
      
        nouveau:
         - infoframe corruption fix
      
        sun4i:
         - Kconfig fix"
      
      * tag 'drm-fixes-2021-11-19' of git://anongit.freedesktop.org/drm/drm:
        drm/amd/amdgpu: fix potential memleak
        drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
        drm/amd/pm: add GFXCLK/SCLK clocks level print support for APUs
        drm/amdgpu: fix set scaling mode Full/Full aspect/Center not works on vga and dvi connectors
        drm/amd/display: Fix OLED brightness control on eDP
        drm/amd/pm: Remove artificial freq level on Navi1x
        drm/amd/pm: avoid duplicate powergate/ungate setting
        drm/amdgpu: add error print when failing to add IP block(v2)
        drm/amd/pm: Enhanced reporting also for a stuck command
        drm/i915/guc: fix NULL vs IS_ERR() checking
        drm/i915/dsi/xelpd: Fix the bit mask for wakeup GB
        Revert "drm/i915/tgl/dsi: Gate the ddi clocks after pll mapping"
        fbdev: Prevent probing generic drivers if a FB is already registered
        drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder
        drm/scheduler: fix drm_sched_job_add_implicit_dependencies
        drm/sun4i: fix unmet dependency on RESET_CONTROLLER for PHY_SUN6I_MIPI_DPHY
        drm/cma-helper: Release non-coherent memory with dma_free_noncoherent()
        drm/nouveau: hdmigv100.c: fix corrupted HDMI Vendor InfoFrame
      ad44518a
    • Peter Zijlstra's avatar
      x86: Pin task-stack in __get_wchan() · 0dc636b3
      Peter Zijlstra authored
      When commit 5d1ceb39
      
       ("x86: Fix __get_wchan() for !STACKTRACE")
      moved from stacktrace to native unwind_*() usage, the
      try_get_task_stack() got lost, leading to use-after-free issues for
      dying tasks.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Fixes: 5d1ceb39 ("x86: Fix __get_wchan() for !STACKTRACE")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215031
      Link: https://lore.kernel.org/stable/YZV02RCRVHIa144u@fedora64.linuxtx.org/
      
      
      Reported-by: default avatarJustin Forbes <jmforbes@linuxtx.org>
      Reported-by: default avatarHolger Hoffstätte <holger@applied-asynchrony.com>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0dc636b3
  3. Nov 19, 2021