Skip to content
  1. Oct 03, 2020
  2. Oct 02, 2020
    • Linus Torvalds's avatar
      pipe: remove pipe_wait() and fix wakeup race with splice · 472e5b05
      Linus Torvalds authored
      The pipe splice code still used the old model of waiting for pipe IO by
      using a non-specific "pipe_wait()" that waited for any pipe event to
      happen, which depended on all pipe IO being entirely serialized by the
      pipe lock.  So by checking the state you were waiting for, and then
      adding yourself to the wait queue before dropping the lock, you were
      guaranteed to see all the wakeups.
      
      Strictly speaking, the actual wakeups were not done under the lock, but
      the pipe_wait() model still worked, because since the waiter held the
      lock when checking whether it should sleep, it would always see the
      current state, and the wakeup was always done after updating the state.
      
      However, commit 0ddad21d ("pipe: use exclusive waits when reading or
      writing") split the single wait-queue into two, and in the process also
      made the "wait for event" code wait for _two_ wait queues, and that then
      showed a race with the wakers that were not serialized by the pipe lock.
      
      It's only splice that used that "pipe_wait()" model, so the problem
      wasn't obvious, but Josef Bacik reports:
      
       "I hit a hang with fstest btrfs/187, which does a btrfs send into
        /dev/null. This works by creating a pipe, the write side is given to
        the kernel to write into, and the read side is handed to a thread that
        splices into a file, in this case /dev/null.
      
        The box that was hung had the write side stuck here [pipe_write] and
        the read side stuck here [splice_from_pipe_next -> pipe_wait].
      
        [ more details about pipe_wait() scenario ]
      
        The problem is we're doing the prepare_to_wait, which sets our state
        each time, however we can be woken up either with reads or writes. In
        the case above we race with the WRITER waking us up, and re-set our
        state to INTERRUPTIBLE, and thus never break out of schedule"
      
      Josef had a patch that avoided the issue in pipe_wait() by just making
      it set the state only once, but the deeper problem is that pipe_wait()
      depends on a level of synchonization by the pipe mutex that it really
      shouldn't.  And the whole "wait for any pipe state change" model really
      isn't very good to begin with.
      
      So rather than trying to work around things in pipe_wait(), remove that
      legacy model of "wait for arbitrary pipe event" entirely, and actually
      create functions that wait for the pipe actually being readable or
      writable, and can do so without depending on the pipe lock serializing
      everything.
      
      Fixes: 0ddad21d ("pipe: use exclusive waits when reading or writing")
      Link: https://lore.kernel.org/linux-fsdevel/bfa88b5ad6f069b2b679316b9e495a970130416c.1601567868.git.josef@toxicpanda.com/
      
      
      Reported-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-and-tested-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      472e5b05
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 44b6e23b
      Linus Torvalds authored
      Pull iommu fixes from Joerg Roedel:
      
       - Fix a device reference counting bug in the Exynos IOMMU driver.
      
       - Lockdep fix for the Intel VT-d driver.
      
       - Fix a bug in the AMD IOMMU driver which caused corruption of the IVRS
         ACPI table and caused IOMMU driver initialization failures in kdump
         kernels.
      
      * tag 'iommu-fixes-v5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/vt-d: Fix lockdep splat in iommu_flush_dev_iotlb()
        iommu/amd: Fix the overwritten field in IVMD header
        iommu/exynos: add missing put_device() call in exynos_iommu_of_xlate()
      44b6e23b
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · eed2ef44
      Linus Torvalds authored
      Pull arm64 fix from Catalin Marinas:
       "A previous commit to prevent AML memory opregions from accessing the
        kernel memory turned out to be too restrictive. Relax the permission
        check to permit the ACPI core to map kernel memory used for table
        overrides"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: permit ACPI core to map kernel memory used for table overrides
      eed2ef44
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2020-10-01-1' of git://anongit.freedesktop.org/drm/drm · fcadab74
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "AMD and vmwgfx fixes.
      
        Just dequeuing these a bit early as the AMD ones are bit larger than
        I'd prefer, but Alex missed last week so it's a double set of fixes.
        The larger ones are just register header fixes for the new chips that
        were just introduced in rc1 along with some new PCI IDs for new hw.
        Otherwise it is usual fixes.
      
        The vmwgfx fix was due to some testing I was doing and found we
        weren't booting properly, vmware had the fix internally so hurried it
      
        vmwgfx:
         - fix a regression due to TTM refactor
      
        amdgpu:
         - Fix potential double free in userptr handling
         - Sienna Cichlid and Navy Flounder udpates
         - Add Sienna Cichlid PCI IDs
         - Drop experimental flag for navi12
         - Raven fixes
         - Renoir fixes
         - HDCP fix
         - DCN3 fix for clang and older versions of gcc
         - Fix a runtime pm refcount issue"
      
      * tag 'drm-fixes-2020-10-01-1' of git://anongit.freedesktop.org/drm/drm:
        drm/amdgpu: disable gfxoff temporarily for navy_flounder
        drm/amd/pm: setup APU dpm clock table in SMU HW initialization
        drm/vmwgfx: Fix error handling in get_node
        drm/amd/display: remove duplicate call to rn_vbios_smu_get_smu_version()
        drm/amdgpu/swsmu/smu12: fix force clock handling for mclk
        drm/amdgpu: restore proper ref count in amdgpu_display_crtc_set_config
        drm/amdgpu/display: fix CFLAGS setup for DCN30
        drm/amd/display: fix return value check for hdcp_work
        drm/amdgpu: remove gpu_info fw support for sienna_cichlid etc.
        drm/amd/pm: Removed fixed clock in auto mode DPM
        drm/amdgpu: remove experimental flag from navi12
        drm/amdgpu: add device ID for sienna_cichlid (v2)
        drm/amdgpu: use the AV1 defines for VCN 3.0
        drm/amdgpu: add VCN 3.0 AV1 registers
        drm/amdgpu: add the GC 10.3 VRS registers
        drm/amdgpu: prevent double kfree ttm->sg
      fcadab74
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · aa5ff935
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "Two tracing fixes:
      
         - Fix temp buffer accounting that caused a WARNING for
           ftrace_dump_on_opps()
      
         - Move the recursion check in one of the function callback helpers to
           the beginning of the function, as if the rcu_is_watching() gets
           traced, it will cause a recursive loop that will crash the kernel"
      
      * tag 'trace-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        ftrace: Move RCU is watching check after recursion check
        tracing: Fix trace_find_next_entry() accounting of temp buffer size
      aa5ff935
  3. Oct 01, 2020
  4. Sep 30, 2020
  5. Sep 29, 2020
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-5.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · fb0155a0
      Linus Torvalds authored
      Pull NFS client bugfixes from Trond Myklebust:
       "Highlights include:
      
         - NFSv4.2: copy_file_range needs to invalidate caches on success
      
         - NFSv4.2: Fix security label length not being reset
      
         - pNFS/flexfiles: Ensure we initialise the mirror bsizes correctly
           on read
      
         - pNFS/flexfiles: Fix signed/unsigned type issues with mirror
           indices"
      
      * tag 'nfs-for-5.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        pNFS/flexfiles: Be consistent about mirror index types
        pNFS/flexfiles: Ensure we initialise the mirror bsizes correctly on read
        NFSv4.2: fix client's attribute cache management for copy_file_range
        nfs: Fix security label length not being reset
      fb0155a0
    • Jason A. Donenfeld's avatar
      mm: do not rely on mm == current->mm in __get_user_pages_locked · a4d63c37
      Jason A. Donenfeld authored
      
      
      It seems likely this block was pasted from internal_get_user_pages_fast,
      which is not passed an mm struct and therefore uses current's.  But
      __get_user_pages_locked is passed an explicit mm, and current->mm is not
      always valid. This was hit when being called from i915, which uses:
      
        pin_user_pages_remote->
          __get_user_pages_remote->
            __gup_longterm_locked->
              __get_user_pages_locked
      
      Before, this would lead to an OOPS:
      
        BUG: kernel NULL pointer dereference, address: 0000000000000064
        #PF: supervisor write access in kernel mode
        #PF: error_code(0x0002) - not-present page
        CPU: 10 PID: 1431 Comm: kworker/u33:1 Tainted: P S   U     O      5.9.0-rc7+ #140
        Hardware name: LENOVO 20QTCTO1WW/20QTCTO1WW, BIOS N2OET47W (1.34 ) 08/06/2020
        Workqueue: i915-userptr-acquire __i915_gem_userptr_get_pages_worker [i915]
        RIP: 0010:__get_user_pages_remote+0xd7/0x310
        Call Trace:
         __i915_gem_userptr_get_pages_worker+0xc8/0x260 [i915]
         process_one_work+0x1ca/0x390
         worker_thread+0x48/0x3c0
         kthread+0x114/0x130
         ret_from_fork+0x1f/0x30
        CR2: 0000000000000064
      
      This commit fixes the problem by using the mm pointer passed to the
      function rather than the bogus one in current.
      
      Fixes: 008cfe44 ("mm: Introduce mm_struct.has_pinned")
      Tested-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reported-by: default avatarHarald Arnesen <harald@skogtun.org>
      Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a4d63c37
  6. Sep 28, 2020
    • Maxime Ripard's avatar
      ARM: dts: bcm2835: Change firmware compatible from simple-bus to simple-mfd · 6a754830
      Maxime Ripard authored
      
      
      The current binding for the RPi firmware uses the simple-bus compatible as
      a fallback to benefit from its automatic probing of child nodes.
      
      However, simple-bus also comes with some constraints, like having the ranges,
      our case.
      
      Let's switch to simple-mfd that provides the same probing logic without
      those constraints.
      
      Signed-off-by: default avatarMaxime Ripard <maxime@cerno.tech>
      Link: https://lore.kernel.org/r/20200924082642.18144-1-maxime@cerno.tech
      
      
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      6a754830
    • Linus Torvalds's avatar
      Linux 5.9-rc7 · a1b8638b
      Linus Torvalds authored
      a1b8638b
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v5.9-4' of... · 16bc1d54
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - ignore compiler stubs for PPC to fix builds
      
       - fix the usage of --target mentioned in the LLVM document
      
      * tag 'kbuild-fixes-v5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        Documentation/llvm: Fix clang target examples
        scripts/kallsyms: skip ppc compiler stub *.long_branch.* / *.plt_branch.*
      16bc1d54
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f8818559
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "Two fixes for the x86 interrupt code:
      
         - Unbreak the magic 'search the timer interrupt' logic in IO/APIC
           code which got wreckaged when the core interrupt code made the
           state tracking logic stricter.
      
           That caused the interrupt line to stay masked after switching from
           IO/APIC to PIC delivery mode, which obviously prevents interrupts
           from being delivered.
      
         - Make run_on_irqstack_code() typesafe. The function argument is a
           void pointer which is then cast to 'void (*fun)(void *).
      
           This breaks Control Flow Integrity checking in clang. Use proper
           helper functions for the three variants reuqired"
      
      * tag 'x86-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/ioapic: Unbreak check_timer()
        x86/irq: Make run_on_irqstack_cond() typesafe
      f8818559
    • Linus Torvalds's avatar
      Merge tag 'timers-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ba25f057
      Linus Torvalds authored
      Pull timer updates from Thomas Gleixner:
       "A set of clocksource/clockevents updates:
      
         - Reset the TI/DM timer before enabling it instead of doing it the
           other way round.
      
         - Initialize the reload value for the GX6605s timer correctly so the
           hardware counter starts at 0 again after overrun.
      
         - Make error return value negative in the h8300 timer init function"
      
      * tag 'timers-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource/drivers/timer-gx6605s: Fixup counter reload
        clocksource/drivers/timer-ti-dm: Do reset before enable
        clocksource/drivers/h8300_timer8: Fix wrong return value in h8300_8timer_init()
      ba25f057
    • Peter Xu's avatar
      mm/thp: Split huge pmds/puds if they're pinned when fork() · d042035e
      Peter Xu authored
      
      
      Pinned pages shouldn't be write-protected when fork() happens, because
      follow up copy-on-write on these pages could cause the pinned pages to
      be replaced by random newly allocated pages.
      
      For huge PMDs, we split the huge pmd if pinning is detected.  So that
      future handling will be done by the PTE level (with our latest changes,
      each of the small pages will be copied).  We can achieve this by let
      copy_huge_pmd() return -EAGAIN for pinned pages, so that we'll
      fallthrough in copy_pmd_range() and finally land the next
      copy_pte_range() call.
      
      Huge PUDs will be even more special - so far it does not support
      anonymous pages.  But it can actually be done the same as the huge PMDs
      even if the split huge PUDs means to erase the PUD entries.  It'll
      guarantee the follow up fault ins will remap the same pages in either
      parent/child later.
      
      This might not be the most efficient way, but it should be easy and
      clean enough.  It should be fine, since we're tackling with a very rare
      case just to make sure userspaces that pinned some thps will still work
      even without MADV_DONTFORK and after they fork()ed.
      
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d042035e
    • Peter Xu's avatar
      mm: Do early cow for pinned pages during fork() for ptes · 70e806e4
      Peter Xu authored
      This allows copy_pte_range() to do early cow if the pages were pinned on
      the source mm.
      
      Currently we don't have an accurate way to know whether a page is pinned
      or not.  The only thing we have is page_maybe_dma_pinned().  However
      that's good enough for now.  Especially, with the newly added
      mm->has_pinned flag to make sure we won't affect processes that never
      pinned any pages.
      
      It would be easier if we can do GFP_KERNEL allocation within
      copy_one_pte().  Unluckily, we can't because we're with the page table
      locks held for both the parent and child processes.  So the page
      allocation needs to be done outside copy_one_pte().
      
      Some trick is there in copy_present_pte(), majorly the wrprotect trick
      to block concurrent fast-gup.  Comments in the function should explain
      better in place.
      
      Oleg Nesterov reported a (probably harmless) bug during review that we
      didn't reset entry.val properly in copy_pte_range() so that potentially
      there's chance to call add_swap_count_continuation() multiple times on
      the same swp entry.  However that should be harmless since even if it
      happens, the same function (add_swap_count_continuation()) will return
      directly noticing that there're enough space for the swp counter.  So
      instead of a standalone stable patch, it is touched up in this patch
      directly.
      
      Link: https://lore.kernel.org/lkml/20200914143829.GA1424636@nvidia.com/
      
      
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      70e806e4
    • Peter Xu's avatar
      mm/fork: Pass new vma pointer into copy_page_range() · 7a4830c3
      Peter Xu authored
      
      
      This prepares for the future work to trigger early cow on pinned pages
      during fork().
      
      No functional change intended.
      
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7a4830c3