Skip to content
  1. Oct 04, 2020
  2. Oct 03, 2020
  3. Oct 02, 2020
    • Linus Torvalds's avatar
      pipe: remove pipe_wait() and fix wakeup race with splice · 472e5b05
      Linus Torvalds authored
      The pipe splice code still used the old model of waiting for pipe IO by
      using a non-specific "pipe_wait()" that waited for any pipe event to
      happen, which depended on all pipe IO being entirely serialized by the
      pipe lock.  So by checking the state you were waiting for, and then
      adding yourself to the wait queue before dropping the lock, you were
      guaranteed to see all the wakeups.
      
      Strictly speaking, the actual wakeups were not done under the lock, but
      the pipe_wait() model still worked, because since the waiter held the
      lock when checking whether it should sleep, it would always see the
      current state, and the wakeup was always done after updating the state.
      
      However, commit 0ddad21d ("pipe: use exclusive waits when reading or
      writing") split the single wait-queue into two, and in the process also
      made the "wait for event" code wait for _two_ wait queues, and that then
      showed a race with the wakers that were not serialized by the pipe lock.
      
      It's only splice that used that "pipe_wait()" model, so the problem
      wasn't obvious, but Josef Bacik reports:
      
       "I hit a hang with fstest btrfs/187, which does a btrfs send into
        /dev/null. This works by creating a pipe, the write side is given to
        the kernel to write into, and the read side is handed to a thread that
        splices into a file, in this case /dev/null.
      
        The box that was hung had the write side stuck here [pipe_write] and
        the read side stuck here [splice_from_pipe_next -> pipe_wait].
      
        [ more details about pipe_wait() scenario ]
      
        The problem is we're doing the prepare_to_wait, which sets our state
        each time, however we can be woken up either with reads or writes. In
        the case above we race with the WRITER waking us up, and re-set our
        state to INTERRUPTIBLE, and thus never break out of schedule"
      
      Josef had a patch that avoided the issue in pipe_wait() by just making
      it set the state only once, but the deeper problem is that pipe_wait()
      depends on a level of synchonization by the pipe mutex that it really
      shouldn't.  And the whole "wait for any pipe state change" model really
      isn't very good to begin with.
      
      So rather than trying to work around things in pipe_wait(), remove that
      legacy model of "wait for arbitrary pipe event" entirely, and actually
      create functions that wait for the pipe actually being readable or
      writable, and can do so without depending on the pipe lock serializing
      everything.
      
      Fixes: 0ddad21d ("pipe: use exclusive waits when reading or writing")
      Link: https://lore.kernel.org/linux-fsdevel/bfa88b5ad6f069b2b679316b9e495a970130416c.1601567868.git.josef@toxicpanda.com/
      
      
      Reported-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-and-tested-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      472e5b05
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 44b6e23b
      Linus Torvalds authored
      Pull iommu fixes from Joerg Roedel:
      
       - Fix a device reference counting bug in the Exynos IOMMU driver.
      
       - Lockdep fix for the Intel VT-d driver.
      
       - Fix a bug in the AMD IOMMU driver which caused corruption of the IVRS
         ACPI table and caused IOMMU driver initialization failures in kdump
         kernels.
      
      * tag 'iommu-fixes-v5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/vt-d: Fix lockdep splat in iommu_flush_dev_iotlb()
        iommu/amd: Fix the overwritten field in IVMD header
        iommu/exynos: add missing put_device() call in exynos_iommu_of_xlate()
      44b6e23b
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · eed2ef44
      Linus Torvalds authored
      Pull arm64 fix from Catalin Marinas:
       "A previous commit to prevent AML memory opregions from accessing the
        kernel memory turned out to be too restrictive. Relax the permission
        check to permit the ACPI core to map kernel memory used for table
        overrides"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: permit ACPI core to map kernel memory used for table overrides
      eed2ef44
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2020-10-01-1' of git://anongit.freedesktop.org/drm/drm · fcadab74
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "AMD and vmwgfx fixes.
      
        Just dequeuing these a bit early as the AMD ones are bit larger than
        I'd prefer, but Alex missed last week so it's a double set of fixes.
        The larger ones are just register header fixes for the new chips that
        were just introduced in rc1 along with some new PCI IDs for new hw.
        Otherwise it is usual fixes.
      
        The vmwgfx fix was due to some testing I was doing and found we
        weren't booting properly, vmware had the fix internally so hurried it
      
        vmwgfx:
         - fix a regression due to TTM refactor
      
        amdgpu:
         - Fix potential double free in userptr handling
         - Sienna Cichlid and Navy Flounder udpates
         - Add Sienna Cichlid PCI IDs
         - Drop experimental flag for navi12
         - Raven fixes
         - Renoir fixes
         - HDCP fix
         - DCN3 fix for clang and older versions of gcc
         - Fix a runtime pm refcount issue"
      
      * tag 'drm-fixes-2020-10-01-1' of git://anongit.freedesktop.org/drm/drm:
        drm/amdgpu: disable gfxoff temporarily for navy_flounder
        drm/amd/pm: setup APU dpm clock table in SMU HW initialization
        drm/vmwgfx: Fix error handling in get_node
        drm/amd/display: remove duplicate call to rn_vbios_smu_get_smu_version()
        drm/amdgpu/swsmu/smu12: fix force clock handling for mclk
        drm/amdgpu: restore proper ref count in amdgpu_display_crtc_set_config
        drm/amdgpu/display: fix CFLAGS setup for DCN30
        drm/amd/display: fix return value check for hdcp_work
        drm/amdgpu: remove gpu_info fw support for sienna_cichlid etc.
        drm/amd/pm: Removed fixed clock in auto mode DPM
        drm/amdgpu: remove experimental flag from navi12
        drm/amdgpu: add device ID for sienna_cichlid (v2)
        drm/amdgpu: use the AV1 defines for VCN 3.0
        drm/amdgpu: add VCN 3.0 AV1 registers
        drm/amdgpu: add the GC 10.3 VRS registers
        drm/amdgpu: prevent double kfree ttm->sg
      fcadab74
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · aa5ff935
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "Two tracing fixes:
      
         - Fix temp buffer accounting that caused a WARNING for
           ftrace_dump_on_opps()
      
         - Move the recursion check in one of the function callback helpers to
           the beginning of the function, as if the rcu_is_watching() gets
           traced, it will cause a recursive loop that will crash the kernel"
      
      * tag 'trace-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        ftrace: Move RCU is watching check after recursion check
        tracing: Fix trace_find_next_entry() accounting of temp buffer size
      aa5ff935
  4. Oct 01, 2020
    • Lu Baolu's avatar
      iommu/vt-d: Fix lockdep splat in iommu_flush_dev_iotlb() · 1a3f2fd7
      Lu Baolu authored
      
      
      Lock(&iommu->lock) without disabling irq causes lockdep warnings.
      
      [   12.703950] ========================================================
      [   12.703962] WARNING: possible irq lock inversion dependency detected
      [   12.703975] 5.9.0-rc6+ #659 Not tainted
      [   12.703983] --------------------------------------------------------
      [   12.703995] systemd-udevd/284 just changed the state of lock:
      [   12.704007] ffffffffbd6ff4d8 (device_domain_lock){..-.}-{2:2}, at:
                     iommu_flush_dev_iotlb.part.57+0x2e/0x90
      [   12.704031] but this lock took another, SOFTIRQ-unsafe lock in the past:
      [   12.704043]  (&iommu->lock){+.+.}-{2:2}
      [   12.704045]
      
                     and interrupts could create inverse lock ordering between
                     them.
      
      [   12.704073]
                     other info that might help us debug this:
      [   12.704085]  Possible interrupt unsafe locking scenario:
      
      [   12.704097]        CPU0                    CPU1
      [   12.704106]        ----                    ----
      [   12.704115]   lock(&iommu->lock);
      [   12.704123]                                local_irq_disable();
      [   12.704134]                                lock(device_domain_lock);
      [   12.704146]                                lock(&iommu->lock);
      [   12.704158]   <Interrupt>
      [   12.704164]     lock(device_domain_lock);
      [   12.704174]
                      *** DEADLOCK ***
      
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Link: https://lore.kernel.org/r/20200927062428.13713-1-baolu.lu@linux.intel.com
      
      
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      1a3f2fd7
    • Adrian Huang's avatar
      iommu/amd: Fix the overwritten field in IVMD header · 0bbe4ced
      Adrian Huang authored
      Commit 387caf0b ("iommu/amd: Treat per-device exclusion
      ranges as r/w unity-mapped regions") accidentally overwrites
      the 'flags' field in IVMD (struct ivmd_header) when the I/O
      virtualization memory definition is associated with the
      exclusion range entry. This leads to the corrupted IVMD table
      (incorrect checksum). The kdump kernel reports the invalid checksum:
      
      ACPI BIOS Warning (bug): Incorrect checksum in table [IVRS] - 0x5C, should be 0x60 (20200717/tbprint-177)
      AMD-Vi: [Firmware Bug]: IVRS invalid checksum
      
      Fix the above-mentioned issue by modifying the 'struct unity_map_entry'
      member instead of the IVMD header.
      
      Cleanup: The *exclusion_range* functions are not used anymore, so
      get rid of them.
      
      Fixes: 387caf0b
      
       ("iommu/amd: Treat per-device exclusion ranges as r/w unity-mapped regions")
      Reported-and-tested-by: default avatarBaoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAdrian Huang <ahuang12@lenovo.com>
      Cc: Jerry Snitselaar <jsnitsel@redhat.com>
      Link: https://lore.kernel.org/r/20200926102602.19177-1-adrianhuang0701@gmail.com
      
      
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      0bbe4ced
    • Andy Shevchenko's avatar
      gpio: pca953x: Correctly initialize registers 6 and 7 for PCA957x · 8c1f1c34
      Andy Shevchenko authored
      When driver has been converted to the bitmap API the non-bitmap functions
      started behaving differently on 32-bit BE architectures since the bytes in
      two consequent unsigned longs are in different order in comparison to byte
      array. Hence if the chip had had more than 32 lines the memset() call over
      it would have not set up upper lines correctly.
      Although it's currently a theoretical case (no supported chips of this type
      has 32+ lines), it's better to provide a clean code to avoid people thinking
      this is okay and potentially producing not fully working things.
      
      Fixes: 35d13d94
      
       ("gpio: pca953x: convert to use bitmap API")
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: default avatarBartosz Golaszewski <bgolaszewski@baylibre.com>
      Link: https://lore.kernel.org/r/20200930142013.59247-2-andriy.shevchenko@linux.intel.com
      
      
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      8c1f1c34
    • Andy Shevchenko's avatar
      gpio: pca953x: Use bitmap API over implicit GCC extension · e09e200e
      Andy Shevchenko authored
      
      
      In IRQ handler we have to clear bitmap before use. Currently
      the GCC extension has been used for that. For sake of the consistency
      switch to bitmap API. As expected bloat-o-meter shows no difference
      in the object size.
      
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: default avatarBartosz Golaszewski <bgolaszewski@baylibre.com>
      Link: https://lore.kernel.org/r/20200930142013.59247-1-andriy.shevchenko@linux.intel.com
      
      
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      e09e200e
    • Hanks Chen's avatar
      pinctrl: mediatek: check mtk_is_virt_gpio input parameter · 39c4dbe4
      Hanks Chen authored
      check mtk_is_virt_gpio input parameter,
      virtual gpio need to support eint mode.
      
      add error handler for the ko case
      to fix this boot fail:
      pc : mtk_is_virt_gpio+0x20/0x38 [pinctrl_mtk_common_v2]
      lr : mtk_gpio_get_direction+0x44/0xb0 [pinctrl_paris]
      
      Fixes: edd54646
      
       ("pinctrl: mediatek: avoid virtual gpio trying to set reg")
      Signed-off-by: default avatarHanks Chen <hanks.chen@mediatek.com>
      Acked-by: default avatarSean Wang <sean.wang@kernel.org>
      Singed-off-by: default avatarJie Yang <sin_jieyang@mediatek.com>
      Link: https://lore.kernel.org/r/1597922546-29633-1-git-send-email-hanks.chen@mediatek.com
      
      
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      39c4dbe4
    • Dmitry Baryshkov's avatar
      pinctrl: qcom: sm8250: correct sdc2_clk · 5d8ff95a
      Dmitry Baryshkov authored
      Correct sdc2_clk pin definition (register offset is wrong, verified by
      the msm-4.19 driver).
      
      Fixes: 4e3ec9e4
      
       ("pinctrl: qcom: Add sm8250 pinctrl driver.")
      Signed-off-by: default avatarDmitry Baryshkov <dmitry.baryshkov@linaro.org>
      Reviewed-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Acked-by: default avatarManivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Link: https://lore.kernel.org/r/20200914091846.55204-1-dmitry.baryshkov@linaro.org
      
      
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      5d8ff95a
    • Dave Airlie's avatar
      Merge tag 'amd-drm-fixes-5.9-2020-09-30' of... · 132d7c8a
      Dave Airlie authored
      Merge tag 'amd-drm-fixes-5.9-2020-09-30' of git://people.freedesktop.org/~agd5f/linux
      
       into drm-fixes
      
      amd-drm-fixes-5.9-2020-09-30:
      
      amdgpu:
      - Fix potential double free in userptr handling
      - Sienna Cichlid and Navy Flounder udpates
      - Add Sienna Cichlid PCI IDs
      - Drop experimental flag for navi12
      - Raven fixes
      - Renoir fixes
      - HDCP fix
      - DCN3 fix for clang and older versions of gcc
      - Fix a runtime pm refcount issue
      
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Alex Deucher <alexdeucher@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200930161326.4243-1-alexander.deucher@amd.com
      132d7c8a
    • Pali Rohár's avatar
    • Ard Biesheuvel's avatar
      arm64: permit ACPI core to map kernel memory used for table overrides · a509a66a
      Ard Biesheuvel authored
      Jonathan reports that the strict policy for memory mapped by the
      ACPI core breaks the use case of passing ACPI table overrides via
      initramfs. This is due to the fact that the memory type used for
      loading the initramfs in memory is not recognized as a memory type
      that is typically used by firmware to pass firmware tables.
      
      Since the purpose of the strict policy is to ensure that no AML or
      other ACPI code can manipulate any memory that is used by the kernel
      to keep its internal state or the state of user tasks, we can relax
      the permission check, and allow mappings of memory that is reserved
      and marked as NOMAP via memblock, and therefore not covered by the
      linear mapping to begin with.
      
      Fixes: 1583052d ("arm64/acpi: disallow AML memory opregions to access kernel memory")
      Fixes: 325f5585
      
       ("arm64/acpi: disallow writeable AML opregion mapping for EFI code regions")
      Reported-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Sudeep Holla <sudeep.holla@arm.com>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Link: https://lore.kernel.org/r/20200929132522.18067-1-ardb@kernel.org
      
      
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      a509a66a
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 60e72093
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "Another batch of clk driver fixes:
      
         - Make sure DRAM and ChipID region doesn't get disabled on Exynos
      
         - Fix a SATA failure on Tegra
      
         - Fix the emac_ptp clk divider on stratix10"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: socfpga: stratix10: fix the divider for the emac_ptp_free_clk
        clk: samsung: exynos4: mark 'chipid' clock as CLK_IGNORE_UNUSED
        clk: tegra: Fix missing prototype for tegra210_clk_register_emc()
        clk: tegra: Always program PLL_E when enabled
        clk: tegra: Capitalization fixes
        clk: samsung: Keep top BPLL mux on Exynos542x enabled
      60e72093
    • Anup Patel's avatar
      RISC-V: Check clint_time_val before use · aa988760
      Anup Patel authored
      The NoMMU kernel is broken for QEMU virt machine from Linux-5.9-rc6
      because clint_time_val is used even before CLINT driver is probed
      at following places:
      1. rand_initialize() calls get_cycles() which in-turn uses
         clint_time_val
      2. boot_init_stack_canary() calls get_cycles() which in-turn
         uses clint_time_val
      
      The issue#1 (above) is fixed by providing custom random_get_entropy()
      for RISC-V NoMMU kernel. For issue#2 (above), we remove dependency of
      boot_init_stack_canary() on get_cycles() and this is aligned with the
      boot_init_stack_canary() implementations of ARM, ARM64 and MIPS kernel.
      
      Fixes: d5be89a8
      
       ("RISC-V: Resurrect the MMIO timer implementation for M-mode systems")
      Signed-off-by: default avatarAnup Patel <anup.patel@wdc.com>
      Tested-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarPalmer Dabbelt <palmerdabbelt@google.com>
      aa988760
    • Filipe Manana's avatar
      btrfs: fix filesystem corruption after a device replace · 4c8f3532
      Filipe Manana authored
      We use a device's allocation state tree to track ranges in a device used
      for allocated chunks, and we set ranges in this tree when allocating a new
      chunk. However after a device replace operation, we were not setting the
      allocated ranges in the new device's allocation state tree, so that tree
      is empty after a device replace.
      
      This means that a fitrim operation after a device replace will trim the
      device ranges that have allocated chunks and extents, as we trim every
      range for which there is not a range marked in the device's allocation
      state tree. It is also important during chunk allocation, since the
      device's allocation state is used to determine if a range is already
      allocated when allocating a new chunk.
      
      This is trivial to reproduce and the following script triggers the bug:
      
        $ cat reproducer.sh
        #!/bin/bash
      
        DEV1="/dev/sdg"
        DEV2="/dev/sdh"
        DEV3="/dev/sdi"
      
        wipefs -a $DEV1 $DEV2 $DEV3 &> /dev/null
      
        # Create a raid1 test fs on 2 devices.
        mkfs.btrfs -f -m raid1 -d raid1 $DEV1 $DEV2 > /dev/null
        mount $DEV1 /mnt/btrfs
      
        xfs_io -f -c "pwrite -S 0xab 0 10M" /mnt/btrfs/foo
      
        echo "Starting to replace $DEV1 with $DEV3"
        btrfs replace start -B $DEV1 $DEV3 /mnt/btrfs
        echo
      
        echo "Running fstrim"
        fstrim /mnt/btrfs
        echo
      
        echo "Unmounting filesystem"
        umount /mnt/btrfs
      
        echo "Mounting filesystem in degraded mode using $DEV3 only"
        wipefs -a $DEV1 $DEV2 &> /dev/null
        mount -o degraded $DEV3 /mnt/btrfs
        if [ $? -ne 0 ]; then
                dmesg | tail
                echo
                echo "Failed to mount in degraded mode"
                exit 1
        fi
      
        echo
        echo "File foo data (expected all bytes = 0xab):"
        od -A d -t x1 /mnt/btrfs/foo
      
        umount /mnt/btrfs
      
      When running the reproducer:
      
        $ ./replace-test.sh
        wrote 10485760/10485760 bytes at offset 0
        10 MiB, 2560 ops; 0.0901 sec (110.877 MiB/sec and 28384.5216 ops/sec)
        Starting to replace /dev/sdg with /dev/sdi
      
        Running fstrim
      
        Unmounting filesystem
        Mounting filesystem in degraded mode using /dev/sdi only
        mount: /mnt/btrfs: wrong fs type, bad option, bad superblock on /dev/sdi, missing codepage or helper program, or other error.
        [19581.748641] BTRFS info (device sdg): dev_replace from /dev/sdg (devid 1) to /dev/sdi started
        [19581.803842] BTRFS info (device sdg): dev_replace from /dev/sdg (devid 1) to /dev/sdi finished
        [19582.208293] BTRFS info (device sdi): allowing degraded mounts
        [19582.208298] BTRFS info (device sdi): disk space caching is enabled
        [19582.208301] BTRFS info (device sdi): has skinny extents
        [19582.212853] BTRFS warning (device sdi): devid 2 uuid 1f731f47-e1bb-4f00-bfbb-9e5a0cb4ba9f is missing
        [19582.213904] btree_readpage_end_io_hook: 25839 callbacks suppressed
        [19582.213907] BTRFS error (device sdi): bad tree block start, want 30490624 have 0
        [19582.214780] BTRFS warning (device sdi): failed to read root (objectid=7): -5
        [19582.231576] BTRFS error (device sdi): open_ctree failed
      
        Failed to mount in degraded mode
      
      So fix by setting all allocated ranges in the replace target device when
      the replace operation is finishing, when we are holding the chunk mutex
      and we can not race with new chunk allocations.
      
      A test case for fstests follows soon.
      
      Fixes: 1c11b63e
      
       ("btrfs: replace pending/pinned chunks lists with io tree")
      CC: stable@vger.kernel.org # 5.2+
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      4c8f3532
    • Josef Bacik's avatar
      btrfs: move btrfs_rm_dev_replace_free_srcdev outside of all locks · a466c85e
      Josef Bacik authored
      When closing and freeing the source device we could end up doing our
      final blkdev_put() on the bdev, which will grab the bd_mutex.  As such
      we want to be holding as few locks as possible, so move this call
      outside of the dev_replace->lock_finishing_cancel_unmount lock.  Since
      we're modifying the fs_devices we need to make sure we're holding the
      uuid_mutex here, so take that as well.
      
      There's a report from syzbot probably hitting one of the cases where
      the bd_mutex and device_list_mutex are taken in the wrong order, however
      it's not with device replace, like this patch fixes. As there's no
      reproducer available so far, we can't verify the fix.
      
      https://lore.kernel.org/lkml/000000000000fc04d105afcf86d7@google.com/
      dashboard link: https://syzkaller.appspot.com/bug?extid=84a0634dc5d21d488419
      
      
      
        WARNING: possible circular locking dependency detected
        5.9.0-rc5-syzkaller #0 Not tainted
        ------------------------------------------------------
        syz-executor.0/6878 is trying to acquire lock:
        ffff88804c17d780 (&bdev->bd_mutex){+.+.}-{3:3}, at: blkdev_put+0x30/0x520 fs/block_dev.c:1804
      
        but task is already holding lock:
        ffff8880908cfce0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: close_fs_devices.part.0+0x2e/0x800 fs/btrfs/volumes.c:1159
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #4 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
      	 __mutex_lock_common kernel/locking/mutex.c:956 [inline]
      	 __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
      	 btrfs_finish_chunk_alloc+0x281/0xf90 fs/btrfs/volumes.c:5255
      	 btrfs_create_pending_block_groups+0x2f3/0x700 fs/btrfs/block-group.c:2109
      	 __btrfs_end_transaction+0xf5/0x690 fs/btrfs/transaction.c:916
      	 find_free_extent_update_loop fs/btrfs/extent-tree.c:3807 [inline]
      	 find_free_extent+0x23b7/0x2e60 fs/btrfs/extent-tree.c:4127
      	 btrfs_reserve_extent+0x166/0x460 fs/btrfs/extent-tree.c:4206
      	 cow_file_range+0x3de/0x9b0 fs/btrfs/inode.c:1063
      	 btrfs_run_delalloc_range+0x2cf/0x1410 fs/btrfs/inode.c:1838
      	 writepage_delalloc+0x150/0x460 fs/btrfs/extent_io.c:3439
      	 __extent_writepage+0x441/0xd00 fs/btrfs/extent_io.c:3653
      	 extent_write_cache_pages.constprop.0+0x69d/0x1040 fs/btrfs/extent_io.c:4249
      	 extent_writepages+0xcd/0x2b0 fs/btrfs/extent_io.c:4370
      	 do_writepages+0xec/0x290 mm/page-writeback.c:2352
      	 __writeback_single_inode+0x125/0x1400 fs/fs-writeback.c:1461
      	 writeback_sb_inodes+0x53d/0xf40 fs/fs-writeback.c:1721
      	 wb_writeback+0x2ad/0xd40 fs/fs-writeback.c:1894
      	 wb_do_writeback fs/fs-writeback.c:2039 [inline]
      	 wb_workfn+0x2dc/0x13e0 fs/fs-writeback.c:2080
      	 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
      	 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
      	 kthread+0x3b5/0x4a0 kernel/kthread.c:292
      	 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
      
        -> #3 (sb_internal#2){.+.+}-{0:0}:
      	 percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
      	 __sb_start_write+0x234/0x470 fs/super.c:1672
      	 sb_start_intwrite include/linux/fs.h:1690 [inline]
      	 start_transaction+0xbe7/0x1170 fs/btrfs/transaction.c:624
      	 find_free_extent_update_loop fs/btrfs/extent-tree.c:3789 [inline]
      	 find_free_extent+0x25e1/0x2e60 fs/btrfs/extent-tree.c:4127
      	 btrfs_reserve_extent+0x166/0x460 fs/btrfs/extent-tree.c:4206
      	 cow_file_range+0x3de/0x9b0 fs/btrfs/inode.c:1063
      	 btrfs_run_delalloc_range+0x2cf/0x1410 fs/btrfs/inode.c:1838
      	 writepage_delalloc+0x150/0x460 fs/btrfs/extent_io.c:3439
      	 __extent_writepage+0x441/0xd00 fs/btrfs/extent_io.c:3653
      	 extent_write_cache_pages.constprop.0+0x69d/0x1040 fs/btrfs/extent_io.c:4249
      	 extent_writepages+0xcd/0x2b0 fs/btrfs/extent_io.c:4370
      	 do_writepages+0xec/0x290 mm/page-writeback.c:2352
      	 __writeback_single_inode+0x125/0x1400 fs/fs-writeback.c:1461
      	 writeback_sb_inodes+0x53d/0xf40 fs/fs-writeback.c:1721
      	 wb_writeback+0x2ad/0xd40 fs/fs-writeback.c:1894
      	 wb_do_writeback fs/fs-writeback.c:2039 [inline]
      	 wb_workfn+0x2dc/0x13e0 fs/fs-writeback.c:2080
      	 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
      	 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
      	 kthread+0x3b5/0x4a0 kernel/kthread.c:292
      	 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
      
        -> #2 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}:
      	 __flush_work+0x60e/0xac0 kernel/workqueue.c:3041
      	 wb_shutdown+0x180/0x220 mm/backing-dev.c:355
      	 bdi_unregister+0x174/0x590 mm/backing-dev.c:872
      	 del_gendisk+0x820/0xa10 block/genhd.c:933
      	 loop_remove drivers/block/loop.c:2192 [inline]
      	 loop_control_ioctl drivers/block/loop.c:2291 [inline]
      	 loop_control_ioctl+0x3b1/0x480 drivers/block/loop.c:2257
      	 vfs_ioctl fs/ioctl.c:48 [inline]
      	 __do_sys_ioctl fs/ioctl.c:753 [inline]
      	 __se_sys_ioctl fs/ioctl.c:739 [inline]
      	 __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
      	 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #1 (loop_ctl_mutex){+.+.}-{3:3}:
      	 __mutex_lock_common kernel/locking/mutex.c:956 [inline]
      	 __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
      	 lo_open+0x19/0xd0 drivers/block/loop.c:1893
      	 __blkdev_get+0x759/0x1aa0 fs/block_dev.c:1507
      	 blkdev_get fs/block_dev.c:1639 [inline]
      	 blkdev_open+0x227/0x300 fs/block_dev.c:1753
      	 do_dentry_open+0x4b9/0x11b0 fs/open.c:817
      	 do_open fs/namei.c:3251 [inline]
      	 path_openat+0x1b9a/0x2730 fs/namei.c:3368
      	 do_filp_open+0x17e/0x3c0 fs/namei.c:3395
      	 do_sys_openat2+0x16d/0x420 fs/open.c:1168
      	 do_sys_open fs/open.c:1184 [inline]
      	 __do_sys_open fs/open.c:1192 [inline]
      	 __se_sys_open fs/open.c:1188 [inline]
      	 __x64_sys_open+0x119/0x1c0 fs/open.c:1188
      	 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #0 (&bdev->bd_mutex){+.+.}-{3:3}:
      	 check_prev_add kernel/locking/lockdep.c:2496 [inline]
      	 check_prevs_add kernel/locking/lockdep.c:2601 [inline]
      	 validate_chain kernel/locking/lockdep.c:3218 [inline]
      	 __lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4426
      	 lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006
      	 __mutex_lock_common kernel/locking/mutex.c:956 [inline]
      	 __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
      	 blkdev_put+0x30/0x520 fs/block_dev.c:1804
      	 btrfs_close_bdev fs/btrfs/volumes.c:1117 [inline]
      	 btrfs_close_bdev fs/btrfs/volumes.c:1107 [inline]
      	 btrfs_close_one_device fs/btrfs/volumes.c:1133 [inline]
      	 close_fs_devices.part.0+0x1a4/0x800 fs/btrfs/volumes.c:1161
      	 close_fs_devices fs/btrfs/volumes.c:1193 [inline]
      	 btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179
      	 close_ctree+0x688/0x6cb fs/btrfs/disk-io.c:4149
      	 generic_shutdown_super+0x144/0x370 fs/super.c:464
      	 kill_anon_super+0x36/0x60 fs/super.c:1108
      	 btrfs_kill_super+0x38/0x50 fs/btrfs/super.c:2265
      	 deactivate_locked_super+0x94/0x160 fs/super.c:335
      	 deactivate_super+0xad/0xd0 fs/super.c:366
      	 cleanup_mnt+0x3a3/0x530 fs/namespace.c:1118
      	 task_work_run+0xdd/0x190 kernel/task_work.c:141
      	 tracehook_notify_resume include/linux/tracehook.h:188 [inline]
      	 exit_to_user_mode_loop kernel/entry/common.c:163 [inline]
      	 exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190
      	 syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        other info that might help us debug this:
      
        Chain exists of:
          &bdev->bd_mutex --> sb_internal#2 --> &fs_devs->device_list_mutex
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(&fs_devs->device_list_mutex);
      				 lock(sb_internal#2);
      				 lock(&fs_devs->device_list_mutex);
          lock(&bdev->bd_mutex);
      
         *** DEADLOCK ***
      
        3 locks held by syz-executor.0/6878:
         #0: ffff88809070c0e0 (&type->s_umount_key#70){++++}-{3:3}, at: deactivate_super+0xa5/0xd0 fs/super.c:365
         #1: ffffffff8a5b37a8 (uuid_mutex){+.+.}-{3:3}, at: btrfs_close_devices+0x23/0x1f0 fs/btrfs/volumes.c:1178
         #2: ffff8880908cfce0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: close_fs_devices.part.0+0x2e/0x800 fs/btrfs/volumes.c:1159
      
        stack backtrace:
        CPU: 0 PID: 6878 Comm: syz-executor.0 Not tainted 5.9.0-rc5-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Call Trace:
         __dump_stack lib/dump_stack.c:77 [inline]
         dump_stack+0x198/0x1fd lib/dump_stack.c:118
         check_noncircular+0x324/0x3e0 kernel/locking/lockdep.c:1827
         check_prev_add kernel/locking/lockdep.c:2496 [inline]
         check_prevs_add kernel/locking/lockdep.c:2601 [inline]
         validate_chain kernel/locking/lockdep.c:3218 [inline]
         __lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4426
         lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006
         __mutex_lock_common kernel/locking/mutex.c:956 [inline]
         __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103
         blkdev_put+0x30/0x520 fs/block_dev.c:1804
         btrfs_close_bdev fs/btrfs/volumes.c:1117 [inline]
         btrfs_close_bdev fs/btrfs/volumes.c:1107 [inline]
         btrfs_close_one_device fs/btrfs/volumes.c:1133 [inline]
         close_fs_devices.part.0+0x1a4/0x800 fs/btrfs/volumes.c:1161
         close_fs_devices fs/btrfs/volumes.c:1193 [inline]
         btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179
         close_ctree+0x688/0x6cb fs/btrfs/disk-io.c:4149
         generic_shutdown_super+0x144/0x370 fs/super.c:464
         kill_anon_super+0x36/0x60 fs/super.c:1108
         btrfs_kill_super+0x38/0x50 fs/btrfs/super.c:2265
         deactivate_locked_super+0x94/0x160 fs/super.c:335
         deactivate_super+0xad/0xd0 fs/super.c:366
         cleanup_mnt+0x3a3/0x530 fs/namespace.c:1118
         task_work_run+0xdd/0x190 kernel/task_work.c:141
         tracehook_notify_resume include/linux/tracehook.h:188 [inline]
         exit_to_user_mode_loop kernel/entry/common.c:163 [inline]
         exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190
         syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x460027
        RSP: 002b:00007fff59216328 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        RAX: 0000000000000000 RBX: 0000000000076035 RCX: 0000000000460027
        RDX: 0000000000403188 RSI: 0000000000000002 RDI: 00007fff592163d0
        RBP: 0000000000000333 R08: 0000000000000000 R09: 000000000000000b
        R10: 0000000000000005 R11: 0000000000000246 R12: 00007fff59217460
        R13: 0000000002df2a60 R14: 0000000000000000 R15: 00007fff59217460
      
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      [ add syzbot reference ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      a466c85e
  5. Sep 30, 2020