Skip to content
  1. May 28, 2020
    • Dongli Zhang's avatar
      nvme-pci: avoid race between nvme_reap_pending_cqes() and nvme_poll() · 9210c075
      Dongli Zhang authored
      There may be a race between nvme_reap_pending_cqes() and nvme_poll(), e.g.,
      when doing live reset while polling the nvme device.
      
            CPU X                        CPU Y
                                     nvme_poll()
      nvme_dev_disable()
      -> nvme_stop_queues()
      -> nvme_suspend_io_queues()
      -> nvme_suspend_queue()
                                     -> spin_lock(&nvmeq->cq_poll_lock);
      -> nvme_reap_pending_cqes()
         -> nvme_process_cq()        -> nvme_process_cq()
      
      In the above scenario, the nvme_process_cq() for the same queue may be
      running on both CPU X and CPU Y concurrently.
      
      It is much more easier to reproduce the issue when CONFIG_PREEMPT is
      enabled in kernel. When CONFIG_PREEMPT is disabled, it would take longer
      time for nvme_stop_queues()-->blk_mq_quiesce_queue() to wait for grace
      period.
      
      This patch protects nvme_process_cq() with nvmeq->cq_poll_lock in
      nvme_reap_pending_cqes().
      
      Fixes: fa46c6fb
      
       ("nvme/pci: move cqe check after device shutdown")
      Signed-off-by: default avatarDongli Zhang <dongli.zhang@oracle.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      9210c075
  2. May 13, 2020
  3. May 10, 2020
  4. May 07, 2020
  5. May 05, 2020
    • Tejun Heo's avatar
      iocost: protect iocg->abs_vdebt with iocg->waitq.lock · 0b80f986
      Tejun Heo authored
      
      
      abs_vdebt is an atomic_64 which tracks how much over budget a given cgroup
      is and controls the activation of use_delay mechanism. Once a cgroup goes
      over budget from forced IOs, it has to pay it back with its future budget.
      The progress guarantee on debt paying comes from the iocg being active -
      active iocgs are processed by the periodic timer, which ensures that as time
      passes the debts dissipate and the iocg returns to normal operation.
      
      However, both iocg activation and vdebt handling are asynchronous and a
      sequence like the following may happen.
      
      1. The iocg is in the process of being deactivated by the periodic timer.
      
      2. A bio enters ioc_rqos_throttle(), calls iocg_activate() which returns
         without anything because it still sees that the iocg is already active.
      
      3. The iocg is deactivated.
      
      4. The bio from #2 is over budget but needs to be forced. It increases
         abs_vdebt and goes over the threshold and enables use_delay.
      
      5. IO control is enabled for the iocg's subtree and now IOs are attributed
         to the descendant cgroups and the iocg itself no longer issues IOs.
      
      This leaves the iocg with stuck abs_vdebt - it has debt but inactive and no
      further IOs which can activate it. This can end up unduly punishing all the
      descendants cgroups.
      
      The usual throttling path has the same issue - the iocg must be active while
      throttled to ensure that future event will wake it up - and solves the
      problem by synchronizing the throttling path with a spinlock. abs_vdebt
      handling is another form of overage handling and shares a lot of
      characteristics including the fact that it isn't in the hottest path.
      
      This patch fixes the above and other possible races by strictly
      synchronizing abs_vdebt and use_delay handling with iocg->waitq.lock.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarVlad Dmitriev <vvd@fb.com>
      Cc: stable@vger.kernel.org # v5.4+
      Fixes: e1518f63
      
       ("blk-iocost: Don't let merges push vtime into the future")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0b80f986
  6. May 01, 2020
  7. Apr 30, 2020
  8. Apr 27, 2020
    • Niklas Cassel's avatar
      nvme: prevent double free in nvme_alloc_ns() error handling · 132be623
      Niklas Cassel authored
      When jumping to the out_put_disk label, we will call put_disk(), which will
      trigger a call to disk_release(), which calls blk_put_queue().
      
      Later in the cleanup code, we do blk_cleanup_queue(), which will also call
      blk_put_queue().
      
      Putting the queue twice is incorrect, and will generate a KASAN splat.
      
      Set the disk->queue pointer to NULL, before calling put_disk(), so that the
      first call to blk_put_queue() will not free the queue.
      
      The second call to blk_put_queue() uses another pointer to the same queue,
      so this call will still free the queue.
      
      Fixes: 85136c01
      
       ("lightnvm: simplify geometry enumeration")
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@wdc.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      132be623
  9. Apr 23, 2020
    • Damien Le Moal's avatar
      null_blk: Cleanup zoned device initialization · d205bde7
      Damien Le Moal authored
      
      
      Move all zoned mode related code from null_blk_main.c to
      null_blk_zoned.c, avoiding an ugly #ifdef in the process.
      Rename null_zone_init() into null_init_zoned_dev(), null_zone_exit()
      into null_free_zoned_dev() and add the new function
      null_register_zoned_dev() to finalize the zoned dev setup before
      add_disk().
      
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d205bde7
    • Damien Le Moal's avatar
      null_blk: Fix zoned command handling · 9dd44c7e
      Damien Le Moal authored
      
      
      For write operations issued to a null_blk device with zoned mode
      enabled, the state and write pointer position of the zone targeted by
      the command should be checked before badblocks and memory backing
      are handled as the write may be first failed due to, for instance, a
      sector position not aligned with the zone write pointer. This order of
      checking for errors reflects more accuratly the behavior of physical
      zoned devices.
      
      Furthermore, the write pointer position of the target zone should be
      incremented only and only if no errors are reported by badblocks and
      memory backing handling.
      
      To fix this, introduce the small helper function null_process_cmd()
      which execute null_handle_badblocks() and null_handle_memory_backed()
      and use this function in null_zone_write() to correctly handle write
      requests to zoned null devices depending on the type and state of the
      write target zone. Also call this function in null_handle_zoned() to
      process read requests to zoned null devices.
      
      null_process_cmd() is called directly from null_handle_cmd() for
      regular null devices, resulting in no functional change for these type
      of devices. To have symmetric names, the function null_handle_zoned()
      is renamed to null_process_zoned_cmd().
      
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9dd44c7e
  10. Apr 21, 2020
  11. Apr 20, 2020
    • Douglas Anderson's avatar
      bdev: Reduce time holding bd_mutex in sync in blkdev_close() · b849dd84
      Douglas Anderson authored
      
      
      While trying to "dd" to the block device for a USB stick, I
      encountered a hung task warning (blocked for > 120 seconds).  I
      managed to come up with an easy way to reproduce this on my system
      (where /dev/sdb is the block device for my USB stick) with:
      
        while true; do dd if=/dev/zero of=/dev/sdb bs=4M; done
      
      With my reproduction here are the relevant bits from the hung task
      detector:
      
       INFO: task udevd:294 blocked for more than 122 seconds.
       ...
       udevd           D    0   294      1 0x00400008
       Call trace:
        ...
        mutex_lock_nested+0x40/0x50
        __blkdev_get+0x7c/0x3d4
        blkdev_get+0x118/0x138
        blkdev_open+0x94/0xa8
        do_dentry_open+0x268/0x3a0
        vfs_open+0x34/0x40
        path_openat+0x39c/0xdf4
        do_filp_open+0x90/0x10c
        do_sys_open+0x150/0x3c8
        ...
      
       ...
       Showing all locks held in the system:
       ...
       1 lock held by dd/2798:
        #0: ffffff814ac1a3b8 (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0x50/0x204
       ...
       dd              D    0  2798   2764 0x00400208
       Call trace:
        ...
        schedule+0x8c/0xbc
        io_schedule+0x1c/0x40
        wait_on_page_bit_common+0x238/0x338
        __lock_page+0x5c/0x68
        write_cache_pages+0x194/0x500
        generic_writepages+0x64/0xa4
        blkdev_writepages+0x24/0x30
        do_writepages+0x48/0xa8
        __filemap_fdatawrite_range+0xac/0xd8
        filemap_write_and_wait+0x30/0x84
        __blkdev_put+0x88/0x204
        blkdev_put+0xc4/0xe4
        blkdev_close+0x28/0x38
        __fput+0xe0/0x238
        ____fput+0x1c/0x28
        task_work_run+0xb0/0xe4
        do_notify_resume+0xfc0/0x14bc
        work_pending+0x8/0x14
      
      The problem appears related to the fact that my USB disk is terribly
      slow and that I have a lot of RAM in my system to cache things.
      Specifically my writes seem to be happening at ~15 MB/s and I've got
      ~4 GB of RAM in my system that can be used for buffering.  To write 4
      GB of buffer to disk thus takes ~4000 MB / ~15 MB/s = ~267 seconds.
      
      The 267 second number is a problem because in __blkdev_put() we call
      sync_blockdev() while holding the bd_mutex.  Any other callers who
      want the bd_mutex will be blocked for the whole time.
      
      The problem is made worse because I believe blkdev_put() specifically
      tells other tasks (namely udev) to go try to access the device at right
      around the same time we're going to hold the mutex for a long time.
      
      Putting some traces around this (after disabling the hung task detector),
      I could confirm:
       dd:    437.608600: __blkdev_put() right before sync_blockdev() for sdb
       udevd: 437.623901: blkdev_open() right before blkdev_get() for sdb
       dd:    661.468451: __blkdev_put() right after sync_blockdev() for sdb
       udevd: 663.820426: blkdev_open() right after blkdev_get() for sdb
      
      A simple fix for this is to realize that sync_blockdev() works fine if
      you're not holding the mutex.  Also, it's not the end of the world if
      you sync a little early (though it can have performance impacts).
      Thus we can make a guess that we're going to need to do the sync and
      then do it without holding the mutex.  We still do one last sync with
      the mutex but it should be much, much faster.
      
      With this, my hung task warnings for my test case are gone.
      
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reviewed-by: default avatarGuenter Roeck <groeck@chromium.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b849dd84
  12. Apr 18, 2020
    • Zhiqiang Liu's avatar
      buffer: remove useless comment and WB_REASON_FREE_MORE_MEM, reason. · c4b4c2a7
      Zhiqiang Liu authored
      free_more_memory func has been completely removed in commit bc48f001
      ("buffer: eliminate the need to call free_more_memory() in __getblk_slow()")
      
      So comment and `WB_REASON_FREE_MORE_MEM` reason about free_more_memory
      are no longer needed.
      
      Fixes: bc48f001
      
       ("buffer: eliminate the need to call free_more_memory() in __getblk_slow()")
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarZhiqiang Liu <liuzhiqiang26@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c4b4c2a7
    • Linus Torvalds's avatar
      Merge tag 'docs-fixes' of git://git.lwn.net/linux · 90280eaa
      Linus Torvalds authored
      Pull documentation fixes from Jonathan Corbet:
       "A handful of fixes for reasonably obnoxious documentation issues"
      
      * tag 'docs-fixes' of git://git.lwn.net/linux:
        scripts: documentation-file-ref-check: Add line break before exit
        scripts/kernel-doc: Add missing close-paren in c:function directives
        docs: admin-guide: merge sections for the kernel.modprobe sysctl
        docs: timekeeping: Use correct prototype for deprecated functions
      90280eaa
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace · 5d286d5e
      Linus Torvalds authored
      Pull proc fix from Eric Biederman:
       "While running syzbot happened to spot one more oversight in my rework
        of proc_flush_task.
      
        The fields proc_self and proc_thread_self were not being reinitialized
        when proc was unmounted, which could cause problems if the mount of
        proc fails"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
        proc: Handle umounts cleanly
      5d286d5e
    • Linus Torvalds's avatar
      Merge tag 'mtd/fixes-for-5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux · ceb1adba
      Linus Torvalds authored
      Pull mtd fix from Richard Weinberger:
       "spi-nor: fix for missing directory after code refactoring"
      
      * tag 'mtd/fixes-for-5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
        mtd: spi-nor: Compile files in controllers/ directory
      ceb1adba
    • Linus Torvalds's avatar
      Merge tag 'linux-watchdog-5.7-rc2' of git://www.linux-watchdog.org/linux-watchdog · 1634615d
      Linus Torvalds authored
      Pull watchdog fix from Wim Van Sebroeck:
       "Fix restart handler in sp805 driver"
      
      * tag 'linux-watchdog-5.7-rc2' of git://www.linux-watchdog.org/linux-watchdog:
        watchdog: sp805: fix restart handler
      1634615d
    • Linus Torvalds's avatar
      Merge tag 'devicetree-fixes-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · 8fce9058
      Linus Torvalds authored
      Pull devicetree fixes from Rob Herring:
      
       - Fix warnings from enabling more dtc warnings which landed in the
         merge window and didn't get fixed in time.
      
       - Fix some document references from DT schema conversions
      
       - Fix kmemleak errors in DT unittests
      
      * tag 'devicetree-fixes-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (23 commits)
        kbuild: check libyaml installation for 'make dt_binding_check'
        of: unittest: kmemleak in duplicate property update
        of: overlay: kmemleak in dup_and_fixup_symbol_prop()
        of: unittest: kmemleak in of_unittest_overlay_high_level()
        of: unittest: kmemleak in of_unittest_platform_populate()
        of: unittest: kmemleak on changeset destroy
        MAINTAINERS: dt: fix pointers for ARM Integrator, Versatile and RealView
        MAINTAINERS: dt: update display/allwinner file entry
        dt-bindings: iio: dac: AD5570R fix bindings errors
        dt-bindings: Fix misspellings of "Analog Devices"
        dt-bindings: pwm: Fix cros-ec-pwm example dtc 'reg' warning
        docs: dt: rockchip,dwc3.txt: fix a pointer to a renamed file
        docs: dt: fix a broken reference for a file converted to json
        docs: dt: qcom,dwc3.txt: fix cross-reference for a converted file
        docs: dt: fix broken reference to phy-cadence-torrent.yaml
        dt-bindings: interrupt-controller: Fix loongson,parent_int_map property schema
        dt-bindings: hwmon: Fix incorrect $id paths
        dt-bindings: Fix dtc warnings on reg and ranges in examples
        dt-bindings: BD718x7 - add missing I2C bus properties
        dt-bindings: clock: syscon-icst: Remove unneeded unit name
        ...
      8fce9058
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 95988fbc
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas:
      
       - Remove vdso code trying to free unallocated pages.
      
       - Delete the space separator in the __emit_inst macro as it breaks the
         clang integrated assembler.
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: Delete the space separator in __emit_inst
        arm64: vdso: don't free unallocated pages
      95988fbc
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.7-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · d0a4ebe7
      Linus Torvalds authored
      Pull xen update from Juergen Gross:
      
       - a small cleanup patch
      
       - a security fix for a bug in the Xen hypervisor to avoid enabling Xen
         guests to crash dom0 on an unfixed hypervisor.
      
      * tag 'for-linus-5.7-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        arm/xen: make _xen_start_info static
        xen/xenbus: ensure xenbus_map_ring_valloc() returns proper grant status
      d0a4ebe7
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.7-2020-04-17' of git://git.kernel.dk/linux-block · a2286a44
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
      
       - wrap up the init/setup cleanup (Pavel)
      
       - fix some issues around deferral sequences (Pavel)
      
       - fix splice punt check using the wrong struct file member
      
       - apply poll re-arm logic for pollable retry too
      
       - pollable retry should honor cancelation
      
       - fix setup time error handling syzbot reported crash
      
       - restore work state when poll is canceled
      
      * tag 'io_uring-5.7-2020-04-17' of git://git.kernel.dk/linux-block:
        io_uring: don't count rqs failed after current one
        io_uring: kill already cached timeout.seq_offset
        io_uring: fix cached_sq_head in io_timeout()
        io_uring: only post events in io_poll_remove_all() if we completed some
        io_uring: io_async_task_func() should check and honor cancelation
        io_uring: check for need to re-wait in polled async handling
        io_uring: correct O_NONBLOCK check for splice punt
        io_uring: restore req->work when canceling poll request
        io_uring: move all request init code in one place
        io_uring: keep all sqe->flags in req->flags
        io_uring: early submission req fail code
        io_uring: track mm through current->mm
        io_uring: remove obsolete @mm_fault
      a2286a44
    • Linus Torvalds's avatar
      Merge tag 'block-5.7-2020-04-17' of git://git.kernel.dk/linux-block · bf9196d5
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - Fix for a driver tag leak in error handling (John)
      
       - Remove now defunct Kconfig selection from dasd (Stefan)
      
       - blk-wbt trace fiexs (Tommi)
      
      * tag 'block-5.7-2020-04-17' of git://git.kernel.dk/linux-block:
        blk-wbt: Drop needless newlines from tracepoint format strings
        blk-wbt: Use tracepoint_string() for wbt_step tracepoint string literals
        s390/dasd: remove IOSCHED_DEADLINE from DASD Kconfig
        blk-mq: Put driver tag in blk_mq_dispatch_rq_list() when no budget
      bf9196d5
    • Linus Torvalds's avatar
      Merge tag 'libata-5.7-2020-04-17' of git://git.kernel.dk/linux-block · 2acbb9e6
      Linus Torvalds authored
      Pull libata fixlet from Jens Axboe:
       "Add yet another Comet Lake PCI ID for ahci"
      
      * tag 'libata-5.7-2020-04-17' of git://git.kernel.dk/linux-block:
        ahci: Add Intel Comet Lake PCH-U PCI ID
      2acbb9e6
    • Linus Torvalds's avatar
      Merge tag 'for-5.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · c5304dd5
      Linus Torvalds authored
      Pull btrfs fix from David Sterba:
       "A regression fix for a warning caused by running balance and snapshot
        creation in parallel"
      
      * tag 'for-5.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: fix setting last_trans for reloc roots
      c5304dd5
    • Linus Torvalds's avatar
      Merge tag 'pm-5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 5a32fe48
      Linus Torvalds authored
      Pull power management update from Rafael Wysocki:
       "Allow the operating performance points (OPP) core to be used in the
        case when the same driver is used on different platforms, some of
        which have an OPP table and some of which have a clock node (Rajendra
        Nayak)"
      
      * tag 'pm-5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        opp: Manage empty OPP tables with clk handle
      5a32fe48
    • Linus Torvalds's avatar
      Merge tag 'sound-5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · c8a6552f
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "One significant regression fix is for HD-audio buffer preallocation.
        In 5.6 it was set to non-prompt for x86 and forced to 0, but this
        turned out to be problematic for some applications, hence it gets
        reverted. Distros would need to restore CONFIG_SND_HDA_PREALLOC_SIZE
        value to the earlier values they've used in the past.
      
        Other than that, we've received quite a few small fixes for HD-audio
        and USB-audio. Most of them are for dealing with the broken TRX40
        mobos and the runtime PM without HD-audio codecs"
      
      * tag 'sound-5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda: call runtime_allow() for all hda controllers
        ALSA: hda: Allow setting preallocation again for x86
        ALSA: hda: Explicitly permit using autosuspend if runtime PM is supported
        ALSA: hda: Skip controller resume if not needed
        ALSA: hda: Keep the controller initialization even if no codecs found
        ALSA: hda: Release resources at error in delayed probe
        ALSA: hda: Honor PM disablement in PM freeze and thaw_noirq ops
        ALSA: hda: Don't release card at firmware loading error
        ALSA: usb-audio: Check mapping at creating connector controls, too
        ALSA: usb-audio: Don't create jack controls for PCM terminals
        ALSA: usb-audio: Don't override ignore_ctl_error value from the map
        ALSA: usb-audio: Filter error from connector kctl ops, too
        ALSA: hda/realtek - Enable the headset mic on Asus FX505DT
        ALSA: ctxfi: Remove unnecessary cast in kfree
      c8a6552f
  13. Apr 17, 2020