Skip to content
  1. Jun 08, 2017
    • Paolo Valente's avatar
      block, bfq: access and cache blkg data only when safe · 8f9bebc3
      Paolo Valente authored
      
      
      In blk-cgroup, operations on blkg objects are protected with the
      request_queue lock. This is no more the lock that protects
      I/O-scheduler operations in blk-mq. In fact, the latter are now
      protected with a finer-grained per-scheduler-instance lock. As a
      consequence, although blkg lookups are also rcu-protected, blk-mq I/O
      schedulers may see inconsistent data when they access blkg and
      blkg-related objects. BFQ does access these objects, and does incur
      this problem, in the following case.
      
      The blkg_lookup performed in bfq_get_queue, being protected (only)
      through rcu, may happen to return the address of a copy of the
      original blkg. If this is the case, then the blkg_get performed in
      bfq_get_queue, to pin down the blkg, is useless: it does not prevent
      blk-cgroup code from destroying both the original blkg and all objects
      directly or indirectly referred by the copy of the blkg. BFQ accesses
      these objects, which typically causes a crash for NULL-pointer
      dereference of memory-protection violation.
      
      Some additional protection mechanism should be added to blk-cgroup to
      address this issue. In the meantime, this commit provides a quick
      temporary fix for BFQ: cache (when safe) blkg data that might
      disappear right after a blkg_lookup.
      
      In particular, this commit exploits the following facts to achieve its
      goal without introducing further locks.  Destroy operations on a blkg
      invoke, as a first step, hooks of the scheduler associated with the
      blkg. And these hooks are executed with bfqd->lock held for BFQ. As a
      consequence, for any blkg associated with the request queue an
      instance of BFQ is attached to, we are guaranteed that such a blkg is
      not destroyed, and that all the pointers it contains are consistent,
      while that instance is holding its bfqd->lock. A blkg_lookup performed
      with bfqd->lock held then returns a fully consistent blkg, which
      remains consistent until this lock is held. In more detail, this holds
      even if the returned blkg is a copy of the original one.
      
      Finally, also the object describing a group inside BFQ needs to be
      protected from destruction on the blkg_free of the original blkg
      (which invokes bfq_pd_free). This commit adds private refcounting for
      this object, to let it disappear only after no bfq_queue refers to it
      any longer.
      
      This commit also removes or updates some stale comments on locking
      issues related to blk-cgroup operations.
      
      Reported-by: default avatarTomas Konir <tomas.konir@gmail.com>
      Reported-by: default avatarLee Tibbert <lee.tibbert@gmail.com>
      Reported-by: default avatarMarco Piazza <mpiazza@gmail.com>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Tested-by: default avatarTomas Konir <tomas.konir@gmail.com>
      Tested-by: default avatarLee Tibbert <lee.tibbert@gmail.com>
      Tested-by: default avatarMarco Piazza <mpiazza@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      8f9bebc3
    • Jens Axboe's avatar
      Merge branch 'nvme-4.12' of git://git.infradead.org/nvme into for-linus · 85d0331a
      Jens Axboe authored
      Christoph writes:
      
      "A few NVMe fixes for 4.12-rc, PCIe reset fixes and APST fixes, a
       RDMA reconnect fix, two FC fixes and a general controller removal fix."
      85d0331a
    • James Wang's avatar
      Fix loop device flush before configure v3 · 64604957
      James Wang authored
      While installing SLES-12 (based on v4.4), I found that the installer
      will stall for 60+ seconds during LVM disk scan.  The root cause was
      determined to be the removal of a bound device check in loop_flush()
      by commit b5dd2f60
      
       ("block: loop: improve performance via blk-mq").
      
      Restoring this check, examining ->lo_state as set by loop_set_fd()
      eliminates the bad behavior.
      
      Test method:
      modprobe loop max_loop=64
      dd if=/dev/zero of=disk bs=512 count=200K
      for((i=0;i<4;i++))do losetup -f disk; done
      mkfs.ext4 -F /dev/loop0
      for((i=0;i<4;i++))do mkdir t$i; mount /dev/loop$i t$i;done
      for f in `ls /dev/loop[0-9]*|sort`; do \
      	echo $f; dd if=$f of=/dev/null  bs=512 count=1; \
      	done
      
      Test output:  stock          patched
      /dev/loop0    18.1217e-05    8.3842e-05
      /dev/loop1     6.1114e-05    0.000147979
      /dev/loop10    0.414701      0.000116564
      /dev/loop11    0.7474        6.7942e-05
      /dev/loop12    0.747986      8.9082e-05
      /dev/loop13    0.746532      7.4799e-05
      /dev/loop14    0.480041      9.3926e-05
      /dev/loop15    1.26453       7.2522e-05
      
      Note that from loop10 onward, the device is not mounted, yet the
      stock kernel consumes several orders of magnitude more wall time
      than it does for a mounted device.
      (Thanks for Mike Galbraith <efault@gmx.de>, give a changelog review.)
      
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJames Wang <jnwang@suse.com>
      Fixes: b5dd2f60
      
       ("block: loop: improve performance via blk-mq")
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      64604957
  2. Jun 07, 2017
    • Shaohua Li's avatar
      blk-throttle: set default latency baseline for harddisk · 6679a90c
      Shaohua Li authored
      
      
      hard disk IO latency varies a lot depending on spindle move. The latency
      range could be from several microseconds to several milliseconds. It's
      pretty hard to get the baseline latency used by io.low.
      
      We will use a different stragety here. The idea is only using IO with
      spindle move to determine if cgroup IO is in good state. For HD, if io
      latency is small (< 1ms), we ignore the IO. Such IO is likely from
      sequential IO, and is helpless to help determine if a cgroup's IO is
      impacted by other cgroups. With this, we only account IO with big
      latency. Then we can choose a hardcoded baseline latency for HD (4ms,
      which is typical IO latency with seek).  With all these settings, the
      io.low latency works for both HD and SSD.
      
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      6679a90c
    • Joseph Qi's avatar
      blk-throttle: fix NULL pointer dereference in throtl_schedule_pending_timer · a41b816c
      Joseph Qi authored
      I have encountered a NULL pointer dereference in
      throtl_schedule_pending_timer:
        [  413.735396] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
        [  413.735535] IP: [<ffffffff812ebbbf>] throtl_schedule_pending_timer+0x3f/0x210
        [  413.735643] PGD 22c8cf067 PUD 22cb34067 PMD 0
        [  413.735713] Oops: 0000 [#1] SMP
        ......
      
      This is caused by the following case:
        blk_throtl_bio
          throtl_schedule_next_dispatch  <= sq is top level one without parent
            throtl_schedule_pending_timer
              sq_to_tg(sq)->td->throtl_slice  <= sq_to_tg(sq) returns NULL
      
      Fix it by using sq_to_td instead of sq_to_tg(sq)->td, which will always
      return a valid td.
      
      Fixes: 297e3d85
      
       ("blk-throttle: make throtl_slice tunable")
      Signed-off-by: default avatarJoseph Qi <qijiang.qj@alibaba-inc.com>
      Reviewed-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      a41b816c
    • Kai-Heng Feng's avatar
      nvme: relax APST default max latency to 100ms · 9947d6a0
      Kai-Heng Feng authored
      
      
      Christoph Hellwig suggests we should to make APST work out of the box.
      Hence relax the the default max latency to make them able to enter
      deepest power state on default.
      
      Here are id-ctrl excerpts from two high latency NVMes:
      
      vid     : 0x14a4
      ssvid   : 0x1b4b
      mn      : CX2-GB1024-Q11 NVMe LITEON 1024GB
      ps    3 : mp:0.1000W non-operational enlat:5000 exlat:5000 rrt:3 rrl:3
                rwt:3 rwl:3 idle_power:- active_power:-
      ps    4 : mp:0.0100W non-operational enlat:50000 exlat:100000 rrt:4 rrl:4
                rwt:4 rwl:4 idle_power:- active_power:-
      
      vid     : 0x15b7
      ssvid   : 0x1b4b
      mn      : A400 NVMe SanDisk 512GB
      ps    3 : mp:0.0500W non-operational enlat:51000 exlat:10000 rrt:0 rrl:0
                rwt:0 rwl:0 idle_power:- active_power:-
      ps    4 : mp:0.0055W non-operational enlat:1000000 exlat:100000 rrt:0 rrl:0
                rwt:0 rwl:0 idle_power:- active_power:-
      
      Signed-off-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      9947d6a0
    • Kai-Heng Feng's avatar
      nvme: only consider exit latency when choosing useful non-op power states · da87591b
      Kai-Heng Feng authored
      
      
      When a NVMe is in non-op states, the latency is exlat.
      The latency will be enlat + exlat only when the NVMe tries to transit
      from operational state right atfer it begins to transit to
      non-operational state, which should be a rare case.
      
      Therefore, as Andy Lutomirski suggests, use exlat only when deciding power
      states to trainsit to.
      
      Signed-off-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      da87591b
    • James Smart's avatar
      nvme-fc: fix missing put reference on controller create failure · 24b7f059
      James Smart authored
      
      
      The failure case, of a create controller request, called
      nvme_uninit_ctrl() but didn't do a put to allow the nvme
      controller to be deleted.
      
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      24b7f059
    • James Smart's avatar
      nvme-fc: on lldd/transport io error, terminate association · f874d5d0
      James Smart authored
      
      
      Per FC-NVME, when lldd or transport detects an i/o error, the
      connection must be terminated, which in turn requires the association
      to be termianted.  Currently the transport simply creates a nvme
      completion status of transport error and returns the io. The FC-NVME
      spec makes the mandate as initiator and host, depending on the error,
      can get out of sync on outstanding io counts (sqhd/sqtail).
      
      Implement the association teardown on lldd or transport detected
      errors.
      
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      f874d5d0
    • Sagi Grimberg's avatar
      nvme-rdma: fast fail incoming requests while we reconnect · e818a5b4
      Sagi Grimberg authored
      
      
      When we encounter an transport/controller errors, error recovery
      kicks in which performs:
      1. stops io/admin queues
      2. moves transport queues out of LIVE state
      3. fast fail pending io
      4. schedule periodic reconnects.
      
      But we also need to fast fail incoming IO taht enters after we
      already scheduled. Given that our queue is not LIVE anymore, simply
      restart the request queues to fail in .queue_rq
      
      Reported-by: default avatarAlex Turin <alex@vastdata.com>
      Reported-by: default avatarshahar.salzman <shahar.salzman@gmail.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: stable@vger.kernel.org
      e818a5b4
    • Rakesh Pandit's avatar
      nvme-pci: fix multiple ctrl removal scheduling · 82b057ca
      Rakesh Pandit authored
      Commit c5f6ce97 tries to address multiple resets but fails as
      work_busy doesn't involve any synchronization and can fail.  This is
      reproducible easily as can be seen by WARNING below which is triggered
      with line:
      
      WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING)
      
      Allowing multiple resets can result in multiple controller removal as
      well if different conditions inside nvme_reset_work fail and which
      might deadlock on device_release_driver.
      
      [  480.327007] WARNING: CPU: 3 PID: 150 at drivers/nvme/host/pci.c:1900 nvme_reset_work+0x36c/0xec0
      [  480.327008] Modules linked in: rfcomm fuse nf_conntrack_netbios_ns nf_conntrack_broadcast...
      [  480.327044]  btusb videobuf2_core ghash_clmulni_intel snd_hwdep cfg80211 acer_wmi hci_uart..
      [  480.327065] CPU: 3 PID: 150 Comm: kworker/u16:2 Not tainted 4.12.0-rc1+ #13
      [  480.327065] Hardware name: Acer Predator G9-591/Mustang_SLS, BIOS V1.10 03/03/2016
      [  480.327066] Workqueue: nvme nvme_reset_work
      [  480.327067] task: ffff880498ad8000 task.stack: ffffc90002218000
      [  480.327068] RIP: 0010:nvme_reset_work+0x36c/0xec0
      [  480.327069] RSP: 0018:ffffc9000221bdb8 EFLAGS: 00010246
      [  480.327070] RAX: 0000000000460000 RBX: ffff880498a98128 RCX: dead000000000200
      [  480.327070] RDX: 0000000000000001 RSI: ffff8804b1028020 RDI: ffff880498a98128
      [  480.327071] RBP: ffffc9000221be50 R08: 0000000000000000 R09: 0000000000000000
      [  480.327071] R10: ffffc90001963ce8 R11: 000000000000020d R12: ffff880498a98000
      [  480.327072] R13: ffff880498a53500 R14: ffff880498a98130 R15: ffff880498a98128
      [  480.327072] FS:  0000000000000000(0000) GS:ffff8804c1cc0000(0000) knlGS:0000000000000000
      [  480.327073] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  480.327074] CR2: 00007ffcf3c37f78 CR3: 0000000001e09000 CR4: 00000000003406e0
      [  480.327074] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  480.327075] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  480.327075] Call Trace:
      [  480.327079]  ? __switch_to+0x227/0x400
      [  480.327081]  process_one_work+0x18c/0x3a0
      [  480.327082]  worker_thread+0x4e/0x3b0
      [  480.327084]  kthread+0x109/0x140
      [  480.327085]  ? process_one_work+0x3a0/0x3a0
      [  480.327087]  ? kthread_park+0x60/0x60
      [  480.327102]  ret_from_fork+0x2c/0x40
      [  480.327103] Code: e8 5a dc ff ff 85 c0 41 89 c1 0f.....
      
      This patch addresses the problem by using state of controller to
      decide whether reset should be queued or not as state change is
      synchronizated using controller spinlock.  Also cancel_work_sync is
      used to make sure remove cancels the reset_work and waits for it to
      finish.  This patch also changes return value from -ENODEV to more
      appropriate -EBUSY if nvme_reset fails to change state.
      
      Fixes: c5f6ce97
      
       ("nvme: don't schedule multiple resets")
      Signed-off-by: default avatarRakesh Pandit <rakesh@tuxera.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      82b057ca
    • Ming Lei's avatar
      nvme: fix hang in remove path · 82654b6b
      Ming Lei authored
      We need to start admin queues too in nvme_kill_queues()
      for avoiding hang in remove path[1].
      
      This patch is very similar with 806f026f(nvme: use
      blk_mq_start_hw_queues() in nvme_kill_queues()).
      
      [1] hang stack trace
      [<ffffffff813c9716>] blk_execute_rq+0x56/0x80
      [<ffffffff815cb6e9>] __nvme_submit_sync_cmd+0x89/0xf0
      [<ffffffff815ce7be>] nvme_set_features+0x5e/0x90
      [<ffffffff815ce9f6>] nvme_configure_apst+0x166/0x200
      [<ffffffff815cef45>] nvme_set_latency_tolerance+0x35/0x50
      [<ffffffff8157bd11>] apply_constraint+0xb1/0xc0
      [<ffffffff8157cbb4>] dev_pm_qos_constraints_destroy+0xf4/0x1f0
      [<ffffffff8157b44a>] dpm_sysfs_remove+0x2a/0x60
      [<ffffffff8156d951>] device_del+0x101/0x320
      [<ffffffff8156db8a>] device_unregister+0x1a/0x60
      [<ffffffff8156dc4c>] device_destroy+0x3c/0x50
      [<ffffffff815cd295>] nvme_uninit_ctrl+0x45/0xa0
      [<ffffffff815d4858>] nvme_remove+0x78/0x110
      [<ffffffff81452b69>] pci_device_remove+0x39/0xb0
      [<ffffffff81572935>] device_release_driver_internal+0x155/0x210
      [<ffffffff81572a02>] device_release_driver+0x12/0x20
      [<ffffffff815d36fb>] nvme_remove_dead_ctrl_work+0x6b/0x70
      [<ffffffff810bf3bc>] process_one_work+0x18c/0x3a0
      [<ffffffff810bf61e>] worker_thread+0x4e/0x3b0
      [<ffffffff810c5ac9>] kthread+0x109/0x140
      [<ffffffff8185800c>] ret_from_fork+0x2c/0x40
      [<ffffffffffffffff>] 0xffffffffffffffff
      
      Fixes: c5552fde
      
      ("nvme: Enable autonomous power state transitions")
      Reported-by: default avatarRakesh Pandit <rakesh@tuxera.com>
      Tested-by: default avatarRakesh Pandit <rakesh@tuxera.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      82654b6b
    • Eric Biggers's avatar
      elevator: fix truncation of icq_cache_name · 9bd2bbc0
      Eric Biggers authored
      
      
      gcc 7.1 reports the following warning:
      
          block/elevator.c: In function ‘elv_register’:
          block/elevator.c:898:5: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=]
               "%s_io_cq", e->elevator_name);
               ^~~~~~~~~~
          block/elevator.c:897:3: note: ‘snprintf’ output between 7 and 22 bytes into a destination of size 21
             snprintf(e->icq_cache_name, sizeof(e->icq_cache_name),
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
               "%s_io_cq", e->elevator_name);
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      The bug is that the name of the icq_cache is 6 characters longer than
      the elevator name, but only ELV_NAME_MAX + 5 characters were reserved
      for it --- so in the case of a maximum-length elevator name, the 'q'
      character in "_io_cq" would be truncated by snprintf().  Fix it by
      reserving ELV_NAME_MAX + 6 characters instead.
      
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarBart Van Assche <Bart.VanAssche@sandisk.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      9bd2bbc0
    • Ming Lei's avatar
      blk-mq: fix direct issue · d964f04a
      Ming Lei authored
      If queue is stopped, we shouldn't dispatch request into driver and
      hardware, unfortunately the check is removed in bd166ef1(blk-mq-sched:
      add framework for MQ capable IO schedulers).
      
      This patch fixes the issue by moving the check back into
      __blk_mq_try_issue_directly().
      
      This patch fixes request use-after-free[1][2] during canceling requets
      of NVMe in nvme_dev_disable(), which can be triggered easily during
      NVMe reset & remove test.
      
      [1] oops kernel log when CONFIG_BLK_DEV_INTEGRITY is on
      [  103.412969] BUG: unable to handle kernel NULL pointer dereference at 000000000000000a
      [  103.412980] IP: bio_integrity_advance+0x48/0xf0
      [  103.412981] PGD 275a88067
      [  103.412981] P4D 275a88067
      [  103.412982] PUD 276c43067
      [  103.412983] PMD 0
      [  103.412984]
      [  103.412986] Oops: 0000 [#1] SMP
      [  103.412989] Modules linked in: vfat fat intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd cryptd ipmi_ssif iTCO_wdt iTCO_vendor_support mxm_wmi glue_helper dcdbas ipmi_si mei_me pcspkr mei sg ipmi_devintf lpc_ich ipmi_msghandler shpchp acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel nvme ahci nvme_core libahci libata tg3 i2c_core megaraid_sas ptp pps_core dm_mirror dm_region_hash dm_log dm_mod
      [  103.413035] CPU: 0 PID: 102 Comm: kworker/0:2 Not tainted 4.11.0+ #1
      [  103.413036] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016
      [  103.413041] Workqueue: events nvme_remove_dead_ctrl_work [nvme]
      [  103.413043] task: ffff9cc8775c8000 task.stack: ffffc033c252c000
      [  103.413045] RIP: 0010:bio_integrity_advance+0x48/0xf0
      [  103.413046] RSP: 0018:ffffc033c252fc10 EFLAGS: 00010202
      [  103.413048] RAX: 0000000000000000 RBX: ffff9cc8720a8cc0 RCX: ffff9cca72958240
      [  103.413049] RDX: ffff9cca72958000 RSI: 0000000000000008 RDI: ffff9cc872537f00
      [  103.413049] RBP: ffffc033c252fc28 R08: 0000000000000000 R09: ffffffffb963a0d5
      [  103.413050] R10: 000000000000063e R11: 0000000000000000 R12: ffff9cc8720a8d18
      [  103.413051] R13: 0000000000001000 R14: ffff9cc872682e00 R15: 00000000fffffffb
      [  103.413053] FS:  0000000000000000(0000) GS:ffff9cc877c00000(0000) knlGS:0000000000000000
      [  103.413054] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  103.413055] CR2: 000000000000000a CR3: 0000000276c41000 CR4: 00000000001406f0
      [  103.413056] Call Trace:
      [  103.413063]  bio_advance+0x2a/0xe0
      [  103.413067]  blk_update_request+0x76/0x330
      [  103.413072]  blk_mq_end_request+0x1a/0x70
      [  103.413074]  blk_mq_dispatch_rq_list+0x370/0x410
      [  103.413076]  ? blk_mq_flush_busy_ctxs+0x94/0xe0
      [  103.413080]  blk_mq_sched_dispatch_requests+0x173/0x1a0
      [  103.413083]  __blk_mq_run_hw_queue+0x8e/0xa0
      [  103.413085]  __blk_mq_delay_run_hw_queue+0x9d/0xa0
      [  103.413088]  blk_mq_start_hw_queue+0x17/0x20
      [  103.413090]  blk_mq_start_hw_queues+0x32/0x50
      [  103.413095]  nvme_kill_queues+0x54/0x80 [nvme_core]
      [  103.413097]  nvme_remove_dead_ctrl_work+0x1f/0x40 [nvme]
      [  103.413103]  process_one_work+0x149/0x360
      [  103.413105]  worker_thread+0x4d/0x3c0
      [  103.413109]  kthread+0x109/0x140
      [  103.413111]  ? rescuer_thread+0x380/0x380
      [  103.413113]  ? kthread_park+0x60/0x60
      [  103.413120]  ret_from_fork+0x2c/0x40
      [  103.413121] Code: 08 4c 8b 63 50 48 8b 80 80 00 00 00 48 8b 90 d0 03 00 00 31 c0 48 83 ba 40 02 00 00 00 48 8d 8a 40 02 00 00 48 0f 45 c1 c1 ee 09 <0f> b6 48 0a 0f b6 40 09 41 89 f5 83 e9 09 41 d3 ed 44 0f af e8
      [  103.413145] RIP: bio_integrity_advance+0x48/0xf0 RSP: ffffc033c252fc10
      [  103.413146] CR2: 000000000000000a
      [  103.413157] ---[ end trace cd6875d16eb5a11e ]---
      [  103.455368] Kernel panic - not syncing: Fatal exception
      [  103.459826] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [  103.850916] ---[ end Kernel panic - not syncing: Fatal exception
      [  103.857637] sched: Unexpected reschedule of offline CPU#1!
      [  103.863762] ------------[ cut here ]------------
      
      [2] kernel hang in blk_mq_freeze_queue_wait() when CONFIG_BLK_DEV_INTEGRITY is off
      [  247.129825] INFO: task nvme-test:1772 blocked for more than 120 seconds.
      [  247.137311]       Not tainted 4.12.0-rc2.upstream+ #4
      [  247.142954] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  247.151704] Call Trace:
      [  247.154445]  __schedule+0x28a/0x880
      [  247.158341]  schedule+0x36/0x80
      [  247.161850]  blk_mq_freeze_queue_wait+0x4b/0xb0
      [  247.166913]  ? remove_wait_queue+0x60/0x60
      [  247.171485]  blk_freeze_queue+0x1a/0x20
      [  247.175770]  blk_cleanup_queue+0x7f/0x140
      [  247.180252]  nvme_ns_remove+0xa3/0xb0 [nvme_core]
      [  247.185503]  nvme_remove_namespaces+0x32/0x50 [nvme_core]
      [  247.191532]  nvme_uninit_ctrl+0x2d/0xa0 [nvme_core]
      [  247.196977]  nvme_remove+0x70/0x110 [nvme]
      [  247.201545]  pci_device_remove+0x39/0xc0
      [  247.205927]  device_release_driver_internal+0x141/0x200
      [  247.211761]  device_release_driver+0x12/0x20
      [  247.216531]  pci_stop_bus_device+0x8c/0xa0
      [  247.221104]  pci_stop_and_remove_bus_device_locked+0x1a/0x30
      [  247.227420]  remove_store+0x7c/0x90
      [  247.231320]  dev_attr_store+0x18/0x30
      [  247.235409]  sysfs_kf_write+0x3a/0x50
      [  247.239497]  kernfs_fop_write+0xff/0x180
      [  247.243867]  __vfs_write+0x37/0x160
      [  247.247757]  ? selinux_file_permission+0xe5/0x120
      [  247.253011]  ? security_file_permission+0x3b/0xc0
      [  247.258260]  vfs_write+0xb2/0x1b0
      [  247.261964]  ? syscall_trace_enter+0x1d0/0x2b0
      [  247.266924]  SyS_write+0x55/0xc0
      [  247.270540]  do_syscall_64+0x67/0x150
      [  247.274636]  entry_SYSCALL64_slow_path+0x25/0x25
      [  247.279794] RIP: 0033:0x7f5c96740840
      [  247.283785] RSP: 002b:00007ffd00e87ee8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  247.292238] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f5c96740840
      [  247.300194] RDX: 0000000000000002 RSI: 00007f5c97060000 RDI: 0000000000000001
      [  247.308159] RBP: 00007f5c97060000 R08: 000000000000000a R09: 00007f5c97059740
      [  247.316123] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f5c96a14400
      [  247.324087] R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000000
      [  370.016340] INFO: task nvme-test:1772 blocked for more than 120 seconds.
      
      Fixes: 12d70958
      
      (blk-mq: don't fail allocating driver tag for stopped hw queue)
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarBart Van Assche <Bart.VanAssche@sandisk.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      d964f04a
    • Ming Lei's avatar
      blk-mq: pass correct hctx to blk_mq_try_issue_directly · dad7a3be
      Ming Lei authored
      
      
      When direct issue is done on request picked up from plug list,
      the hctx need to be updated with the actual hw queue, otherwise
      wrong hctx is used and may hurt performance, especially when
      wrong SRCU readlock is acquired/released
      
      Reported-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      dad7a3be
  3. Jun 03, 2017
    • Dmitry Monakhov's avatar
      bio-integrity: Do not allocate integrity context for bio w/o data · 3116a23b
      Dmitry Monakhov authored
      
      
      If bio has no data, such as ones from blkdev_issue_flush(),
      then we have nothing to protect.
      
      This patch prevent bugon like follows:
      
      kfree_debugcheck: out of range ptr ac1fa1d106742a5ah
      kernel BUG at mm/slab.c:2773!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: bcache
      CPU: 0 PID: 4428 Comm: xfs_io Tainted: G        W       4.11.0-rc4-ext4-00041-g2ef0043-dirty #43
      Hardware name: Virtuozzo KVM, BIOS seabios-1.7.5-11.vz7.4 04/01/2014
      task: ffff880137786440 task.stack: ffffc90000ba8000
      RIP: 0010:kfree_debugcheck+0x25/0x2a
      RSP: 0018:ffffc90000babde0 EFLAGS: 00010082
      RAX: 0000000000000034 RBX: ac1fa1d106742a5a RCX: 0000000000000007
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88013f3ccb40
      RBP: ffffc90000babde8 R08: 0000000000000000 R09: 0000000000000000
      R10: 00000000fcb76420 R11: 00000000725172ed R12: 0000000000000282
      R13: ffffffff8150e766 R14: ffff88013a145e00 R15: 0000000000000001
      FS:  00007fb09384bf40(0000) GS:ffff88013f200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fd0172f9e40 CR3: 0000000137fa9000 CR4: 00000000000006f0
      Call Trace:
       kfree+0xc8/0x1b3
       bio_integrity_free+0xc3/0x16b
       bio_free+0x25/0x66
       bio_put+0x14/0x26
       blkdev_issue_flush+0x7a/0x85
       blkdev_fsync+0x35/0x42
       vfs_fsync_range+0x8e/0x9f
       vfs_fsync+0x1c/0x1e
       do_fsync+0x31/0x4a
       SyS_fsync+0x10/0x14
       entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      3116a23b
    • Linus Torvalds's avatar
      Merge tag 'xfs-4.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · e6e6d074
      Linus Torvalds authored
      Pull XFS fix from Darrick Wong:
       "I've one more bugfix for you for 4.12-rc4: Fix an unmount hang due to
        a race in io buffer accounting"
      
      * tag 'xfs-4.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: use ->b_state to fix buffer I/O accounting release race
      e6e6d074
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · b939c514
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas:
       "ACPI-related fixes for arm64:
      
         - GICC MADT entry validity check fix
      
         - Skip IRQ registration with pmu=off in an ACPI guest
      
         - struct acpi_pci_root_ops freeing on error path"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        ARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementation
        drivers/perf: arm_pmu_acpi: avoid perf IRQ init when guest PMU is off
        ARM64: PCI: Fix struct acpi_pci_root_ops allocation failure path
      b939c514
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-4.12-rc4' of git://github.com/ceph/ceph-client · 65d03328
      Linus Torvalds authored
      Pull ceph fix from Ilya Dryomov:
       "A small fix for rbd FALLOC_FL_ZERO_RANGE/PUNCH_HOLE handling breakage
        introduced in -rc1"
      
      * tag 'ceph-for-4.12-rc4' of git://github.com/ceph/ceph-client:
        rbd: implement REQ_OP_WRITE_ZEROES
      65d03328
    • Linus Torvalds's avatar
      Merge tag 'for-4.12/dm-fixes-3' of... · 60c42a31
      Linus Torvalds authored
      Merge tag 'for-4.12/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper fixes from Mike Snitzer:
      
       - a DM verity fix for a mode when no salt is used
      
       - a fix to DM to account for the possibility that PREFLUSH or FUA are
         used without the SYNC flag if the underlying storage doesn't have a
         volatile write-cache
      
       - a DM ioctl memory allocation flag fix to use __GFP_HIGH to allow
         emergency forward progress (by using memory reserves as last resort)
      
       - a small DM integrity cleanup to use kvmalloc() instead of duplicating
         the same
      
      * tag 'for-4.12/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm: make flush bios explicitly sync
        dm ioctl: restore __GFP_HIGH in copy_params()
        dm integrity: use kvmalloc() instead of dm_integrity_kvmalloc()
        dm verity: fix no salt use case
      60c42a31
    • Linus Torvalds's avatar
      Merge tag 'md/4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md · 6f37fa43
      Linus Torvalds authored
      Pull MD fixes from Shaohua Li:
       "Several patches for MD. One notable is making flush bios sync, others
        fix small issues"
      
      * tag 'md/4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
        md: Make flush bios explicitely sync
        md: report sector of stripes with check mismatches
        md: uuid debug statement now in processor byte order.
        md-cluster: fix potential lock issue in add_new_disk
      6f37fa43
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · bb329859
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A set of fixes that should go into the next -rc. This contains:
      
         - A use-after-free in the request_list exit for the legacy IO path,
           from Bart.
      
         - A fix for CFQ, fixing a recent regression with the conversion to
           higher resolution timing for iops mode. From Hou Tao.
      
         - A single fix for nbd, split in two patches, fixing a leak of a data
           structure.
      
         - A regression fix from Keith, ensuring that callers of
           blk_mq_update_nr_hw_queues() hold the right lock"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        block: Avoid that blk_exit_rl() triggers a use-after-free
        cfq-iosched: fix the delay of cfq_group's vdisktime under iops mode
        blk-mq: Take tagset lock when updating hw queues
        nbd: don't leak nbd_config
        nbd: nbd_reset() call in nbd_dev_add() is redundant
      bb329859
    • Linus Torvalds's avatar
      Merge tag 'drm-dp-quirk-for-v4.12-rc4' of git://people.freedesktop.org/~airlied/linux · 46356945
      Linus Torvalds authored
      Pull drm displayport quirk support:
       "DP quirk for usb c dongles.
      
        As mentioned I have a separate request for fixing a regression, but
        also keeping the broken hw working, for certain USB-C DP adapters they
        require a minimised n/m parameters, but an attempt to do this
        generically has failed, we need to quirk these specific adapters.
        However doing it generically regressed some eDP panels.
      
        This pull adds the infrastructure and a quirk for the adapter"
      
      * tag 'drm-dp-quirk-for-v4.12-rc4' of git://people.freedesktop.org/~airlied/linux:
        drm/i915: Detect USB-C specific dongles before reducing M and N
        drm/dp: start a DPCD based DP sink/branch device quirk database
        drm/i915: use drm DP helper to read DPCD desc
        drm/dp: add helper for reading DP sink/branch device desc from DPCD
      46356945
    • Linus Torvalds's avatar
      Merge tag 'sound-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · c531577b
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "This contains the fixes for a few reported regression for HD-audio and
        USB-audio. All small, trivial, and boring"
      
      * tag 'sound-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda - Fix applying MSI dual-codec mobo quirk
        ALSA: usb: Avoid VLA in mixer_us16x08.c
        ALSA: usb: Fix a typo in Tascam US-16x08 mixer element
        Revert "ALSA: usb-audio: purge needless variable length array"
      c531577b
    • Linus Torvalds's avatar
      Merge tag 'dmaengine-fix-4.12-rc4' of git://git.infradead.org/users/vkoul/slave-dma · f8e72db3
      Linus Torvalds authored
      Pull dmaengine fixes from Vinod Koul:
       "Here is the dmaengine fixes request for 4.12. Fixes bunch of issues in
        the driver, npthing exciting though..
      
         - mv_xor_v2 driver fixes for handling descriptors, tx_submit
           implementation, removing interrupt coalescing and setting DMA mask
           properly
      
         - fix usb-dmac DMAOR AE bit definition
      
         - fix ep93xx start buffer from BASE0 and not drain the transfers in
           terminate_all
      
         - fix rcar-dmac to use right descriptor pointer for residue
           calculation
      
         - pl330 fix warn for irq freeup"
      
      * tag 'dmaengine-fix-4.12-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: pl330: fix warning in pl330_remove
        rcar-dmac: fixup descriptor pointer for descriptor mode
        dmaengine: ep93xx: Don't drain the transfers in terminate_all()
        dmaengine: ep93xx: Always start from BASE0
        dmaengine: usb-dmac: Fix DMAOR AE bit definition
        dmaengine: mv_xor_v2: set DMA mask to 40 bits
        dmaengine: mv_xor_v2: remove interrupt coalescing
        dmaengine: mv_xor_v2: fix tx_submit() implementation
        dmaengine: mv_xor_v2: enable XOR engine after its configuration
        dmaengine: mv_xor_v2: do not use descriptors not acked by async_tx
        dmaengine: mv_xor_v2: properly handle wrapping in the array of HW descriptors
        dmaengine: mv_xor_v2: handle mv_xor_v2_prep_sw_desc() error properly
      f8e72db3
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 6df62e79
      Linus Torvalds authored
      Pull HID fixes from Jiri Kosina:
      
       - corner-case oops fixes for Asus and Wacom drivers from Carlo Caione
         and Jason Gerecke
      
       - power management fix (reported on SIS0817 touchscreen) for i2c-hid
         devices from Hans de Goede
      
       - device-id-specific fixes and quirks from Hans de Goede, Diego Elio
         Pettenò and Che-Liang Chiou
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
        HID: asus: Stop underlying hardware on remove
        HID: i2c: Call acpi_device_fix_up_power for ACPI-enumerated devices
        HID: asus: Add support for T100 keyboard
        HID: elecom: extend to fix the descriptor for DEFT trackballs
        HID: magicmouse: Set multi-touch keybits for Magic Mouse
        HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference
      6df62e79
  4. Jun 02, 2017
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching · 035f1456
      Linus Torvalds authored
      Pull livepatching fix from Jiri Kosina:
       "Kconfig dependency fix for livepatching infrastructure from Miroslav
        Benes"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
        livepatch: Make livepatch dependent on !TRIM_UNUSED_KSYMS
      035f1456
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f2a025de
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Misc fixes:
      
         - revert a broken PAT commit that broke a number of systems
      
         - fix two preemptability warnings/bugs that can trigger under certain
           circumstances, in the debug code and in the microcode loader"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        Revert "x86/PAT: Fix Xorg regression on CPUs that don't support PAT"
        x86/debug/32: Convert a smp_processor_id() call to raw to avoid DEBUG_PREEMPT warning
        x86/microcode/AMD: Change load_microcode_amd()'s param to bool to fix preemptibility bug
      f2a025de
    • Linus Torvalds's avatar
      Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f56f88ee
      Linus Torvalds authored
      Pull EFI fixes from Ingo Molnar:
       "Misc fixes:
      
         - three boot crash fixes for uncommon configurations
      
         - silence a boot warning under virtualization
      
         - plus a GCC 7 related (harmless) build warning fix"
      
      * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi/bgrt: Skip efi_bgrt_init() in case of non-EFI boot
        x86/efi: Correct EFI identity mapping under 'efi=old_map' when KASLR is enabled
        x86/efi: Disable runtime services on kexec kernel if booted with efi=old_map
        efi: Remove duplicate 'const' specifiers
        efi: Don't issue error message when booted under Xen
      f56f88ee
    • Lorenzo Pieralisi's avatar
      ARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementation · cb7cf772
      Lorenzo Pieralisi authored
      The BAD_MADT_GICC_ENTRY() macro checks if a GICC MADT entry passes
      muster from an ACPI specification standpoint. Current macro detects the
      MADT GICC entry length through ACPI firmware version (it changed from 76
      to 80 bytes in the transition from ACPI 5.1 to ACPI 6.0 specification)
      but always uses (erroneously) the ACPICA (latest) struct (ie struct
      acpi_madt_generic_interrupt - that is 80-bytes long) length to check if
      the current GICC entry memory record exceeds the MADT table end in
      memory as defined by the MADT table header itself, which may result in
      false negatives depending on the ACPI firmware version and how the MADT
      entries are laid out in memory (ie on ACPI 5.1 firmware MADT GICC
      entries are 76 bytes long, so by adding 80 to a GICC entry start address
      in memory the resulting address may well be past the actual MADT end,
      triggering a false negative).
      
      Fix the BAD_MADT_GICC_ENTRY() macro by reshuffling the condition checks
      and update them to always use the firmware version specific MADT GICC
      entry length in order to carry out boundary checks.
      
      Fixes: b6cfb277
      
       ("ACPI / ARM64: add BAD_MADT_GICC_ENTRY() macro")
      Reported-by: default avatarJulien Grall <julien.grall@arm.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Julien Grall <julien.grall@arm.com>
      Cc: Hanjun Guo <hanjun.guo@linaro.org>
      Cc: Al Stone <ahs3@redhat.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      cb7cf772
    • Carlo Caione's avatar
      HID: asus: Stop underlying hardware on remove · 715e944f
      Carlo Caione authored
      
      
      We are missing a call to hid_hw_stop() on the remove hook.
      Among other things this is causing an Oops when (re-)starting GNOME /
      upowerd / ... after the module has been already rmmod-ed.
      
      Signed-off-by: default avatarCarlo Caione <carlo@endlessm.com>
      Reviewed-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      715e944f
    • Jean-Philippe Brucker's avatar
      dmaengine: pl330: fix warning in pl330_remove · ebcdaee4
      Jean-Philippe Brucker authored
      
      
      When removing a device with less than 9 IRQs (AMBA_NR_IRQS), we'll get a
      big WARN_ON from devres.c because pl330_remove calls devm_free_irqs for
      unallocated irqs. Similarly to pl330_probe, check that IRQ number is
      present before calling devm_free_irq.
      
      Signed-off-by: default avatarJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      ebcdaee4
    • Dave Airlie's avatar
      Merge tag 'topic/dp-quirks-2017-05-31' of... · 28904eec
      Dave Airlie authored
      Merge tag 'topic/dp-quirks-2017-05-31' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes
      
      DP sink specific quirks
      
      * tag 'topic/dp-quirks-2017-05-31' of git://anongit.freedesktop.org/git/drm-intel:
        drm/i915: Detect USB-C specific dongles before reducing M and N
        drm/dp: start a DPCD based DP sink/branch device quirk database
        drm/i915: use drm DP helper to read DPCD desc
        drm/dp: add helper for reading DP sink/branch device desc from DPCD
      28904eec
    • Linus Torvalds's avatar
      Merge tag 'nfsd-4.12-1' of git://linux-nfs.org/~bfields/linux · 3b1e342b
      Linus Torvalds authored
      Pull nfsd fixes from Bruce Fields:
       "Revert patch accidentally included in the merge window pull request,
        and fix a crash that was likely a result of buggy client behavior"
      
      * tag 'nfsd-4.12-1' of git://linux-nfs.org/~bfields/linux:
        nfsd4: fix null dereference on replay
        nfsd: Revert "nfsd: check for oversized NFSv2/v3 arguments"
      3b1e342b
    • Linus Torvalds's avatar
      Merge tag 'gcc-plugins-v4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 2f48641c
      Linus Torvalds authored
      Pull gcc-plugin prepwork from Kees Cook:
       "Use designated initializers for mtk-vcodec, powerplay, amdgpu, and
        sgi-xp. Use ERR_CAST() to avoid cross-structure cast in ocf2, ntfs,
        and NFS.
      
        Christoph Hellwig recommended that I send these fixes now, rather than
        waiting for the v4.13 merge window. These are all initializer and cast
        fixes needed for the future randstruct plugin that haven't been picked
        up by the respective maintainers"
      
      * tag 'gcc-plugins-v4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        mtk-vcodec: Use designated initializers
        drm/amd/powerplay: Use designated initializers
        drm/amdgpu: Use designated initializers
        sgi-xp: Use designated initializers
        ocfs2: Use ERR_CAST() to avoid cross-structure cast
        ntfs: Use ERR_CAST() to avoid cross-structure cast
        NFS: Use ERR_CAST() to avoid cross-structure cast
      2f48641c
    • Bart Van Assche's avatar
      block: Avoid that blk_exit_rl() triggers a use-after-free · b425e504
      Bart Van Assche authored
      Since the introduction of .init_rq_fn() and .exit_rq_fn() it is
      essential that the memory allocated for struct request_queue
      stays around until all blk_exit_rl() calls have finished. Hence
      make blk_init_rl() take a reference on struct request_queue.
      
      This patch fixes the following crash:
      
      general protection fault: 0000 [#2] SMP
      CPU: 3 PID: 28 Comm: ksoftirqd/3 Tainted: G      D         4.12.0-rc2-dbg+ #2
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
      task: ffff88013a108040 task.stack: ffffc9000071c000
      RIP: 0010:free_request_size+0x1a/0x30
      RSP: 0018:ffffc9000071fd38 EFLAGS: 00010202
      RAX: 6b6b6b6b6b6b6b6b RBX: ffff880067362a88 RCX: 0000000000000003
      RDX: ffff880067464178 RSI: ffff880067362a88 RDI: ffff880135ea4418
      RBP: ffffc9000071fd40 R08: 0000000000000000 R09: 0000000100180009
      R10: ffffc9000071fd38 R11: ffffffff81110800 R12: ffff88006752d3d8
      R13: ffff88006752d3d8 R14: ffff88013a108040 R15: 000000000000000a
      FS:  0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fa8ec1edb00 CR3: 0000000138ee8000 CR4: 00000000001406e0
      Call Trace:
       mempool_destroy.part.10+0x21/0x40
       mempool_destroy+0xe/0x10
       blk_exit_rl+0x12/0x20
       blkg_free+0x4d/0xa0
       __blkg_release_rcu+0x59/0x170
       rcu_process_callbacks+0x260/0x4e0
       __do_softirq+0x116/0x250
       smpboot_thread_fn+0x123/0x1e0
       kthread+0x109/0x140
       ret_from_fork+0x31/0x40
      
      Fixes: commit e9c787e6
      
       ("scsi: allocate scsi_cmnd structures as part of struct request")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: <stable@vger.kernel.org> # v4.11+
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      b425e504
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 9ea15a59
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "Many small x86 bug fixes: SVM segment registers access rights, nested
        VMX, preempt notifiers, LAPIC virtual wire mode, NMI injection"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: Fix nmi injection failure when vcpu got blocked
        KVM: SVM: do not zero out segment attributes if segment is unusable or not present
        KVM: SVM: ignore type when setting segment registers
        KVM: nVMX: fix nested_vmx_check_vmptr failure paths under debugging
        KVM: x86: Fix virtual wire mode
        KVM: nVMX: Fix handling of lmsw instruction
        KVM: X86: Fix preempt the preemption timer cancel
      9ea15a59
    • Linus Torvalds's avatar
      Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · 0bb23039
      Linus Torvalds authored
      Pull Reiserfs and GFS2 fixes from Jan Kara:
       "Fixes to GFS2 & Reiserfs for the fallout of the recent WRITE_FUA
        cleanup from Christoph.
      
        Fixes for other filesystems were already merged by respective
        maintainers."
      
      * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        reiserfs: Make flush bios explicitely sync
        gfs2: Make flush bios explicitely sync
      0bb23039
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · 393bcfae
      Linus Torvalds authored
      Pull SCSI target fixes from Nicholas Bellinger:
       "Here are the target-pending fixes for v4.12-rc4:
      
         - ibmviscsis ABORT_TASK handling fixes that missed the v4.12 merge
           window. (Bryant Ly and Michael Cyr)
      
         - Re-add a target-core check enforcing WRITE overflow reject that was
           relaxed in v4.3, to avoid unsupported iscsi-target immediate data
           overflow. (nab)
      
         - Fix a target-core-user OOPs during device removal. (MNC + Bryant
           Ly)
      
         - Fix a long standing iscsi-target potential issue where kthread exit
           did not wait for kthread_should_stop(). (Jiang Yi)
      
         - Fix a iscsi-target v3.12.y regression OOPs involving initial login
           PDU processing during asynchronous TCP connection close. (MNC +
           nab)
      
        This is a little larger than usual for an -rc4, primarily due to the
        iscsi-target v3.12.y regression OOPs bug-fix.
      
        However, it's an important patch as MNC + Hannes where both able to
        trigger it using a reduced iscsi initiator login timeout combined with
        a backend taking a long time to complete I/Os during iscsi login
        driven session reinstatement"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
        iscsi-target: Always wait for kthread_should_stop() before kthread exit
        iscsi-target: Fix initial login PDU asynchronous socket close OOPs
        tcmu: fix crash during device removal
        target: Re-add check to reject control WRITEs with overflow data
        ibmvscsis: Fix the incorrect req_lim_delta
        ibmvscsis: Clear left-over abort_cmd pointers
      393bcfae
  5. Jun 01, 2017
    • Ingo Molnar's avatar
      Revert "x86/PAT: Fix Xorg regression on CPUs that don't support PAT" · c08d5174
      Ingo Molnar authored
      This reverts commit cbed27cd
      
      .
      
      As Andy Lutomirski observed:
      
       "I think this patch is bogus. pat_enabled() sure looks like it's
        supposed to return true if PAT is *enabled*, and these days PAT is
        'enabled' even if there's no HW PAT support."
      
      Reported-by: default avatarBernhard Held <berny156@gmx.de>
      Reported-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: stable@vger.kernel.org # v4.2+
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c08d5174