Skip to content
  1. Feb 11, 2020
    • Daniel Verkamp's avatar
      virtio-pci: check name when counting MSI-X vectors · e0fc65ef
      Daniel Verkamp authored
      commit 303090b5 upstream.
      
      VQs without a name specified are not valid; they are skipped in the
      later loop that assigns MSI-X vectors to queues, but the per_vq_vectors
      loop above that counts the required number of vectors previously still
      counted any queue with a non-NULL callback as needing a vector.
      
      Add a check to the per_vq_vectors loop so that vectors with no name are
      not counted to make the two loops consistent.  This prevents
      over-counting unnecessary vectors (e.g. for features which were not
      negotiated with the device).
      
      Cc: stable@vger.kernel.org
      Fixes: 86a55978
      
       ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarDaniel Verkamp <dverkamp@chromium.org>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarWang, Wei W <wei.w.wang@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e0fc65ef
    • Daniel Verkamp's avatar
      virtio-balloon: initialize all vq callbacks · f603b371
      Daniel Verkamp authored
      commit 5790b533 upstream.
      
      Ensure that elements of the callbacks array that correspond to
      unavailable features are set to NULL; previously, they would be left
      uninitialized.
      
      Since the corresponding names array elements were explicitly set to
      NULL, the uninitialized callback pointers would not actually be
      dereferenced; however, the uninitialized callbacks elements would still
      be read in vp_find_vqs_msix() and used to calculate the number of MSI-X
      vectors required.
      
      Cc: stable@vger.kernel.org
      Fixes: 86a55978
      
       ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarDaniel Verkamp <dverkamp@chromium.org>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f603b371
    • Lyude Paul's avatar
      drm/amd/dm/mst: Ignore payload update failures · fe84d084
      Lyude Paul authored
      commit 58fe03d6
      
       upstream.
      
      Disabling a display on MST can potentially happen after the entire MST
      topology has been removed, which means that we can't communicate with
      the topology at all in this scenario. Likewise, this also means that we
      can't properly update payloads on the topology and as such, it's a good
      idea to ignore payload update failures when disabling displays.
      Currently, amdgpu makes the mistake of halting the payload update
      process when any payload update failures occur, resulting in leaving
      DC's local copies of the payload tables out of date.
      
      This ends up causing problems with hotplugging MST topologies, and
      causes modesets on the second hotplug to fail like so:
      
      [drm] Failed to updateMST allocation table forpipe idx:1
      ------------[ cut here ]------------
      WARNING: CPU: 5 PID: 1511 at
      drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link.c:2677
      update_mst_stream_alloc_table+0x11e/0x130 [amdgpu]
      Modules linked in: cdc_ether usbnet fuse xt_conntrack nf_conntrack
      nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4
      nft_counter nft_compat nf_tables nfnetlink tun bridge stp llc sunrpc
      vfat fat wmi_bmof uvcvideo snd_hda_codec_realtek snd_hda_codec_generic
      snd_hda_codec_hdmi videobuf2_vmalloc snd_hda_intel videobuf2_memops
      videobuf2_v4l2 snd_intel_dspcfg videobuf2_common crct10dif_pclmul
      snd_hda_codec videodev crc32_pclmul snd_hwdep snd_hda_core
      ghash_clmulni_intel snd_seq mc joydev pcspkr snd_seq_device snd_pcm
      sp5100_tco k10temp i2c_piix4 snd_timer thinkpad_acpi ledtrig_audio snd
      wmi soundcore video i2c_scmi acpi_cpufreq ip_tables amdgpu(O)
      rtsx_pci_sdmmc amd_iommu_v2 gpu_sched mmc_core i2c_algo_bit ttm
      drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm
      crc32c_intel serio_raw hid_multitouch r8152 mii nvme r8169 nvme_core
      rtsx_pci pinctrl_amd
      CPU: 5 PID: 1511 Comm: gnome-shell Tainted: G           O      5.5.0-rc7Lyude-Test+ #4
      Hardware name: LENOVO FA495SIT26/FA495SIT26, BIOS R12ET22W(0.22 ) 01/31/2019
      RIP: 0010:update_mst_stream_alloc_table+0x11e/0x130 [amdgpu]
      Code: 28 00 00 00 75 2b 48 8d 65 e0 5b 41 5c 41 5d 41 5e 5d c3 0f b6 06
      49 89 1c 24 41 88 44 24 08 0f b6 46 01 41 88 44 24 09 eb 93 <0f> 0b e9
      2f ff ff ff e8 a6 82 a3 c2 66 0f 1f 44 00 00 0f 1f 44 00
      RSP: 0018:ffffac428127f5b0 EFLAGS: 00010202
      RAX: 0000000000000002 RBX: ffff8d1e166eee80 RCX: 0000000000000000
      RDX: ffffac428127f668 RSI: ffff8d1e166eee80 RDI: ffffac428127f610
      RBP: ffffac428127f640 R08: ffffffffc03d94a8 R09: 0000000000000000
      R10: ffff8d1e24b02000 R11: ffffac428127f5b0 R12: ffff8d1e1b83d000
      R13: ffff8d1e1bea0b08 R14: 0000000000000002 R15: 0000000000000002
      FS:  00007fab23ffcd80(0000) GS:ffff8d1e28b40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f151f1711e8 CR3: 00000005997c0000 CR4: 00000000003406e0
      Call Trace:
       ? mutex_lock+0xe/0x30
       dc_link_allocate_mst_payload+0x9a/0x210 [amdgpu]
       ? dm_read_reg_func+0x39/0xb0 [amdgpu]
       ? core_link_enable_stream+0x656/0x730 [amdgpu]
       core_link_enable_stream+0x656/0x730 [amdgpu]
       dce110_apply_ctx_to_hw+0x58e/0x5d0 [amdgpu]
       ? dcn10_verify_allow_pstate_change_high+0x1d/0x280 [amdgpu]
       ? dcn10_wait_for_mpcc_disconnect+0x3c/0x130 [amdgpu]
       dc_commit_state+0x292/0x770 [amdgpu]
       ? add_timer+0x101/0x1f0
       ? ttm_bo_put+0x1a1/0x2f0 [ttm]
       amdgpu_dm_atomic_commit_tail+0xb59/0x1ff0 [amdgpu]
       ? amdgpu_move_blit.constprop.0+0xb8/0x1f0 [amdgpu]
       ? amdgpu_bo_move+0x16d/0x2b0 [amdgpu]
       ? ttm_bo_handle_move_mem+0x118/0x570 [ttm]
       ? ttm_bo_validate+0x134/0x150 [ttm]
       ? dm_plane_helper_prepare_fb+0x1b9/0x2a0 [amdgpu]
       ? _cond_resched+0x15/0x30
       ? wait_for_completion_timeout+0x38/0x160
       ? _cond_resched+0x15/0x30
       ? wait_for_completion_interruptible+0x33/0x190
       commit_tail+0x94/0x130 [drm_kms_helper]
       drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
       drm_atomic_helper_set_config+0x70/0xb0 [drm_kms_helper]
       drm_mode_setcrtc+0x194/0x6a0 [drm]
       ? _cond_resched+0x15/0x30
       ? mutex_lock+0xe/0x30
       ? drm_mode_getcrtc+0x180/0x180 [drm]
       drm_ioctl_kernel+0xaa/0xf0 [drm]
       drm_ioctl+0x208/0x390 [drm]
       ? drm_mode_getcrtc+0x180/0x180 [drm]
       amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
       do_vfs_ioctl+0x458/0x6d0
       ksys_ioctl+0x5e/0x90
       __x64_sys_ioctl+0x16/0x20
       do_syscall_64+0x55/0x1b0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7fab2121f87b
      Code: 0f 1e fa 48 8b 05 0d 96 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff
      ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01
      f0 ff ff 73 01 c3 48 8b 0d dd 95 2c 00 f7 d8 64 89 01 48
      RSP: 002b:00007ffd045f9068 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00007ffd045f90a0 RCX: 00007fab2121f87b
      RDX: 00007ffd045f90a0 RSI: 00000000c06864a2 RDI: 000000000000000b
      RBP: 00007ffd045f90a0 R08: 0000000000000000 R09: 000055dbd2985d10
      R10: 000055dbd2196280 R11: 0000000000000246 R12: 00000000c06864a2
      R13: 000000000000000b R14: 0000000000000000 R15: 000055dbd2196280
      ---[ end trace 6ea888c24d2059cd ]---
      
      Note as well, I have only been able to reproduce this on setups with 2
      MST displays.
      
      Changes since v1:
      * Don't return false when part 1 or part 2 of updating the payloads
        fails, we don't want to abort at any step of the process even if
        things fail
      
      Reviewed-by: default avatarMikita Lipski <Mikita.Lipski@amd.com>
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Acked-by: default avatarHarry Wentland <harry.wentland@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe84d084
    • Stephen Warren's avatar
      clk: tegra: Mark fuse clock as critical · f4bda8b6
      Stephen Warren authored
      commit bf83b96f
      
       upstream.
      
      For a little over a year, U-Boot on Tegra124 has configured the flow
      controller to perform automatic RAM re-repair on off->on power
      transitions of the CPU rail[1]. This is mandatory for correct operation
      of Tegra124. However, RAM re-repair relies on certain clocks, which the
      kernel must enable and leave running. The fuse clock is one of those
      clocks. Mark this clock as critical so that LP1 power mode (system
      suspend) operates correctly.
      
      [1] 3cc7942a4ae5 ARM: tegra: implement RAM repair
      
      Reported-by: default avatarJonathan Hunter <jonathanh@nvidia.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarStephen Warren <swarren@nvidia.com>
      Signed-off-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4bda8b6
    • Peter Zijlstra's avatar
      mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush · 806cabd3
      Peter Zijlstra authored
      commit 0ed13259 upstream.
      
      Architectures for which we have hardware walkers of Linux page table
      should flush TLB on mmu gather batch allocation failures and batch flush.
      Some architectures like POWER supports multiple translation modes (hash
      and radix) and in the case of POWER only radix translation mode needs the
      above TLBI.  This is because for hash translation mode kernel wants to
      avoid this extra flush since there are no hardware walkers of linux page
      table.  With radix translation, the hardware also walks linux page table
      and with that, kernel needs to make sure to TLB invalidate page walk cache
      before page table pages are freed.
      
      More details in commit d86564a2 ("mm/tlb, x86/mm: Support invalidating
      TLB caches for RCU_TABLE_FREE")
      
      The changes to sparc are to make sure we keep the old behavior since we
      are now removing HAVE_RCU_TABLE_NO_INVALIDATE.  The default value for
      tlb_needs_table_invalidate is to always force an invalidate and sparc can
      avoid the table invalidate.  Hence we define tlb_needs_table_invalidate to
      false for sparc architecture.
      
      Link: http://lkml.kernel.org/r/20200116064531.483522-3-aneesh.kumar@linux.ibm.com
      Fixes: a46cc7a9
      
       ("powerpc/mm/radix: Improve TLB/PWC flushes")
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
      Cc: <stable@vger.kernel.org>	[4.14+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      806cabd3
    • Niklas Cassel's avatar
      arm64: dts: qcom: qcs404-evb: Set vdd_apc regulator in high power mode · 091c9615
      Niklas Cassel authored
      commit eac8ce86 upstream.
      
      vdd_apc is the regulator that supplies the main CPU cluster.
      
      At sudden CPU load changes, we have noticed invalid page faults on
      addresses with all bits shifted, as well as on addresses with individual
      bits flipped.
      
      By putting the vdd_apc regulator in high power mode, the voltage drops
      during sudden load changes will be less severe, and we have not been able
      to reproduce the invalid page faults with the regulator in this mode.
      
      Fixes: 8faea8ed
      
       ("arm64: dts: qcom: qcs404-evb: add spmi regulators")
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@linaro.org>
      Reviewed-by: default avatarVinod Koul <vkoul@kernel.org>
      Link: https://lore.kernel.org/r/20191014120920.12691-1-niklas.cassel@linaro.org
      Signed-off-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      091c9615
    • David Hildenbrand's avatar
      mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section · ed53278e
      David Hildenbrand authored
      commit e822969c upstream.
      
      Patch series "mm: fix max_pfn not falling on section boundary", v2.
      
      Playing with different memory sizes for a x86-64 guest, I discovered that
      some memmaps (highest section if max_mem does not fall on the section
      boundary) are marked as being valid and online, but contain garbage.  We
      have to properly initialize these memmaps.
      
      Looking at /proc/kpageflags and friends, I found some more issues,
      partially related to this.
      
      This patch (of 3):
      
      If max_pfn is not aligned to a section boundary, we can easily run into
      BUGs.  This can e.g., be triggered on x86-64 under QEMU by specifying a
      memory size that is not a multiple of 128MB (e.g., 4097MB, but also
      4160MB).  I was told that on real HW, we can easily have this scenario
      (esp., one of the main reasons sub-section hotadd of devmem was added).
      
      The issue is, that we have a valid memmap (pfn_valid()) for the whole
      section, and the whole section will be marked "online".
      pfn_to_online_page() will succeed, but the memmap contains garbage.
      
      E.g., doing a "./page-types -r -a 0x144001" when QEMU was started with "-m
      4160M" - (see tools/vm/page-types.c):
      
      [  200.476376] BUG: unable to handle page fault for address: fffffffffffffffe
      [  200.477500] #PF: supervisor read access in kernel mode
      [  200.478334] #PF: error_code(0x0000) - not-present page
      [  200.479076] PGD 59614067 P4D 59614067 PUD 59616067 PMD 0
      [  200.479557] Oops: 0000 [#4] SMP NOPTI
      [  200.479875] CPU: 0 PID: 603 Comm: page-types Tainted: G      D W         5.5.0-rc1-next-20191209 #93
      [  200.480646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4
      [  200.481648] RIP: 0010:stable_page_flags+0x4d/0x410
      [  200.482061] Code: f3 ff 41 89 c0 48 b8 00 00 00 00 01 00 00 00 45 84 c0 0f 85 cd 02 00 00 48 8b 53 08 48 8b 2b 48f
      [  200.483644] RSP: 0018:ffffb139401cbe60 EFLAGS: 00010202
      [  200.484091] RAX: fffffffffffffffe RBX: fffffbeec5100040 RCX: 0000000000000000
      [  200.484697] RDX: 0000000000000001 RSI: ffffffff9535c7cd RDI: 0000000000000246
      [  200.485313] RBP: ffffffffffffffff R08: 0000000000000000 R09: 0000000000000000
      [  200.485917] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000144001
      [  200.486523] R13: 00007ffd6ba55f48 R14: 00007ffd6ba55f40 R15: ffffb139401cbf08
      [  200.487130] FS:  00007f68df717580(0000) GS:ffff9ec77fa00000(0000) knlGS:0000000000000000
      [  200.487804] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  200.488295] CR2: fffffffffffffffe CR3: 0000000135d48000 CR4: 00000000000006f0
      [  200.488897] Call Trace:
      [  200.489115]  kpageflags_read+0xe9/0x140
      [  200.489447]  proc_reg_read+0x3c/0x60
      [  200.489755]  vfs_read+0xc2/0x170
      [  200.490037]  ksys_pread64+0x65/0xa0
      [  200.490352]  do_syscall_64+0x5c/0xa0
      [  200.490665]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      But it can be triggered much easier via "cat /proc/kpageflags > /dev/null"
      after cold/hot plugging a DIMM to such a system:
      
      [root@localhost ~]# cat /proc/kpageflags > /dev/null
      [  111.517275] BUG: unable to handle page fault for address: fffffffffffffffe
      [  111.517907] #PF: supervisor read access in kernel mode
      [  111.518333] #PF: error_code(0x0000) - not-present page
      [  111.518771] PGD a240e067 P4D a240e067 PUD a2410067 PMD 0
      
      This patch fixes that by at least zero-ing out that memmap (so e.g.,
      page_to_pfn() will not crash).  Commit 907ec5fc ("mm: zero remaining
      unavailable struct pages") tried to fix a similar issue, but forgot to
      consider this special case.
      
      After this patch, there are still problems to solve.  E.g., not all of
      these pages falling into a memory hole will actually get initialized later
      and set PageReserved - they are only zeroed out - but at least the
      immediate crashes are gone.  A follow-up patch will take care of this.
      
      Link: http://lkml.kernel.org/r/20191211163201.17179-2-david@redhat.com
      Fixes: f7f99100
      
       ("mm: stop zeroing memory during allocation in vmemmap")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Tested-by: default avatarDaniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: <stable@vger.kernel.org>	[4.15+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed53278e
    • Gang He's avatar
      ocfs2: fix oops when writing cloned file · 03c03090
      Gang He authored
      commit 2d797e9f upstream.
      
      Writing a cloned file triggers a kernel oops and the user-space command
      process is also killed by the system.  The bug can be reproduced stably
      via:
      
      1) create a file under ocfs2 file system directory.
      
        journalctl -b > aa.txt
      
      2) create a cloned file for this file.
      
        reflink aa.txt bb.txt
      
      3) write the cloned file with dd command.
      
        dd if=/dev/zero of=bb.txt bs=512 count=1 conv=notrunc
      
      The dd command is killed by the kernel, then you can see the oops message
      via dmesg command.
      
      [  463.875404] BUG: kernel NULL pointer dereference, address: 0000000000000028
      [  463.875413] #PF: supervisor read access in kernel mode
      [  463.875416] #PF: error_code(0x0000) - not-present page
      [  463.875418] PGD 0 P4D 0
      [  463.875425] Oops: 0000 [#1] SMP PTI
      [  463.875431] CPU: 1 PID: 2291 Comm: dd Tainted: G           OE     5.3.16-2-default
      [  463.875433] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      [  463.875500] RIP: 0010:ocfs2_refcount_cow+0xa4/0x5d0 [ocfs2]
      [  463.875505] Code: 06 89 6c 24 38 89 eb f6 44 24 3c 02 74 be 49 8b 47 28
      [  463.875508] RSP: 0018:ffffa2cb409dfce8 EFLAGS: 00010202
      [  463.875512] RAX: ffff8b1ebdca8000 RBX: 0000000000000001 RCX: ffff8b1eb73a9df0
      [  463.875515] RDX: 0000000000056a01 RSI: 0000000000000000 RDI: 0000000000000000
      [  463.875517] RBP: 0000000000000001 R08: ffff8b1eb73a9de0 R09: 0000000000000000
      [  463.875520] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
      [  463.875522] R13: ffff8b1eb922f048 R14: 0000000000000000 R15: ffff8b1eb922f048
      [  463.875526] FS:  00007f8f44d15540(0000) GS:ffff8b1ebeb00000(0000) knlGS:0000000000000000
      [  463.875529] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  463.875532] CR2: 0000000000000028 CR3: 000000003c17a000 CR4: 00000000000006e0
      [  463.875546] Call Trace:
      [  463.875596]  ? ocfs2_inode_lock_full_nested+0x18b/0x960 [ocfs2]
      [  463.875648]  ocfs2_file_write_iter+0xaf8/0xc70 [ocfs2]
      [  463.875672]  new_sync_write+0x12d/0x1d0
      [  463.875688]  vfs_write+0xad/0x1a0
      [  463.875697]  ksys_write+0xa1/0xe0
      [  463.875710]  do_syscall_64+0x60/0x1f0
      [  463.875743]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  463.875758] RIP: 0033:0x7f8f4482ed44
      [  463.875762] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00
      [  463.875765] RSP: 002b:00007fff300a79d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  463.875769] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f8f4482ed44
      [  463.875771] RDX: 0000000000000200 RSI: 000055f771b5c000 RDI: 0000000000000001
      [  463.875774] RBP: 0000000000000200 R08: 00007f8f44af9c78 R09: 0000000000000003
      [  463.875776] R10: 000000000000089f R11: 0000000000000246 R12: 000055f771b5c000
      [  463.875779] R13: 0000000000000200 R14: 0000000000000000 R15: 000055f771b5c000
      
      This regression problem was introduced by commit e74540b2 ("ocfs2:
      protect extent tree in ocfs2_prepare_inode_for_write()").
      
      Link: http://lkml.kernel.org/r/20200121050153.13290-1-ghe@suse.com
      Fixes: e74540b2
      
       ("ocfs2: protect extent tree in ocfs2_prepare_inode_for_write()").
      Signed-off-by: default avatarGang He <ghe@suse.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      03c03090
    • Christian Borntraeger's avatar
      KVM: s390: do not clobber registers during guest reset/store status · 6e41b549
      Christian Borntraeger authored
      commit 55680890 upstream.
      
      The initial CPU reset clobbers the userspace fpc and the store status
      ioctl clobbers the guest acrs + fpr.  As these calls are only done via
      ioctl (and not via vcpu_run), no CPU context is loaded, so we can (and
      must) act directly on the sync regs, not on the thread context.
      
      Cc: stable@kernel.org
      Fixes: e1788bb9 ("KVM: s390: handle floating point registers in the run ioctl not in vcpu_put/load")
      Fixes: 31d8b8d4
      
       ("KVM: s390: handle access registers in the run ioctl not in vcpu_put/load")
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarJanosch Frank <frankja@linux.ibm.com>
      Link: https://lore.kernel.org/r/20200131100205.74720-2-frankja@linux.ibm.com
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6e41b549
    • Sean Christopherson's avatar
      KVM: x86: Revert "KVM: X86: Fix fpu state crash in kvm guest" · b1f9f9b8
      Sean Christopherson authored
      commit 2620fe26 upstream.
      
      Reload the current thread's FPU state, which contains the guest's FPU
      state, to the CPU registers if necessary during vcpu_enter_guest().
      TIF_NEED_FPU_LOAD can be set any time control is transferred out of KVM,
      e.g. if I/O is triggered during a KVM call to get_user_pages() or if a
      softirq occurs while KVM is scheduled in.
      
      Moving the handling of TIF_NEED_FPU_LOAD from vcpu_enter_guest() to
      kvm_arch_vcpu_load(), effectively kvm_sched_in(), papered over a bug
      where kvm_put_guest_fpu() failed to account for TIF_NEED_FPU_LOAD.  The
      easiest way to the kvm_put_guest_fpu() bug was to run with involuntary
      preemption enable, thus handling TIF_NEED_FPU_LOAD during kvm_sched_in()
      made the bug go away.  But, removing the handling in vcpu_enter_guest()
      exposed KVM to the rare case of a softirq triggering kernel_fpu_begin()
      between vcpu_load() and vcpu_enter_guest().
      
      Now that kvm_{load,put}_guest_fpu() correctly handle TIF_NEED_FPU_LOAD,
      revert the commit to both restore the vcpu_enter_guest() behavior and
      eliminate the superfluous switch_fpu_return() in kvm_arch_vcpu_load().
      
      Note, leaving the handling in kvm_arch_vcpu_load() isn't wrong per se,
      but it is unnecessary, and most critically, makes it extremely difficult
      to find bugs such as the kvm_put_guest_fpu() issue due to shrinking the
      window where a softirq can corrupt state.
      
      A sample trace triggered by warning if TIF_NEED_FPU_LOAD is set while
      vcpu state is loaded:
      
       <IRQ>
        gcmaes_crypt_by_sg.constprop.12+0x26e/0x660
        ? 0xffffffffc024547d
        ? __qdisc_run+0x83/0x510
        ? __dev_queue_xmit+0x45e/0x990
        ? ip_finish_output2+0x1a8/0x570
        ? fib4_rule_action+0x61/0x70
        ? fib4_rule_action+0x70/0x70
        ? fib_rules_lookup+0x13f/0x1c0
        ? helper_rfc4106_decrypt+0x82/0xa0
        ? crypto_aead_decrypt+0x40/0x70
        ? crypto_aead_decrypt+0x40/0x70
        ? crypto_aead_decrypt+0x40/0x70
        ? esp_output_tail+0x8f4/0xa5a [esp4]
        ? skb_ext_add+0xd3/0x170
        ? xfrm_input+0x7a6/0x12c0
        ? xfrm4_rcv_encap+0xae/0xd0
        ? xfrm4_transport_finish+0x200/0x200
        ? udp_queue_rcv_one_skb+0x1ba/0x460
        ? udp_unicast_rcv_skb.isra.63+0x72/0x90
        ? __udp4_lib_rcv+0x51b/0xb00
        ? ip_protocol_deliver_rcu+0xd2/0x1c0
        ? ip_local_deliver_finish+0x44/0x50
        ? ip_local_deliver+0xe0/0xf0
        ? ip_protocol_deliver_rcu+0x1c0/0x1c0
        ? ip_rcv+0xbc/0xd0
        ? ip_rcv_finish_core.isra.19+0x380/0x380
        ? __netif_receive_skb_one_core+0x7e/0x90
        ? netif_receive_skb_internal+0x3d/0xb0
        ? napi_gro_receive+0xed/0x150
        ? 0xffffffffc0243c77
        ? net_rx_action+0x149/0x3b0
        ? __do_softirq+0xe4/0x2f8
        ? handle_irq_event_percpu+0x6a/0x80
        ? irq_exit+0xe6/0xf0
        ? do_IRQ+0x7f/0xd0
        ? common_interrupt+0xf/0xf
        </IRQ>
        ? irq_entries_start+0x20/0x660
        ? vmx_get_interrupt_shadow+0x2f0/0x710 [kvm_intel]
        ? kvm_set_msr_common+0xfc7/0x2380 [kvm]
        ? recalibrate_cpu_khz+0x10/0x10
        ? ktime_get+0x3a/0xa0
        ? kvm_arch_vcpu_ioctl_run+0x107/0x560 [kvm]
        ? kvm_init+0x6bf/0xd00 [kvm]
        ? __seccomp_filter+0x7a/0x680
        ? do_vfs_ioctl+0xa4/0x630
        ? security_file_ioctl+0x32/0x50
        ? ksys_ioctl+0x60/0x90
        ? __x64_sys_ioctl+0x16/0x20
        ? do_syscall_64+0x5f/0x1a0
        ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      ---[ end trace 9564a1ccad733a90 ]---
      
      This reverts commit e7517324.
      
      Fixes: e7517324
      
       ("KVM: X86: Fix fpu state crash in kvm guest")
      Reported-by: default avatarDerek Yerger <derek@djy.llc>
      Reported-by: default avatar <kernel@najdan.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Cc: Thomas Lambertz <mail@thomaslambertz.de>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b1f9f9b8
    • Sean Christopherson's avatar
      KVM: x86: Ensure guest's FPU state is loaded when accessing for emulation · 58e1e751
      Sean Christopherson authored
      commit a7baead7 upstream.
      
      Lock the FPU regs and reload the current thread's FPU state, which holds
      the guest's FPU state, to the CPU registers if necessary prior to
      accessing guest FPU state as part of emulation.  kernel_fpu_begin() can
      be called from softirq context, therefore KVM must ensure softirqs are
      disabled (locking the FPU regs disables softirqs) when touching CPU FPU
      state.
      
      Note, for all intents and purposes this reverts commit 6ab0b9fe
      ("x86,kvm: remove KVM emulator get_fpu / put_fpu"), but at the time it
      was applied, removing get/put_fpu() was correct.  The re-introduction
      of {get,put}_fpu() is necessitated by the deferring of FPU state load.
      
      Fixes: 5f409e20
      
       ("x86/fpu: Defer FPU state load until return to userspace")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58e1e751
    • Sean Christopherson's avatar
      KVM: x86: Handle TIF_NEED_FPU_LOAD in kvm_{load,put}_guest_fpu() · a6ff6e05
      Sean Christopherson authored
      commit c9aef3b8 upstream.
      
      Handle TIF_NEED_FPU_LOAD similar to how fpu__copy() handles the flag
      when duplicating FPU state to a new task struct.  TIF_NEED_FPU_LOAD can
      be set any time control is transferred out of KVM, be it voluntarily,
      e.g. if I/O is triggered during a KVM call to get_user_pages, or
      involuntarily, e.g. if softirq runs after an IRQ occurs.  Therefore,
      KVM must account for TIF_NEED_FPU_LOAD whenever it is (potentially)
      accessing CPU FPU state.
      
      Fixes: 5f409e20
      
       ("x86/fpu: Defer FPU state load until return to userspace")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a6ff6e05
    • Sean Christopherson's avatar
      KVM: x86: Free wbinvd_dirty_mask if vCPU creation fails · e3a37628
      Sean Christopherson authored
      commit 16be9dde upstream.
      
      Free the vCPU's wbinvd_dirty_mask if vCPU creation fails after
      kvm_arch_vcpu_init(), e.g. when installing the vCPU's file descriptor.
      Do the freeing by calling kvm_arch_vcpu_free() instead of open coding
      the freeing.  This adds a likely superfluous, but ultimately harmless,
      call to kvmclock_reset(), which only clears vcpu->arch.pv_time_enabled.
      Using kvm_arch_vcpu_free() allows for additional cleanup in the future.
      
      Fixes: f5f48ee1
      
       ("KVM: VMX: Execute WBINVD to keep data consistency with assigned devices")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e3a37628
    • Sean Christopherson's avatar
      KVM: x86: Don't let userspace set host-reserved cr4 bits · 9d9933f7
      Sean Christopherson authored
      commit b11306b5
      
       upstream.
      
      Calculate the host-reserved cr4 bits at runtime based on the system's
      capabilities (using logic similar to __do_cpuid_func()), and use the
      dynamically generated mask for the reserved bit check in kvm_set_cr4()
      instead using of the static CR4_RESERVED_BITS define.  This prevents
      userspace from "enabling" features in cr4 that are not supported by the
      system, e.g. by ignoring KVM_GET_SUPPORTED_CPUID and specifying a bogus
      CPUID for the vCPU.
      
      Allowing userspace to set unsupported bits in cr4 can lead to a variety
      of undesirable behavior, e.g. failed VM-Enter, and in general increases
      KVM's attack surface.  A crafty userspace can even abuse CR4.LA57 to
      induce an unchecked #GP on a WRMSR.
      
      On a platform without LA57 support:
      
        KVM_SET_CPUID2 // CPUID_7_0_ECX.LA57 = 1
        KVM_SET_SREGS  // CR4.LA57 = 1
        KVM_SET_MSRS   // KERNEL_GS_BASE = 0x0004000000000000
        KVM_RUN
      
      leads to a #GP when writing KERNEL_GS_BASE into hardware:
      
        unchecked MSR access error: WRMSR to 0xc0000102 (tried to write 0x0004000000000000)
        at rIP: 0xffffffffa00f239a (vmx_prepare_switch_to_guest+0x10a/0x1d0 [kvm_intel])
        Call Trace:
         kvm_arch_vcpu_ioctl_run+0x671/0x1c70 [kvm]
         kvm_vcpu_ioctl+0x36b/0x5d0 [kvm]
         do_vfs_ioctl+0xa1/0x620
         ksys_ioctl+0x66/0x70
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x4c/0x170
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7fc08133bf47
      
      Note, the above sequence fails VM-Enter due to invalid guest state.
      Userspace can allow VM-Enter to succeed (after the WRMSR #GP) by adding
      a KVM_SET_SREGS w/ CR4.LA57=0 after KVM_SET_MSRS, in which case KVM will
      technically leak the host's KERNEL_GS_BASE into the guest.  But, as
      KERNEL_GS_BASE is a userspace-defined value/address, the leak is largely
      benign as a malicious userspace would simply be exposing its own data to
      the guest, and attacking a benevolent userspace would require multiple
      bugs in the userspace VMM.
      
      Cc: stable@vger.kernel.org
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d9933f7
    • Sean Christopherson's avatar
      KVM: VMX: Add non-canonical check on writes to RTIT address MSRs · 715f9f9a
      Sean Christopherson authored
      commit fe6ed369
      
       upstream.
      
      Reject writes to RTIT address MSRs if the data being written is a
      non-canonical address as the MSRs are subject to canonical checks, e.g.
      KVM will trigger an unchecked #GP when loading the values to hardware
      during pt_guest_enter().
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      715f9f9a
    • Boris Ostrovsky's avatar
      x86/KVM: Clean up host's steal time structure · 2aebc6ed
      Boris Ostrovsky authored
      commit a6bd811f
      
       upstream.
      
      Now that we are mapping kvm_steal_time from the guest directly we
      don't need keep a copy of it in kvm_vcpu_arch.st. The same is true
      for the stime field.
      
      This is part of CVE-2019-3016.
      
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2aebc6ed
    • Boris Ostrovsky's avatar
      x86/kvm: Cache gfn to pfn translation · f7c1a6c6
      Boris Ostrovsky authored
      commit 91724814
      
       upstream.
      
      __kvm_map_gfn()'s call to gfn_to_pfn_memslot() is
      * relatively expensive
      * in certain cases (such as when done from atomic context) cannot be called
      
      Stashing gfn-to-pfn mapping should help with both cases.
      
      This is part of CVE-2019-3016.
      
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7c1a6c6
    • Boris Ostrovsky's avatar
      x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed · d71eef9f
      Boris Ostrovsky authored
      commit b0431382
      
       upstream.
      
      There is a potential race in record_steal_time() between setting
      host-local vcpu->arch.st.steal.preempted to zero (i.e. clearing
      KVM_VCPU_PREEMPTED) and propagating this value to the guest with
      kvm_write_guest_cached(). Between those two events the guest may
      still see KVM_VCPU_PREEMPTED in its copy of kvm_steal_time, set
      KVM_VCPU_FLUSH_TLB and assume that hypervisor will do the right
      thing. Which it won't.
      
      Instad of copying, we should map kvm_steal_time and that will
      guarantee atomicity of accesses to @preempted.
      
      This is part of CVE-2019-3016.
      
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d71eef9f
    • Boris Ostrovsky's avatar
      x86/kvm: Introduce kvm_(un)map_gfn() · a3db2949
      Boris Ostrovsky authored
      commit 1eff70a9
      
       upstream.
      
      kvm_vcpu_(un)map operates on gfns from any current address space.
      In certain cases we want to make sure we are not mapping SMRAM
      and for that we can use kvm_(un)map_gfn() that we are introducing
      in this patch.
      
      This is part of CVE-2019-3016.
      
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a3db2949
    • Boris Ostrovsky's avatar
      x86/kvm: Be careful not to clear KVM_VCPU_FLUSH_TLB bit · 68460ceb
      Boris Ostrovsky authored
      commit 8c6de56a
      
       upstream.
      
      kvm_steal_time_set_preempted() may accidentally clear KVM_VCPU_FLUSH_TLB
      bit if it is called more than once while VCPU is preempted.
      
      This is part of CVE-2019-3016.
      
      (This bug was also independently discovered by Jim Mattson
      <jmattson@google.com>)
      
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68460ceb
    • John Allen's avatar
      kvm/svm: PKU not currently supported · d0671151
      John Allen authored
      commit a47970ed
      
       upstream.
      
      Current SVM implementation does not have support for handling PKU. Guests
      running on a host with future AMD cpus that support the feature will read
      garbage from the PKRU register and will hit segmentation faults on boot as
      memory is getting marked as protected that should not be. Ensure that cpuid
      from SVM does not advertise the feature.
      
      Signed-off-by: default avatarJohn Allen <john.allen@amd.com>
      Cc: stable@vger.kernel.org
      Fixes: 0556cbdc
      
       ("x86/pkeys: Don't check if PKRU is zero before writing it")
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d0671151
    • Sean Christopherson's avatar
      KVM: PPC: Book3S PR: Free shared page if mmu initialization fails · 9213699e
      Sean Christopherson authored
      commit cb10bf91 upstream.
      
      Explicitly free the shared page if kvmppc_mmu_init() fails during
      kvmppc_core_vcpu_create(), as the page is freed only in
      kvmppc_core_vcpu_free(), which is not reached via kvm_vcpu_uninit().
      
      Fixes: 96bc451a
      
       ("KVM: PPC: Introduce shared page")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarGreg Kurz <groug@kaod.org>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Acked-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9213699e
    • Sean Christopherson's avatar
      KVM: PPC: Book3S HV: Uninit vCPU if vcore creation fails · b2301ded
      Sean Christopherson authored
      commit 1a978d9d upstream.
      
      Call kvm_vcpu_uninit() if vcore creation fails to avoid leaking any
      resources allocated by kvm_vcpu_init(), i.e. the vcpu->run page.
      
      Fixes: 371fefd6
      
       ("KVM: PPC: Allow book3s_hv guests to use SMT processor modes")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarGreg Kurz <groug@kaod.org>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Acked-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2301ded
    • Sean Christopherson's avatar
      KVM: x86: Fix potential put_fpu() w/o load_fpu() on MPX platform · 0718e2d3
      Sean Christopherson authored
      commit f958bd23 upstream.
      
      Unlike most state managed by XSAVE, MPX is initialized to zero on INIT.
      Because INITs are usually recognized in the context of a VCPU_RUN call,
      kvm_vcpu_reset() puts the guest's FPU so that the FPU state is resident
      in memory, zeros the MPX state, and reloads FPU state to hardware.  But,
      in the unlikely event that an INIT is recognized during
      kvm_arch_vcpu_ioctl_get_mpstate() via kvm_apic_accept_events(),
      kvm_vcpu_reset() will call kvm_put_guest_fpu() without a preceding
      kvm_load_guest_fpu() and corrupt the guest's FPU state (and possibly
      userspace's FPU state as well).
      
      Given that MPX is being removed from the kernel[*], fix the bug with the
      simple-but-ugly approach of loading the guest's FPU during
      KVM_GET_MP_STATE.
      
      [*] See commit f240652b ("x86/mpx: Remove MPX APIs").
      
      Fixes: f775b13e
      
       ("x86,kvm: move qemu/guest FPU switching out to vcpu_run")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0718e2d3
    • Marios Pomonis's avatar
      KVM: x86: Protect MSR-based index computations in fixed_msr_to_seg_unit() from... · 72324a1d
      Marios Pomonis authored
      KVM: x86: Protect MSR-based index computations in fixed_msr_to_seg_unit() from Spectre-v1/L1TF attacks
      
      commit 25a5edea upstream.
      
      This fixes a Spectre-v1/L1TF vulnerability in fixed_msr_to_seg_unit().
      This function contains index computations based on the
      (attacker-controlled) MSR number.
      
      Fixes: de9aef5e
      
       ("KVM: MTRR: introduce fixed_mtrr_segment table")
      
      Signed-off-by: default avatarNick Finco <nifi@google.com>
      Signed-off-by: default avatarMarios Pomonis <pomonis@google.com>
      Reviewed-by: default avatarAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      72324a1d
    • Marios Pomonis's avatar
      KVM: x86: Protect x86_decode_insn from Spectre-v1/L1TF attacks · 2fb35312
      Marios Pomonis authored
      commit 3c9053a2 upstream.
      
      This fixes a Spectre-v1/L1TF vulnerability in x86_decode_insn().
      kvm_emulate_instruction() (an ancestor of x86_decode_insn()) is an exported
      symbol, so KVM should treat it conservatively from a security perspective.
      
      Fixes: 045a282c
      
       ("KVM: emulator: implement fninit, fnstsw, fnstcw")
      
      Signed-off-by: default avatarNick Finco <nifi@google.com>
      Signed-off-by: default avatarMarios Pomonis <pomonis@google.com>
      Reviewed-by: default avatarAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2fb35312
    • Marios Pomonis's avatar
      KVM: x86: Protect MSR-based index computations from Spectre-v1/L1TF attacks in x86.c · f2a51431
      Marios Pomonis authored
      commit 6ec4c5ee upstream.
      
      This fixes a Spectre-v1/L1TF vulnerability in set_msr_mce() and
      get_msr_mce().
      Both functions contain index computations based on the
      (attacker-controlled) MSR number.
      
      Fixes: 890ca9ae
      
       ("KVM: Add MCE support")
      
      Signed-off-by: default avatarNick Finco <nifi@google.com>
      Signed-off-by: default avatarMarios Pomonis <pomonis@google.com>
      Reviewed-by: default avatarAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2a51431
    • Marios Pomonis's avatar
      KVM: x86: Protect ioapic_read_indirect() from Spectre-v1/L1TF attacks · a07fdd5f
      Marios Pomonis authored
      commit 8c86405f upstream.
      
      This fixes a Spectre-v1/L1TF vulnerability in ioapic_read_indirect().
      This function contains index computations based on the
      (attacker-controlled) IOREGSEL register.
      
      Fixes: a2c118bf
      
       ("KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798)")
      
      Signed-off-by: default avatarNick Finco <nifi@google.com>
      Signed-off-by: default avatarMarios Pomonis <pomonis@google.com>
      Reviewed-by: default avatarAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a07fdd5f
    • Marios Pomonis's avatar
      KVM: x86: Protect MSR-based index computations in pmu.h from Spectre-v1/L1TF attacks · c09be769
      Marios Pomonis authored
      commit 13c5183a upstream.
      
      This fixes a Spectre-v1/L1TF vulnerability in the get_gp_pmc() and
      get_fixed_pmc() functions.
      They both contain index computations based on the (attacker-controlled)
      MSR number.
      
      Fixes: 25462f7f
      
       ("KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch")
      
      Signed-off-by: default avatarNick Finco <nifi@google.com>
      Signed-off-by: default avatarMarios Pomonis <pomonis@google.com>
      Reviewed-by: default avatarAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c09be769
    • Marios Pomonis's avatar
      KVM: x86: Protect ioapic_write_indirect() from Spectre-v1/L1TF attacks · 2f8a1375
      Marios Pomonis authored
      commit 67056455 upstream.
      
      This fixes a Spectre-v1/L1TF vulnerability in ioapic_write_indirect().
      This function contains index computations based on the
      (attacker-controlled) IOREGSEL register.
      
      This patch depends on patch
      "KVM: x86: Protect ioapic_read_indirect() from Spectre-v1/L1TF attacks".
      
      Fixes: 70f93dae
      
       ("KVM: Use temporary variable to shorten lines.")
      
      Signed-off-by: default avatarNick Finco <nifi@google.com>
      Signed-off-by: default avatarMarios Pomonis <pomonis@google.com>
      Reviewed-by: default avatarAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f8a1375
    • Marios Pomonis's avatar
      KVM: x86: Protect kvm_hv_msr_[get|set]_crash_data() from Spectre-v1/L1TF attacks · c8a6b591
      Marios Pomonis authored
      commit 86187937 upstream.
      
      This fixes Spectre-v1/L1TF vulnerabilities in kvm_hv_msr_get_crash_data()
      and kvm_hv_msr_set_crash_data().
      These functions contain index computations that use the
      (attacker-controlled) MSR number.
      
      Fixes: e7d9513b
      
       ("kvm/x86: added hyper-v crash msrs into kvm hyperv context")
      
      Signed-off-by: default avatarNick Finco <nifi@google.com>
      Signed-off-by: default avatarMarios Pomonis <pomonis@google.com>
      Reviewed-by: default avatarAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8a6b591
    • Marios Pomonis's avatar
      KVM: x86: Protect kvm_lapic_reg_write() from Spectre-v1/L1TF attacks · bf13472e
      Marios Pomonis authored
      commit 4bf79cb0 upstream.
      
      This fixes a Spectre-v1/L1TF vulnerability in kvm_lapic_reg_write().
      This function contains index computations based on the
      (attacker-controlled) MSR number.
      
      Fixes: 0105d1a5
      
       ("KVM: x2apic interface to lapic")
      
      Signed-off-by: default avatarNick Finco <nifi@google.com>
      Signed-off-by: default avatarMarios Pomonis <pomonis@google.com>
      Reviewed-by: default avatarAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf13472e
    • Marios Pomonis's avatar
      KVM: x86: Protect DR-based index computations from Spectre-v1/L1TF attacks · 8b73ccf4
      Marios Pomonis authored
      commit ea740059 upstream.
      
      This fixes a Spectre-v1/L1TF vulnerability in __kvm_set_dr() and
      kvm_get_dr().
      Both kvm_get_dr() and kvm_set_dr() (a wrapper of __kvm_set_dr()) are
      exported symbols so KVM should tream them conservatively from a security
      perspective.
      
      Fixes: 020df079
      
       ("KVM: move DR register access handling into generic code")
      
      Signed-off-by: default avatarNick Finco <nifi@google.com>
      Signed-off-by: default avatarMarios Pomonis <pomonis@google.com>
      Reviewed-by: default avatarAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b73ccf4
    • Marios Pomonis's avatar
      KVM: x86: Protect pmu_intel.c from Spectre-v1/L1TF attacks · c2b02d09
      Marios Pomonis authored
      commit 66061740 upstream.
      
      This fixes Spectre-v1/L1TF vulnerabilities in intel_find_fixed_event()
      and intel_rdpmc_ecx_to_pmc().
      kvm_rdpmc() (ancestor of intel_find_fixed_event()) and
      reprogram_fixed_counter() (ancestor of intel_rdpmc_ecx_to_pmc()) are
      exported symbols so KVM should treat them conservatively from a security
      perspective.
      
      Fixes: 25462f7f
      
       ("KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch")
      
      Signed-off-by: default avatarNick Finco <nifi@google.com>
      Signed-off-by: default avatarMarios Pomonis <pomonis@google.com>
      Reviewed-by: default avatarAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c2b02d09
    • Marios Pomonis's avatar
      KVM: x86: Refactor prefix decoding to prevent Spectre-v1/L1TF attacks · 79777eb8
      Marios Pomonis authored
      commit 125ffc5e
      
       upstream.
      
      This fixes Spectre-v1/L1TF vulnerabilities in
      vmx_read_guest_seg_selector(), vmx_read_guest_seg_base(),
      vmx_read_guest_seg_limit() and vmx_read_guest_seg_ar().  When
      invoked from emulation, these functions contain index computations
      based on the (attacker-influenced) segment value.  Using constants
      prevents the attack.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79777eb8
    • Marios Pomonis's avatar
      KVM: x86: Refactor picdev_write() to prevent Spectre-v1/L1TF attacks · 443fd004
      Marios Pomonis authored
      commit 14e32321 upstream.
      
      This fixes a Spectre-v1/L1TF vulnerability in picdev_write().
      It replaces index computations based on the (attacked-controlled) port
      number with constants through a minor refactoring.
      
      Fixes: 85f455f7
      
       ("KVM: Add support for in-kernel PIC emulation")
      
      Signed-off-by: default avatarNick Finco <nifi@google.com>
      Signed-off-by: default avatarMarios Pomonis <pomonis@google.com>
      Reviewed-by: default avatarAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      443fd004
    • Jens Axboe's avatar
      aio: prevent potential eventfd recursion on poll · 8dcbf268
      Jens Axboe authored
      commit 01d7a356
      
       upstream.
      
      If we have nested or circular eventfd wakeups, then we can deadlock if
      we run them inline from our poll waitqueue wakeup handler. It's also
      possible to have very long chains of notifications, to the extent where
      we could risk blowing the stack.
      
      Check the eventfd recursion count before calling eventfd_signal(). If
      it's non-zero, then punt the signaling to async context. This is always
      safe, as it takes us out-of-line in terms of stack and locking context.
      
      Cc: stable@vger.kernel.org # 4.19+
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8dcbf268
    • Jens Axboe's avatar
      eventfd: track eventfd_signal() recursion depth · 844d2025
      Jens Axboe authored
      commit b5e683d5
      
       upstream.
      
      eventfd use cases from aio and io_uring can deadlock due to circular
      or resursive calling, when eventfd_signal() tries to grab the waitqueue
      lock. On top of that, it's also possible to construct notification
      chains that are deep enough that we could blow the stack.
      
      Add a percpu counter that tracks the percpu recursion depth, warn if we
      exceed it. The counter is also exposed so that users of eventfd_signal()
      can do the right thing if it's non-zero in the context where it is
      called.
      
      Cc: stable@vger.kernel.org # 4.19+
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      844d2025
    • Coly Li's avatar
      bcache: add readahead cache policy options via sysfs interface · d5d6b588
      Coly Li authored
      commit 038ba8cc
      
       upstream.
      
      In year 2007 high performance SSD was still expensive, in order to
      save more space for real workload or meta data, the readahead I/Os
      for non-meta data was bypassed and not cached on SSD.
      
      In now days, SSD price drops a lot and people can find larger size
      SSD with more comfortable price. It is unncessary to alway bypass
      normal readahead I/Os to save SSD space for now.
      
      This patch adds options for readahead data cache policies via sysfs
      file /sys/block/bcache<N>/readahead_cache_policy, the options are,
      - "all": cache all readahead data I/Os.
      - "meta-only": only cache meta data, and bypass other regular I/Os.
      
      If users want to make bcache continue to only cache readahead request
      for metadata and bypass regular data readahead, please set "meta-only"
      to this sysfs file. By default, bcache will back to cache all read-
      ahead requests now.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Acked-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Michael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5d6b588
    • Vladis Dronov's avatar
      watchdog: fix UAF in reboot notifier handling in watchdog core code · f158399c
      Vladis Dronov authored
      commit 69503e58 upstream.
      
      After the commit 44ea3942 ("drivers/watchdog: make use of
      devm_register_reboot_notifier()") the struct notifier_block reboot_nb in
      the struct watchdog_device is removed from the reboot notifiers chain at
      the time watchdog's chardev is closed. But at least in i6300esb.c case
      reboot_nb is embedded in the struct esb_dev which can be freed on its
      device removal and before the chardev is closed, thus UAF at reboot:
      
      [    7.728581] esb_probe: esb_dev.watchdog_device ffff91316f91ab28
      ts# uname -r                            note the address ^^^
      5.5.0-rc5-ae6088-wdog
      ts# ./openwdog0 &
      [1] 696
      ts# opened /dev/watchdog0, sleeping 10s...
      ts# echo 1 > /sys/devices/pci0000\:00/0000\:00\:09.0/remove
      [  178.086079] devres:rel_nodes: dev ffff91317668a0b0 data ffff91316f91ab28
                 esb_dev.watchdog_device.reboot_nb memory is freed here ^^^
      ts# ...woken up
      [  181.459010] devres:rel_nodes: dev ffff913171781000 data ffff913174a1dae8
      [  181.460195] devm_unreg_reboot_notifier: res ffff913174a1dae8 nb ffff91316f91ab78
                                           attempt to use memory already freed ^^^
      [  181.461063] devm_unreg_reboot_notifier: nb->call 6b6b6b6b6b6b6b6b
      [  181.461243] devm_unreg_reboot_notifier: nb->next 6b6b6b6b6b6b6b6b
                      freed memory is filled with a slub poison ^^^
      [1]+  Done                    ./openwdog0
      ts# reboot
      [  229.921862] systemd-shutdown[1]: Rebooting.
      [  229.939265] notifier_call_chain: nb ffffffff9c6c2f20 nb->next ffffffff9c6d50c0
      [  229.943080] notifier_call_chain: nb ffffffff9c6d50c0 nb->next 6b6b6b6b6b6b6b6b
      [  229.946054] notifier_call_chain: nb 6b6b6b6b6b6b6b6b INVAL
      [  229.957584] general protection fault: 0000 [#1] SMP
      [  229.958770] CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 5.5.0-rc5-ae6088-wdog
      [  229.960224] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
      [  229.963288] RIP: 0010:notifier_call_chain+0x66/0xd0
      [  229.969082] RSP: 0018:ffffb20dc0013d88 EFLAGS: 00010246
      [  229.970812] RAX: 000000000000002e RBX: 6b6b6b6b6b6b6b6b RCX: 00000000000008b3
      [  229.972929] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffffffff9ccc46ac
      [  229.975028] RBP: 0000000000000001 R08: 0000000000000000 R09: 00000000000008b3
      [  229.977039] R10: 0000000000000001 R11: ffffffff9c26c740 R12: 0000000000000000
      [  229.979155] R13: 6b6b6b6b6b6b6b6b R14: 0000000000000000 R15: 00000000fffffffa
      ...   slub_debug=FZP poison ^^^
      [  229.989089] Call Trace:
      [  229.990157]  blocking_notifier_call_chain+0x43/0x59
      [  229.991401]  kernel_restart_prepare+0x14/0x30
      [  229.992607]  kernel_restart+0x9/0x30
      [  229.993800]  __do_sys_reboot+0x1d2/0x210
      [  230.000149]  do_syscall_64+0x3d/0x130
      [  230.001277]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  230.002639] RIP: 0033:0x7f5461bdd177
      [  230.016402] Modules linked in: i6300esb
      [  230.050261] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
      
      Fix the crash by reverting 44ea3942 so unregister_reboot_notifier()
      is called when watchdog device is removed. This also makes handling of
      the reboot notifier unified with the handling of the restart handler,
      which is freed with unregister_restart_handler() in the same place.
      
      Fixes: 44ea3942
      
       ("drivers/watchdog: make use of devm_register_reboot_notifier()")
      Cc: stable@vger.kernel.org # v4.15+
      Signed-off-by: default avatarVladis Dronov <vdronov@redhat.com>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Link: https://lore.kernel.org/r/20200108125347.6067-1-vdronov@redhat.com
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarWim Van Sebroeck <wim@linux-watchdog.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f158399c