Skip to content
  1. Sep 05, 2022
    • Letu Ren's avatar
      fbdev: fb_pm2fb: Avoid potential divide by zero error · 8fc778ee
      Letu Ren authored
      commit 19f953e7
      
       upstream.
      
      In `do_fb_ioctl()` of fbmem.c, if cmd is FBIOPUT_VSCREENINFO, var will be
      copied from user, then go through `fb_set_var()` and
      `info->fbops->fb_check_var()` which could may be `pm2fb_check_var()`.
      Along the path, `var->pixclock` won't be modified. This function checks
      whether reciprocal of `var->pixclock` is too high. If `var->pixclock` is
      zero, there will be a divide by zero error. So, it is necessary to check
      whether denominator is zero to avoid crash. As this bug is found by
      Syzkaller, logs are listed below.
      
      divide error in pm2fb_check_var
      Call Trace:
       <TASK>
       fb_set_var+0x367/0xeb0 drivers/video/fbdev/core/fbmem.c:1015
       do_fb_ioctl+0x234/0x670 drivers/video/fbdev/core/fbmem.c:1110
       fb_ioctl+0xdd/0x130 drivers/video/fbdev/core/fbmem.c:1189
      
      Reported-by: default avatarZheyu Ma <zheyuma97@gmail.com>
      Signed-off-by: default avatarLetu Ren <fantasquex@gmail.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8fc778ee
    • Hawkins Jiawei's avatar
      net: fix refcount bug in sk_psock_get (2) · 61cc7985
      Hawkins Jiawei authored
      commit 2a013372 upstream.
      
      Syzkaller reports refcount bug as follows:
      ------------[ cut here ]------------
      refcount_t: saturated; leaking memory.
      WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19
      Modules linked in:
      CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda7d90 #0
       <TASK>
       __refcount_add_not_zero include/linux/refcount.h:163 [inline]
       __refcount_inc_not_zero include/linux/refcount.h:227 [inline]
       refcount_inc_not_zero include/linux/refcount.h:245 [inline]
       sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439
       tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091
       tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983
       tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057
       tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659
       tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682
       sk_backlog_rcv include/net/sock.h:1061 [inline]
       __release_sock+0x134/0x3b0 net/core/sock.c:2849
       release_sock+0x54/0x1b0 net/core/sock.c:3404
       inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909
       __sys_shutdown_sock net/socket.c:2331 [inline]
       __sys_shutdown_sock net/socket.c:2325 [inline]
       __sys_shutdown+0xf1/0x1b0 net/socket.c:2343
       __do_sys_shutdown net/socket.c:2351 [inline]
       __se_sys_shutdown net/socket.c:2349 [inline]
       __x64_sys_shutdown+0x50/0x70 net/socket.c:2349
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
       </TASK>
      
      During SMC fallback process in connect syscall, kernel will
      replaces TCP with SMC. In order to forward wakeup
      smc socket waitqueue after fallback, kernel will sets
      clcsk->sk_user_data to origin smc socket in
      smc_fback_replace_callbacks().
      
      Later, in shutdown syscall, kernel will calls
      sk_psock_get(), which treats the clcsk->sk_user_data
      as psock type, triggering the refcnt warning.
      
      So, the root cause is that smc and psock, both will use
      sk_user_data field. So they will mismatch this field
      easily.
      
      This patch solves it by using another bit(defined as
      SK_USER_DATA_PSOCK) in PTRMASK, to mark whether
      sk_user_data points to a psock object or not.
      This patch depends on a PTRMASK introduced in commit f1ff5ce2
      
      
      ("net, sk_msg: Clear sk_user_data pointer on clone if tagged").
      
      For there will possibly be more flags in the sk_user_data field,
      this patch also refactor sk_user_data flags code to be more generic
      to improve its maintainability.
      
      Reported-and-tested-by: default avatar <syzbot+5f26f85569bd179c18ce@syzkaller.appspotmail.com>
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Signed-off-by: default avatarHawkins Jiawei <yin31149@gmail.com>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61cc7985
    • Karthik Alapati's avatar
      HID: hidraw: fix memory leak in hidraw_release() · 7e2fa792
      Karthik Alapati authored
      commit a5623a20
      
       upstream.
      
      Free the buffered reports before deleting the list entry.
      
      BUG: memory leak
      unreferenced object 0xffff88810e72f180 (size 32):
        comm "softirq", pid 0, jiffies 4294945143 (age 16.080s)
        hex dump (first 32 bytes):
          64 f3 c6 6a d1 88 07 04 00 00 00 00 00 00 00 00  d..j............
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff814ac6c3>] kmemdup+0x23/0x50 mm/util.c:128
          [<ffffffff8357c1d2>] kmemdup include/linux/fortify-string.h:440 [inline]
          [<ffffffff8357c1d2>] hidraw_report_event+0xa2/0x150 drivers/hid/hidraw.c:521
          [<ffffffff8356ddad>] hid_report_raw_event+0x27d/0x740 drivers/hid/hid-core.c:1992
          [<ffffffff8356e41e>] hid_input_report+0x1ae/0x270 drivers/hid/hid-core.c:2065
          [<ffffffff835f0d3f>] hid_irq_in+0x1ff/0x250 drivers/hid/usbhid/hid-core.c:284
          [<ffffffff82d3c7f9>] __usb_hcd_giveback_urb+0xf9/0x230 drivers/usb/core/hcd.c:1670
          [<ffffffff82d3cc26>] usb_hcd_giveback_urb+0x1b6/0x1d0 drivers/usb/core/hcd.c:1747
          [<ffffffff82ef1e14>] dummy_timer+0x8e4/0x14c0 drivers/usb/gadget/udc/dummy_hcd.c:1988
          [<ffffffff812f50a8>] call_timer_fn+0x38/0x200 kernel/time/timer.c:1474
          [<ffffffff812f5586>] expire_timers kernel/time/timer.c:1519 [inline]
          [<ffffffff812f5586>] __run_timers.part.0+0x316/0x430 kernel/time/timer.c:1790
          [<ffffffff812f56e4>] __run_timers kernel/time/timer.c:1768 [inline]
          [<ffffffff812f56e4>] run_timer_softirq+0x44/0x90 kernel/time/timer.c:1803
          [<ffffffff848000e6>] __do_softirq+0xe6/0x2ea kernel/softirq.c:571
          [<ffffffff81246db0>] invoke_softirq kernel/softirq.c:445 [inline]
          [<ffffffff81246db0>] __irq_exit_rcu kernel/softirq.c:650 [inline]
          [<ffffffff81246db0>] irq_exit_rcu+0xc0/0x110 kernel/softirq.c:662
          [<ffffffff84574f02>] sysvec_apic_timer_interrupt+0xa2/0xd0 arch/x86/kernel/apic/apic.c:1106
          [<ffffffff84600c8b>] asm_sysvec_apic_timer_interrupt+0x1b/0x20 arch/x86/include/asm/idtentry.h:649
          [<ffffffff8458a070>] native_safe_halt arch/x86/include/asm/irqflags.h:51 [inline]
          [<ffffffff8458a070>] arch_safe_halt arch/x86/include/asm/irqflags.h:89 [inline]
          [<ffffffff8458a070>] acpi_safe_halt drivers/acpi/processor_idle.c:111 [inline]
          [<ffffffff8458a070>] acpi_idle_do_entry+0xc0/0xd0 drivers/acpi/processor_idle.c:554
      
      Link: https://syzkaller.appspot.com/bug?id=19a04b43c75ed1092021010419b5e560a8172c4f
      Reported-by: default avatar <syzbot+f59100a0428e6ded9443@syzkaller.appspotmail.com>
      Signed-off-by: default avatarKarthik Alapati <mail@karthek.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e2fa792
    • Dongliang Mu's avatar
      media: pvrusb2: fix memory leak in pvr_probe · bacb37bd
      Dongliang Mu authored
      commit 945a9a8e
      
       upstream.
      
      The error handling code in pvr2_hdw_create forgets to unregister the
      v4l2 device. When pvr2_hdw_create returns back to pvr2_context_create,
      it calls pvr2_context_destroy to destroy context, but mp->hdw is NULL,
      which leads to that pvr2_hdw_destroy directly returns.
      
      Fix this by adding v4l2_device_unregister to decrease the refcount of
      usb interface.
      
      Reported-by: default avatar <syzbot+77b432d57c4791183ed4@syzkaller.appspotmail.com>
      Signed-off-by: default avatarDongliang Mu <mudongliangabcd@gmail.com>
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bacb37bd
    • Vivek Kasireddy's avatar
      udmabuf: Set the DMA mask for the udmabuf device (v2) · 872875c9
      Vivek Kasireddy authored
      commit 9e9fa6a9
      
       upstream.
      
      If the DMA mask is not set explicitly, the following warning occurs
      when the userspace tries to access the dma-buf via the CPU as
      reported by syzbot here:
      
      WARNING: CPU: 1 PID: 3595 at kernel/dma/mapping.c:188
      __dma_map_sg_attrs+0x181/0x1f0 kernel/dma/mapping.c:188
      Modules linked in:
      CPU: 0 PID: 3595 Comm: syz-executor249 Not tainted
      5.17.0-rc2-syzkaller-00316-g0457e5153e0e #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:__dma_map_sg_attrs+0x181/0x1f0 kernel/dma/mapping.c:188
      Code: 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 75 71 4c 8b 3d c0
      83 b5 0d e9 db fe ff ff e8 b6 0f 13 00 0f 0b e8 af 0f 13 00 <0f> 0b 45
         31 e4 e9 54 ff ff ff e8 a0 0f 13 00 49 8d 7f 50 48 b8 00
      RSP: 0018:ffffc90002a07d68 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: ffff88807e25e2c0 RSI: ffffffff81649e91 RDI: ffff88801b848408
      RBP: ffff88801b848000 R08: 0000000000000002 R09: ffff88801d86c74f
      R10: ffffffff81649d72 R11: 0000000000000001 R12: 0000000000000002
      R13: ffff88801d86c680 R14: 0000000000000001 R15: 0000000000000000
      FS:  0000555556e30300(0000) GS:ffff8880b9d00000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000200000cc CR3: 000000001d74a000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       dma_map_sgtable+0x70/0xf0 kernel/dma/mapping.c:264
       get_sg_table.isra.0+0xe0/0x160 drivers/dma-buf/udmabuf.c:72
       begin_cpu_udmabuf+0x130/0x1d0 drivers/dma-buf/udmabuf.c:126
       dma_buf_begin_cpu_access+0xfd/0x1d0 drivers/dma-buf/dma-buf.c:1164
       dma_buf_ioctl+0x259/0x2b0 drivers/dma-buf/dma-buf.c:363
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:874 [inline]
       __se_sys_ioctl fs/ioctl.c:860 [inline]
       __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f62fcf530f9
      Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89
      f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
      f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007ffe3edab9b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f62fcf530f9
      RDX: 0000000020000200 RSI: 0000000040086200 RDI: 0000000000000006
      RBP: 00007f62fcf170e0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f62fcf17170
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
       </TASK>
      
      v2: Dont't forget to deregister if DMA mask setup fails.
      
      Reported-by: default avatar <syzbot+10e27961f4da37c443b2@syzkaller.appspotmail.com>
      Cc: Gerd Hoffmann <kraxel@redhat.com>
      Signed-off-by: default avatarVivek Kasireddy <vivek.kasireddy@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20220520205235.3687336-1-vivek.kasireddy@intel.com
      Signed-off-by: default avatarGerd Hoffmann <kraxel@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      872875c9
    • Lee Jones's avatar
      HID: steam: Prevent NULL pointer dereference in steam_{recv,send}_report · dc815761
      Lee Jones authored
      commit cd11d1a6 upstream.
      
      It is possible for a malicious device to forgo submitting a Feature
      Report.  The HID Steam driver presently makes no prevision for this
      and de-references the 'struct hid_report' pointer obtained from the
      HID devices without first checking its validity.  Let's change that.
      
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
      Cc: linux-input@vger.kernel.org
      Fixes: c164d6ab
      
       ("HID: add driver for Valve Steam Controller")
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dc815761
    • Greg Kroah-Hartman's avatar
      Revert "PCI/portdrv: Don't disable AER reporting in get_port_device_capability()" · 412b8441
      Greg Kroah-Hartman authored
      This reverts commit ee70aa21 which is
      commit 8795e182
      
       upstream.
      
      It is reported to cause problems, so drop it from the stable trees for
      now until it gets sorted out.
      
      Link: https://lore.kernel.org/r/47b775c5-57fa-5edf-b59e-8a9041ffbee7@candelatech.com
      Reported-by: default avatarBen Greear <greearb@candelatech.com>
      Cc: Stefan Roese <sr@denx.de>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Pali Rohár <pali@kernel.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
      Cc: Michal Simek <michal.simek@xilinx.com>
      Cc: Yao Hongbo <yaohongbo@linux.alibaba.com>
      Cc: Naveen Naidu <naveennaidu479@gmail.com>
      Cc: Sasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      412b8441
    • Luiz Augusto von Dentz's avatar
      Bluetooth: L2CAP: Fix build errors in some archs · 38267d26
      Luiz Augusto von Dentz authored
      commit b840304f upstream.
      
      This attempts to fix the follow errors:
      
      In function 'memcmp',
          inlined from 'bacmp' at ./include/net/bluetooth/bluetooth.h:347:9,
          inlined from 'l2cap_global_chan_by_psm' at
          net/bluetooth/l2cap_core.c:2003:15:
      ./include/linux/fortify-string.h:44:33: error: '__builtin_memcmp'
      specified bound 6 exceeds source size 0 [-Werror=stringop-overread]
         44 | #define __underlying_memcmp     __builtin_memcmp
            |                                 ^
      ./include/linux/fortify-string.h:420:16: note: in expansion of macro
      '__underlying_memcmp'
        420 |         return __underlying_memcmp(p, q, size);
            |                ^~~~~~~~~~~~~~~~~~~
      In function 'memcmp',
          inlined from 'bacmp' at ./include/net/bluetooth/bluetooth.h:347:9,
          inlined from 'l2cap_global_chan_by_psm' at
          net/bluetooth/l2cap_core.c:2004:15:
      ./include/linux/fortify-string.h:44:33: error: '__builtin_memcmp'
      specified bound 6 exceeds source size 0 [-Werror=stringop-overread]
         44 | #define __underlying_memcmp     __builtin_memcmp
            |                                 ^
      ./include/linux/fortify-string.h:420:16: note: in expansion of macro
      '__underlying_memcmp'
        420 |         return __underlying_memcmp(p, q, size);
            |                ^~~~~~~~~~~~~~~~~~~
      
      Fixes: 332f1795
      
       ("Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm regression")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      38267d26
    • Jing Leng's avatar
      kbuild: Fix include path in scripts/Makefile.modpost · ad697ade
      Jing Leng authored
      commit 23a0cb8e
      
       upstream.
      
      When building an external module, if users don't need to separate the
      compilation output and source code, they run the following command:
      "make -C $(LINUX_SRC_DIR) M=$(PWD)". At this point, "$(KBUILD_EXTMOD)"
      and "$(src)" are the same.
      
      If they need to separate them, they run "make -C $(KERNEL_SRC_DIR)
      O=$(KERNEL_OUT_DIR) M=$(OUT_DIR) src=$(PWD)". Before running the
      command, they need to copy "Kbuild" or "Makefile" to "$(OUT_DIR)" to
      prevent compilation failure.
      
      So the kernel should change the included path to avoid the copy operation.
      
      Signed-off-by: default avatarJing Leng <jleng@ambarella.com>
      [masahiro: I do not think "M=$(OUT_DIR) src=$(PWD)" is the official way,
      but this patch is a nice clean up anyway.]
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Signed-off-by: default avatarNicolas Schier <n.schier@avm.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ad697ade
    • Gerald Schaefer's avatar
      s390/mm: do not trigger write fault when vma does not allow VM_WRITE · b9feeb61
      Gerald Schaefer authored
      commit 41ac42f1 upstream.
      
      For non-protection pXd_none() page faults in do_dat_exception(), we
      call do_exception() with access == (VM_READ | VM_WRITE | VM_EXEC).
      In do_exception(), vma->vm_flags is checked against that before
      calling handle_mm_fault().
      
      Since commit 92f842ea ("[S390] store indication fault optimization"),
      we call handle_mm_fault() with FAULT_FLAG_WRITE, when recognizing that
      it was a write access. However, the vma flags check is still only
      checking against (VM_READ | VM_WRITE | VM_EXEC), and therefore also
      calling handle_mm_fault() with FAULT_FLAG_WRITE in cases where the vma
      does not allow VM_WRITE.
      
      Fix this by changing access check in do_exception() to VM_WRITE only,
      when recognizing write access.
      
      Link: https://lkml.kernel.org/r/20220811103435.188481-3-david@redhat.com
      Fixes: 92f842ea
      
       ("[S390] store indication fault optimization")
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarGerald Schaefer <gerald.schaefer@linux.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarGerald Schaefer <gerald.schaefer@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9feeb61
    • Eric Biggers's avatar
      crypto: lib - remove unneeded selection of XOR_BLOCKS · 0dea6b3e
      Eric Biggers authored
      commit 874b3019 upstream.
      
      CRYPTO_LIB_CHACHA_GENERIC doesn't need to select XOR_BLOCKS.  It perhaps
      was thought that it's needed for __crypto_xor, but that's not the case.
      
      Enabling XOR_BLOCKS is problematic because the XOR_BLOCKS code runs a
      benchmark when it is initialized.  That causes a boot time regression on
      systems that didn't have it enabled before.
      
      Therefore, remove this unnecessary and problematic selection.
      
      Fixes: e56e1898
      
       ("lib/crypto: add prompts back to crypto libraries")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0dea6b3e
    • Peter Zijlstra's avatar
      x86/nospec: Fix i386 RSB stuffing · e5796ff9
      Peter Zijlstra authored
      commit 33292497 upstream.
      
      Turns out that i386 doesn't unconditionally have LFENCE, as such the
      loop in __FILL_RETURN_BUFFER isn't actually speculation safe on such
      chips.
      
      Fixes: ba6e31af
      
       ("x86/speculation: Add LFENCE to RSB fill sequence")
      Reported-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/Yv9tj9vbQ9nNlXoY@worktop.programming.kicks-ass.net
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e5796ff9
    • Peter Zijlstra's avatar
      x86/nospec: Unwreck the RSB stuffing · adee8f30
      Peter Zijlstra authored
      commit 4e3aa923 upstream.
      
      Commit 2b129932
      
       ("x86/speculation: Add RSB VM Exit protections")
      made a right mess of the RSB stuffing, rewrite the whole thing to not
      suck.
      
      Thanks to Andrew for the enlightening comment about Post-Barrier RSB
      things so we can make this code less magical.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/YvuNdDWoUZSBjYcm@worktop.programming.kicks-ass.net
      [bwh: Backported to 5.10: adjust context]
      Signed-off-by: default avatarBen Hutchings <benh@debian.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      adee8f30
    • Jann Horn's avatar
      mm: Force TLB flush for PFNMAP mappings before unlink_file_vma() · 895428ee
      Jann Horn authored
      commit b67fbebd upstream.
      
      Some drivers rely on having all VMAs through which a PFN might be
      accessible listed in the rmap for correctness.
      However, on X86, it was possible for a VMA with stale TLB entries
      to not be listed in the rmap.
      
      This was fixed in mainline with
      commit b67fbebd ("mmu_gather: Force tlb-flush VM_PFNMAP vmas"),
      but that commit relies on preceding refactoring in
      commit 18ba064e ("mmu_gather: Let there be one tlb_{start,end}_vma()
      implementation") and commit 1e9fdf21
      
       ("mmu_gather: Remove per arch
      tlb_{start,end}_vma()").
      
      This patch provides equivalent protection without needing that
      refactoring, by forcing a TLB flush between removing PTEs in
      unmap_vmas() and the call to unlink_file_vma() in free_pgtables().
      
      [This is a stable-specific rewrite of the upstream commit!]
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      895428ee
  2. Aug 31, 2022
    • Greg Kroah-Hartman's avatar
    • Daniel Borkmann's avatar
      bpf: Don't use tnum_range on array range checking for poke descriptors · e8979807
      Daniel Borkmann authored
      commit a657182a upstream.
      
      Hsin-Wei reported a KASAN splat triggered by their BPF runtime fuzzer which
      is based on a customized syzkaller:
      
        BUG: KASAN: slab-out-of-bounds in bpf_int_jit_compile+0x1257/0x13f0
        Read of size 8 at addr ffff888004e90b58 by task syz-executor.0/1489
        CPU: 1 PID: 1489 Comm: syz-executor.0 Not tainted 5.19.0 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
        1.13.0-1ubuntu1.1 04/01/2014
        Call Trace:
         <TASK>
         dump_stack_lvl+0x9c/0xc9
         print_address_description.constprop.0+0x1f/0x1f0
         ? bpf_int_jit_compile+0x1257/0x13f0
         kasan_report.cold+0xeb/0x197
         ? kvmalloc_node+0x170/0x200
         ? bpf_int_jit_compile+0x1257/0x13f0
         bpf_int_jit_compile+0x1257/0x13f0
         ? arch_prepare_bpf_dispatcher+0xd0/0xd0
         ? rcu_read_lock_sched_held+0x43/0x70
         bpf_prog_select_runtime+0x3e8/0x640
         ? bpf_obj_name_cpy+0x149/0x1b0
         bpf_prog_load+0x102f/0x2220
         ? __bpf_prog_put.constprop.0+0x220/0x220
         ? find_held_lock+0x2c/0x110
         ? __might_fault+0xd6/0x180
         ? lock_downgrade+0x6e0/0x6e0
         ? lock_is_held_type+0xa6/0x120
         ? __might_fault+0x147/0x180
         __sys_bpf+0x137b/0x6070
         ? bpf_perf_link_attach+0x530/0x530
         ? new_sync_read+0x600/0x600
         ? __fget_files+0x255/0x450
         ? lock_downgrade+0x6e0/0x6e0
         ? fput+0x30/0x1a0
         ? ksys_write+0x1a8/0x260
         __x64_sys_bpf+0x7a/0xc0
         ? syscall_enter_from_user_mode+0x21/0x70
         do_syscall_64+0x3b/0x90
         entry_SYSCALL_64_after_hwframe+0x63/0xcd
        RIP: 0033:0x7f917c4e2c2d
      
      The problem here is that a range of tnum_range(0, map->max_entries - 1) has
      limited ability to represent the concrete tight range with the tnum as the
      set of resulting states from value + mask can result in a superset of the
      actual intended range, and as such a tnum_in(range, reg->var_off) check may
      yield true when it shouldn't, for example tnum_range(0, 2) would result in
      00XX -> v = 0000, m = 0011 such that the intended set of {0, 1, 2} is here
      represented by a less precise superset of {0, 1, 2, 3}. As the register is
      known const scalar, really just use the concrete reg->var_off.value for the
      upper index check.
      
      Fixes: d2e4c1e6
      
       ("bpf: Constant map key tracking for prog array pokes")
      Reported-by: default avatarHsin-Wei Hung <hsinweih@uci.edu>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Shung-Hsi Yu <shung-hsi.yu@suse.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/984b37f9fdf7ac36831d2137415a4a915744c1b6.1661462653.git.daniel@iogearbox.net
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8979807
    • Saurabh Sengar's avatar
      scsi: storvsc: Remove WQ_MEM_RECLAIM from storvsc_error_wq · 46fcb0fc
      Saurabh Sengar authored
      commit d957e7ff upstream.
      
      storvsc_error_wq workqueue should not be marked as WQ_MEM_RECLAIM as it
      doesn't need to make forward progress under memory pressure.  Marking this
      workqueue as WQ_MEM_RECLAIM may cause deadlock while flushing a
      non-WQ_MEM_RECLAIM workqueue.  In the current state it causes the following
      warning:
      
      [   14.506347] ------------[ cut here ]------------
      [   14.506354] workqueue: WQ_MEM_RECLAIM storvsc_error_wq_0:storvsc_remove_lun is flushing !WQ_MEM_RECLAIM events_freezable_power_:disk_events_workfn
      [   14.506360] WARNING: CPU: 0 PID: 8 at <-snip->kernel/workqueue.c:2623 check_flush_dependency+0xb5/0x130
      [   14.506390] CPU: 0 PID: 8 Comm: kworker/u4:0 Not tainted 5.4.0-1086-azure #91~18.04.1-Ubuntu
      [   14.506391] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 05/09/2022
      [   14.506393] Workqueue: storvsc_error_wq_0 storvsc_remove_lun
      [   14.506395] RIP: 0010:check_flush_dependency+0xb5/0x130
      		<-snip->
      [   14.506408] Call Trace:
      [   14.506412]  __flush_work+0xf1/0x1c0
      [   14.506414]  __cancel_work_timer+0x12f/0x1b0
      [   14.506417]  ? kernfs_put+0xf0/0x190
      [   14.506418]  cancel_delayed_work_sync+0x13/0x20
      [   14.506420]  disk_block_events+0x78/0x80
      [   14.506421]  del_gendisk+0x3d/0x2f0
      [   14.506423]  sr_remove+0x28/0x70
      [   14.506427]  device_release_driver_internal+0xef/0x1c0
      [   14.506428]  device_release_driver+0x12/0x20
      [   14.506429]  bus_remove_device+0xe1/0x150
      [   14.506431]  device_del+0x167/0x380
      [   14.506432]  __scsi_remove_device+0x11d/0x150
      [   14.506433]  scsi_remove_device+0x26/0x40
      [   14.506434]  storvsc_remove_lun+0x40/0x60
      [   14.506436]  process_one_work+0x209/0x400
      [   14.506437]  worker_thread+0x34/0x400
      [   14.506439]  kthread+0x121/0x140
      [   14.506440]  ? process_one_work+0x400/0x400
      [   14.506441]  ? kthread_park+0x90/0x90
      [   14.506443]  ret_from_fork+0x35/0x40
      [   14.506445] ---[ end trace 2d9633159fdc6ee7 ]---
      
      Link: https://lore.kernel.org/r/1659628534-17539-1-git-send-email-ssengar@linux.microsoft.com
      Fixes: 436ad941
      
       ("scsi: storvsc: Allow only one remove lun work item to be issued per lun")
      Reviewed-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: default avatarSaurabh Sengar <ssengar@linux.microsoft.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      46fcb0fc
    • Kiwoong Kim's avatar
      scsi: ufs: core: Enable link lost interrupt · 8d5c106f
      Kiwoong Kim authored
      commit 6d17a112 upstream.
      
      Link lost is treated as fatal error with commit c99b9b23 ("scsi: ufs:
      Treat link loss as fatal error"), but the event isn't registered as
      interrupt source. Enable it.
      
      Link: https://lore.kernel.org/r/1659404551-160958-1-git-send-email-kwmad.kim@samsung.com
      Fixes: c99b9b23
      
       ("scsi: ufs: Treat link loss as fatal error")
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarKiwoong Kim <kwmad.kim@samsung.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d5c106f
    • Stephane Eranian's avatar
      perf/x86/intel/uncore: Fix broken read_counter() for SNB IMC PMU · c0ba9aa9
      Stephane Eranian authored
      commit 11745ecf upstream.
      
      Existing code was generating bogus counts for the SNB IMC bandwidth counters:
      
      $ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/
           1.000327813           1,024.03 MiB  uncore_imc/data_reads/
           1.000327813              20.73 MiB  uncore_imc/data_writes/
           2.000580153         261,120.00 MiB  uncore_imc/data_reads/
           2.000580153              23.28 MiB  uncore_imc/data_writes/
      
      The problem was introduced by commit:
        07ce734d ("perf/x86/intel/uncore: Clean up client IMC")
      
      Where the read_counter callback was replace to point to the generic
      uncore_mmio_read_counter() function.
      
      The SNB IMC counters are freerunnig 32-bit counters laid out contiguously in
      MMIO. But uncore_mmio_read_counter() is using a readq() call to read from
      MMIO therefore reading 64-bit from MMIO. Although this is okay for the
      uncore_perf_event_update() function because it is shifting the value based
      on the actual counter width to compute a delta, it is not okay for the
      uncore_pmu_event_start() which is simply reading the counter  and therefore
      priming the event->prev_count with a bogus value which is responsible for
      causing bogus deltas in the perf stat command above.
      
      The fix is to reintroduce the custom callback for read_counter for the SNB
      IMC PMU and use readl() instead of readq(). With the change the output of
      perf stat is back to normal:
      $ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/
           1.000120987             296.94 MiB  uncore_imc/data_reads/
           1.000120987             138.42 MiB  uncore_imc/data_writes/
           2.000403144             175.91 MiB  uncore_imc/data_reads/
           2.000403144              68.50 MiB  uncore_imc/data_writes/
      
      Fixes: 07ce734d
      
       ("perf/x86/intel/uncore: Clean up client IMC")
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Link: https://lore.kernel.org/r/20220803160031.1379788-1-eranian@google.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c0ba9aa9
    • James Clark's avatar
      perf python: Fix build when PYTHON_CONFIG is user supplied · 5a768c97
      James Clark authored
      commit bc9e7fe3 upstream.
      
      The previous change to Python autodetection had a small mistake where
      the auto value was used to determine the Python binary, rather than the
      user supplied value. The Python binary is only used for one part of the
      build process, rather than the final linking, so it was producing
      correct builds in most scenarios, especially when the auto detected
      value matched what the user wanted, or the system only had a valid set
      of Pythons.
      
      Change it so that the Python binary path is derived from either the
      PYTHON_CONFIG value or PYTHON value, depending on what is specified by
      the user. This was the original intention.
      
      This error was spotted in a build failure an odd cross compilation
      environment after commit 4c41cb46 ("perf python: Prefer
      python3") was merged.
      
      Fixes: 630af16e
      
       ("perf tools: Use Python devtools for version autodetection rather than runtime")
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220728093946.1337642-1-james.clark@arm.com
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5a768c97
    • Yu Kuai's avatar
      blk-mq: fix io hung due to missing commit_rqs · 3ddbd090
      Yu Kuai authored
      commit 65fac0d5 upstream.
      
      Currently, in virtio_scsi, if 'bd->last' is not set to true while
      dispatching request, such io will stay in driver's queue, and driver
      will wait for block layer to dispatch more rqs. However, if block
      layer failed to dispatch more rq, it should trigger commit_rqs to
      inform driver.
      
      There is a problem in blk_mq_try_issue_list_directly() that commit_rqs
      won't be called:
      
      // assume that queue_depth is set to 1, list contains two rq
      blk_mq_try_issue_list_directly
       blk_mq_request_issue_directly
       // dispatch first rq
       // last is false
        __blk_mq_try_issue_directly
         blk_mq_get_dispatch_budget
         // succeed to get first budget
         __blk_mq_issue_directly
          scsi_queue_rq
           cmd->flags |= SCMD_LAST
            virtscsi_queuecommand
             kick = (sc->flags & SCMD_LAST) != 0
             // kick is false, first rq won't issue to disk
       queued++
      
       blk_mq_request_issue_directly
       // dispatch second rq
        __blk_mq_try_issue_directly
         blk_mq_get_dispatch_budget
         // failed to get second budget
       ret == BLK_STS_RESOURCE
        blk_mq_request_bypass_insert
       // errors is still 0
      
       if (!list_empty(list) || errors && ...)
        // won't pass, commit_rqs won't be called
      
      In this situation, first rq relied on second rq to dispatch, while
      second rq relied on first rq to complete, thus they will both hung.
      
      Fix the problem by also treat 'BLK_STS_*RESOURCE' as 'errors' since
      it means that request is not queued successfully.
      
      Same problem exists in blk_mq_dispatch_rq_list(), 'BLK_STS_*RESOURCE'
      can't be treated as 'errors' here, fix the problem by calling
      commit_rqs if queue_rq return 'BLK_STS_*RESOURCE'.
      
      Fixes: d666ba98
      
       ("blk-mq: add mq_ops->commit_rqs()")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20220726122224.1790882-1-yukuai1@huaweicloud.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ddbd090
    • Salvatore Bonaccorso's avatar
      Documentation/ABI: Mention retbleed vulnerability info file for sysfs · 7ca73d0a
      Salvatore Bonaccorso authored
      commit 00da0cb3 upstream.
      
      While reporting for the AMD retbleed vulnerability was added in
      
        6b80b59b ("x86/bugs: Report AMD retbleed vulnerability")
      
      the new sysfs file was not mentioned so far in the ABI documentation for
      sysfs-devices-system-cpu. Fix that.
      
      Fixes: 6b80b59b
      
       ("x86/bugs: Report AMD retbleed vulnerability")
      Signed-off-by: default avatarSalvatore Bonaccorso <carnil@debian.org>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Link: https://lore.kernel.org/r/20220801091529.325327-1-carnil@debian.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7ca73d0a
    • Zenghui Yu's avatar
      arm64: Fix match_list for erratum 1286807 on Arm Cortex-A76 · 18962326
      Zenghui Yu authored
      commit 5e1e0874 upstream.
      
      Since commit 51f559d6 ("arm64: Enable repeat tlbi workaround on KRYO4XX
      gold CPUs"), we failed to detect erratum 1286807 on Cortex-A76 because its
      entry in arm64_repeat_tlbi_list[] was accidently corrupted by this commit.
      
      Fix this issue by creating a separate entry for Kryo4xx Gold.
      
      Fixes: 51f559d6
      
       ("arm64: Enable repeat tlbi workaround on KRYO4XX gold CPUs")
      Cc: Shreyas K K <quic_shrekk@quicinc.com>
      Signed-off-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      Acked-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20220809043848.969-1-yuzenghui@huawei.com
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18962326
    • Guoqing Jiang's avatar
      md: call __md_stop_writes in md_stop · a5a58fab
      Guoqing Jiang authored
      commit 0dd84b31 upstream.
      
      From the link [1], we can see raid1d was running even after the path
      raid_dtr -> md_stop -> __md_stop.
      
      Let's stop write first in destructor to align with normal md-raid to
      fix the KASAN issue.
      
      [1]. https://lore.kernel.org/linux-raid/CAPhsuW5gc4AakdGNdF8ubpezAuDLFOYUO_sfMZcec6hQFm8nhg@mail.gmail.com/T/#m7f12bf90481c02c6d2da68c64aeed4779b7df74a
      
      Fixes: 48df498d
      
       ("md: move bitmap_destroy to the beginning of __md_stop")
      Reported-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarGuoqing Jiang <guoqing.jiang@linux.dev>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a5a58fab
    • Guoqing Jiang's avatar
      Revert "md-raid: destroy the bitmap after destroying the thread" · f68f025c
      Guoqing Jiang authored
      commit 1d258758 upstream.
      
      This reverts commit e151db8e. Because it
      obviously breaks clustered raid as noticed by Neil though it fixed KASAN
      issue for dm-raid, let's revert it and fix KASAN issue in next commit.
      
      [1]. https://lore.kernel.org/linux-raid/a6657e08-b6a7-358b-2d2a-0ac37d49d23a@linux.dev/T/#m95ac225cab7409f66c295772483d091084a6d470
      
      Fixes: e151db8e
      
       ("md-raid: destroy the bitmap after destroying the thread")
      Signed-off-by: default avatarGuoqing Jiang <guoqing.jiang@linux.dev>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f68f025c
    • David Hildenbrand's avatar
      mm/hugetlb: fix hugetlb not supporting softdirty tracking · 62af37c5
      David Hildenbrand authored
      commit f96f7a40 upstream.
      
      Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2.
      
      I observed that hugetlb does not support/expect write-faults in shared
      mappings that would have to map the R/O-mapped page writable -- and I
      found two case where we could currently get such faults and would
      erroneously map an anon page into a shared mapping.
      
      Reproducers part of the patches.
      
      I propose to backport both fixes to stable trees.  The first fix needs a
      small adjustment.
      
      
      This patch (of 2):
      
      Staring at hugetlb_wp(), one might wonder where all the logic for shared
      mappings is when stumbling over a write-protected page in a shared
      mapping.  In fact, there is none, and so far we thought we could get away
      with that because e.g., mprotect() should always do the right thing and
      map all pages directly writable.
      
      Looks like we were wrong:
      
      --------------------------------------------------------------------------
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
       #include <fcntl.h>
       #include <unistd.h>
       #include <errno.h>
       #include <sys/mman.h>
      
       #define HUGETLB_SIZE (2 * 1024 * 1024u)
      
       static void clear_softdirty(void)
       {
               int fd = open("/proc/self/clear_refs", O_WRONLY);
               const char *ctrl = "4";
               int ret;
      
               if (fd < 0) {
                       fprintf(stderr, "open(clear_refs) failed\n");
                       exit(1);
               }
               ret = write(fd, ctrl, strlen(ctrl));
               if (ret != strlen(ctrl)) {
                       fprintf(stderr, "write(clear_refs) failed\n");
                       exit(1);
               }
               close(fd);
       }
      
       int main(int argc, char **argv)
       {
               char *map;
               int fd;
      
               fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT);
               if (!fd) {
                       fprintf(stderr, "open() failed\n");
                       return -errno;
               }
               if (ftruncate(fd, HUGETLB_SIZE)) {
                       fprintf(stderr, "ftruncate() failed\n");
                       return -errno;
               }
      
               map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
               if (map == MAP_FAILED) {
                       fprintf(stderr, "mmap() failed\n");
                       return -errno;
               }
      
               *map = 0;
      
               if (mprotect(map, HUGETLB_SIZE, PROT_READ)) {
                       fprintf(stderr, "mmprotect() failed\n");
                       return -errno;
               }
      
               clear_softdirty();
      
               if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) {
                       fprintf(stderr, "mmprotect() failed\n");
                       return -errno;
               }
      
               *map = 0;
      
               return 0;
       }
      --------------------------------------------------------------------------
      
      Above test fails with SIGBUS when there is only a single free hugetlb page.
       # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
       # ./test
       Bus error (core dumped)
      
      And worse, with sufficient free hugetlb pages it will map an anonymous page
      into a shared mapping, for example, messing up accounting during unmap
      and breaking MAP_SHARED semantics:
       # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
       # ./test
       # cat /proc/meminfo | grep HugePages_
       HugePages_Total:       2
       HugePages_Free:        1
       HugePages_Rsvd:    18446744073709551615
       HugePages_Surp:        0
      
      Reason in this particular case is that vma_wants_writenotify() will
      return "true", removing VM_SHARED in vma_set_page_prot() to map pages
      write-protected. Let's teach vma_wants_writenotify() that hugetlb does not
      support softdirty tracking.
      
      Link: https://lkml.kernel.org/r/20220811103435.188481-1-david@redhat.com
      Link: https://lkml.kernel.org/r/20220811103435.188481-2-david@redhat.com
      Fixes: 64e45507
      
       ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Jamie Liu <jamieliu@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: <stable@vger.kernel.org>	[3.18+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62af37c5
    • Juergen Gross's avatar
      xen/privcmd: fix error exit of privcmd_ioctl_dm_op() · 6de50db1
      Juergen Gross authored
      commit c5deb278 upstream.
      
      The error exit of privcmd_ioctl_dm_op() is calling unlock_pages()
      potentially with pages being NULL, leading to a NULL dereference.
      
      Additionally lock_pages() doesn't check for pin_user_pages_fast()
      having been completely successful, resulting in potentially not
      locking all pages into memory. This could result in sporadic failures
      when using the related memory in user mode.
      
      Fix all of that by calling unlock_pages() always with the real number
      of pinned pages, which will be zero in case pages being NULL, and by
      checking the number of pages pinned by pin_user_pages_fast() matching
      the expected number of pages.
      
      Cc: <stable@vger.kernel.org>
      Fixes: ab520be8
      
       ("xen/privcmd: Add IOCTL_PRIVCMD_DM_OP")
      Reported-by: default avatarRustam Subkhankulov <subkhankulov@ispras.ru>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Reviewed-by: default avatarOleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
      Link: https://lore.kernel.org/r/20220825141918.3581-1-jgross@suse.com
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6de50db1
    • Riwen Lu's avatar
      ACPI: processor: Remove freq Qos request for all CPUs · 8d5f8a4f
      Riwen Lu authored
      commit 36527b9d upstream.
      
      The freq Qos request would be removed repeatedly if the cpufreq policy
      relates to more than one CPU. Then, it would cause the "called for unknown
      object" warning.
      
      Remove the freq Qos request for each CPU relates to the cpufreq policy,
      instead of removing repeatedly for the last CPU of it.
      
      Fixes: a1bb46c3
      
       ("ACPI: processor: Add QoS requests for all CPUs")
      Reported-by: default avatarJeremy Linton <Jeremy.Linton@arm.com>
      Tested-by: default avatarJeremy Linton <jeremy.linton@arm.com>
      Signed-off-by: default avatarRiwen Lu <luriwen@kylinos.cn>
      Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d5f8a4f
    • Brian Foster's avatar
      s390: fix double free of GS and RI CBs on fork() failure · 297ae7e8
      Brian Foster authored
      commit 13cccafe
      
       upstream.
      
      The pointers for guarded storage and runtime instrumentation control
      blocks are stored in the thread_struct of the associated task. These
      pointers are initially copied on fork() via arch_dup_task_struct()
      and then cleared via copy_thread() before fork() returns. If fork()
      happens to fail after the initial task dup and before copy_thread(),
      the newly allocated task and associated thread_struct memory are
      freed via free_task() -> arch_release_task_struct(). This results in
      a double free of the guarded storage and runtime info structs
      because the fields in the failed task still refer to memory
      associated with the source task.
      
      This problem can manifest as a BUG_ON() in set_freepointer() (with
      CONFIG_SLAB_FREELIST_HARDENED enabled) or KASAN splat (if enabled)
      when running trinity syscall fuzz tests on s390x. To avoid this
      problem, clear the associated pointer fields in
      arch_dup_task_struct() immediately after the new task is copied.
      Note that the RI flag is still cleared in copy_thread() because it
      resides in thread stack memory and that is where stack info is
      copied.
      
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Fixes: 8d9047f8 ("s390/runtime instrumentation: simplify task exit handling")
      Fixes: 7b83c629
      
       ("s390/guarded storage: simplify task exit handling")
      Cc: <stable@vger.kernel.org> # 4.15
      Reviewed-by: default avatarGerald Schaefer <gerald.schaefer@linux.ibm.com>
      Reviewed-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220816155407.537372-1-bfoster@redhat.com
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      297ae7e8
    • Quanyang Wang's avatar
      asm-generic: sections: refactor memory_intersects · c60ae878
      Quanyang Wang authored
      commit 0c7d7cc2 upstream.
      
      There are two problems with the current code of memory_intersects:
      
      First, it doesn't check whether the region (begin, end) falls inside the
      region (virt, vend), that is (virt < begin && vend > end).
      
      The second problem is if vend is equal to begin, it will return true but
      this is wrong since vend (virt + size) is not the last address of the
      memory region but (virt + size -1) is.  The wrong determination will
      trigger the misreporting when the function check_for_illegal_area calls
      memory_intersects to check if the dma region intersects with stext region.
      
      The misreporting is as below (stext is at 0x80100000):
       WARNING: CPU: 0 PID: 77 at kernel/dma/debug.c:1073 check_for_illegal_area+0x130/0x168
       DMA-API: chipidea-usb2 e0002000.usb: device driver maps memory from kernel text or rodata [addr=800f0000] [len=65536]
       Modules linked in:
       CPU: 1 PID: 77 Comm: usb-storage Not tainted 5.19.0-yocto-standard #5
       Hardware name: Xilinx Zynq Platform
        unwind_backtrace from show_stack+0x18/0x1c
        show_stack from dump_stack_lvl+0x58/0x70
        dump_stack_lvl from __warn+0xb0/0x198
        __warn from warn_slowpath_fmt+0x80/0xb4
        warn_slowpath_fmt from check_for_illegal_area+0x130/0x168
        check_for_illegal_area from debug_dma_map_sg+0x94/0x368
        debug_dma_map_sg from __dma_map_sg_attrs+0x114/0x128
        __dma_map_sg_attrs from dma_map_sg_attrs+0x18/0x24
        dma_map_sg_attrs from usb_hcd_map_urb_for_dma+0x250/0x3b4
        usb_hcd_map_urb_for_dma from usb_hcd_submit_urb+0x194/0x214
        usb_hcd_submit_urb from usb_sg_wait+0xa4/0x118
        usb_sg_wait from usb_stor_bulk_transfer_sglist+0xa0/0xec
        usb_stor_bulk_transfer_sglist from usb_stor_bulk_srb+0x38/0x70
        usb_stor_bulk_srb from usb_stor_Bulk_transport+0x150/0x360
        usb_stor_Bulk_transport from usb_stor_invoke_transport+0x38/0x440
        usb_stor_invoke_transport from usb_stor_control_thread+0x1e0/0x238
        usb_stor_control_thread from kthread+0xf8/0x104
        kthread from ret_from_fork+0x14/0x2c
      
      Refactor memory_intersects to fix the two problems above.
      
      Before the 1d7db834 ("dma-debug: use memory_intersects()
      directly"), memory_intersects is called only by printk_late_init:
      
      printk_late_init -> init_section_intersects ->memory_intersects.
      
      There were few places where memory_intersects was called.
      
      When commit 1d7db834 ("dma-debug: use memory_intersects()
      directly") was merged and CONFIG_DMA_API_DEBUG is enabled, the DMA
      subsystem uses it to check for an illegal area and the calltrace above
      is triggered.
      
      [akpm@linux-foundation.org: fix nearby comment typo]
      Link: https://lkml.kernel.org/r/20220819081145.948016-1-quanyang.wang@windriver.com
      Fixes: 97955936
      
       ("asm/sections: add helpers to check for section data")
      Signed-off-by: default avatarQuanyang Wang <quanyang.wang@windriver.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c60ae878
    • Siddh Raman Pant's avatar
      loop: Check for overflow while configuring loop · 68589331
      Siddh Raman Pant authored
      commit c490a0b5
      
       upstream.
      
      The userspace can configure a loop using an ioctl call, wherein
      a configuration of type loop_config is passed (see lo_ioctl()'s
      case on line 1550 of drivers/block/loop.c). This proceeds to call
      loop_configure() which in turn calls loop_set_status_from_info()
      (see line 1050 of loop.c), passing &config->info which is of type
      loop_info64*. This function then sets the appropriate values, like
      the offset.
      
      loop_device has lo_offset of type loff_t (see line 52 of loop.c),
      which is typdef-chained to long long, whereas loop_info64 has
      lo_offset of type __u64 (see line 56 of include/uapi/linux/loop.h).
      
      The function directly copies offset from info to the device as
      follows (See line 980 of loop.c):
      	lo->lo_offset = info->lo_offset;
      
      This results in an overflow, which triggers a warning in iomap_iter()
      due to a call to iomap_iter_done() which has:
      	WARN_ON_ONCE(iter->iomap.offset > iter->pos);
      
      Thus, check for negative value during loop_set_status_from_info().
      
      Bug report: https://syzkaller.appspot.com/bug?id=c620fe14aac810396d3c3edc9ad73848bf69a29e
      
      Reported-and-tested-by: default avatar <syzbot+a8e049cd3abd342936b6@syzkaller.appspotmail.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarSiddh Raman Pant <code@siddh.me>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220823160810.181275-1-code@siddh.me
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68589331
    • Pawan Gupta's avatar
      x86/bugs: Add "unknown" reporting for MMIO Stale Data · 14cbbb9c
      Pawan Gupta authored
      commit 7df54884 upstream.
      
      Older Intel CPUs that are not in the affected processor list for MMIO
      Stale Data vulnerabilities currently report "Not affected" in sysfs,
      which may not be correct. Vulnerability status for these older CPUs is
      unknown.
      
      Add known-not-affected CPUs to the whitelist. Report "unknown"
      mitigation status for CPUs that are not in blacklist, whitelist and also
      don't enumerate MSR ARCH_CAPABILITIES bits that reflect hardware
      immunity to MMIO Stale Data vulnerabilities.
      
      Mitigation is not deployed when the status is unknown.
      
        [ bp: Massage, fixup. ]
      
      Fixes: 8d50cdf8
      
       ("x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data")
      Suggested-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Suggested-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/a932c154772f2121794a5f2eded1a11013114711.1657846269.git.pawan.kumar.gupta@linux.intel.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      14cbbb9c
    • Chen Zhongjin's avatar
      x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry · e3e0d117
      Chen Zhongjin authored
      commit fc2e426b upstream.
      
      When meeting ftrace trampolines in ORC unwinding, unwinder uses address
      of ftrace_{regs_}call address to find the ORC entry, which gets next frame at
      sp+176.
      
      If there is an IRQ hitting at sub $0xa8,%rsp, the next frame should be
      sp+8 instead of 176. It makes unwinder skip correct frame and throw
      warnings such as "wrong direction" or "can't access registers", etc,
      depending on the content of the incorrect frame address.
      
      By adding the base address ftrace_{regs_}caller with the offset
      *ip - ops->trampoline*, we can get the correct address to find the ORC entry.
      
      Also change "caller" to "tramp_addr" to make variable name conform to
      its content.
      
      [ mingo: Clarified the changelog a bit. ]
      
      Fixes: 6be7fa3c
      
       ("ftrace, orc, x86: Handle ftrace dynamically allocated trampolines")
      Signed-off-by: default avatarChen Zhongjin <chenzhongjin@huawei.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20220819084334.244016-1-chenzhongjin@huawei.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e3e0d117
    • Kan Liang's avatar
      perf/x86/lbr: Enable the branch type for the Arch LBR by default · 090f0ac1
      Kan Liang authored
      commit 32ba156d upstream.
      
      On the platform with Arch LBR, the HW raw branch type encoding may leak
      to the perf tool when the SAVE_TYPE option is not set.
      
      In the intel_pmu_store_lbr(), the HW raw branch type is stored in
      lbr_entries[].type. If the SAVE_TYPE option is set, the
      lbr_entries[].type will be converted into the generic PERF_BR_* type
      in the intel_pmu_lbr_filter() and exposed to the user tools.
      But if the SAVE_TYPE option is NOT set by the user, the current perf
      kernel doesn't clear the field. The HW raw branch type leaks.
      
      There are two solutions to fix the issue for the Arch LBR.
      One is to clear the field if the SAVE_TYPE option is NOT set.
      The other solution is to unconditionally convert the branch type and
      expose the generic type to the user tools.
      
      The latter is implemented here, because
      - The branch type is valuable information. I don't see a case where
        you would not benefit from the branch type. (Stephane Eranian)
      - Not having the branch type DOES NOT save any space in the
        branch record (Stephane Eranian)
      - The Arch LBR HW can retrieve the common branch types from the
        LBR_INFO. It doesn't require the high overhead SW disassemble.
      
      Fixes: 47125db2
      
       ("perf/x86/intel/lbr: Support Architectural LBR")
      Reported-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20220816125612.2042397-1-kan.liang@linux.intel.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      090f0ac1
    • Goldwyn Rodrigues's avatar
      btrfs: check if root is readonly while setting security xattr · d2bd18d5
      Goldwyn Rodrigues authored
      commit b5111127
      
       upstream.
      
      For a filesystem which has btrfs read-only property set to true, all
      write operations including xattr should be denied. However, security
      xattr can still be changed even if btrfs ro property is true.
      
      This happens because xattr_permission() does not have any restrictions
      on security.*, system.*  and in some cases trusted.* from VFS and
      the decision is left to the underlying filesystem. See comments in
      xattr_permission() for more details.
      
      This patch checks if the root is read-only before performing the set
      xattr operation.
      
      Testcase:
      
        DEV=/dev/vdb
        MNT=/mnt
      
        mkfs.btrfs -f $DEV
        mount $DEV $MNT
        echo "file one" > $MNT/f1
      
        setfattr -n "security.one" -v 2 $MNT/f1
        btrfs property set /mnt ro true
      
        setfattr -n "security.one" -v 1 $MNT/f1
      
        umount $MNT
      
      CC: stable@vger.kernel.org # 4.9+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarGoldwyn Rodrigues <rgoldwyn@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2bd18d5
    • Anand Jain's avatar
      btrfs: add info when mount fails due to stale replace target · dcac6293
      Anand Jain authored
      commit f2c3bec2
      
       upstream.
      
      If the replace target device reappears after the suspended replace is
      cancelled, it blocks the mount operation as it can't find the matching
      replace-item in the metadata. As shown below,
      
         BTRFS error (device sda5): replace devid present without an active replace item
      
      To overcome this situation, the user can run the command
      
         btrfs device scan --forget <replace target device>
      
      and try the mount command again. And also, to avoid repeating the issue,
      superblock on the devid=0 must be wiped.
      
         wipefs -a device-path-to-devid=0.
      
      This patch adds some info when this situation occurs.
      
      Reported-by: default avatarSamuel Greiner <samuel@balkonien.org>
      Link: https://lore.kernel.org/linux-btrfs/b4f62b10-b295-26ea-71f9-9a5c9299d42c@balkonien.org/T/
      CC: stable@vger.kernel.org # 5.0+
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dcac6293
    • Anand Jain's avatar
      btrfs: replace: drop assert for suspended replace · b2d352ed
      Anand Jain authored
      commit 59a39919 upstream.
      
      If the filesystem mounts with the replace-operation in a suspended state
      and try to cancel the suspended replace-operation, we hit the assert. The
      assert came from the commit fe97e2e1 ("btrfs: dev-replace: replace's
      scrub must not be running in suspended state") that was actually not
      required. So just remove it.
      
       $ mount /dev/sda5 /btrfs
      
          BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing
          BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded'
      
       $ mount -o degraded /dev/sda5 /btrfs <-- success.
      
       $ btrfs replace cancel /btrfs
      
          kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131
          kernel: ------------[ cut here ]------------
          kernel: kernel BUG at fs/btrfs/ctree.h:3750!
      
      After the patch:
      
       $ btrfs replace cancel /btrfs
      
          BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled
      
      Fixes: fe97e2e1
      
       ("btrfs: dev-replace: replace's scrub must not be running in suspended state")
      CC: stable@vger.kernel.org # 5.0+
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2d352ed
    • Filipe Manana's avatar
      btrfs: fix silent failure when deleting root reference · 2fc3c168
      Filipe Manana authored
      commit 47bf225a upstream.
      
      At btrfs_del_root_ref(), if btrfs_search_slot() returns an error, we end
      up returning from the function with a value of 0 (success). This happens
      because the function returns the value stored in the variable 'err',
      which is 0, while the error value we got from btrfs_search_slot() is
      stored in the 'ret' variable.
      
      So fix it by setting 'err' with the error value.
      
      Fixes: 8289ed9f
      
       ("btrfs: replace the BUG_ON in btrfs_del_root_ref with proper error handling")
      CC: stable@vger.kernel.org # 5.16+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2fc3c168
    • Shannon Nelson's avatar
      ionic: fix up issues with handling EAGAIN on FW cmds · 3a351b56
      Shannon Nelson authored
      [ Upstream commit 0fc4dd45 ]
      
      In looping on FW update tests we occasionally see the
      FW_ACTIVATE_STATUS command fail while it is in its EAGAIN loop
      waiting for the FW activate step to finsh inside the FW.  The
      firmware is complaining that the done bit is set when a new
      dev_cmd is going to be processed.
      
      Doing a clean on the cmd registers and doorbell before exiting
      the wait-for-done and cleaning the done bit before the sleep
      prevents this from occurring.
      
      Fixes: fbfb8031
      
       ("ionic: Add hardware init and device commands")
      Signed-off-by: default avatarShannon Nelson <snelson@pensando.io>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3a351b56
    • David Howells's avatar
      rxrpc: Fix locking in rxrpc's sendmsg · 79e2ca7a
      David Howells authored
      [ Upstream commit b0f571ec ]
      
      Fix three bugs in the rxrpc's sendmsg implementation:
      
       (1) rxrpc_new_client_call() should release the socket lock when returning
           an error from rxrpc_get_call_slot().
      
       (2) rxrpc_wait_for_tx_window_intr() will return without the call mutex
           held in the event that we're interrupted by a signal whilst waiting
           for tx space on the socket or relocking the call mutex afterwards.
      
           Fix this by: (a) moving the unlock/lock of the call mutex up to
           rxrpc_send_data() such that the lock is not held around all of
           rxrpc_wait_for_tx_window*() and (b) indicating to higher callers
           whether we're return with the lock dropped.  Note that this means
           recvmsg() will not block on this call whilst we're waiting.
      
       (3) After dropping and regaining the call mutex, rxrpc_send_data() needs
           to go and recheck the state of the tx_pending buffer and the
           tx_total_len check in case we raced with another sendmsg() on the same
           call.
      
      Thinking on this some more, it might make sense to have different locks for
      sendmsg() and recvmsg().  There's probably no need to make recvmsg() wait
      for sendmsg().  It does mean that recvmsg() can return MSG_EOR indicating
      that a call is dead before a sendmsg() to that call returns - but that can
      currently happen anyway.
      
      Without fix (2), something like the following can be induced:
      
      	WARNING: bad unlock balance detected!
      	5.16.0-rc6-syzkaller #0 Not tainted
      	-------------------------------------
      	syz-executor011/3597 is trying to release lock (&call->user_mutex) at:
      	[<ffffffff885163a3>] rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
      	but there are no more locks to release!
      
      	other info that might help us debug this:
      	no locks held by syz-executor011/3597.
      	...
      	Call Trace:
      	 <TASK>
      	 __dump_stack lib/dump_stack.c:88 [inline]
      	 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
      	 print_unlock_imbalance_bug include/trace/events/lock.h:58 [inline]
      	 __lock_release kernel/locking/lockdep.c:5306 [inline]
      	 lock_release.cold+0x49/0x4e kernel/locking/lockdep.c:5657
      	 __mutex_unlock_slowpath+0x99/0x5e0 kernel/locking/mutex.c:900
      	 rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
      	 rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:561
      	 sock_sendmsg_nosec net/socket.c:704 [inline]
      	 sock_sendmsg+0xcf/0x120 net/socket.c:724
      	 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
      	 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
      	 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
      	 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      	 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      	 entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [Thanks to Hawkins Jiawei and Khalid Masum for their attempts to fix this]
      
      Fixes: bc5e3a54
      
       ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
      Reported-by: default avatar <syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Tested-by: default avatar <syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com>
      cc: Hawkins Jiawei <yin31149@gmail.com>
      cc: Khalid Masum <khalid.masum.92@gmail.com>
      cc: Dan Carpenter <dan.carpenter@oracle.com>
      cc: linux-afs@lists.infradead.org
      Link: https://lore.kernel.org/r/166135894583.600315.7170979436768124075.stgit@warthog.procyon.org.uk
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      79e2ca7a