Skip to content
  1. Aug 04, 2021
    • Tejun Heo's avatar
      blk-iocost: fix operation ordering in iocg_wake_fn() · caed0df2
      Tejun Heo authored
      commit 5ab189cf
      
       upstream.
      
      iocg_wake_fn() open-codes wait_queue_entry removal and wakeup because it
      wants the wq_entry to be always removed whether it ended up waking the
      task or not. finish_wait() tests whether wq_entry needs removal without
      grabbing the wait_queue lock and expects the waker to use
      list_del_init_careful() after all waking operations are complete, which
      iocg_wake_fn() didn't do. The operation order was wrong and the regular
      list_del_init() was used.
      
      The result is that if a waiter wakes up racing the waker, it can free pop
      the wq_entry off stack before the waker is still looking at it, which can
      lead to a backtrace like the following.
      
        [7312084.588951] general protection fault, probably for non-canonical address 0x586bf4005b2b88: 0000 [#1] SMP
        ...
        [7312084.647079] RIP: 0010:queued_spin_lock_slowpath+0x171/0x1b0
        ...
        [7312084.858314] Call Trace:
        [7312084.863548]  _raw_spin_lock_irqsave+0x22/0x30
        [7312084.872605]  try_to_wake_up+0x4c/0x4f0
        [7312084.880444]  iocg_wake_fn+0x71/0x80
        [7312084.887763]  __wake_up_common+0x71/0x140
        [7312084.895951]  iocg_kick_waitq+0xe8/0x2b0
        [7312084.903964]  ioc_rqos_throttle+0x275/0x650
        [7312084.922423]  __rq_qos_throttle+0x20/0x30
        [7312084.930608]  blk_mq_make_request+0x120/0x650
        [7312084.939490]  generic_make_request+0xca/0x310
        [7312084.957600]  submit_bio+0x173/0x200
        [7312084.981806]  swap_readpage+0x15c/0x240
        [7312084.989646]  read_swap_cache_async+0x58/0x60
        [7312084.998527]  swap_cluster_readahead+0x201/0x320
        [7312085.023432]  swapin_readahead+0x2df/0x450
        [7312085.040672]  do_swap_page+0x52f/0x820
        [7312085.058259]  handle_mm_fault+0xa16/0x1420
        [7312085.066620]  do_page_fault+0x2c6/0x5c0
        [7312085.074459]  page_fault+0x2f/0x40
      
      Fix it by switching to list_del_init_careful() and putting it at the end.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarRik van Riel <riel@surriel.com>
      Fixes: 7caa4715
      
       ("blkcg: implement blk-iocost")
      Cc: stable@vger.kernel.org # v5.4+
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      caed0df2
    • Jiri Kosina's avatar
      drm/amdgpu: Fix resource leak on probe error path · 749abc8d
      Jiri Kosina authored
      commit d47255d3 upstream.
      
      This reverts commit 4192f7b5.
      
      It is not true (as stated in the reverted commit changelog) that we never
      unmap the BAR on failure; it actually does happen properly on
      amdgpu_driver_load_kms() -> amdgpu_driver_unload_kms() ->
      amdgpu_device_fini() error path.
      
      What's worse, this commit actually completely breaks resource freeing on
      probe failure (like e.g. failure to load microcode), as
      amdgpu_driver_unload_kms() notices adev->rmmio being NULL and bails too
      early, leaving all the resources that'd normally be freed in
      amdgpu_acpi_fini() and amdgpu_device_fini() still hanging around, leading
      to all sorts of oopses when someone tries to, for example, access the
      sysfs and procfs resources which are still around while the driver is
      gone.
      
      Fixes: 4192f7b5
      
       ("drm/amdgpu: unmap register bar on device init failure")
      Reported-by: default avatarVojtech Pavlik <vojtech@ucw.cz>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      749abc8d
    • Jiri Kosina's avatar
      drm/amdgpu: Avoid printing of stack contents on firmware load error · 070f46bc
      Jiri Kosina authored
      commit 6aade587
      
       upstream.
      
      In case when psp_init_asd_microcode() fails to load ASD microcode file,
      psp_v12_0_init_microcode() tries to print the firmware filename that
      failed to load before bailing out.
      
      This is wrong because:
      
      - the firmware filename it would want it print is an incorrect one as
        psp_init_asd_microcode() and psp_v12_0_init_microcode() are loading
        different filenames
      - it tries to print fw_name, but that's not yet been initialized by that
        time, so it prints random stack contents, e.g.
      
          amdgpu 0000:04:00.0: Direct firmware load for amdgpu/renoir_asd.bin failed with error -2
          amdgpu 0000:04:00.0: amdgpu: fail to initialize asd microcode
          amdgpu 0000:04:00.0: amdgpu: psp v12.0: Failed to load firmware "\xfeTO\x8e\xff\xff"
      
      Fix that by bailing out immediately, instead of priting the bogus error
      message.
      
      Reported-by: default avatarVojtech Pavlik <vojtech@ucw.cz>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      070f46bc
    • Pratik Vishwakarma's avatar
      drm/amdgpu: Check pmops for desired suspend state · 4e7961b3
      Pratik Vishwakarma authored
      commit 91e27371
      
       upstream.
      
      [Why]
      User might change the suspend behaviour from OS.
      
      [How]
      Check with pm for target suspend state and set s0ix
      flag only for s2idle state.
      
      v2: User might change default suspend state, use target state
      v3: squash in build fix
      
      Suggested-by: default avatarLijo Lazar <Lijo.Lazar@amd.com>
      Signed-off-by: default avatarPratik Vishwakarma <Pratik.Vishwakarma@amd.com>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4e7961b3
    • Dale Zhao's avatar
      drm/amd/display: ensure dentist display clock update finished in DCN20 · 0652b1ea
      Dale Zhao authored
      commit b53e041d
      
       upstream.
      
      [Why]
      We don't check DENTIST_DISPCLK_CHG_DONE to ensure dentist
      display clockis updated to target value. In some scenarios with large
      display clock margin, it will deliver unfinished display clock and cause
      issues like display black screen.
      
      [How]
      Checking DENTIST_DISPCLK_CHG_DONE to ensure display clock
      has been update to target value before driver do other clock related
      actions.
      
      Reviewed-by: default avatarCyr Aric <aric.cyr@amd.com>
      Acked-by: default avatarSolomon Chiu <solomon.chiu@amd.com>
      Signed-off-by: default avatarDale Zhao <dale.zhao@amd.com>
      Tested-by: default avatarDaniel Wheeler <daniel.wheeler@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0652b1ea
    • Paul Jakma's avatar
      NIU: fix incorrect error return, missed in previous revert · 9c2cae70
      Paul Jakma authored
      commit 15bbf8bb upstream.
      
      Commit 7930742d, reverting 26fd962b, missed out on reverting an incorrect
      change to a return value.  The niu_pci_vpd_scan_props(..) == 1 case appears
      to be a normal path - treating it as an error and return -EINVAL was
      breaking VPD_SCAN and causing the driver to fail to load.
      
      Fix, so my Neptune card works again.
      
      Cc: Kangjie Lu <kjlu@umn.edu>
      Cc: Shannon Nelson <shannon.lee.nelson@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: stable <stable@vger.kernel.org>
      Fixes: 7930742d
      
       ('Revert "niu: fix missing checks of niu_pci_eeprom_read"')
      Signed-off-by: default avatarPaul Jakma <paul@jakma.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9c2cae70
    • Mohammad Athari Bin Ismail's avatar
      net: stmmac: add est_irq_status callback function for GMAC 4.10 and 5.10 · 633799dd
      Mohammad Athari Bin Ismail authored
      commit 94cbe7db upstream.
      
      Assign dwmac5_est_irq_status to est_irq_status callback function for
      GMAC 4.10 and 5.10. With this, EST related interrupts could be handled
      properly.
      
      Fixes: e49aa315
      
       ("net: stmmac: EST interrupts handling and error reporting")
      Cc: <stable@vger.kernel.org> # 5.13.x
      Signed-off-by: default avatarMohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
      Acked-by: default avatarWong Vee Khee <vee.khee.wong@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      633799dd
    • Jason Gerecke's avatar
      HID: wacom: Re-enable touch by default for Cintiq 24HDT / 27QHDT · fa1c5eff
      Jason Gerecke authored
      commit 6ca2350e upstream.
      
      Commit 670e9092 ("HID: wacom: support named keys on older devices")
      added support for sending named events from the soft buttons on the
      24HDT and 27QHDT. In the process, however, it inadvertantly disabled the
      touchscreen of the 24HDT and 27QHDT by default. The
      `wacom_set_shared_values` function would normally enable touch by default
      but because it checks the state of the non-shared `has_mute_touch_switch`
      flag and `wacom_setup_touch_input_capabilities` sets the state of the
      /shared/ version, touch ends up being disabled by default.
      
      This patch sets the non-shared flag, letting `wacom_set_shared_values`
      take care of copying the value over to the shared version and setting
      the default touch state to "on".
      
      Fixes: 670e9092
      
       ("HID: wacom: support named keys on older devices")
      CC: stable@vger.kernel.org # 5.4+
      Signed-off-by: default avatarJason Gerecke <jason.gerecke@wacom.com>
      Reviewed-by: default avatarPing Cheng <ping.cheng@wacom.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa1c5eff
    • Mike Rapoport's avatar
      alpha: register early reserved memory in memblock · 892ced35
      Mike Rapoport authored
      commit 640b7ea5 upstream.
      
      The memory reserved by console/PALcode or non-volatile memory is not added
      to memblock.memory.
      
      Since commit fa3354e4 (mm: free_area_init: use maximal zone PFNs rather
      than zone sizes) the initialization of the memory map relies on the
      accuracy of memblock.memory to properly calculate zone sizes. The holes in
      memblock.memory caused by absent regions reserved by the firmware cause
      incorrect initialization of struct pages which leads to BUG() during the
      initial page freeing:
      
      BUG: Bad page state in process swapper  pfn:2ffc53
      page:fffffc000ecf14c0 refcount:0 mapcount:1 mapping:0000000000000000 index:0x0
      flags: 0x0()
      raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      page dumped because: nonzero mapcount
      Modules linked in:
      CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-03841-gfa3354e4ea39-dirty #26
             fffffc0001b5bd68 fffffc0001b5be80 fffffc00011cd148 fffffc000ecf14c0
             fffffc00019803df fffffc0001b5be80 fffffc00011ce340 fffffc000ecf14c0
             0000000000000000 fffffc0001b5be80 fffffc0001b482c0 fffffc00027d6618
             fffffc00027da7d0 00000000002ff97a 0000000000000000 fffffc0001b5be80
             fffffc00011d1abc fffffc000ecf14c0 fffffc0002d00000 fffffc0001b5be80
             fffffc0001b2350c 0000000000300000 fffffc0001b48298 fffffc0001b482c0
      Trace:
      [<fffffc00011cd148>] bad_page+0x168/0x1b0
      [<fffffc00011ce340>] free_pcp_prepare+0x1e0/0x290
      [<fffffc00011d1abc>] free_unref_page+0x2c/0xa0
      [<fffffc00014ee5f0>] cmp_ex_sort+0x0/0x30
      [<fffffc00014ee5f0>] cmp_ex_sort+0x0/0x30
      [<fffffc000101001c>] _stext+0x1c/0x20
      
      Fix this by registering the reserved ranges in memblock.memory.
      
      Link: https://lore.kernel.org/lkml/20210726192311.uffqnanxw3ac5wwi@ivybridge
      Fixes: fa3354e4
      
       ("mm: free_area_init: use maximal zone PFNs rather than zone sizes")
      Reported-by: default avatarMatt Turner <mattst88@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarMatt Turner <mattst88@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      892ced35
    • Pavel Skripkin's avatar
      can: esd_usb2: fix memory leak · a63d311c
      Pavel Skripkin authored
      commit 928150fa upstream.
      
      In esd_usb2_setup_rx_urbs() MAX_RX_URBS coherent buffers are allocated
      and there is nothing, that frees them:
      
      1) In callback function the urb is resubmitted and that's all
      2) In disconnect function urbs are simply killed, but URB_FREE_BUFFER
         is not set (see esd_usb2_setup_rx_urbs) and this flag cannot be used
         with coherent buffers.
      
      So, all allocated buffers should be freed with usb_free_coherent()
      explicitly.
      
      Side note: This code looks like a copy-paste of other can drivers. The
      same patch was applied to mcba_usb driver and it works nice with real
      hardware. There is no change in functionality, only clean-up code for
      coherent buffers.
      
      Fixes: 96d8e903 ("can: Add driver for esd CAN-USB/2 device")
      Link: https://lore.kernel.org/r/b31b096926dcb35998ad0271aac4b51770ca7cc8.1627404470.git.paskripkin@gmail.com
      
      
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a63d311c
    • Pavel Skripkin's avatar
      can: ems_usb: fix memory leak · d23e7c01
      Pavel Skripkin authored
      commit 9969e3c5 upstream.
      
      In ems_usb_start() MAX_RX_URBS coherent buffers are allocated and
      there is nothing, that frees them:
      
      1) In callback function the urb is resubmitted and that's all
      2) In disconnect function urbs are simply killed, but URB_FREE_BUFFER
         is not set (see ems_usb_start) and this flag cannot be used with
         coherent buffers.
      
      So, all allocated buffers should be freed with usb_free_coherent()
      explicitly.
      
      Side note: This code looks like a copy-paste of other can drivers. The
      same patch was applied to mcba_usb driver and it works nice with real
      hardware. There is no change in functionality, only clean-up code for
      coherent buffers.
      
      Fixes: 702171ad ("ems_usb: Added support for EMS CPC-USB/ARM7 CAN/USB interface")
      Link: https://lore.kernel.org/r/59aa9fbc9a8cbf9af2bbd2f61a659c480b415800.1627404470.git.paskripkin@gmail.com
      
      
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d23e7c01
    • Pavel Skripkin's avatar
      can: usb_8dev: fix memory leak · 62365842
      Pavel Skripkin authored
      commit 0e865f0c upstream.
      
      In usb_8dev_start() MAX_RX_URBS coherent buffers are allocated and
      there is nothing, that frees them:
      
      1) In callback function the urb is resubmitted and that's all
      2) In disconnect function urbs are simply killed, but URB_FREE_BUFFER
         is not set (see usb_8dev_start) and this flag cannot be used with
         coherent buffers.
      
      So, all allocated buffers should be freed with usb_free_coherent()
      explicitly.
      
      Side note: This code looks like a copy-paste of other can drivers. The
      same patch was applied to mcba_usb driver and it works nice with real
      hardware. There is no change in functionality, only clean-up code for
      coherent buffers.
      
      Fixes: 0024d8ad ("can: usb_8dev: Add support for USB2CAN interface from 8 devices")
      Link: https://lore.kernel.org/r/d39b458cd425a1cf7f512f340224e6e9563b07bd.1627404470.git.paskripkin@gmail.com
      
      
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62365842
    • Pavel Skripkin's avatar
      can: mcba_usb_start(): add missing urb->transfer_dma initialization · 78673a83
      Pavel Skripkin authored
      commit fc43fb69 upstream.
      
      Yasushi reported, that his Microchip CAN Analyzer stopped working
      since commit 91c02557 ("can: mcba_usb: fix memory leak in
      mcba_usb"). The problem was in missing urb->transfer_dma
      initialization.
      
      In my previous patch to this driver I refactored mcba_usb_start() code
      to avoid leaking usb coherent buffers. To archive it, I passed local
      stack variable to usb_alloc_coherent() and then saved it to private
      array to correctly free all coherent buffers on ->close() call. But I
      forgot to initialize urb->transfer_dma with variable passed to
      usb_alloc_coherent().
      
      All of this was causing device to not work, since dma addr 0 is not
      valid and following log can be found on bug report page, which points
      exactly to problem described above.
      
      | DMAR: [DMA Write] Request device [00:14.0] PASID ffffffff fault addr 0 [fault reason 05] PTE Write access is not set
      
      Fixes: 91c02557 ("can: mcba_usb: fix memory leak in mcba_usb")
      Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=990850
      Link: https://lore.kernel.org/r/20210725103630.23864-1-paskripkin@gmail.com
      
      
      Cc: linux-stable <stable@vger.kernel.org>
      Reported-by: default avatarYasushi SHOJI <yasushi.shoji@gmail.com>
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Tested-by: default avatarYasushi SHOJI <yashi@spacecubics.com>
      [mkl: fixed typos in commit message - thanks Yasushi SHOJI]
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78673a83
    • Stephane Grosjean's avatar
      can: peak_usb: pcan_usb_handle_bus_evt(): fix reading rxerr/txerr values · 87d268fe
      Stephane Grosjean authored
      commit 590eb2b7 upstream.
      
      This patch fixes an incorrect way of reading error counters in messages
      received for this purpose from the PCAN-USB interface. These messages
      inform about the increase or decrease of the error counters, whose values
      are placed in bytes 1 and 2 of the message data (not 0 and 1).
      
      Fixes: ea8b33bd ("can: pcan_usb: add support of rxerr/txerr counters")
      Link: https://lore.kernel.org/r/20210625130931.27438-4-s.grosjean@peak-system.com
      
      
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: default avatarStephane Grosjean <s.grosjean@peak-system.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87d268fe
    • Ziyang Xuan's avatar
      can: raw: raw_setsockopt(): fix raw_rcv panic for sock UAF · aec236c7
      Ziyang Xuan authored
      commit 54f93336 upstream.
      
      We get a bug during ltp can_filter test as following.
      
      ===========================================
      [60919.264984] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      [60919.265223] PGD 8000003dda726067 P4D 8000003dda726067 PUD 3dda727067 PMD 0
      [60919.265443] Oops: 0000 [#1] SMP PTI
      [60919.265550] CPU: 30 PID: 3638365 Comm: can_filter Kdump: loaded Tainted: G        W         4.19.90+ #1
      [60919.266068] RIP: 0010:selinux_socket_sock_rcv_skb+0x3e/0x200
      [60919.293289] RSP: 0018:ffff8d53bfc03cf8 EFLAGS: 00010246
      [60919.307140] RAX: 0000000000000000 RBX: 000000000000001d RCX: 0000000000000007
      [60919.320756] RDX: 0000000000000001 RSI: ffff8d5104a8ed00 RDI: ffff8d53bfc03d30
      [60919.334319] RBP: ffff8d9338056800 R08: ffff8d53bfc29d80 R09: 0000000000000001
      [60919.347969] R10: ffff8d53bfc03ec0 R11: ffffb8526ef47c98 R12: ffff8d53bfc03d30
      [60919.350320] perf: interrupt took too long (3063 > 2500), lowering kernel.perf_event_max_sample_rate to 65000
      [60919.361148] R13: 0000000000000001 R14: ffff8d53bcf90000 R15: 0000000000000000
      [60919.361151] FS:  00007fb78b6b3600(0000) GS:ffff8d53bfc00000(0000) knlGS:0000000000000000
      [60919.400812] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [60919.413730] CR2: 0000000000000010 CR3: 0000003e3f784006 CR4: 00000000007606e0
      [60919.426479] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [60919.439339] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [60919.451608] PKRU: 55555554
      [60919.463622] Call Trace:
      [60919.475617]  <IRQ>
      [60919.487122]  ? update_load_avg+0x89/0x5d0
      [60919.498478]  ? update_load_avg+0x89/0x5d0
      [60919.509822]  ? account_entity_enqueue+0xc5/0xf0
      [60919.520709]  security_sock_rcv_skb+0x2a/0x40
      [60919.531413]  sk_filter_trim_cap+0x47/0x1b0
      [60919.542178]  ? kmem_cache_alloc+0x38/0x1b0
      [60919.552444]  sock_queue_rcv_skb+0x17/0x30
      [60919.562477]  raw_rcv+0x110/0x190 [can_raw]
      [60919.572539]  can_rcv_filter+0xbc/0x1b0 [can]
      [60919.582173]  can_receive+0x6b/0xb0 [can]
      [60919.591595]  can_rcv+0x31/0x70 [can]
      [60919.600783]  __netif_receive_skb_one_core+0x5a/0x80
      [60919.609864]  process_backlog+0x9b/0x150
      [60919.618691]  net_rx_action+0x156/0x400
      [60919.627310]  ? sched_clock_cpu+0xc/0xa0
      [60919.635714]  __do_softirq+0xe8/0x2e9
      [60919.644161]  do_softirq_own_stack+0x2a/0x40
      [60919.652154]  </IRQ>
      [60919.659899]  do_softirq.part.17+0x4f/0x60
      [60919.667475]  __local_bh_enable_ip+0x60/0x70
      [60919.675089]  __dev_queue_xmit+0x539/0x920
      [60919.682267]  ? finish_wait+0x80/0x80
      [60919.689218]  ? finish_wait+0x80/0x80
      [60919.695886]  ? sock_alloc_send_pskb+0x211/0x230
      [60919.702395]  ? can_send+0xe5/0x1f0 [can]
      [60919.708882]  can_send+0xe5/0x1f0 [can]
      [60919.715037]  raw_sendmsg+0x16d/0x268 [can_raw]
      
      It's because raw_setsockopt() concurrently with
      unregister_netdevice_many(). Concurrent scenario as following.
      
      	cpu0						cpu1
      raw_bind
      raw_setsockopt					unregister_netdevice_many
      						unlist_netdevice
      dev_get_by_index				raw_notifier
      raw_enable_filters				......
      can_rx_register
      can_rcv_list_find(..., net->can.rx_alldev_list)
      
      ......
      
      sock_close
      raw_release(sock_a)
      
      ......
      
      can_receive
      can_rcv_filter(net->can.rx_alldev_list, ...)
      raw_rcv(skb, sock_a)
      BUG
      
      After unlist_netdevice(), dev_get_by_index() return NULL in
      raw_setsockopt(). Function raw_enable_filters() will add sock
      and can_filter to net->can.rx_alldev_list. Then the sock is closed.
      Followed by, we sock_sendmsg() to a new vcan device use the same
      can_filter. Protocol stack match the old receiver whose sock has
      been released on net->can.rx_alldev_list in can_rcv_filter().
      Function raw_rcv() uses the freed sock. UAF BUG is triggered.
      
      We can find that the key issue is that net_device has not been
      protected in raw_setsockopt(). Use rtnl_lock to protect net_device
      in raw_setsockopt().
      
      Fixes: c18ce101 ("[CAN]: Add raw protocol")
      Link: https://lore.kernel.org/r/20210722070819.1048263-1-william.xuanziyang@huawei.com
      
      
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Acked-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aec236c7
    • Zhang Changzhong's avatar
      can: j1939: j1939_xtp_rx_dat_one(): fix rxtimer value between consecutive TP.DT to 750ms · ea9e6fc2
      Zhang Changzhong authored
      commit c6eea1c8 upstream.
      
      For receive side, the max time interval between two consecutive TP.DT
      should be 750ms.
      
      Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
      Link: https://lore.kernel.org/r/1625569210-47506-1-git-send-email-zhangchangzhong@huawei.com
      
      
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: default avatarZhang Changzhong <zhangchangzhong@huawei.com>
      Acked-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ea9e6fc2
    • Wang Hai's avatar
      mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook() · 9293727a
      Wang Hai authored
      commit 121dffe2 upstream.
      
      When I use kfree_rcu() to free a large memory allocated by kmalloc_node(),
      the following dump occurs.
      
        BUG: kernel NULL pointer dereference, address: 0000000000000020
        [...]
        Oops: 0000 [#1] SMP
        [...]
        Workqueue: events kfree_rcu_work
        RIP: 0010:__obj_to_index include/linux/slub_def.h:182 [inline]
        RIP: 0010:obj_to_index include/linux/slub_def.h:191 [inline]
        RIP: 0010:memcg_slab_free_hook+0x120/0x260 mm/slab.h:363
        [...]
        Call Trace:
          kmem_cache_free_bulk+0x58/0x630 mm/slub.c:3293
          kfree_bulk include/linux/slab.h:413 [inline]
          kfree_rcu_work+0x1ab/0x200 kernel/rcu/tree.c:3300
          process_one_work+0x207/0x530 kernel/workqueue.c:2276
          worker_thread+0x320/0x610 kernel/workqueue.c:2422
          kthread+0x13d/0x160 kernel/kthread.c:313
          ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
      
      When kmalloc_node() a large memory, page is allocated, not slab, so when
      freeing memory via kfree_rcu(), this large memory should not be used by
      memcg_slab_free_hook(), because memcg_slab_free_hook() is is used for
      slab.
      
      Using page_objcgs_check() instead of page_objcgs() in
      memcg_slab_free_hook() to fix this bug.
      
      Link: https://lkml.kernel.org/r/20210728145655.274476-1-wanghai38@huawei.com
      Fixes: 270c6a71
      
       ("mm: memcontrol/slab: Use helpers to access slab page's memcg_data")
      Signed-off-by: default avatarWang Hai <wanghai38@huawei.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9293727a
    • Johannes Weiner's avatar
      mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code · 87370a9d
      Johannes Weiner authored
      commit 30def935 upstream.
      
      Dan Carpenter reports:
      
          The patch 2d146aa3: "mm: memcontrol: switch to rstat" from Apr
          29, 2021, leads to the following static checker warning:
      
      	    kernel/cgroup/rstat.c:200 cgroup_rstat_flush()
      	    warn: sleeping in atomic context
      
          mm/memcontrol.c
            3572  static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
            3573  {
            3574          unsigned long val;
            3575
            3576          if (mem_cgroup_is_root(memcg)) {
            3577                  cgroup_rstat_flush(memcg->css.cgroup);
      			    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      
          This is from static analysis and potentially a false positive.  The
          problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold()
          which holds an rcu_read_lock().  And the cgroup_rstat_flush() function
          can sleep.
      
            3578                  val = memcg_page_state(memcg, NR_FILE_PAGES) +
            3579                          memcg_page_state(memcg, NR_ANON_MAPPED);
            3580                  if (swap)
            3581                          val += memcg_page_state(memcg, MEMCG_SWAP);
            3582          } else {
            3583                  if (!swap)
            3584                          val = page_counter_read(&memcg->memory);
            3585                  else
            3586                          val = page_counter_read(&memcg->memsw);
            3587          }
            3588          return val;
            3589  }
      
      __mem_cgroup_threshold() indeed holds the rcu lock.  In addition, the
      thresholding code is invoked during stat changes, and those contexts
      have irqs disabled as well.  If the lock breaking occurs inside the
      flush function, it will result in a sleep from an atomic context.
      
      Use the irqsafe flushing variant in mem_cgroup_usage() to fix this.
      
      Link: https://lkml.kernel.org/r/20210726150019.251820-1-hannes@cmpxchg.org
      Fixes: 2d146aa3
      
       ("mm: memcontrol: switch to rstat")
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarChris Down <chris@chrisdown.name>
      Reviewed-by: default avatarRik van Riel <riel@surriel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87370a9d
    • Junxiao Bi's avatar
      ocfs2: issue zeroout to EOF blocks · 3df2bd99
      Junxiao Bi authored
      commit 9449ad33 upstream.
      
      For punch holes in EOF blocks, fallocate used buffer write to zero the
      EOF blocks in last cluster.  But since ->writepage will ignore EOF
      pages, those zeros will not be flushed.
      
      This "looks" ok as commit 6bba4471 ("ocfs2: fix data corruption by
      fallocate") will zero the EOF blocks when extend the file size, but it
      isn't.  The problem happened on those EOF pages, before writeback, those
      pages had DIRTY flag set and all buffer_head in them also had DIRTY flag
      set, when writeback run by write_cache_pages(), DIRTY flag on the page
      was cleared, but DIRTY flag on the buffer_head not.
      
      When next write happened to those EOF pages, since buffer_head already
      had DIRTY flag set, it would not mark page DIRTY again.  That made
      writeback ignore them forever.  That will cause data corruption.  Even
      directio write can't work because it will fail when trying to drop pages
      caches before direct io, as it found the buffer_head for those pages
      still had DIRTY flag set, then it will fall back to buffer io mode.
      
      To make a summary of the issue, as writeback ingores EOF pages, once any
      EOF page is generated, any write to it will only go to the page cache,
      it will never be flushed to disk even file size extends and that page is
      not EOF page any more.  The fix is to avoid zero EOF blocks with buffer
      write.
      
      The following code snippet from qemu-img could trigger the corruption.
      
        656   open("6b3711ae-3306-4bdd-823c-cf1c0060a095.conv.2", O_RDWR|O_DIRECT|O_CLOEXEC) = 11
        ...
        660   fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2275868672, 327680 <unfinished ...>
        660   fallocate(11, 0, 2275868672, 327680) = 0
        658   pwrite64(11, "
      
      Link: https://lkml.kernel.org/r/20210722054923.24389-2-junxiao.bi@oracle.com
      
      
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3df2bd99
    • Junxiao Bi's avatar
      ocfs2: fix zero out valid data · c9302ab3
      Junxiao Bi authored
      commit f267aeb6 upstream.
      
      If append-dio feature is enabled, direct-io write and fallocate could
      run in parallel to extend file size, fallocate used "orig_isize" to
      record i_size before taking "ip_alloc_sem", when
      ocfs2_zeroout_partial_cluster() zeroout EOF blocks, i_size maybe already
      extended by ocfs2_dio_end_io_write(), that will cause valid data zeroed
      out.
      
      Link: https://lkml.kernel.org/r/20210722054923.24389-1-junxiao.bi@oracle.com
      Fixes: 6bba4471
      
       ("ocfs2: fix data corruption by fallocate")
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9302ab3
    • Paolo Bonzini's avatar
      KVM: add missing compat KVM_CLEAR_DIRTY_LOG · a9f2d088
      Paolo Bonzini authored
      commit 8750f9bb upstream.
      
      The arguments to the KVM_CLEAR_DIRTY_LOG ioctl include a pointer,
      therefore it needs a compat ioctl implementation.  Otherwise,
      32-bit userspace fails to invoke it on 64-bit kernels; for x86
      it might work fine by chance if the padding is zero, but not
      on big-endian architectures.
      
      Reported-by: Thomas Sattler
      Cc: stable@vger.kernel.org
      Fixes: 2a31b9db
      
       ("kvm: introduce manual dirty log reprotect")
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a9f2d088
    • Juergen Gross's avatar
      x86/kvm: fix vcpu-id indexed array sizes · a80e3243
      Juergen Gross authored
      commit 76b4f357
      
       upstream.
      
      KVM_MAX_VCPU_ID is the maximum vcpu-id of a guest, and not the number
      of vcpu-ids. Fix array indexed by vcpu-id to have KVM_MAX_VCPU_ID+1
      elements.
      
      Note that this is currently no real problem, as KVM_MAX_VCPU_ID is
      an odd number, resulting in always enough padding being available at
      the end of those arrays.
      
      Nevertheless this should be fixed in order to avoid rare problems in
      case someone is using an even number for KVM_MAX_VCPU_ID.
      
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Message-Id: <20210701154105.23215-2-jgross@suse.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a80e3243
    • Srinivas Pandruvada's avatar
      ACPI: DPTF: Fix reading of attributes · 3c82e279
      Srinivas Pandruvada authored
      commit 41a8457f upstream.
      
      The current assumption that methods to read PCH FIVR attributes will
      return integer, is not correct. There is no good way to return integer
      as negative numbers are also valid.
      
      These read methods return a package of integers. The first integer returns
      status, which is 0 on success and any other value for failure. When the
      returned status is zero, then the second integer returns the actual value.
      
      This change fixes this issue by replacing acpi_evaluate_integer() with
      acpi_evaluate_object() and use acpi_extract_package() to extract results.
      
      Fixes: 2ce6324e
      
       ("ACPI: DPTF: Add PCH FIVR participant driver")
      Signed-off-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: 5.10+ <stable@vger.kernel.org> # 5.10+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c82e279
    • Hui Wang's avatar
      Revert "ACPI: resources: Add checks for ACPI IRQ override" · cf90e1c4
      Hui Wang authored
      commit e0eef369 upstream.
      
      The commit 0ec4e55e ("ACPI: resources: Add checks for ACPI IRQ
      override") introduces regression on some platforms, at least it makes
      the UART can't get correct irq setting on two different platforms,
      and it makes the kernel can't bootup on these two platforms.
      
      This reverts commit 0ec4e55e.
      
      Regression-discuss: https://bugzilla.kernel.org/show_bug.cgi?id=213031
      
      
      Reported-by: default avatarPGNd <pgnet.dev@gmail.com>
      Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
      Signed-off-by: default avatarHui Wang <hui.wang@canonical.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf90e1c4
    • Goldwyn Rodrigues's avatar
      btrfs: mark compressed range uptodate only if all bio succeed · 1d381aca
      Goldwyn Rodrigues authored
      commit 240246f6
      
       upstream.
      
      In compression write endio sequence, the range which the compressed_bio
      writes is marked as uptodate if the last bio of the compressed (sub)bios
      is completed successfully. There could be previous bio which may
      have failed which is recorded in cb->errors.
      
      Set the writeback range as uptodate only if cb->errors is zero, as opposed
      to checking only the last bio's status.
      
      Backporting notes: in all versions up to 4.4 the last argument is always
      replaced by "!cb->errors".
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarGoldwyn Rodrigues <rgoldwyn@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d381aca
    • Desmond Cheong Zhi Xi's avatar
      btrfs: fix rw device counting in __btrfs_free_extra_devids · c543bced
      Desmond Cheong Zhi Xi authored
      commit b2a61667 upstream.
      
      When removing a writeable device in __btrfs_free_extra_devids, the rw
      device count should be decremented.
      
      This error was caught by Syzbot which reported a warning in
      close_fs_devices:
      
        WARNING: CPU: 1 PID: 9355 at fs/btrfs/volumes.c:1168 close_fs_devices+0x763/0x880 fs/btrfs/volumes.c:1168
        Modules linked in:
        CPU: 0 PID: 9355 Comm: syz-executor552 Not tainted 5.13.0-rc1-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:close_fs_devices+0x763/0x880 fs/btrfs/volumes.c:1168
        RSP: 0018:ffffc9000333f2f0 EFLAGS: 00010293
        RAX: ffffffff8365f5c3 RBX: 0000000000000001 RCX: ffff888029afd4c0
        RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
        RBP: ffff88802846f508 R08: ffffffff8365f525 R09: ffffed100337d128
        R10: ffffed100337d128 R11: 0000000000000000 R12: dffffc0000000000
        R13: ffff888019be8868 R14: 1ffff1100337d10d R15: 1ffff1100337d10a
        FS:  00007f6f53828700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 000000000047c410 CR3: 00000000302a6000 CR4: 00000000001506f0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         btrfs_close_devices+0xc9/0x450 fs/btrfs/volumes.c:1180
         open_ctree+0x8e1/0x3968 fs/btrfs/disk-io.c:3693
         btrfs_fill_super fs/btrfs/super.c:1382 [inline]
         btrfs_mount_root+0xac5/0xc60 fs/btrfs/super.c:1749
         legacy_get_tree+0xea/0x180 fs/fs_context.c:592
         vfs_get_tree+0x86/0x270 fs/super.c:1498
         fc_mount fs/namespace.c:993 [inline]
         vfs_kern_mount+0xc9/0x160 fs/namespace.c:1023
         btrfs_mount+0x3d3/0xb50 fs/btrfs/super.c:1809
         legacy_get_tree+0xea/0x180 fs/fs_context.c:592
         vfs_get_tree+0x86/0x270 fs/super.c:1498
         do_new_mount fs/namespace.c:2905 [inline]
         path_mount+0x196f/0x2be0 fs/namespace.c:3235
         do_mount fs/namespace.c:3248 [inline]
         __do_sys_mount fs/namespace.c:3456 [inline]
         __se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3433
         do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Because fs_devices->rw_devices was not 0 after
      closing all devices. Here is the call trace that was observed:
      
        btrfs_mount_root():
          btrfs_scan_one_device():
            device_list_add();   <---------------- device added
          btrfs_open_devices():
            open_fs_devices():
              btrfs_open_one_device();   <-------- writable device opened,
      	                                     rw device count ++
          btrfs_fill_super():
            open_ctree():
              btrfs_free_extra_devids():
      	  __btrfs_free_extra_devids();  <--- writable device removed,
      	                              rw device count not decremented
      	  fail_tree_roots:
      	    btrfs_close_devices():
      	      close_fs_devices();   <------- rw device count off by 1
      
      As a note, prior to commit cf89af14 ("btrfs: dev-replace: fail
      mount if we don't have replace item with target device"), rw_devices
      was decremented on removing a writable device in
      __btrfs_free_extra_devids only if the BTRFS_DEV_STATE_REPLACE_TGT bit
      was not set for the device. However, this check does not need to be
      reinstated as it is now redundant and incorrect.
      
      In __btrfs_free_extra_devids, we skip removing the device if it is the
      target for replacement. This is done by checking whether device->devid
      == BTRFS_DEV_REPLACE_DEVID. Since BTRFS_DEV_STATE_REPLACE_TGT is set
      only on the device with devid BTRFS_DEV_REPLACE_DEVID, no devices
      should have the BTRFS_DEV_STATE_REPLACE_TGT bit set after the check,
      and so it's redundant to test for that bit.
      
      Additionally, following commit 82372bc8
      
       ("Btrfs: make
      the logic of source device removing more clear"), rw_devices is
      incremented whenever a writeable device is added to the alloc
      list (including the target device in btrfs_dev_replace_finishing), so
      all removals of writable devices from the alloc list should also be
      accompanied by a decrement to rw_devices.
      
      Reported-by: default avatar <syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com>
      Fixes: cf89af14
      
       ("btrfs: dev-replace: fail mount if we don't have replace item with target device")
      CC: stable@vger.kernel.org # 5.10+
      Tested-by: default avatar <syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com>
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDesmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c543bced
    • Filipe Manana's avatar
      btrfs: fix lost inode on log replay after mix of fsync, rename and inode eviction · 9e4417af
      Filipe Manana authored
      commit ecc64fab
      
       upstream.
      
      When checking if we need to log the new name of a renamed inode, we are
      checking if the inode and its parent inode have been logged before, and if
      not we don't log the new name. The check however is buggy, as it directly
      compares the logged_trans field of the inodes versus the ID of the current
      transaction. The problem is that logged_trans is a transient field, only
      stored in memory and never persisted in the inode item, so if an inode
      was logged before, evicted and reloaded, its logged_trans field is set to
      a value of 0, meaning the check will return false and the new name of the
      renamed inode is not logged. If the old parent directory was previously
      fsynced and we deleted the logged directory entries corresponding to the
      old name, we end up with a log that when replayed will delete the renamed
      inode.
      
      The following example triggers the problem:
      
        $ mkfs.btrfs -f /dev/sdc
        $ mount /dev/sdc /mnt
      
        $ mkdir /mnt/A
        $ mkdir /mnt/B
        $ echo -n "hello world" > /mnt/A/foo
      
        $ sync
      
        # Add some new file to A and fsync directory A.
        $ touch /mnt/A/bar
        $ xfs_io -c "fsync" /mnt/A
      
        # Now trigger inode eviction. We are only interested in triggering
        # eviction for the inode of directory A.
        $ echo 2 > /proc/sys/vm/drop_caches
      
        # Move foo from directory A to directory B.
        # This deletes the directory entries for foo in A from the log, and
        # does not add the new name for foo in directory B to the log, because
        # logged_trans of A is 0, which is less than the current transaction ID.
        $ mv /mnt/A/foo /mnt/B/foo
      
        # Now make an fsync to anything except A, B or any file inside them,
        # like for example create a file at the root directory and fsync this
        # new file. This syncs the log that contains all the changes done by
        # previous rename operation.
        $ touch /mnt/baz
        $ xfs_io -c "fsync" /mnt/baz
      
        <power fail>
      
        # Mount the filesystem and replay the log.
        $ mount /dev/sdc /mnt
      
        # Check the filesystem content.
        $ ls -1R /mnt
        /mnt/:
        A
        B
        baz
      
        /mnt/A:
        bar
      
        /mnt/B:
        $
      
        # File foo is gone, it's neither in A/ nor in B/.
      
      Fix this by using the inode_logged() helper at btrfs_log_new_name(), which
      safely checks if an inode was logged before in the current transaction.
      
      A test case for fstests will follow soon.
      
      CC: stable@vger.kernel.org # 4.14+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e4417af
    • Javier Pello's avatar
      fs/ext2: Avoid page_address on pages returned by ext2_get_page · 89e34995
      Javier Pello authored
      commit 728d392f upstream.
      
      Commit 782b76d7 ("fs/ext2: Replace
      kmap() with kmap_local_page()") replaced the kmap/kunmap calls in
      ext2_get_page/ext2_put_page with kmap_local_page/kunmap_local for
      efficiency reasons. As a necessary side change, the commit also
      made ext2_get_page (and ext2_find_entry and ext2_dotdot) return
      the mapping address along with the page itself, as it is required
      for kunmap_local, and converted uses of page_address on such pages
      to use the newly returned address instead. However, uses of
      page_address on such pages were missed in ext2_check_page and
      ext2_delete_entry, which triggers oopses if kmap_local_page happens
      to return an address from high memory. Fix this now by converting
      the remaining uses of page_address to use the right address, as
      returned by kmap_local_page.
      
      Link: https://lore.kernel.org/r/20210714185448.8707ac239e9f12b3a7f5b9f9@urjc.es
      
      
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Signed-off-by: default avatarJavier Pello <javier.pello@urjc.es>
      Fixes: 782b76d7
      
       ("fs/ext2: Replace kmap() with kmap_local_page()")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      89e34995
    • Linus Torvalds's avatar
      pipe: make pipe writes always wake up readers · f0aa1bc3
      Linus Torvalds authored
      commit 3a34b13a upstream.
      
      Since commit 1b6b26ae ("pipe: fix and clarify pipe write wakeup
      logic") we have sanitized the pipe write logic, and would only try to
      wake up readers if they needed it.
      
      In particular, if the pipe already had data in it before the write,
      there was no point in trying to wake up a reader, since any existing
      readers must have been aware of the pre-existing data already.  Doing
      extraneous wakeups will only cause potential thundering herd problems.
      
      However, it turns out that some Android libraries have misused the EPOLL
      interface, and expected "edge triggered" be to "any new write will
      trigger it".  Even if there was no edge in sight.
      
      Quoting Sandeep Patil:
       "The commit 1b6b26ae ('pipe: fix and clarify pipe write wakeup
        logic') changed pipe write logic to wakeup readers only if the pipe
        was empty at the time of write. However, there are libraries that
        relied upon the older behavior for notification scheme similar to
        what's described in [1]
      
        One such library 'realm-core'[2] is used by numerous Android
        applications. The library uses a similar notification mechanism as GNU
        Make but it never drains the pipe until it is full. When Android moved
        to v5.10 kernel, all applications using this library stopped working.
      
        The library has since been fixed[3] but it will be a while before all
        applications incorporate the updated library"
      
      Our regression rule for the kernel is that if applications break from
      new behavior, it's a regression, even if it was because the application
      did something patently wrong.  Also note the original report [4] by
      Michal Kerrisk about a test for this epoll behavior - but at that point
      we didn't know of any actual broken use case.
      
      So add the extraneous wakeup, to approximate the old behavior.
      
      [ I say "approximate", because the exact old behavior was to do a wakeup
        not for each write(), but for each pipe buffer chunk that was filled
        in. The behavior introduced by this change is not that - this is just
        "every write will cause a wakeup, whether necessary or not", which
        seems to be sufficient for the broken library use. ]
      
      It's worth noting that this adds the extraneous wakeup only for the
      write side, while the read side still considers the "edge" to be purely
      about reading enough from the pipe to allow further writes.
      
      See commit f467a6a6 ("pipe: fix and clarify pipe read wakeup logic")
      for the pipe read case, which remains that "only wake up if the pipe was
      full, and we read something from it".
      
      Link: https://lore.kernel.org/lkml/CAHk-=wjeG0q1vgzu4iJhW5juPkTsjTYmiqiMUYAebWW+0bam6w@mail.gmail.com/ [1]
      Link: https://github.com/realm/realm-core [2]
      Link: https://github.com/realm/realm-core/issues/4666 [3]
      Link: https://lore.kernel.org/lkml/CAKgNAkjMBGeAwF=2MKK758BhxvW58wYTgYKB2V-gY1PwXxrH+Q@mail.gmail.com/ [4]
      Link: https://lore.kernel.org/lkml/20210729222635.2937453-1-sspatil@android.com/
      
      
      Reported-by: default avatarSandeep Patil <sspatil@android.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f0aa1bc3
    • Greg Kroah-Hartman's avatar
      selftest: fix build error in tools/testing/selftests/vm/userfaultfd.c · 5a5aaf41
      Greg Kroah-Hartman authored
      When backporting 0db282ba
      
       ("selftest: use mmap instead of
      posix_memalign to allocate memory") to this stable branch, I forgot a {
      breaking the build.
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5a5aaf41
  2. Jul 31, 2021