Skip to content
  1. Sep 08, 2022
  2. Sep 05, 2022
    • Greg Kroah-Hartman's avatar
    • Yang Yingliang's avatar
      net: neigh: don't call kfree_skb() under spin_lock_irqsave() · 572b646c
      Yang Yingliang authored
      commit d5485d9d upstream.
      
      It is not allowed to call kfree_skb() from hardware interrupt
      context or with interrupts being disabled. So add all skb to
      a tmp list, then free them after spin_unlock_irqrestore() at
      once.
      
      Fixes: 66ba215c
      
       ("neigh: fix possible DoS due to net iface start/stop loop")
      Suggested-by: default avatarDenis V. Lunev <den@openvz.org>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      572b646c
    • Zhengchao Shao's avatar
      net/af_packet: check len when min_header_len equals to 0 · facf99bc
      Zhengchao Shao authored
      commit dc633700
      
       upstream.
      
      User can use AF_PACKET socket to send packets with the length of 0.
      When min_header_len equals to 0, packet_snd will call __dev_queue_xmit
      to send packets, and sock->type can be any type.
      
      Reported-by: default avatar <syzbot+5ea725c25d06fb9114c4@syzkaller.appspotmail.com>
      Fixes: fd189422
      
       ("bpf: Don't redirect packets with invalid pkt_len")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      facf99bc
    • Liam Howlett's avatar
      android: binder: fix lockdep check on clearing vma · 591a98b8
      Liam Howlett authored
      commit b0cab80e upstream.
      
      When munmapping a vma, the mmap_lock can be degraded to a write before
      calling close() on the file handle.  The binder close() function calls
      binder_alloc_set_vma() to clear the vma address, which now has a lock dep
      check for writing on the mmap_lock.  Change the lockdep check to ensure
      the reading lock is held while clearing and keep the write check while
      writing.
      
      Link: https://lkml.kernel.org/r/20220627151857.2316964-1-Liam.Howlett@oracle.com
      
      
      Fixes: 472a68df605b ("android: binder: stop saving a pointer to the VMA")
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Reported-by: default avatar <syzbot+da54fa8d793ca89c741f@syzkaller.appspotmail.com>
      Acked-by: default avatarTodd Kjos <tkjos@google.com>
      Cc: "Arve Hjønnevåg" <arve@android.com>
      Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hridya Valsaraju <hridya@google.com>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Martijn Coenen <maco@android.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      591a98b8
    • Omar Sandoval's avatar
      btrfs: fix space cache corruption and potential double allocations · 92dc4c1a
      Omar Sandoval authored
      commit ced8ecf0 upstream.
      
      When testing space_cache v2 on a large set of machines, we encountered a
      few symptoms:
      
      1. "unable to add free space :-17" (EEXIST) errors.
      2. Missing free space info items, sometimes caught with a "missing free
         space info for X" error.
      3. Double-accounted space: ranges that were allocated in the extent tree
         and also marked as free in the free space tree, ranges that were
         marked as allocated twice in the extent tree, or ranges that were
         marked as free twice in the free space tree. If the latter made it
         onto disk, the next reboot would hit the BUG_ON() in
         add_new_free_space().
      4. On some hosts with no on-disk corruption or error messages, the
         in-memory space cache (dumped with drgn) disagreed with the free
         space tree.
      
      All of these symptoms have the same underlying cause: a race between
      caching the free space for a block group and returning free space to the
      in-memory space cache for pinned extents causes us to double-add a free
      range to the space cache. This race exists when free space is cached
      from the free space tree (space_cache=v2) or the extent tree
      (nospace_cache, or space_cache=v1 if the cache needs to be regenerated).
      struct btrfs_block_group::last_byte_to_unpin and struct
      btrfs_block_group::progress are supposed to protect against this race,
      but commit d0c2f4fa ("btrfs: make concurrent fsyncs wait less when
      waiting for a transaction commit") subtly broke this by allowing
      multiple transactions to be unpinning extents at the same time.
      
      Specifically, the race is as follows:
      
      1. An extent is deleted from an uncached block group in transaction A.
      2. btrfs_commit_transaction() is called for transaction A.
      3. btrfs_run_delayed_refs() -> __btrfs_free_extent() runs the delayed
         ref for the deleted extent.
      4. __btrfs_free_extent() -> do_free_extent_accounting() ->
         add_to_free_space_tree() adds the deleted extent back to the free
         space tree.
      5. do_free_extent_accounting() -> btrfs_update_block_group() ->
         btrfs_cache_block_group() queues up the block group to get cached.
         block_group->progress is set to block_group->start.
      6. btrfs_commit_transaction() for transaction A calls
         switch_commit_roots(). It sets block_group->last_byte_to_unpin to
         block_group->progress, which is block_group->start because the block
         group hasn't been cached yet.
      7. The caching thread gets to our block group. Since the commit roots
         were already switched, load_free_space_tree() sees the deleted extent
         as free and adds it to the space cache. It finishes caching and sets
         block_group->progress to U64_MAX.
      8. btrfs_commit_transaction() advances transaction A to
         TRANS_STATE_SUPER_COMMITTED.
      9. fsync calls btrfs_commit_transaction() for transaction B. Since
         transaction A is already in TRANS_STATE_SUPER_COMMITTED and the
         commit is for fsync, it advances.
      10. btrfs_commit_transaction() for transaction B calls
          switch_commit_roots(). This time, the block group has already been
          cached, so it sets block_group->last_byte_to_unpin to U64_MAX.
      11. btrfs_commit_transaction() for transaction A calls
          btrfs_finish_extent_commit(), which calls unpin_extent_range() for
          the deleted extent. It sees last_byte_to_unpin set to U64_MAX (by
          transaction B!), so it adds the deleted extent to the space cache
          again!
      
      This explains all of our symptoms above:
      
      * If the sequence of events is exactly as described above, when the free
        space is re-added in step 11, it will fail with EEXIST.
      * If another thread reallocates the deleted extent in between steps 7
        and 11, then step 11 will silently re-add that space to the space
        cache as free even though it is actually allocated. Then, if that
        space is allocated *again*, the free space tree will be corrupted
        (namely, the wrong item will be deleted).
      * If we don't catch this free space tree corruption, it will continue
        to get worse as extents are deleted and reallocated.
      
      The v1 space_cache is synchronously loaded when an extent is deleted
      (btrfs_update_block_group() with alloc=0 calls btrfs_cache_block_group()
      with load_cache_only=1), so it is not normally affected by this bug.
      However, as noted above, if we fail to load the space cache, we will
      fall back to caching from the extent tree and may hit this bug.
      
      The easiest fix for this race is to also make caching from the free
      space tree or extent tree synchronous. Josef tested this and found no
      performance regressions.
      
      A few extra changes fall out of this change. Namely, this fix does the
      following, with step 2 being the crucial fix:
      
      1. Factor btrfs_caching_ctl_wait_done() out of
         btrfs_wait_block_group_cache_done() to allow waiting on a caching_ctl
         that we already hold a reference to.
      2. Change the call in btrfs_cache_block_group() of
         btrfs_wait_space_cache_v1_finished() to
         btrfs_caching_ctl_wait_done(), which makes us wait regardless of the
         space_cache option.
      3. Delete the now unused btrfs_wait_space_cache_v1_finished() and
         space_cache_v1_done().
      4. Change btrfs_cache_block_group()'s `int load_cache_only` parameter to
         `bool wait` to more accurately describe its new meaning.
      5. Change a few callers which had a separate call to
         btrfs_wait_block_group_cache_done() to use wait = true instead.
      6. Make btrfs_wait_block_group_cache_done() static now that it's not
         used outside of block-group.c anymore.
      
      Fixes: d0c2f4fa
      
       ("btrfs: make concurrent fsyncs wait less when waiting for a transaction commit")
      CC: stable@vger.kernel.org # 5.12+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92dc4c1a
    • Kuniyuki Iwashima's avatar
      kprobes: don't call disarm_kprobe() for disabled kprobes · 55c7a915
      Kuniyuki Iwashima authored
      commit 9c80e799 upstream.
      
      The assumption in __disable_kprobe() is wrong, and it could try to disarm
      an already disarmed kprobe and fire the WARN_ONCE() below. [0]  We can
      easily reproduce this issue.
      
      1. Write 0 to /sys/kernel/debug/kprobes/enabled.
      
        # echo 0 > /sys/kernel/debug/kprobes/enabled
      
      2. Run execsnoop.  At this time, one kprobe is disabled.
      
        # /usr/share/bcc/tools/execsnoop &
        [1] 2460
        PCOMM            PID    PPID   RET ARGS
      
        # cat /sys/kernel/debug/kprobes/list
        ffffffff91345650  r  __x64_sys_execve+0x0    [FTRACE]
        ffffffff91345650  k  __x64_sys_execve+0x0    [DISABLED][FTRACE]
      
      3. Write 1 to /sys/kernel/debug/kprobes/enabled, which changes
         kprobes_all_disarmed to false but does not arm the disabled kprobe.
      
        # echo 1 > /sys/kernel/debug/kprobes/enabled
      
        # cat /sys/kernel/debug/kprobes/list
        ffffffff91345650  r  __x64_sys_execve+0x0    [FTRACE]
        ffffffff91345650  k  __x64_sys_execve+0x0    [DISABLED][FTRACE]
      
      4. Kill execsnoop, when __disable_kprobe() calls disarm_kprobe() for the
         disabled kprobe and hits the WARN_ONCE() in __disarm_kprobe_ftrace().
      
        # fg
        /usr/share/bcc/tools/execsnoop
        ^C
      
      Actually, WARN_ONCE() is fired twice, and __unregister_kprobe_top() misses
      some cleanups and leaves the aggregated kprobe in the hash table.  Then,
      __unregister_trace_kprobe() initialises tk->rp.kp.list and creates an
      infinite loop like this.
      
        aggregated kprobe.list -> kprobe.list -.
                                           ^    |
                                           '.__.'
      
      In this situation, these commands fall into the infinite loop and result
      in RCU stall or soft lockup.
      
        cat /sys/kernel/debug/kprobes/list : show_kprobe_addr() enters into the
                                             infinite loop with RCU.
      
        /usr/share/bcc/tools/execsnoop : warn_kprobe_rereg() holds kprobe_mutex,
                                         and __get_valid_kprobe() is stuck in
      				   the loop.
      
      To avoid the issue, make sure we don't call disarm_kprobe() for disabled
      kprobes.
      
      [0]
      Failed to disarm kprobe-ftrace at __x64_sys_execve+0x0/0x40 (error -2)
      WARNING: CPU: 6 PID: 2460 at kernel/kprobes.c:1130 __disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129)
      Modules linked in: ena
      CPU: 6 PID: 2460 Comm: execsnoop Not tainted 5.19.0+ #28
      Hardware name: Amazon EC2 c5.2xlarge/, BIOS 1.0 10/16/2017
      RIP: 0010:__disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129)
      Code: 24 8b 02 eb c1 80 3d c4 83 f2 01 00 75 d4 48 8b 75 00 89 c2 48 c7 c7 90 fa 0f 92 89 04 24 c6 05 ab 83 01 e8 e4 94 f0 ff <0f> 0b 8b 04 24 eb b1 89 c6 48 c7 c7 60 fa 0f 92 89 04 24 e8 cc 94
      RSP: 0018:ffff9e6ec154bd98 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: ffffffff930f7b00 RCX: 0000000000000001
      RDX: 0000000080000001 RSI: ffffffff921461c5 RDI: 00000000ffffffff
      RBP: ffff89c504286da8 R08: 0000000000000000 R09: c0000000fffeffff
      R10: 0000000000000000 R11: ffff9e6ec154bc28 R12: ffff89c502394e40
      R13: ffff89c502394c00 R14: ffff9e6ec154bc00 R15: 0000000000000000
      FS:  00007fe800398740(0000) GS:ffff89c812d80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000c00057f010 CR3: 0000000103b54006 CR4: 00000000007706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
      <TASK>
       __disable_kprobe (kernel/kprobes.c:1716)
       disable_kprobe (kernel/kprobes.c:2392)
       __disable_trace_kprobe (kernel/trace/trace_kprobe.c:340)
       disable_trace_kprobe (kernel/trace/trace_kprobe.c:429)
       perf_trace_event_unreg.isra.2 (./include/linux/tracepoint.h:93 kernel/trace/trace_event_perf.c:168)
       perf_kprobe_destroy (kernel/trace/trace_event_perf.c:295)
       _free_event (kernel/events/core.c:4971)
       perf_event_release_kernel (kernel/events/core.c:5176)
       perf_release (kernel/events/core.c:5186)
       __fput (fs/file_table.c:321)
       task_work_run (./include/linux/sched.h:2056 (discriminator 1) kernel/task_work.c:179 (discriminator 1))
       exit_to_user_mode_prepare (./include/linux/resume_user_mode.h:49 kernel/entry/common.c:169 kernel/entry/common.c:201)
       syscall_exit_to_user_mode (./arch/x86/include/asm/jump_label.h:55 ./arch/x86/include/asm/nospec-branch.h:384 ./arch/x86/include/asm/entry-common.h:94 kernel/entry/common.c:133 kernel/entry/common.c:296)
       do_syscall_64 (arch/x86/entry/common.c:87)
       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      RIP: 0033:0x7fe7ff210654
      Code: 15 79 89 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb be 0f 1f 00 8b 05 9a cd 20 00 48 63 ff 85 c0 75 11 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3a f3 c3 48 83 ec 18 48 89 7c 24 08 e8 34 fc
      RSP: 002b:00007ffdbd1d3538 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
      RAX: 0000000000000000 RBX: 0000000000000008 RCX: 00007fe7ff210654
      RDX: 0000000000000000 RSI: 0000000000002401 RDI: 0000000000000008
      RBP: 0000000000000000 R08: 94ae31d6fda838a4 R0900007fe8001c9d30
      R10: 00007ffdbd1d34b0 R11: 0000000000000246 R12: 00007ffdbd1d3600
      R13: 0000000000000000 R14: fffffffffffffffc R15: 00007ffdbd1d3560
      </TASK>
      
      Link: https://lkml.kernel.org/r/20220813020509.90805-1-kuniyu@amazon.com
      Fixes: 69d54b91
      
       ("kprobes: makes kprobes/enabled works correctly for optimized kprobes.")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reported-by: default avatarAyushman Dutta <ayudutta@amazon.com>
      Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
      Cc: Kuniyuki Iwashima <kuni1840@gmail.com>
      Cc: Ayushman Dutta <ayudutta@amazon.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55c7a915
    • Josef Bacik's avatar
      btrfs: tree-checker: check for overlapping extent items · 6a27997c
      Josef Bacik authored
      [ Upstream commit 899b7f69
      
       ]
      
      We're seeing a weird problem in production where we have overlapping
      extent items in the extent tree.  It's unclear where these are coming
      from, and in debugging we realized there's no check in the tree checker
      for this sort of problem.  Add a check to the tree-checker to make sure
      that the extents do not overlap each other.
      
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6a27997c
    • Josef Bacik's avatar
      btrfs: fix lockdep splat with reloc root extent buffers · 1b2a7dde
      Josef Bacik authored
      [ Upstream commit b40130b2
      
       ]
      
      We have been hitting the following lockdep splat with btrfs/187 recently
      
        WARNING: possible circular locking dependency detected
        5.19.0-rc8+ #775 Not tainted
        ------------------------------------------------------
        btrfs/752500 is trying to acquire lock:
        ffff97e1875a97b8 (btrfs-treloc-02#2){+.+.}-{3:3}, at: __btrfs_tree_lock+0x24/0x110
      
        but task is already holding lock:
        ffff97e1875a9278 (btrfs-tree-01/1){+.+.}-{3:3}, at: __btrfs_tree_lock+0x24/0x110
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #2 (btrfs-tree-01/1){+.+.}-{3:3}:
      	 down_write_nested+0x41/0x80
      	 __btrfs_tree_lock+0x24/0x110
      	 btrfs_init_new_buffer+0x7d/0x2c0
      	 btrfs_alloc_tree_block+0x120/0x3b0
      	 __btrfs_cow_block+0x136/0x600
      	 btrfs_cow_block+0x10b/0x230
      	 btrfs_search_slot+0x53b/0xb70
      	 btrfs_lookup_inode+0x2a/0xa0
      	 __btrfs_update_delayed_inode+0x5f/0x280
      	 btrfs_async_run_delayed_root+0x24c/0x290
      	 btrfs_work_helper+0xf2/0x3e0
      	 process_one_work+0x271/0x590
      	 worker_thread+0x52/0x3b0
      	 kthread+0xf0/0x120
      	 ret_from_fork+0x1f/0x30
      
        -> #1 (btrfs-tree-01){++++}-{3:3}:
      	 down_write_nested+0x41/0x80
      	 __btrfs_tree_lock+0x24/0x110
      	 btrfs_search_slot+0x3c3/0xb70
      	 do_relocation+0x10c/0x6b0
      	 relocate_tree_blocks+0x317/0x6d0
      	 relocate_block_group+0x1f1/0x560
      	 btrfs_relocate_block_group+0x23e/0x400
      	 btrfs_relocate_chunk+0x4c/0x140
      	 btrfs_balance+0x755/0xe40
      	 btrfs_ioctl+0x1ea2/0x2c90
      	 __x64_sys_ioctl+0x88/0xc0
      	 do_syscall_64+0x38/0x90
      	 entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
        -> #0 (btrfs-treloc-02#2){+.+.}-{3:3}:
      	 __lock_acquire+0x1122/0x1e10
      	 lock_acquire+0xc2/0x2d0
      	 down_write_nested+0x41/0x80
      	 __btrfs_tree_lock+0x24/0x110
      	 btrfs_lock_root_node+0x31/0x50
      	 btrfs_search_slot+0x1cb/0xb70
      	 replace_path+0x541/0x9f0
      	 merge_reloc_root+0x1d6/0x610
      	 merge_reloc_roots+0xe2/0x260
      	 relocate_block_group+0x2c8/0x560
      	 btrfs_relocate_block_group+0x23e/0x400
      	 btrfs_relocate_chunk+0x4c/0x140
      	 btrfs_balance+0x755/0xe40
      	 btrfs_ioctl+0x1ea2/0x2c90
      	 __x64_sys_ioctl+0x88/0xc0
      	 do_syscall_64+0x38/0x90
      	 entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
        other info that might help us debug this:
      
        Chain exists of:
          btrfs-treloc-02#2 --> btrfs-tree-01 --> btrfs-tree-01/1
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(btrfs-tree-01/1);
      				 lock(btrfs-tree-01);
      				 lock(btrfs-tree-01/1);
          lock(btrfs-treloc-02#2);
      
         *** DEADLOCK ***
      
        7 locks held by btrfs/752500:
         #0: ffff97e292fdf460 (sb_writers#12){.+.+}-{0:0}, at: btrfs_ioctl+0x208/0x2c90
         #1: ffff97e284c02050 (&fs_info->reclaim_bgs_lock){+.+.}-{3:3}, at: btrfs_balance+0x55f/0xe40
         #2: ffff97e284c00878 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_relocate_block_group+0x236/0x400
         #3: ffff97e292fdf650 (sb_internal#2){.+.+}-{0:0}, at: merge_reloc_root+0xef/0x610
         #4: ffff97e284c02378 (btrfs_trans_num_writers){++++}-{0:0}, at: join_transaction+0x1a8/0x5a0
         #5: ffff97e284c023a0 (btrfs_trans_num_extwriters){++++}-{0:0}, at: join_transaction+0x1a8/0x5a0
         #6: ffff97e1875a9278 (btrfs-tree-01/1){+.+.}-{3:3}, at: __btrfs_tree_lock+0x24/0x110
      
        stack backtrace:
        CPU: 1 PID: 752500 Comm: btrfs Not tainted 5.19.0-rc8+ #775
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
        Call Trace:
      
         dump_stack_lvl+0x56/0x73
         check_noncircular+0xd6/0x100
         ? lock_is_held_type+0xe2/0x140
         __lock_acquire+0x1122/0x1e10
         lock_acquire+0xc2/0x2d0
         ? __btrfs_tree_lock+0x24/0x110
         down_write_nested+0x41/0x80
         ? __btrfs_tree_lock+0x24/0x110
         __btrfs_tree_lock+0x24/0x110
         btrfs_lock_root_node+0x31/0x50
         btrfs_search_slot+0x1cb/0xb70
         ? lock_release+0x137/0x2d0
         ? _raw_spin_unlock+0x29/0x50
         ? release_extent_buffer+0x128/0x180
         replace_path+0x541/0x9f0
         merge_reloc_root+0x1d6/0x610
         merge_reloc_roots+0xe2/0x260
         relocate_block_group+0x2c8/0x560
         btrfs_relocate_block_group+0x23e/0x400
         btrfs_relocate_chunk+0x4c/0x140
         btrfs_balance+0x755/0xe40
         btrfs_ioctl+0x1ea2/0x2c90
         ? lock_is_held_type+0xe2/0x140
         ? lock_is_held_type+0xe2/0x140
         ? __x64_sys_ioctl+0x88/0xc0
         __x64_sys_ioctl+0x88/0xc0
         do_syscall_64+0x38/0x90
         entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      This isn't necessarily new, it's just tricky to hit in practice.  There
      are two competing things going on here.  With relocation we create a
      snapshot of every fs tree with a reloc tree.  Any extent buffers that
      get initialized here are initialized with the reloc root lockdep key.
      However since it is a snapshot, any blocks that are currently in cache
      that originally belonged to the fs tree will have the normal tree
      lockdep key set.  This creates the lock dependency of
      
        reloc tree -> normal tree
      
      for the extent buffer locking during the first phase of the relocation
      as we walk down the reloc root to relocate blocks.
      
      However this is problematic because the final phase of the relocation is
      merging the reloc root into the original fs root.  This involves
      searching down to any keys that exist in the original fs root and then
      swapping the relocated block and the original fs root block.  We have to
      search down to the fs root first, and then go search the reloc root for
      the block we need to replace.  This creates the dependency of
      
        normal tree -> reloc tree
      
      which is why lockdep complains.
      
      Additionally even if we were to fix this particular mismatch with a
      different nesting for the merge case, we're still slotting in a block
      that has a owner of the reloc root objectid into a normal tree, so that
      block will have its lockdep key set to the tree reloc root, and create a
      lockdep splat later on when we wander into that block from the fs root.
      
      Unfortunately the only solution here is to make sure we do not set the
      lockdep key to the reloc tree lockdep key normally, and then reset any
      blocks we wander into from the reloc root when we're doing the merged.
      
      This solves the problem of having mixed tree reloc keys intermixed with
      normal tree keys, and then allows us to make sure in the merge case we
      maintain the lock order of
      
        normal tree -> reloc tree
      
      We handle this by setting a bit on the reloc root when we do the search
      for the block we want to relocate, and any block we search into or COW
      at that point gets set to the reloc tree key.  This works correctly
      because we only ever COW down to the parent node, so we aren't resetting
      the key for the block we're linking into the fs root.
      
      With this patch we no longer have the lockdep splat in btrfs/187.
      
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1b2a7dde
    • Josef Bacik's avatar
      btrfs: move lockdep class helpers to locking.c · 98dfad7f
      Josef Bacik authored
      [ Upstream commit 0a27a047
      
       ]
      
      These definitions exist in disk-io.c, which is not related to the
      locking.  Move this over to locking.h/c where it makes more sense.
      
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      98dfad7f
    • Florian Westphal's avatar
      testing: selftests: nft_flowtable.sh: use random netns names · a74fc94f
      Florian Westphal authored
      [ Upstream commit b71b7bfe
      
       ]
      
      "ns1" is a too generic name, use a random suffix to avoid
      errors when such a netns exists.  Also allows to run multiple
      instances of the script in parallel.
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a74fc94f
    • Geert Uytterhoeven's avatar
      netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to y · 1d8b5d25
      Geert Uytterhoeven authored
      [ Upstream commit aa5762c3 ]
      
      NF_CONNTRACK_PROCFS was marked obsolete in commit 54b07dca
      
      
      ("netfilter: provide config option to disable ancient procfs parts") in
      v3.3.
      
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1d8b5d25
    • Charlene Liu's avatar
      drm/amd/display: avoid doing vm_init multiple time · 85dd24ff
      Charlene Liu authored
      [ Upstream commit 5544a7b5
      
       ]
      
      [why]
      this is to ensure that driver will not reprogram hvm_prefetch_req again if
      it is done.
      
      Reviewed-by: default avatarMartin Leung <Martin.Leung@amd.com>
      Acked-by: default avatarBrian Chang <Brian.Chang@amd.com>
      Signed-off-by: default avatarCharlene Liu <Charlene.Liu@amd.com>
      Tested-by: default avatarDaniel Wheeler <daniel.wheeler@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      85dd24ff
    • Dusica Milinkovic's avatar
      drm/amdgpu: Increase tlb flush timeout for sriov · 898467ac
      Dusica Milinkovic authored
      [ Upstream commit 373008bf
      
       ]
      
      [Why]
      During multi-vf executing benchmark (Luxmark) observed kiq error timeout.
      It happenes because all of VFs do the tlb invalidation at the same time.
      Although each VF has the invalidate register set, from hardware side
      the invalidate requests are queue to execute.
      
      [How]
      In case of 12 VF increase timeout on 12*100ms
      
      Signed-off-by: default avatarDusica Milinkovic <Dusica.Milinkovic@amd.com>
      Acked-by: default avatarShaoyun Liu <shaoyun.liu@amd.com>
      Acked-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      898467ac
    • Ilya Bakoulin's avatar
      drm/amd/display: Fix pixel clock programming · 4df54c49
      Ilya Bakoulin authored
      [ Upstream commit 04fb918b
      
       ]
      
      [Why]
      Some pixel clock values could cause HDMI TMDS SSCPs to be misaligned
      between different HDMI lanes when using YCbCr420 10-bit pixel format.
      
      BIOS functions for transmitter/encoder control take pixel clock in kHz
      increments, whereas the function for setting the pixel clock is in 100Hz
      increments. Setting pixel clock to a value that is not on a kHz boundary
      will cause the issue.
      
      [How]
      Round pixel clock down to nearest kHz in 10/12-bpc cases.
      
      Reviewed-by: default avatarAric Cyr <Aric.Cyr@amd.com>
      Acked-by: default avatarBrian Chang <Brian.Chang@amd.com>
      Signed-off-by: default avatarIlya Bakoulin <Ilya.Bakoulin@amd.com>
      Tested-by: default avatarDaniel Wheeler <daniel.wheeler@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4df54c49
    • Evan Quan's avatar
      drm/amd/pm: add missing ->fini_microcode interface for Sienna Cichlid · a89e753d
      Evan Quan authored
      [ Upstream commit 0a2d922a
      
       ]
      
      To avoid any potential memory leak.
      
      Signed-off-by: default avatarEvan Quan <evan.quan@amd.com>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a89e753d
    • Namjae Jeon's avatar
      ksmbd: don't remove dos attribute xattr on O_TRUNC open · a2ede313
      Namjae Jeon authored
      [ Upstream commit 17661ecf
      
       ]
      
      When smb client open file in ksmbd share with O_TRUNC, dos attribute
      xattr is removed as well as data in file. This cause the FSCTL_SET_SPARSE
      request from the client fails because ksmbd can't update the dos attribute
      after setting ATTR_SPARSE_FILE. And this patch fix xfstests generic/469
      test also.
      
      Signed-off-by: default avatarNamjae Jeon <linkinjeon@kernel.org>
      Reviewed-by: default avatarHyunchul Lee <hyc.lee@gmail.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a2ede313
    • Juergen Gross's avatar
      s390/hypfs: avoid error message under KVM · a7ada939
      Juergen Gross authored
      [ Upstream commit 7b6670b0
      
       ]
      
      When booting under KVM the following error messages are issued:
      
      hypfs.7f5705: The hardware system does not support hypfs
      hypfs.7a79f0: Initialization of hypfs failed with rc=-61
      
      Demote the severity of first message from "error" to "info" and issue
      the second message only in other error cases.
      
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Acked-by: default avatarChristian Borntraeger <borntraeger@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220620094534.18967-1-jgross@suse.com
      
      
      [arch/s390/hypfs/hypfs_diag.c changed description]
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a7ada939
    • Denis V. Lunev's avatar
      neigh: fix possible DoS due to net iface start/stop loop · db6fa03d
      Denis V. Lunev authored
      [ Upstream commit 66ba215c
      
       ]
      
      Normal processing of ARP request (usually this is Ethernet broadcast
      packet) coming to the host is looking like the following:
      * the packet comes to arp_process() call and is passed through routing
        procedure
      * the request is put into the queue using pneigh_enqueue() if
        corresponding ARP record is not local (common case for container
        records on the host)
      * the request is processed by timer (within 80 jiffies by default) and
        ARP reply is sent from the same arp_process() using
        NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED condition (flag is set inside
        pneigh_enqueue())
      
      And here the problem comes. Linux kernel calls pneigh_queue_purge()
      which destroys the whole queue of ARP requests on ANY network interface
      start/stop event through __neigh_ifdown().
      
      This is actually not a problem within the original world as network
      interface start/stop was accessible to the host 'root' only, which
      could do more destructive things. But the world is changed and there
      are Linux containers available. Here container 'root' has an access
      to this API and could be considered as untrusted user in the hosting
      (container's) world.
      
      Thus there is an attack vector to other containers on node when
      container's root will endlessly start/stop interfaces. We have observed
      similar situation on a real production node when docker container was
      doing such activity and thus other containers on the node become not
      accessible.
      
      The patch proposed doing very simple thing. It drops only packets from
      the same namespace in the pneigh_queue_purge() where network interface
      state change is detected. This is enough to prevent the problem for the
      whole node preserving original semantics of the code.
      
      v2:
      	- do del_timer_sync() if queue is empty after pneigh_queue_purge()
      v3:
      	- rebase to net tree
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@kernel.org>
      Cc: Yajun Deng <yajun.deng@linux.dev>
      Cc: Roopa Prabhu <roopa@nvidia.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
      Cc: Konstantin Khorenko <khorenko@virtuozzo.com>
      Cc: kernel@openvz.org
      Cc: devel@openvz.org
      Investigated-by: default avatarAlexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
      Signed-off-by: default avatarDenis V. Lunev <den@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      db6fa03d
    • Namjae Jeon's avatar
      ksmbd: return STATUS_BAD_NETWORK_NAME error status if share is not configured · 857048ea
      Namjae Jeon authored
      [ Upstream commit fe54833d
      
       ]
      
      If share is not configured in smb.conf, smb2 tree connect should return
      STATUS_BAD_NETWORK_NAME instead of STATUS_BAD_NETWORK_PATH.
      
      Signed-off-by: default avatarNamjae Jeon <linkinjeon@kernel.org>
      Reviewed-by: default avatarHyunchul Lee <hyc.lee@gmail.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      857048ea
    • Fudong Wang's avatar
      drm/amd/display: clear optc underflow before turn off odm clock · 5ee30bcf
      Fudong Wang authored
      [ Upstream commit b2a93490
      
       ]
      
      [Why]
      After ODM clock off, optc underflow bit will be kept there always and clear not work.
      We need to clear that before clock off.
      
      [How]
      Clear that if have when clock off.
      
      Reviewed-by: default avatarAlvin Lee <alvin.lee2@amd.com>
      Acked-by: default avatarTom Chung <chiahsuan.chung@amd.com>
      Signed-off-by: default avatarFudong Wang <Fudong.Wang@amd.com>
      Tested-by: default avatarDaniel Wheeler <daniel.wheeler@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5ee30bcf
    • Alvin Lee's avatar
      drm/amd/display: For stereo keep "FLIP_ANY_FRAME" · e407e04a
      Alvin Lee authored
      [ Upstream commit 84ef99c7
      
       ]
      
      [Description]
      Observed in stereomode that programming FLIP_LEFT_EYE
      can cause hangs. Keep FLIP_ANY_FRAME in stereo mode so
      the surface flip can take place before left or right eye
      
      Reviewed-by: default avatarMartin Leung <Martin.Leung@amd.com>
      Acked-by: default avatarTom Chung <chiahsuan.chung@amd.com>
      Signed-off-by: default avatarAlvin Lee <alvin.lee2@amd.com>
      Tested-by: default avatarDaniel Wheeler <daniel.wheeler@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e407e04a
    • Leo Ma's avatar
      drm/amd/display: Fix HDMI VSIF V3 incorrect issue · 2cddd3d0
      Leo Ma authored
      [ Upstream commit 05911836
      
       ]
      
      [Why]
      Reported from customer the checksum in AMD VSIF V3 is incorrect and
      causing blank screen issue.
      
      [How]
      Fix the packet length issue on AMD HDMI VSIF V3.
      
      Reviewed-by: default avatarAnthony Koo <Anthony.Koo@amd.com>
      Acked-by: default avatarTom Chung <chiahsuan.chung@amd.com>
      Signed-off-by: default avatarLeo Ma <hanghong.ma@amd.com>
      Tested-by: default avatarDaniel Wheeler <daniel.wheeler@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2cddd3d0