Skip to content
  1. Sep 15, 2022
  2. Sep 05, 2022
    • Greg Kroah-Hartman's avatar
      Linux 5.4.212 · d6deb370
      Greg Kroah-Hartman authored
      
      
      Link: https://lore.kernel.org/r/20220902121403.569927325@linuxfoundation.org
      Tested-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Tested-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Tested-by: default avatarSudip Mukherjee <sudip.mukherjee@codethink.co.uk>
      Tested-by: default avatarLinux Kernel Functional Testing <lkft@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      v5.4.212
      d6deb370
    • Yang Yingliang's avatar
      net: neigh: don't call kfree_skb() under spin_lock_irqsave() · 00523483
      Yang Yingliang authored
      commit d5485d9d upstream.
      
      It is not allowed to call kfree_skb() from hardware interrupt
      context or with interrupts being disabled. So add all skb to
      a tmp list, then free them after spin_unlock_irqrestore() at
      once.
      
      Fixes: 66ba215c
      
       ("neigh: fix possible DoS due to net iface start/stop loop")
      Suggested-by: default avatarDenis V. Lunev <den@openvz.org>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      00523483
    • Zhengchao Shao's avatar
      net/af_packet: check len when min_header_len equals to 0 · 25a80e72
      Zhengchao Shao authored
      commit dc633700
      
       upstream.
      
      User can use AF_PACKET socket to send packets with the length of 0.
      When min_header_len equals to 0, packet_snd will call __dev_queue_xmit
      to send packets, and sock->type can be any type.
      
      Reported-by: default avatar <syzbot+5ea725c25d06fb9114c4@syzkaller.appspotmail.com>
      Fixes: fd189422
      
       ("bpf: Don't redirect packets with invalid pkt_len")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      25a80e72
    • Pavel Begunkov's avatar
      io_uring: disable polling pollfree files · fc78b2fc
      Pavel Begunkov authored
      Older kernels lack io_uring POLLFREE handling. As only affected files
      are signalfd and android binder the safest option would be to disable
      polling those files via io_uring and hope there are no users.
      
      Fixes: 221c5eb2
      
       ("io_uring: add support for IORING_OP_POLL")
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc78b2fc
    • Kuniyuki Iwashima's avatar
      kprobes: don't call disarm_kprobe() for disabled kprobes · b474ff1b
      Kuniyuki Iwashima authored
      commit 9c80e799 upstream.
      
      The assumption in __disable_kprobe() is wrong, and it could try to disarm
      an already disarmed kprobe and fire the WARN_ONCE() below. [0]  We can
      easily reproduce this issue.
      
      1. Write 0 to /sys/kernel/debug/kprobes/enabled.
      
        # echo 0 > /sys/kernel/debug/kprobes/enabled
      
      2. Run execsnoop.  At this time, one kprobe is disabled.
      
        # /usr/share/bcc/tools/execsnoop &
        [1] 2460
        PCOMM            PID    PPID   RET ARGS
      
        # cat /sys/kernel/debug/kprobes/list
        ffffffff91345650  r  __x64_sys_execve+0x0    [FTRACE]
        ffffffff91345650  k  __x64_sys_execve+0x0    [DISABLED][FTRACE]
      
      3. Write 1 to /sys/kernel/debug/kprobes/enabled, which changes
         kprobes_all_disarmed to false but does not arm the disabled kprobe.
      
        # echo 1 > /sys/kernel/debug/kprobes/enabled
      
        # cat /sys/kernel/debug/kprobes/list
        ffffffff91345650  r  __x64_sys_execve+0x0    [FTRACE]
        ffffffff91345650  k  __x64_sys_execve+0x0    [DISABLED][FTRACE]
      
      4. Kill execsnoop, when __disable_kprobe() calls disarm_kprobe() for the
         disabled kprobe and hits the WARN_ONCE() in __disarm_kprobe_ftrace().
      
        # fg
        /usr/share/bcc/tools/execsnoop
        ^C
      
      Actually, WARN_ONCE() is fired twice, and __unregister_kprobe_top() misses
      some cleanups and leaves the aggregated kprobe in the hash table.  Then,
      __unregister_trace_kprobe() initialises tk->rp.kp.list and creates an
      infinite loop like this.
      
        aggregated kprobe.list -> kprobe.list -.
                                           ^    |
                                           '.__.'
      
      In this situation, these commands fall into the infinite loop and result
      in RCU stall or soft lockup.
      
        cat /sys/kernel/debug/kprobes/list : show_kprobe_addr() enters into the
                                             infinite loop with RCU.
      
        /usr/share/bcc/tools/execsnoop : warn_kprobe_rereg() holds kprobe_mutex,
                                         and __get_valid_kprobe() is stuck in
      				   the loop.
      
      To avoid the issue, make sure we don't call disarm_kprobe() for disabled
      kprobes.
      
      [0]
      Failed to disarm kprobe-ftrace at __x64_sys_execve+0x0/0x40 (error -2)
      WARNING: CPU: 6 PID: 2460 at kernel/kprobes.c:1130 __disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129)
      Modules linked in: ena
      CPU: 6 PID: 2460 Comm: execsnoop Not tainted 5.19.0+ #28
      Hardware name: Amazon EC2 c5.2xlarge/, BIOS 1.0 10/16/2017
      RIP: 0010:__disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129)
      Code: 24 8b 02 eb c1 80 3d c4 83 f2 01 00 75 d4 48 8b 75 00 89 c2 48 c7 c7 90 fa 0f 92 89 04 24 c6 05 ab 83 01 e8 e4 94 f0 ff <0f> 0b 8b 04 24 eb b1 89 c6 48 c7 c7 60 fa 0f 92 89 04 24 e8 cc 94
      RSP: 0018:ffff9e6ec154bd98 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: ffffffff930f7b00 RCX: 0000000000000001
      RDX: 0000000080000001 RSI: ffffffff921461c5 RDI: 00000000ffffffff
      RBP: ffff89c504286da8 R08: 0000000000000000 R09: c0000000fffeffff
      R10: 0000000000000000 R11: ffff9e6ec154bc28 R12: ffff89c502394e40
      R13: ffff89c502394c00 R14: ffff9e6ec154bc00 R15: 0000000000000000
      FS:  00007fe800398740(0000) GS:ffff89c812d80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000c00057f010 CR3: 0000000103b54006 CR4: 00000000007706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
      <TASK>
       __disable_kprobe (kernel/kprobes.c:1716)
       disable_kprobe (kernel/kprobes.c:2392)
       __disable_trace_kprobe (kernel/trace/trace_kprobe.c:340)
       disable_trace_kprobe (kernel/trace/trace_kprobe.c:429)
       perf_trace_event_unreg.isra.2 (./include/linux/tracepoint.h:93 kernel/trace/trace_event_perf.c:168)
       perf_kprobe_destroy (kernel/trace/trace_event_perf.c:295)
       _free_event (kernel/events/core.c:4971)
       perf_event_release_kernel (kernel/events/core.c:5176)
       perf_release (kernel/events/core.c:5186)
       __fput (fs/file_table.c:321)
       task_work_run (./include/linux/sched.h:2056 (discriminator 1) kernel/task_work.c:179 (discriminator 1))
       exit_to_user_mode_prepare (./include/linux/resume_user_mode.h:49 kernel/entry/common.c:169 kernel/entry/common.c:201)
       syscall_exit_to_user_mode (./arch/x86/include/asm/jump_label.h:55 ./arch/x86/include/asm/nospec-branch.h:384 ./arch/x86/include/asm/entry-common.h:94 kernel/entry/common.c:133 kernel/entry/common.c:296)
       do_syscall_64 (arch/x86/entry/common.c:87)
       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      RIP: 0033:0x7fe7ff210654
      Code: 15 79 89 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb be 0f 1f 00 8b 05 9a cd 20 00 48 63 ff 85 c0 75 11 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3a f3 c3 48 83 ec 18 48 89 7c 24 08 e8 34 fc
      RSP: 002b:00007ffdbd1d3538 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
      RAX: 0000000000000000 RBX: 0000000000000008 RCX: 00007fe7ff210654
      RDX: 0000000000000000 RSI: 0000000000002401 RDI: 0000000000000008
      RBP: 0000000000000000 R08: 94ae31d6fda838a4 R0900007fe8001c9d30
      R10: 00007ffdbd1d34b0 R11: 0000000000000246 R12: 00007ffdbd1d3600
      R13: 0000000000000000 R14: fffffffffffffffc R15: 00007ffdbd1d3560
      </TASK>
      
      Link: https://lkml.kernel.org/r/20220813020509.90805-1-kuniyu@amazon.com
      Fixes: 69d54b91
      
       ("kprobes: makes kprobes/enabled works correctly for optimized kprobes.")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reported-by: default avatarAyushman Dutta <ayudutta@amazon.com>
      Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
      Cc: Kuniyuki Iwashima <kuni1840@gmail.com>
      Cc: Ayushman Dutta <ayudutta@amazon.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b474ff1b
    • Andrei Vagin's avatar
      lib/vdso: Mark do_hres() and do_coarse() as __always_inline · 6fbc49b7
      Andrei Vagin authored
      [ Upstream commit c966533f
      
       ]
      
      Performance numbers for Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
      (more clock_gettime() cycles - the better):
      
      clock            | before     | after      | diff
      ----------------------------------------------------------
      monotonic        |  153222105 |  166775025 | 8.8%
      monotonic-coarse |  671557054 |  691513017 | 3.0%
      monotonic-raw    |  147116067 |  161057395 | 9.5%
      boottime         |  153446224 |  166962668 | 9.1%
      
      The improvement for arm64 for monotonic and boottime is around 3.5%.
      
      clock            | before     | after      | diff
      ==================================================
      monotonic          17326692     17951770     3.6%
      monotonic-coarse   43624027     44215292     1.3%
      monotonic-raw      17541809     17554932     0.1%
      boottime           17334982     17954361     3.5%
      
      [ tglx: Avoid the goto ]
      
      Signed-off-by: default avatarAndrei Vagin <avagin@gmail.com>
      Signed-off-by: default avatarDmitry Safonov <dima@arista.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20191112012724.250792-3-dima@arista.com
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6fbc49b7
    • Christophe Leroy's avatar
      lib/vdso: Let do_coarse() return 0 to simplify the callsite · 2161d3c1
      Christophe Leroy authored
      [ Upstream commit 8463cf80
      
       ]
      
      do_coarse() is similar to do_hres() except that it never fails.
      
      Change its type to int instead of void and let it always return success (0)
      to simplify the call site.
      
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/21e8afa38c02ca8672c2690307383507fe63b454.1577111367.git.christophe.leroy@c-s.fr
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2161d3c1
    • Josef Bacik's avatar
      btrfs: tree-checker: check for overlapping extent items · 06ebb40b
      Josef Bacik authored
      [ Upstream commit 899b7f69
      
       ]
      
      We're seeing a weird problem in production where we have overlapping
      extent items in the extent tree.  It's unclear where these are coming
      from, and in debugging we realized there's no check in the tree checker
      for this sort of problem.  Add a check to the tree-checker to make sure
      that the extents do not overlap each other.
      
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      06ebb40b
    • Geert Uytterhoeven's avatar
      netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to y · 63c79058
      Geert Uytterhoeven authored
      [ Upstream commit aa5762c3 ]
      
      NF_CONNTRACK_PROCFS was marked obsolete in commit 54b07dca
      
      
      ("netfilter: provide config option to disable ancient procfs parts") in
      v3.3.
      
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      63c79058
    • Ilya Bakoulin's avatar
      drm/amd/display: Fix pixel clock programming · 5c5cd52a
      Ilya Bakoulin authored
      [ Upstream commit 04fb918b
      
       ]
      
      [Why]
      Some pixel clock values could cause HDMI TMDS SSCPs to be misaligned
      between different HDMI lanes when using YCbCr420 10-bit pixel format.
      
      BIOS functions for transmitter/encoder control take pixel clock in kHz
      increments, whereas the function for setting the pixel clock is in 100Hz
      increments. Setting pixel clock to a value that is not on a kHz boundary
      will cause the issue.
      
      [How]
      Round pixel clock down to nearest kHz in 10/12-bpc cases.
      
      Reviewed-by: default avatarAric Cyr <Aric.Cyr@amd.com>
      Acked-by: default avatarBrian Chang <Brian.Chang@amd.com>
      Signed-off-by: default avatarIlya Bakoulin <Ilya.Bakoulin@amd.com>
      Tested-by: default avatarDaniel Wheeler <daniel.wheeler@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5c5cd52a
    • Juergen Gross's avatar
      s390/hypfs: avoid error message under KVM · c570198c
      Juergen Gross authored
      [ Upstream commit 7b6670b0
      
       ]
      
      When booting under KVM the following error messages are issued:
      
      hypfs.7f5705: The hardware system does not support hypfs
      hypfs.7a79f0: Initialization of hypfs failed with rc=-61
      
      Demote the severity of first message from "error" to "info" and issue
      the second message only in other error cases.
      
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Acked-by: default avatarChristian Borntraeger <borntraeger@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220620094534.18967-1-jgross@suse.com
      [arch/s390/hypfs/hypfs_diag.c changed description]
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c570198c
    • Denis V. Lunev's avatar
      neigh: fix possible DoS due to net iface start/stop loop · 51be9dd3
      Denis V. Lunev authored
      [ Upstream commit 66ba215c
      
       ]
      
      Normal processing of ARP request (usually this is Ethernet broadcast
      packet) coming to the host is looking like the following:
      * the packet comes to arp_process() call and is passed through routing
        procedure
      * the request is put into the queue using pneigh_enqueue() if
        corresponding ARP record is not local (common case for container
        records on the host)
      * the request is processed by timer (within 80 jiffies by default) and
        ARP reply is sent from the same arp_process() using
        NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED condition (flag is set inside
        pneigh_enqueue())
      
      And here the problem comes. Linux kernel calls pneigh_queue_purge()
      which destroys the whole queue of ARP requests on ANY network interface
      start/stop event through __neigh_ifdown().
      
      This is actually not a problem within the original world as network
      interface start/stop was accessible to the host 'root' only, which
      could do more destructive things. But the world is changed and there
      are Linux containers available. Here container 'root' has an access
      to this API and could be considered as untrusted user in the hosting
      (container's) world.
      
      Thus there is an attack vector to other containers on node when
      container's root will endlessly start/stop interfaces. We have observed
      similar situation on a real production node when docker container was
      doing such activity and thus other containers on the node become not
      accessible.
      
      The patch proposed doing very simple thing. It drops only packets from
      the same namespace in the pneigh_queue_purge() where network interface
      state change is detected. This is enough to prevent the problem for the
      whole node preserving original semantics of the code.
      
      v2:
      	- do del_timer_sync() if queue is empty after pneigh_queue_purge()
      v3:
      	- rebase to net tree
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@kernel.org>
      Cc: Yajun Deng <yajun.deng@linux.dev>
      Cc: Roopa Prabhu <roopa@nvidia.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
      Cc: Konstantin Khorenko <khorenko@virtuozzo.com>
      Cc: kernel@openvz.org
      Cc: devel@openvz.org
      Investigated-by: default avatarAlexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
      Signed-off-by: default avatarDenis V. Lunev <den@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      51be9dd3
    • Fudong Wang's avatar
      drm/amd/display: clear optc underflow before turn off odm clock · 814b756d
      Fudong Wang authored
      [ Upstream commit b2a93490
      
       ]
      
      [Why]
      After ODM clock off, optc underflow bit will be kept there always and clear not work.
      We need to clear that before clock off.
      
      [How]
      Clear that if have when clock off.
      
      Reviewed-by: default avatarAlvin Lee <alvin.lee2@amd.com>
      Acked-by: default avatarTom Chung <chiahsuan.chung@amd.com>
      Signed-off-by: default avatarFudong Wang <Fudong.Wang@amd.com>
      Tested-by: default avatarDaniel Wheeler <daniel.wheeler@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      814b756d
    • Josip Pavic's avatar
      drm/amd/display: Avoid MPC infinite loop · a06e4eb6
      Josip Pavic authored
      [ Upstream commit 8de297dc
      
       ]
      
      [why]
      In some cases MPC tree bottom pipe ends up point to itself.  This causes
      iterating from top to bottom to hang the system in an infinite loop.
      
      [how]
      When looping to next MPC bottom pipe, check that the pointer is not same
      as current to avoid infinite loop.
      
      Reviewed-by: default avatarJosip Pavic <Josip.Pavic@amd.com>
      Reviewed-by: default avatarJun Lei <Jun.Lei@amd.com>
      Acked-by: default avatarAlex Hung <alex.hung@amd.com>
      Signed-off-by: default avatarAric Cyr <aric.cyr@amd.com>
      Tested-by: default avatarDaniel Wheeler <daniel.wheeler@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a06e4eb6
    • Filipe Manana's avatar
      btrfs: unify lookup return value when dir entry is missing · 2608885a
      Filipe Manana authored
      [ Upstream commit 8dcbc261
      
       ]
      
      btrfs_lookup_dir_index_item() and btrfs_lookup_dir_item() lookup for dir
      entries and both are used during log replay or when updating a log tree
      during an unlink.
      
      However when the dir item does not exists, btrfs_lookup_dir_item() returns
      NULL while btrfs_lookup_dir_index_item() returns PTR_ERR(-ENOENT), and if
      the dir item exists but there is no matching entry for a given name or
      index, both return NULL. This makes the call sites during log replay to
      be more verbose than necessary and it makes it easy to miss this slight
      difference. Since we don't need to distinguish between those two cases,
      make btrfs_lookup_dir_index_item() always return NULL when there is no
      matching directory entry - either because there isn't any dir entry or
      because there is one but it does not match the given name and index.
      
      Also rename the argument 'objectid' of btrfs_lookup_dir_index_item() to
      'index' since it is supposed to match an index number, and the name
      'objectid' is not very good because it can easily be confused with an
      inode number (like the inode number a dir entry points to).
      
      CC: stable@vger.kernel.org # 4.14+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2608885a
    • Filipe Manana's avatar
      btrfs: do not pin logs too early during renames · 1fe3375c
      Filipe Manana authored
      [ Upstream commit bd54f381
      
       ]
      
      During renames we pin the logs of the roots a bit too early, before the
      calls to btrfs_insert_inode_ref(). We can pin the logs after those calls,
      since those will not change anything in a log tree.
      
      In a scenario where we have multiple and diverse filesystem operations
      running in parallel, those calls can take a significant amount of time,
      due to lock contention on extent buffers, and delay log commits from other
      tasks for longer than necessary.
      
      So just pin logs after calls to btrfs_insert_inode_ref() and right before
      the first operation that can update a log tree.
      
      The following script that uses dbench was used for testing:
      
        $ cat dbench-test.sh
        #!/bin/bash
      
        DEV=/dev/nvme0n1
        MNT=/mnt/nvme0n1
        MOUNT_OPTIONS="-o ssd"
        MKFS_OPTIONS="-m single -d single"
      
        echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
      
        umount $DEV &> /dev/null
        mkfs.btrfs -f $MKFS_OPTIONS $DEV
        mount $MOUNT_OPTIONS $DEV $MNT
      
        dbench -D $MNT -t 120 16
      
        umount $MNT
      
      The tests were run on a machine with 12 cores, 64G of RAN, a NVMe device
      and using a non-debug kernel config (Debian's default config).
      
      The results compare a branch without this patch and without the previous
      patch in the series, that has the subject:
      
       "btrfs: eliminate some false positives when checking if inode was logged"
      
      Versus the same branch with these two patches applied.
      
      dbench with 8 clients, results before:
      
       Operation      Count    AvgLat    MaxLat
       ----------------------------------------
       NTCreateX    4391359     0.009   249.745
       Close        3225882     0.001     3.243
       Rename        185953     0.065   240.643
       Unlink        886669     0.049   249.906
       Deltree          112     2.455   217.433
       Mkdir             56     0.002     0.004
       Qpathinfo    3980281     0.004     3.109
       Qfileinfo     697579     0.001     0.187
       Qfsinfo       729780     0.002     2.424
       Sfileinfo     357764     0.004     1.415
       Find         1538861     0.016     4.863
       WriteX       2189666     0.010     3.327
       ReadX        6883443     0.002     0.729
       LockX          14298     0.002     0.073
       UnlockX        14298     0.001     0.042
       Flush         307777     2.447   303.663
      
      Throughput 1149.6 MB/sec  8 clients  8 procs  max_latency=303.666 ms
      
      dbench with 8 clients, results after:
      
       Operation      Count    AvgLat    MaxLat
       ----------------------------------------
       NTCreateX    4269920     0.009   213.532
       Close        3136653     0.001     0.690
       Rename        180805     0.082   213.858
       Unlink        862189     0.050   172.893
       Deltree          112     2.998   218.328
       Mkdir             56     0.002     0.003
       Qpathinfo    3870158     0.004     5.072
       Qfileinfo     678375     0.001     0.194
       Qfsinfo       709604     0.002     0.485
       Sfileinfo     347850     0.004     1.304
       Find         1496310     0.017     5.504
       WriteX       2129613     0.010     2.882
       ReadX        6693066     0.002     1.517
       LockX          13902     0.002     0.075
       UnlockX        13902     0.001     0.055
       Flush         299276     2.511   220.189
      
      Throughput 1187.33 MB/sec  8 clients  8 procs  max_latency=220.194 ms
      
      +3.2% throughput, -31.8% max latency
      
      dbench with 16 clients, results before:
      
       Operation      Count    AvgLat    MaxLat
       ----------------------------------------
       NTCreateX    5978334     0.028   156.507
       Close        4391598     0.001     1.345
       Rename        253136     0.241   155.057
       Unlink       1207220     0.182   257.344
       Deltree          160     6.123    36.277
       Mkdir             80     0.003     0.005
       Qpathinfo    5418817     0.012     6.867
       Qfileinfo     949929     0.001     0.941
       Qfsinfo       993560     0.002     1.386
       Sfileinfo     486904     0.004     2.829
       Find         2095088     0.059     8.164
       WriteX       2982319     0.017     9.029
       ReadX        9371484     0.002     4.052
       LockX          19470     0.002     0.461
       UnlockX        19470     0.001     0.990
       Flush         418936     2.740   347.902
      
      Throughput 1495.31 MB/sec  16 clients  16 procs  max_latency=347.909 ms
      
      dbench with 16 clients, results after:
      
       Operation      Count    AvgLat    MaxLat
       ----------------------------------------
       NTCreateX    5711833     0.029   131.240
       Close        4195897     0.001     1.732
       Rename        241849     0.204   147.831
       Unlink       1153341     0.184   231.322
       Deltree          160     6.086    30.198
       Mkdir             80     0.003     0.021
       Qpathinfo    5177011     0.012     7.150
       Qfileinfo     907768     0.001     0.793
       Qfsinfo       949205     0.002     1.431
       Sfileinfo     465317     0.004     2.454
       Find         2001541     0.058     7.819
       WriteX       2850661     0.017     9.110
       ReadX        8952289     0.002     3.991
       LockX          18596     0.002     0.655
       UnlockX        18596     0.001     0.179
       Flush         400342     2.879   293.607
      
      Throughput 1565.73 MB/sec  16 clients  16 procs  max_latency=293.611 ms
      
      +4.6% throughput, -16.9% max latency
      
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1fe3375c