Skip to content
  1. Jan 18, 2024
  2. Jan 17, 2024
  3. Jan 16, 2024
  4. Jan 12, 2024
  5. Jan 11, 2024
    • Greg Kroah-Hartman's avatar
    • Amit Pundir's avatar
      Revert "interconnect: qcom: sm8250: Enable sync_state" · 2dbe25ae
      Amit Pundir authored
      This reverts commit 3637f6bd which is
      commit bfc7db1c
      
       upstream.
      
      This resulted in boot regression on RB5 (sm8250), causing the device
      to hard crash into USB crash dump mode everytime.
      
      Signed-off-by: default avatarAmit Pundir <amit.pundir@linaro.org>
      Link: https://lkft.validation.linaro.org/scheduler/job/7151629#L4239
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2dbe25ae
    • Kees Cook's avatar
      smb3: Replace smb2pdu 1-element arrays with flex-arrays · f73a374c
      Kees Cook authored
      commit eb3e28c1 upstream.
      
      The kernel is globally removing the ambiguous 0-length and 1-element
      arrays in favor of flexible arrays, so that we can gain both compile-time
      and run-time array bounds checking[1].
      
      Replace the trailing 1-element array with a flexible array in the
      following structures:
      
      	struct smb2_err_rsp
      	struct smb2_tree_connect_req
      	struct smb2_negotiate_rsp
      	struct smb2_sess_setup_req
      	struct smb2_sess_setup_rsp
      	struct smb2_read_req
      	struct smb2_read_rsp
      	struct smb2_write_req
      	struct smb2_write_rsp
      	struct smb2_query_directory_req
      	struct smb2_query_directory_rsp
      	struct smb2_set_info_req
      	struct smb2_change_notify_rsp
      	struct smb2_create_rsp
      	struct smb2_query_info_req
      	struct smb2_query_info_rsp
      
      Replace the trailing 1-element array with a flexible array, but leave
      the existing structure padding:
      
      	struct smb2_file_all_info
      	struct smb2_lock_req
      
      Adjust all related size calculations to match the changes to sizeof().
      
      No machine code output or .data section differences are produced after
      these changes.
      
      [1] For lots of details, see both:
          https://docs.kernel.org/process/deprecated.html#zero-length-and-one-element-arrays
          https://people.kernel.org/kees/bounded-flexible-arrays-in-c
      
      
      
      Cc: Steve French <sfrench@samba.org>
      Cc: Paulo Alcantara <pc@cjr.nz>
      Cc: Ronnie Sahlberg <lsahlber@redhat.com>
      Cc: Shyam Prasad N <sprasad@microsoft.com>
      Cc: Tom Talpey <tom@talpey.com>
      Cc: Namjae Jeon <linkinjeon@kernel.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: linux-cifs@vger.kernel.org
      Cc: samba-technical@lists.samba.org
      Reviewed-by: default avatarNamjae Jeon <linkinjeon@kernel.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f73a374c
    • Bryan O'Donoghue's avatar
      media: qcom: camss: Comment CSID dt_id field · ec162546
      Bryan O'Donoghue authored
      commit f910d3ba upstream.
      
      Digging into the documentation we find that the DT_ID bitfield is used to
      map the six bit DT to a two bit ID code. This value is concatenated to the
      VC bitfield to create a CID value. DT_ID is the two least significant bits
      of CID and VC the most significant bits.
      
      Originally we set dt_id = vc * 4 in and then subsequently set dt_id = vc.
      
      commit 3c4ed72a ("media: camss: sm8250: Virtual channels for CSID")
      silently fixed the multiplication by four which would give a better
      value for the generated CID without mentioning what was being done or why.
      
      Next up I haplessly changed the value back to "dt_id = vc * 4" since there
      didn't appear to be any logic behind it.
      
      Hans asked what the change was for and I honestly couldn't remember the
      provenance of it, so I dug in.
      
      Link: https://lore.kernel.org/linux-arm-msm/edd4bf9b-0e1b-883c-1a4d-50f4102c3924@xs4all.nl/
      
      
      
      Add a comment so the next hapless programmer doesn't make this same
      mistake.
      
      Signed-off-by: default avatarBryan O'Donoghue <bryan.odonoghue@linaro.org>
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ec162546
    • John Fastabend's avatar
      bpf: syzkaller found null ptr deref in unix_bpf proto add · a5c3f2b4
      John Fastabend authored
      commit 8d665064
      
       upstream.
      
      I added logic to track the sock pair for stream_unix sockets so that we
      ensure lifetime of the sock matches the time a sockmap could reference
      the sock (see fixes tag). I forgot though that we allow af_unix unconnected
      sockets into a sock{map|hash} map.
      
      This is problematic because previous fixed expected sk_pair() to exist
      and did not NULL check it. Because unconnected sockets have a NULL
      sk_pair this resulted in the NULL ptr dereference found by syzkaller.
      
      BUG: KASAN: null-ptr-deref in unix_stream_bpf_update_proto+0x72/0x430 net/unix/unix_bpf.c:171
      Write of size 4 at addr 0000000000000080 by task syz-executor360/5073
      Call Trace:
       <TASK>
       ...
       sock_hold include/net/sock.h:777 [inline]
       unix_stream_bpf_update_proto+0x72/0x430 net/unix/unix_bpf.c:171
       sock_map_init_proto net/core/sock_map.c:190 [inline]
       sock_map_link+0xb87/0x1100 net/core/sock_map.c:294
       sock_map_update_common+0xf6/0x870 net/core/sock_map.c:483
       sock_map_update_elem_sys+0x5b6/0x640 net/core/sock_map.c:577
       bpf_map_update_value+0x3af/0x820 kernel/bpf/syscall.c:167
      
      We considered just checking for the null ptr and skipping taking a ref
      on the NULL peer sock. But, if the socket is then connected() after
      being added to the sockmap we can cause the original issue again. So
      instead this patch blocks adding af_unix sockets that are not in the
      ESTABLISHED state.
      
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatar <syzbot+e8030702aefd3444fb9e@syzkaller.appspotmail.com>
      Fixes: 8866730a
      
       ("bpf, sockmap: af_unix stream sockets need to hold ref for pair sock")
      Acked-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/20231201180139.328529-2-john.fastabend@gmail.com
      
      
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a5c3f2b4
    • Yonghong Song's avatar
      bpf: Fix a verifier bug due to incorrect branch offset comparison with cpu=v4 · 15db6829
      Yonghong Song authored
      commit dfce9cb3 upstream.
      
      Bpf cpu=v4 support is introduced in [1] and Commit 4cd58e9a
      ("bpf: Support new 32bit offset jmp instruction") added support for new
      32bit offset jmp instruction. Unfortunately, in function
      bpf_adj_delta_to_off(), for new branch insn with 32bit offset, the offset
      (plus/minor a small delta) compares to 16-bit offset bound
      [S16_MIN, S16_MAX], which caused the following verification failure:
        $ ./test_progs-cpuv4 -t verif_scale_pyperf180
        ...
        insn 10 cannot be patched due to 16-bit range
        ...
        libbpf: failed to load object 'pyperf180.bpf.o'
        scale_test:FAIL:expect_success unexpected error: -12 (errno 12)
        #405     verif_scale_pyperf180:FAIL
      
      Note that due to recent llvm18 development, the patch [2] (already applied
      in bpf-next) needs to be applied to bpf tree for testing purpose.
      
      The fix is rather simple. For 32bit offset branch insn, the adjusted
      offset compares to [S32_MIN, S32_MAX] and then verification succeeded.
      
        [1] https://lore.kernel.org/all/20230728011143.3710005-1-yonghong.song@linux.dev
        [2] https://lore.kernel.org/bpf/20231110193644.3130906-1-yonghong.song@linux.dev
      
      Fixes: 4cd58e9a
      
       ("bpf: Support new 32bit offset jmp instruction")
      Signed-off-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20231201024640.3417057-1-yonghong.song@linux.dev
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      15db6829
    • Vlad Buslov's avatar
      net/sched: act_ct: Always fill offloading tuple iifidx · 7cbdf36e
      Vlad Buslov authored
      commit 9bc64bd0 upstream.
      
      Referenced commit doesn't always set iifidx when offloading the flow to
      hardware. Fix the following cases:
      
      - nf_conn_act_ct_ext_fill() is called before extension is created with
      nf_conn_act_ct_ext_add() in tcf_ct_act(). This can cause rule offload with
      unspecified iifidx when connection is offloaded after only single
      original-direction packet has been processed by tc data path. Always fill
      the new nf_conn_act_ct_ext instance after creating it in
      nf_conn_act_ct_ext_add().
      
      - Offloading of unidirectional UDP NEW connections is now supported, but ct
      flow iifidx field is not updated when connection is promoted to
      bidirectional which can result reply-direction iifidx to be zero when
      refreshing the connection. Fill in the extension and update flow iifidx
      before calling flow_offload_refresh().
      
      Fixes: 9795ded7
      
       ("net/sched: act_ct: Fill offloading tuple iifidx")
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Fixes: 6a9bad00 ("net/sched: act_ct: offload UDP NEW connections")
      Link: https://lore.kernel.org/r/20231103151410.764271-1-vladbu@nvidia.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7cbdf36e
    • Vlad Buslov's avatar
      net/sched: act_ct: additional checks for outdated flows · 2be4e8ac
      Vlad Buslov authored
      commit a63b6622 upstream.
      
      Current nf_flow_is_outdated() implementation considers any flow table flow
      which state diverged from its underlying CT connection status for teardown
      which can be problematic in the following cases:
      
      - Flow has never been offloaded to hardware in the first place either
      because flow table has hardware offload disabled (flag
      NF_FLOWTABLE_HW_OFFLOAD is not set) or because it is still pending on 'add'
      workqueue to be offloaded for the first time. The former is incorrect, the
      later generates excessive deletions and additions of flows.
      
      - Flow is already pending to be updated on the workqueue. Tearing down such
      flows will also generate excessive removals from the flow table, especially
      on highly loaded system where the latency to re-offload a flow via 'add'
      workqueue can be quite high.
      
      When considering a flow for teardown as outdated verify that it is both
      offloaded to hardware and doesn't have any pending updates.
      
      Fixes: 41f2c7c3
      
       ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple")
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2be4e8ac
    • Chao Yu's avatar
      f2fs: compress: fix to assign compress_level for lz4 correctly · 87318b7e
      Chao Yu authored
      commit 091a4dfb upstream.
      
      After remount, F2FS_OPTION().compress_level was assgin to
      LZ4HC_DEFAULT_CLEVEL incorrectly, result in lz4hc:9 was enabled, fix it.
      
      1. mount /dev/vdb
      /dev/vdb on /mnt/f2fs type f2fs (...,compress_algorithm=lz4,compress_log_size=2,...)
      2. mount -t f2fs -o remount,compress_log_size=3 /mnt/f2fs/
      3. mount|grep f2fs
      /dev/vdb on /mnt/f2fs type f2fs (...,compress_algorithm=lz4:9,compress_log_size=3,...)
      
      Fixes: 00e120b5
      
       ("f2fs: assign default compression level")
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87318b7e
    • Ingo Molnar's avatar
      genirq/affinity: Only build SMP-only helper functions on SMP kernels · 397f7190
      Ingo Molnar authored
      commit 188a5696 upstream.
      
      allnoconfig grew these new build warnings in lib/group_cpus.c:
      
        lib/group_cpus.c:247:12: warning: ‘__group_cpus_evenly’ defined but not used [-Wunused-function]
        lib/group_cpus.c:75:13: warning: ‘build_node_to_cpumask’ defined but not used [-Wunused-function]
        lib/group_cpus.c:66:13: warning: ‘free_node_to_cpumask’ defined but not used [-Wunused-function]
        lib/group_cpus.c:43:23: warning: ‘alloc_node_to_cpumask’ defined but not used [-Wunused-function]
      
      Widen the #ifdef CONFIG_SMP block to not expose unused helpers on
      non-SMP builds.
      
      Also annotate the preprocessor branches for better readability.
      
      Fixes: f7b3ea8c ("genirq/affinity: Move group_cpus_evenly() into lib/")
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20221227022905.352674-6-ming.lei@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      397f7190
    • Wenchao Chen's avatar
      mmc: sdhci-sprd: Fix eMMC init failure after hw reset · 28c9222e
      Wenchao Chen authored
      commit 8abf77c8
      
       upstream.
      
      Some eMMC devices that do not close the auto clk gate after hw reset will
      cause eMMC initialization to fail. Let's fix this.
      
      Signed-off-by: default avatarWenchao Chen <wenchao.chen@unisoc.com>
      Fixes: ff874dbc
      
       ("mmc: sdhci-sprd: Disable CLK_AUTO when the clock is less than 400K")
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20231204064934.21236-1-wenchao.chen@unisoc.com
      
      
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      28c9222e
    • Geert Uytterhoeven's avatar
      mmc: core: Cancel delayed work before releasing host · 2813a434
      Geert Uytterhoeven authored
      commit 1036f69e
      
       upstream.
      
      On RZ/Five SMARC EVK, where probing of SDHI is deferred due to probe
      deferral of the vqmmc-supply regulator:
      
          ------------[ cut here ]------------
          WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:1738 __run_timers.part.0+0x1d0/0x1e8
          Modules linked in:
          CPU: 0 PID: 0 Comm: swapper Not tainted 6.7.0-rc4 #101
          Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
          epc : __run_timers.part.0+0x1d0/0x1e8
           ra : __run_timers.part.0+0x134/0x1e8
          epc : ffffffff800771a4 ra : ffffffff80077108 sp : ffffffc800003e60
           gp : ffffffff814f5028 tp : ffffffff8140c5c0 t0 : ffffffc800000000
           t1 : 0000000000000001 t2 : ffffffff81201300 s0 : ffffffc800003f20
           s1 : ffffffd8023bc4a0 a0 : 00000000fffee6b0 a1 : 0004010000400000
           a2 : ffffffffc0000016 a3 : ffffffff81488640 a4 : ffffffc800003e60
           a5 : 0000000000000000 a6 : 0000000004000000 a7 : ffffffc800003e68
           s2 : 0000000000000122 s3 : 0000000000200000 s4 : 0000000000000000
           s5 : ffffffffffffffff s6 : ffffffff81488678 s7 : ffffffff814886c0
           s8 : ffffffff814f49c0 s9 : ffffffff81488640 s10: 0000000000000000
           s11: ffffffc800003e60 t3 : 0000000000000240 t4 : 0000000000000a52
           t5 : ffffffd8024ae018 t6 : ffffffd8024ae038
          status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
          [<ffffffff800771a4>] __run_timers.part.0+0x1d0/0x1e8
          [<ffffffff800771e0>] run_timer_softirq+0x24/0x4a
          [<ffffffff80809092>] __do_softirq+0xc6/0x1fa
          [<ffffffff80028e4c>] irq_exit_rcu+0x66/0x84
          [<ffffffff80800f7a>] handle_riscv_irq+0x40/0x4e
          [<ffffffff80808f48>] call_on_irq_stack+0x1c/0x28
          ---[ end trace 0000000000000000 ]---
      
      What happens?
      
          renesas_sdhi_probe()
          {
          	tmio_mmc_host_alloc()
      	    mmc_alloc_host()
      		INIT_DELAYED_WORK(&host->detect, mmc_rescan);
      
      	devm_request_irq(tmio_mmc_irq);
      
      	/*
      	 * After this, the interrupt handler may be invoked at any time
      	 *
      	 *  tmio_mmc_irq()
      	 *  {
      	 *	__tmio_mmc_card_detect_irq()
      	 *	    mmc_detect_change()
      	 *		_mmc_detect_change()
      	 *		    mmc_schedule_delayed_work(&host->detect, delay);
      	 *  }
      	 */
      
      	tmio_mmc_host_probe()
      	    tmio_mmc_init_ocr()
      		-EPROBE_DEFER
      
      	tmio_mmc_host_free()
      	    mmc_free_host()
          }
      
      When expire_timers() runs later, it warns because the MMC host structure
      containing the delayed work was freed, and now contains an invalid work
      function pointer.
      
      Fix this by cancelling any pending delayed work before releasing the
      MMC host structure.
      
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Tested-by: default avatarLad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/205dc4c91b47e31b64392fe2498c7a449e717b4b.1701689330.git.geert+renesas@glider.be
      
      
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2813a434
    • Jorge Ramirez-Ortiz's avatar
      mmc: rpmb: fixes pause retune on all RPMB partitions. · 575e1270
      Jorge Ramirez-Ortiz authored
      commit e7794c14 upstream.
      
      When RPMB was converted to a character device, it added support for
      multiple RPMB partitions (Commit 97548575 ("mmc: block: Convert RPMB to
      a character device").
      
      One of the changes in this commit was transforming the variable target_part
      defined in __mmc_blk_ioctl_cmd into a bitmask. This inadvertently regressed
      the validation check done in mmc_blk_part_switch_pre() and
      mmc_blk_part_switch_post(), so let's fix it.
      
      Fixes: 97548575
      
       ("mmc: block: Convert RPMB to a character device")
      Signed-off-by: default avatarJorge Ramirez-Ortiz <jorge@foundries.io>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20231201153143.1449753-1-jorge@foundries.io
      
      
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      575e1270
    • Ziyang Huang's avatar
      mmc: meson-mx-sdhc: Fix initialization frozen issue · 9c5efaa0
      Ziyang Huang authored
      commit 8c124d99 upstream.
      
      Commit 4bc31ede ("mmc: core: Set HS clock speed before sending
      HS CMD13") set HS clock (52MHz) before switching to HS mode. For this
      freq, FCLK_DIV5 will be selected and div value is 10 (reg value is 9).
      Then we set rx_clk_phase to 11 or 15 which is out of range and make
      hardware frozen. After we send command request, no irq will be
      interrupted and the mmc driver will keep to wait for request finished,
      even durning rebooting.
      
      So let's set it to Phase 90 which should work in most cases. Then let
      meson_mx_sdhc_execute_tuning() to find the accurate value for data
      transfer.
      
      If this doesn't work, maybe need to define a factor in dts.
      
      Fixes: e4bf1b09
      
       ("mmc: host: meson-mx-sdhc: new driver for the Amlogic Meson SDHC host")
      Signed-off-by: default avatarZiyang Huang <hzyitc@outlook.com>
      Tested-by: default avatarAnand Moon <linux.amoon@gmail.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/TYZPR01MB5556A3E71554A2EC08597EA4C9CDA@TYZPR01MB5556.apcprd01.prod.exchangelabs.com
      
      
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9c5efaa0
    • Alex Deucher's avatar
      drm/amd/display: add nv12 bounding box · 48e1d426
      Alex Deucher authored
      commit 7e725c20 upstream.
      
      This was included in gpu_info firmware, move it into the
      driver for consistency with other nv1x parts.
      
      Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2318
      
      
      Reviewed-by: default avatarHawking Zhang <Hawking.Zhang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      48e1d426
    • Alex Deucher's avatar
      drm/amdgpu: skip gpu_info fw loading on navi12 · 11c3510d
      Alex Deucher authored
      commit 21f6137c upstream.
      
      It's no longer required.
      
      Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2318
      
      
      Reviewed-by: default avatarHawking Zhang <Hawking.Zhang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      11c3510d
    • Jiajun Xie's avatar
      mm: fix unmap_mapping_range high bits shift bug · dafdeb7b
      Jiajun Xie authored
      commit 9eab0421 upstream.
      
      The bug happens when highest bit of holebegin is 1, suppose holebegin is
      0x8000000111111000, after shift, hba would be 0xfff8000000111111, then
      vma_interval_tree_foreach would look it up fail or leads to the wrong
      result.
      
      error call seq e.g.:
      - mmap(..., offset=0x8000000111111000)
        |- syscall(mmap, ... unsigned long, off):
           |- ksys_mmap_pgoff( ... , off >> PAGE_SHIFT);
      
        here pgoff is correctly shifted to 0x8000000111111,
        but pass 0x8000000111111000 as holebegin to unmap
        would then cause terrible result, as shown below:
      
      - unmap_mapping_range(..., loff_t const holebegin)
        |- pgoff_t hba = holebegin >> PAGE_SHIFT;
                /* hba = 0xfff8000000111111 unexpectedly */
      
      The issue happens in Heterogeneous computing, where the device(e.g.
      gpu) and host share the same virtual address space.
      
      A simple workflow pattern which hit the issue is:
              /* host */
          1. userspace first mmap a file backed VA range with specified offset.
                              e.g. (offset=0x800..., mmap return: va_a)
          2. write some data to the corresponding sys page
                               e.g. (va_a = 0xAABB)
              /* device */
          3. gpu workload touches VA, triggers gpu fault and notify the host.
              /* host */
          4. reviced gpu fault notification, then it will:
                  4.1 unmap host pages and also takes care of cpu tlb
                        (use unmap_mapping_range with offset=0x800...)
                  4.2 migrate sys page to device
                  4.3 setup device page table and resolve device fault.
              /* device */
          5. gpu workload continued, it accessed va_a and got 0xAABB.
          6. gpu workload continued, it wrote 0xBBCC to va_a.
              /* host */
          7. userspace access va_a, as expected, it will:
                  7.1 trigger cpu vm fault.
                  7.2 driver handling fault to migrate gpu local page to host.
          8. userspace then could correctly get 0xBBCC from va_a
          9. done
      
      But in step 4.1, if we hit the bug this patch mentioned, then userspace
      would never trigger cpu fault, and still get the old value: 0xAABB.
      
      Making holebegin unsigned first fixes the bug.
      
      Link: https://lkml.kernel.org/r/20231220052839.26970-1-jiajun.xie.sh@gmail.com
      
      
      Signed-off-by: default avatarJiajun Xie <jiajun.xie.sh@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dafdeb7b
    • Benjamin Bara's avatar
      i2c: core: Fix atomic xfer check for non-preempt config · 08038069
      Benjamin Bara authored
      commit a3368e11 upstream.
      
      Since commit aa49c908 ("i2c: core: Run atomic i2c xfer when
      !preemptible"), the whole reboot/power off sequence on non-preempt kernels
      is using atomic i2c xfer, as !preemptible() always results to 1.
      
      During device_shutdown(), the i2c might be used a lot and not all busses
      have implemented an atomic xfer handler. This results in a lot of
      avoidable noise, like:
      
      [   12.687169] No atomic I2C transfer handler for 'i2c-0'
      [   12.692313] WARNING: CPU: 6 PID: 275 at drivers/i2c/i2c-core.h:40 i2c_smbus_xfer+0x100/0x118
      ...
      
      Fix this by allowing non-atomic xfer when the interrupts are enabled, as
      it was before.
      
      Link: https://lore.kernel.org/r/20231222230106.73f030a5@yea
      Link: https://lore.kernel.org/r/20240102150350.3180741-1-mwalle@kernel.org
      Link: https://lore.kernel.org/linux-i2c/13271b9b-4132-46ef-abf8-2c311967bb46@mailbox.org/
      Fixes: aa49c908
      
       ("i2c: core: Run atomic i2c xfer when !preemptible")
      Cc: stable@vger.kernel.org # v5.2+
      Signed-off-by: default avatarBenjamin Bara <benjamin.bara@skidata.com>
      Tested-by: default avatarMichael Walle <mwalle@kernel.org>
      Tested-by: default avatarTor Vic <torvic9@mailbox.org>
      [wsa: removed a comment which needs more work, code is ok]
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      08038069
    • Jinghao Jia's avatar
      x86/kprobes: fix incorrect return address calculation in kprobe_emulate_call_indirect · 53b42cb3
      Jinghao Jia authored
      commit f5d03da4 upstream.
      
      kprobe_emulate_call_indirect currently uses int3_emulate_call to emulate
      indirect calls. However, int3_emulate_call always assumes the size of
      the call to be 5 bytes when calculating the return address. This is
      incorrect for register-based indirect calls in x86, which can be either
      2 or 3 bytes depending on whether REX prefix is used. At kprobe runtime,
      the incorrect return address causes control flow to land onto the wrong
      place after return -- possibly not a valid instruction boundary. This
      can lead to a panic like the following:
      
      [    7.308204][    C1] BUG: unable to handle page fault for address: 000000000002b4d8
      [    7.308883][    C1] #PF: supervisor read access in kernel mode
      [    7.309168][    C1] #PF: error_code(0x0000) - not-present page
      [    7.309461][    C1] PGD 0 P4D 0
      [    7.309652][    C1] Oops: 0000 [#1] SMP
      [    7.309929][    C1] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.7.0-rc5-trace-for-next #6
      [    7.310397][    C1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
      [    7.311068][    C1] RIP: 0010:__common_interrupt+0x52/0xc0
      [    7.311349][    C1] Code: 01 00 4d 85 f6 74 39 49 81 fe 00 f0 ff ff 77 30 4c 89 f7 4d 8b 5e 68 41 ba 91 76 d8 42 45 03 53 fc 74 02 0f 0b cc ff d3 65 48 <8b> 05 30 c7 ff 7e 65 4c 89 3d 28 c7 ff 7e 5b 41 5c 41 5e 41 5f c3
      [    7.312512][    C1] RSP: 0018:ffffc900000e0fd0 EFLAGS: 00010046
      [    7.312899][    C1] RAX: 0000000000000001 RBX: 0000000000000023 RCX: 0000000000000001
      [    7.313334][    C1] RDX: 00000000000003cd RSI: 0000000000000001 RDI: ffff888100d302a4
      [    7.313702][    C1] RBP: 0000000000000001 R08: 0ef439818636191f R09: b1621ff338a3b482
      [    7.314146][    C1] R10: ffffffff81e5127b R11: ffffffff81059810 R12: 0000000000000023
      [    7.314509][    C1] R13: 0000000000000000 R14: ffff888100d30200 R15: 0000000000000000
      [    7.314951][    C1] FS:  0000000000000000(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
      [    7.315396][    C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    7.315691][    C1] CR2: 000000000002b4d8 CR3: 0000000003028003 CR4: 0000000000370ef0
      [    7.316153][    C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [    7.316508][    C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [    7.316948][    C1] Call Trace:
      [    7.317123][    C1]  <IRQ>
      [    7.317279][    C1]  ? __die_body+0x64/0xb0
      [    7.317482][    C1]  ? page_fault_oops+0x248/0x370
      [    7.317712][    C1]  ? __wake_up+0x96/0xb0
      [    7.317964][    C1]  ? exc_page_fault+0x62/0x130
      [    7.318211][    C1]  ? asm_exc_page_fault+0x22/0x30
      [    7.318444][    C1]  ? __cfi_native_send_call_func_single_ipi+0x10/0x10
      [    7.318860][    C1]  ? default_idle+0xb/0x10
      [    7.319063][    C1]  ? __common_interrupt+0x52/0xc0
      [    7.319330][    C1]  common_interrupt+0x78/0x90
      [    7.319546][    C1]  </IRQ>
      [    7.319679][    C1]  <TASK>
      [    7.319854][    C1]  asm_common_interrupt+0x22/0x40
      [    7.320082][    C1] RIP: 0010:default_idle+0xb/0x10
      [    7.320309][    C1] Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 b8 0c 67 40 a5 66 90 0f 00 2d 09 b9 3b 00 fb f4 <fa> c3 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 b8 0c 67 40 a5 e9
      [    7.321449][    C1] RSP: 0018:ffffc9000009bee8 EFLAGS: 00000256
      [    7.321808][    C1] RAX: ffff88813bca8b68 RBX: 0000000000000001 RCX: 000000000001ef0c
      [    7.322227][    C1] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000001ef0c
      [    7.322656][    C1] RBP: ffffc9000009bef8 R08: 8000000000000000 R09: 00000000000008c2
      [    7.323083][    C1] R10: 0000000000000000 R11: ffffffff81058e70 R12: 0000000000000000
      [    7.323530][    C1] R13: ffff8881002b30c0 R14: 0000000000000000 R15: 0000000000000000
      [    7.323948][    C1]  ? __cfi_lapic_next_deadline+0x10/0x10
      [    7.324239][    C1]  default_idle_call+0x31/0x50
      [    7.324464][    C1]  do_idle+0xd3/0x240
      [    7.324690][    C1]  cpu_startup_entry+0x25/0x30
      [    7.324983][    C1]  start_secondary+0xb4/0xc0
      [    7.325217][    C1]  secondary_startup_64_no_verify+0x179/0x17b
      [    7.325498][    C1]  </TASK>
      [    7.325641][    C1] Modules linked in:
      [    7.325906][    C1] CR2: 000000000002b4d8
      [    7.326104][    C1] ---[ end trace 0000000000000000 ]---
      [    7.326354][    C1] RIP: 0010:__common_interrupt+0x52/0xc0
      [    7.326614][    C1] Code: 01 00 4d 85 f6 74 39 49 81 fe 00 f0 ff ff 77 30 4c 89 f7 4d 8b 5e 68 41 ba 91 76 d8 42 45 03 53 fc 74 02 0f 0b cc ff d3 65 48 <8b> 05 30 c7 ff 7e 65 4c 89 3d 28 c7 ff 7e 5b 41 5c 41 5e 41 5f c3
      [    7.327570][    C1] RSP: 0018:ffffc900000e0fd0 EFLAGS: 00010046
      [    7.327910][    C1] RAX: 0000000000000001 RBX: 0000000000000023 RCX: 0000000000000001
      [    7.328273][    C1] RDX: 00000000000003cd RSI: 0000000000000001 RDI: ffff888100d302a4
      [    7.328632][    C1] RBP: 0000000000000001 R08: 0ef439818636191f R09: b1621ff338a3b482
      [    7.329223][    C1] R10: ffffffff81e5127b R11: ffffffff81059810 R12: 0000000000000023
      [    7.329780][    C1] R13: 0000000000000000 R14: ffff888100d30200 R15: 0000000000000000
      [    7.330193][    C1] FS:  0000000000000000(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
      [    7.330632][    C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    7.331050][    C1] CR2: 000000000002b4d8 CR3: 0000000003028003 CR4: 0000000000370ef0
      [    7.331454][    C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [    7.331854][    C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [    7.332236][    C1] Kernel panic - not syncing: Fatal exception in interrupt
      [    7.332730][    C1] Kernel Offset: disabled
      [    7.333044][    C1] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      
      The relevant assembly code is (from objdump, faulting address
      highlighted):
      
      ffffffff8102ed9d:       41 ff d3                  call   *%r11
      ffffffff8102eda0:       65 48 <8b> 05 30 c7 ff    mov    %gs:0x7effc730(%rip),%rax
      
      The emulation incorrectly sets the return address to be ffffffff8102ed9d
      + 0x5 = ffffffff8102eda2, which is the 8b byte in the middle of the next
      mov. This in turn causes incorrect subsequent instruction decoding and
      eventually triggers the page fault above.
      
      Instead of invoking int3_emulate_call, perform push and jmp emulation
      directly in kprobe_emulate_call_indirect. At this point we can obtain
      the instruction size from p->ainsn.size so that we can calculate the
      correct return address.
      
      Link: https://lore.kernel.org/all/20240102233345.385475-1-jinghao7@illinois.edu/
      
      Fixes: 6256e668
      
       ("x86/kprobes: Use int3 instead of debug trap for single-step")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJinghao Jia <jinghao7@illinois.edu>
      Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53b42cb3
    • Takashi Sakamoto's avatar
      firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and... · d1db1ef5
      Takashi Sakamoto authored
      firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and ASM108x/VT630x PCIe cards
      
      commit ac9184fb upstream.
      
      VIA VT6306/6307/6308 provides PCI interface compliant to 1394 OHCI. When
      the hardware is combined with Asmedia ASM1083/1085 PCIe-to-PCI bus bridge,
      it appears that accesses to its 'Isochronous Cycle Timer' register (offset
      0xf0 on PCI memory space) often causes unexpected system reboot in any
      type of AMD Ryzen machine (both 0x17 and 0x19 families). It does not
      appears in the other type of machine (AMD pre-Ryzen machine, Intel
      machine, at least), or in the other OHCI 1394 hardware (e.g. Texas
      Instruments).
      
      The issue explicitly appears at a commit dcadfd7f
      
       ("firewire: core:
      use union for callback of transaction completion") added to v6.5 kernel.
      It changed 1394 OHCI driver to access to the register every time to
      dispatch local asynchronous transaction. However, the issue exists in
      older version of kernel as long as it runs in AMD Ryzen machine, since
      the access to the register is required to maintain bus time. It is not
      hard to imagine that users experience the unexpected system reboot when
      generating bus reset by plugging any devices in, or reading the register
      by time-aware application programs; e.g. audio sample processing.
      
      This commit suppresses the unexpected system reboot in the combination of
      hardware. It avoids the access itself. As a result, the software stack can
      not provide the hardware time anymore to unit drivers, userspace
      applications, and nodes in the same IEEE 1394 bus. It brings apparent
      disadvantage since time-aware application programs require it, while
      time-unaware applications are available again; e.g. sbp2.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarJiri Slaby <jirislaby@kernel.org>
      Closes: https://bugzilla.suse.com/show_bug.cgi?id=1215436
      
      
      Reported-by: default avatarMario Limonciello <mario.limonciello@amd.com>
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217994
      
      
      Reported-by: default avatarTobias Gruetzmacher <tobias-lists@23.gs>
      Closes: https://sourceforge.net/p/linux1394/mailman/message/58711901/
      Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2240973
      Closes: https://bugs.launchpad.net/linux/+bug/2043905
      Link: https://lore.kernel.org/r/20240102110150.244475-1-o-takashi@sakamocchi.jp
      
      
      Signed-off-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1db1ef5
    • Mathieu Desnoyers's avatar
      ring-buffer: Fix 32-bit rb_time_read() race with rb_time_cmpxchg() · 09a44d99
      Mathieu Desnoyers authored
      [ Upstream commit dec89008 ]
      
      The following race can cause rb_time_read() to observe a corrupted time
      stamp:
      
      rb_time_cmpxchg()
      [...]
              if (!rb_time_read_cmpxchg(&t->msb, msb, msb2))
                      return false;
              if (!rb_time_read_cmpxchg(&t->top, top, top2))
                      return false;
      <interrupted before updating bottom>
      __rb_time_read()
      [...]
              do {
                      c = local_read(&t->cnt);
                      top = local_read(&t->top);
                      bottom = local_read(&t->bottom);
                      msb = local_read(&t->msb);
              } while (c != local_read(&t->cnt));
      
              *cnt = rb_time_cnt(top);
      
              /* If top and msb counts don't match, this interrupted a write */
              if (*cnt != rb_time_cnt(msb))
                      return false;
                ^ this check fails to catch that "bottom" is still not updated.
      
      So the old "bottom" value is returned, which is wrong.
      
      Fix this by checking that all three of msb, top, and bottom 2-bit cnt
      values match.
      
      The reason to favor checking all three fields over requiring a specific
      update order for both rb_time_set() and rb_time_cmpxchg() is because
      checking all three fields is more robust to handle partial failures of
      rb_time_cmpxchg() when interrupted by nested rb_time_set().
      
      Link: https://lore.kernel.org/lkml/20231211201324.652870-1-mathieu.desnoyers@efficios.com/
      Link: https://lore.kernel.org/linux-trace-kernel/20231212193049.680122-1-mathieu.desnoyers@efficios.com
      
      Fixes: f458a145
      
       ("ring-buffer: Test last update in 32bit version of __rb_time_read()")
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      09a44d99
    • Christoph Hellwig's avatar
      btrfs: mark the len field in struct btrfs_ordered_sum as unsigned · 820a7802
      Christoph Hellwig authored
      [ Upstream commit 6e4b2479
      
       ]
      
      len can't ever be negative, so mark it as an u32 instead of int.
      
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Stable-dep-of: 9e65bfca
      
       ("btrfs: fix qgroup_free_reserved_data int overflow")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      820a7802
    • Boris Burkov's avatar
      btrfs: fix qgroup_free_reserved_data int overflow · ab220f4f
      Boris Burkov authored
      [ Upstream commit 9e65bfca
      
       ]
      
      The reserved data counter and input parameter is a u64, but we
      inadvertently accumulate it in an int. Overflowing that int results in
      freeing the wrong amount of data and breaking reserve accounting.
      
      Unfortunately, this overflow rot spreads from there, as the qgroup
      release/free functions rely on returning an int to take advantage of
      negative values for error codes.
      
      Therefore, the full fix is to return the "released" or "freed" amount by
      a u64 argument and to return 0 or negative error code via the return
      value.
      
      Most of the call sites simply ignore the return value, though some
      of them handle the error and count the returned bytes. Change all of
      them accordingly.
      
      CC: stable@vger.kernel.org # 6.1+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarBoris Burkov <boris@bur.io>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ab220f4f
    • Rakesh Babu Saladi's avatar
      octeontx2-af: Support variable number of lmacs · 0f74dde5
      Rakesh Babu Saladi authored
      [ Upstream commit f2e664ad
      
       ]
      
      Most of the code in CGX/RPM driver assumes that max lmacs per
      given MAC as always, 4 and the number of MAC blocks also as 4.
      With this assumption, the max number of interfaces supported is
      hardcoded to 16. This creates a problem as next gen CN10KB silicon
      MAC supports 8 lmacs per MAC block.
      
      This patch solves the problem by using "max lmac per MAC block"
      value from constant csrs and uses cgx_cnt_max value which is
      populated based number of MAC blocks supported by silicon.
      
      Signed-off-by: default avatarRakesh Babu Saladi <rsaladi2@marvell.com>
      Signed-off-by: default avatarHariprasad Kelam <hkelam@marvell.com>
      Signed-off-by: default avatarSunil Kovvuri Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Stable-dep-of: e307b5a8
      
       ("octeontx2-af: Fix pause frame configuration")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0f74dde5
    • Hariprasad Kelam's avatar
      octeontx2-af: Fix pause frame configuration · 7d391261
      Hariprasad Kelam authored
      [ Upstream commit e307b5a8 ]
      
      The current implementation's default Pause Forward setting is causing
      unnecessary network traffic. This patch disables Pause Forward to
      address this issue.
      
      Fixes: 1121f6b0
      
       ("octeontx2-af: Priority flow control configuration support")
      Signed-off-by: default avatarHariprasad Kelam <hkelam@marvell.com>
      Signed-off-by: default avatarSunil Kovvuri Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7d391261
    • Vlad Buslov's avatar
      net/sched: act_ct: Take per-cb reference to tcf_ct_flow_table · a29b15cc
      Vlad Buslov authored
      [ Upstream commit 125f1c7f ]
      
      The referenced change added custom cleanup code to act_ct to delete any
      callbacks registered on the parent block when deleting the
      tcf_ct_flow_table instance. However, the underlying issue is that the
      drivers don't obtain the reference to the tcf_ct_flow_table instance when
      registering callbacks which means that not only driver callbacks may still
      be on the table when deleting it but also that the driver can still have
      pointers to its internal nf_flowtable and can use it concurrently which
      results either warning in netfilter[0] or use-after-free.
      
      Fix the issue by taking a reference to the underlying struct
      tcf_ct_flow_table instance when registering the callback and release the
      reference when unregistering. Expose new API required for such reference
      counting by adding two new callbacks to nf_flowtable_type and implementing
      them for act_ct flowtable_ct type. This fixes the issue by extending the
      lifetime of nf_flowtable until all users have unregistered.
      
      [0]:
      [106170.938634] ------------[ cut here ]------------
      [106170.939111] WARNING: CPU: 21 PID: 3688 at include/net/netfilter/nf_flow_table.h:262 mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core]
      [106170.940108] Modules linked in: act_ct nf_flow_table act_mirred act_skbedit act_tunnel_key vxlan cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vhost_iotlb vdpa bonding openvswitch nsh rpcrdma rdma_ucm
      ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_regis
      try overlay mlx5_core
      [106170.943496] CPU: 21 PID: 3688 Comm: kworker/u48:0 Not tainted 6.6.0-rc7_for_upstream_min_debug_2023_11_01_13_02 #1
      [106170.944361] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [106170.945292] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core]
      [106170.945846] RIP: 0010:mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core]
      [106170.946413] Code: 89 ef 48 83 05 71 a4 14 00 01 e8 f4 06 04 e1 48 83 05 6c a4 14 00 01 48 83 c4 28 5b 5d 41 5c 41 5d c3 48 83 05 d1 8b 14 00 01 <0f> 0b 48 83 05 d7 8b 14 00 01 e9 96 fe ff ff 48 83 05 a2 90 14 00
      [106170.947924] RSP: 0018:ffff88813ff0fcb8 EFLAGS: 00010202
      [106170.948397] RAX: 0000000000000000 RBX: ffff88811eabac40 RCX: ffff88811eabad48
      [106170.949040] RDX: ffff88811eab8000 RSI: ffffffffa02cd560 RDI: 0000000000000000
      [106170.949679] RBP: ffff88811eab8000 R08: 0000000000000001 R09: ffffffffa0229700
      [106170.950317] R10: ffff888103538fc0 R11: 0000000000000001 R12: ffff88811eabad58
      [106170.950969] R13: ffff888110c01c00 R14: ffff888106b40000 R15: 0000000000000000
      [106170.951616] FS:  0000000000000000(0000) GS:ffff88885fd40000(0000) knlGS:0000000000000000
      [106170.952329] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [106170.952834] CR2: 00007f1cefd28cb0 CR3: 000000012181b006 CR4: 0000000000370ea0
      [106170.953482] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [106170.954121] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [106170.954766] Call Trace:
      [106170.955057]  <TASK>
      [106170.955315]  ? __warn+0x79/0x120
      [106170.955648]  ? mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core]
      [106170.956172]  ? report_bug+0x17c/0x190
      [106170.956537]  ? handle_bug+0x3c/0x60
      [106170.956891]  ? exc_invalid_op+0x14/0x70
      [106170.957264]  ? asm_exc_invalid_op+0x16/0x20
      [106170.957666]  ? mlx5_del_flow_rules+0x10/0x310 [mlx5_core]
      [106170.958172]  ? mlx5_tc_ct_block_flow_offload_add+0x1240/0x1240 [mlx5_core]
      [106170.958788]  ? mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core]
      [106170.959339]  ? mlx5_tc_ct_del_ft_cb+0xc6/0x2b0 [mlx5_core]
      [106170.959854]  ? mapping_remove+0x154/0x1d0 [mlx5_core]
      [106170.960342]  ? mlx5e_tc_action_miss_mapping_put+0x4f/0x80 [mlx5_core]
      [106170.960927]  mlx5_tc_ct_delete_flow+0x76/0xc0 [mlx5_core]
      [106170.961441]  mlx5_free_flow_attr_actions+0x13b/0x220 [mlx5_core]
      [106170.962001]  mlx5e_tc_del_fdb_flow+0x22c/0x3b0 [mlx5_core]
      [106170.962524]  mlx5e_tc_del_flow+0x95/0x3c0 [mlx5_core]
      [106170.963034]  mlx5e_flow_put+0x73/0xe0 [mlx5_core]
      [106170.963506]  mlx5e_put_flow_list+0x38/0x70 [mlx5_core]
      [106170.964002]  mlx5e_rep_update_flows+0xec/0x290 [mlx5_core]
      [106170.964525]  mlx5e_rep_neigh_update+0x1da/0x310 [mlx5_core]
      [106170.965056]  process_one_work+0x13a/0x2c0
      [106170.965443]  worker_thread+0x2e5/0x3f0
      [106170.965808]  ? rescuer_thread+0x410/0x410
      [106170.966192]  kthread+0xc6/0xf0
      [106170.966515]  ? kthread_complete_and_exit+0x20/0x20
      [106170.966970]  ret_from_fork+0x2d/0x50
      [106170.967332]  ? kthread_complete_and_exit+0x20/0x20
      [106170.967774]  ret_from_fork_asm+0x11/0x20
      [106170.970466]  </TASK>
      [106170.970726] ---[ end trace 0000000000000000 ]---
      
      Fixes: 77ac5e40
      
       ("net/sched: act_ct: remove and free nf_table callbacks")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a29b15cc
    • Pablo Neira Ayuso's avatar
      netfilter: flowtable: GC pushes back packets to classic path · 2bb4ecb3
      Pablo Neira Ayuso authored
      [ Upstream commit 735795f6 ]
      
      Since 41f2c7c3 ("net/sched: act_ct: Fix promotion of offloaded
      unreplied tuple"), flowtable GC pushes back flows with IPS_SEEN_REPLY
      back to classic path in every run, ie. every second. This is because of
      a new check for NF_FLOW_HW_ESTABLISHED which is specific of sched/act_ct.
      
      In Netfilter's flowtable case, NF_FLOW_HW_ESTABLISHED never gets set on
      and IPS_SEEN_REPLY is unreliable since users decide when to offload the
      flow before, such bit might be set on at a later stage.
      
      Fix it by adding a custom .gc handler that sched/act_ct can use to
      deal with its NF_FLOW_HW_ESTABLISHED bit.
      
      Fixes: 41f2c7c3
      
       ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple")
      Reported-by: default avatarVladimir Smelhaus <vl.sm@email.cz>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Stable-dep-of: 125f1c7f
      
       ("net/sched: act_ct: Take per-cb reference to tcf_ct_flow_table")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2bb4ecb3
    • Paul Blakey's avatar
      net/sched: act_ct: Fix promotion of offloaded unreplied tuple · df01de08
      Paul Blakey authored
      [ Upstream commit 41f2c7c3 ]
      
      Currently UNREPLIED and UNASSURED connections are added to the nf flow
      table. This causes the following connection packets to be processed
      by the flow table which then skips conntrack_in(), and thus such the
      connections will remain UNREPLIED and UNASSURED even if reply traffic
      is then seen. Even still, the unoffloaded reply packets are the ones
      triggering hardware update from new to established state, and if
      there aren't any to triger an update and/or previous update was
      missed, hardware can get out of sync with sw and still mark
      packets as new.
      
      Fix the above by:
      1) Not skipping conntrack_in() for UNASSURED packets, but still
         refresh for hardware, as before the cited patch.
      2) Try and force a refresh by reply-direction packets that update
         the hardware rules from new to established state.
      3) Remove any bidirectional flows that didn't failed to update in
         hardware for re-insertion as bidrectional once any new packet
         arrives.
      
      Fixes: 6a9bad00
      
       ("net/sched: act_ct: offload UDP NEW connections")
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Link: https://lore.kernel.org/r/1686313379-117663-1-git-send-email-paulb@nvidia.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Stable-dep-of: 125f1c7f
      
       ("net/sched: act_ct: Take per-cb reference to tcf_ct_flow_table")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      df01de08
    • Vlad Buslov's avatar
      net/sched: act_ct: offload UDP NEW connections · 87466a37
      Vlad Buslov authored
      [ Upstream commit 6a9bad00
      
       ]
      
      Modify the offload algorithm of UDP connections to the following:
      
      - Offload NEW connection as unidirectional.
      
      - When connection state changes to ESTABLISHED also update the hardware
      flow. However, in order to prevent act_ct from spamming offload add wq for
      every packet coming in reply direction in this state verify whether
      connection has already been updated to ESTABLISHED in the drivers. If that
      it the case, then skip flow_table and let conntrack handle such packets
      which will also allow conntrack to potentially promote the connection to
      ASSURED.
      
      - When connection state changes to ASSURED set the flow_table flow
      NF_FLOW_HW_BIDIRECTIONAL flag which will cause refresh mechanism to offload
      the reply direction.
      
      All other protocols have their offload algorithm preserved and are always
      offloaded as bidirectional.
      
      Note that this change tries to minimize the load on flow_table add
      workqueue. First, it tracks the last ctinfo that was offloaded by using new
      flow 'NF_FLOW_HW_ESTABLISHED' flag and doesn't schedule the refresh for
      reply direction packets when the offloads have already been updated with
      current ctinfo. Second, when 'add' task executes on workqueue it always
      update the offload with current flow state (by checking 'bidirectional'
      flow flag and obtaining actual ctinfo/cookie through meta action instead of
      caching any of these from the moment of scheduling the 'add' work)
      preventing the need from scheduling more updates if state changed
      concurrently while the 'add' work was pending on workqueue.
      
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Stable-dep-of: 125f1c7f
      
       ("net/sched: act_ct: Take per-cb reference to tcf_ct_flow_table")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      87466a37