Skip to content
  1. Feb 22, 2023
    • Thomas Gleixner's avatar
      alarmtimer: Prevent starvation by small intervals and SIG_IGN · 68c2db8e
      Thomas Gleixner authored
      commit d125d134 upstream.
      
      syzbot reported a RCU stall which is caused by setting up an alarmtimer
      with a very small interval and ignoring the signal. The reproducer arms the
      alarm timer with a relative expiry of 8ns and an interval of 9ns. Not a
      problem per se, but that's an issue when the signal is ignored because then
      the timer is immediately rearmed because there is no way to delay that
      rearming to the signal delivery path.  See posix_timer_fn() and commit
      58229a18 ("posix-timers: Prevent softirq starvation by small intervals
      and SIG_IGN") for details.
      
      The reproducer does not set SIG_IGN explicitely, but it sets up the timers
      signal with SIGCONT. That has the same effect as explicitely setting
      SIG_IGN for a signal as SIGCONT is ignored if there is no handler set and
      the task is not ptraced.
      
      The log clearly shows that:
      
         [pid  5102] --- SIGCONT {si_signo=SIGCONT, si_code=SI_TIMER, si_timerid=0, si_overrun=316014, si_int=0, si_ptr=NULL} ---
      
      It works because the tasks are traced and therefore the signal is queued so
      the tracer can see it, which delays the restart of the timer to the signal
      delivery path. But then the tracer is killed:
      
         [pid  5087] kill(-5102, SIGKILL <unfinished ...>
         ...
         ./strace-static-x86_64: Process 5107 detached
      
      and after it's gone the stall can be observed:
      
         syzkaller login: [   79.439102][    C0] hrtimer: interrupt took 68471 ns
         [  184.460538][    C1] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
         ...
         [  184.658237][    C1] rcu: Stack dump where RCU GP kthread last ran:
         [  184.664574][    C1] Sending NMI from CPU 1 to CPUs 0:
         [  184.669821][    C0] NMI backtrace for cpu 0
         [  184.669831][    C0] CPU: 0 PID: 5108 Comm: syz-executor192 Not tainted 6.2.0-rc6-next-20230203-syzkaller #0
         ...
         [  184.670036][    C0] Call Trace:
         [  184.670041][    C0]  <IRQ>
         [  184.670045][    C0]  alarmtimer_fired+0x327/0x670
      
      posix_timer_fn() prevents that by checking whether the interval for
      timers which have the signal ignored is smaller than a jiffie and
      artifically delay it by shifting the next expiry out by a jiffie. That's
      accurate vs. the overrun accounting, but slightly inaccurate
      vs. timer_gettimer(2).
      
      The comment in that function says what needs to be done and there was a fix
      available for the regular userspace induced SIG_IGN mechanism, but that did
      not work due to the implicit ignore for SIGCONT and similar signals. This
      needs to be worked on, but for now the only available workaround is to do
      exactly what posix_timer_fn() does:
      
      Increase the interval of self-rearming timers, which have their signal
      ignored, to at least a jiffie.
      
      Interestingly this has been fixed before via commit ff86bf0c
      
      
      ("alarmtimer: Rate limit periodic intervals") already, but that fix got
      lost in a later rework.
      
      Reported-by: default avatar <syzbot+b9564ba6e8e00694511b@syzkaller.appspotmail.com>
      Fixes: f2c45807
      
       ("alarmtimer: Switch over to generic set/get/rearm routine")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarJohn Stultz <jstultz@google.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/87k00q1no2.ffs@tglx
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68c2db8e
    • Greg Kroah-Hartman's avatar
      kvm: initialize all of the kvm_debugregs structure before sending it to userspace · 35351e30
      Greg Kroah-Hartman authored
      commit 2c10b614
      
       upstream.
      
      When calling the KVM_GET_DEBUGREGS ioctl, on some configurations, there
      might be some unitialized portions of the kvm_debugregs structure that
      could be copied to userspace.  Prevent this as is done in the other kvm
      ioctls, by setting the whole structure to 0 before copying anything into
      it.
      
      Bonus is that this reduces the lines of code as the explicit flag
      setting and reserved space zeroing out can be removed.
      
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: <x86@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: stable <stable@kernel.org>
      Reported-by: default avatarXingyuan Mo <hdthky0@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Message-Id: <20230214103304.3689213-1-gregkh@linuxfoundation.org>
      Tested-by: default avatarXingyuan Mo <hdthky0@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      35351e30
    • Pedro Tammela's avatar
      net/sched: tcindex: search key must be 16 bits · 1cbb51d8
      Pedro Tammela authored
      [ Upstream commit 42018a32 ]
      
      Syzkaller found an issue where a handle greater than 16 bits would trigger
      a null-ptr-deref in the imperfect hash area update.
      
      general protection fault, probably for non-canonical address
      0xdffffc0000000015: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x00000000000000a8-0x00000000000000af]
      CPU: 0 PID: 5070 Comm: syz-executor456 Not tainted
      6.2.0-rc7-syzkaller-00112-gc68f345b7c42 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine,
      BIOS Google 01/21/2023
      RIP: 0010:tcindex_set_parms+0x1a6a/0x2990 net/sched/cls_tcindex.c:509
      Code: 01 e9 e9 fe ff ff 4c 8b bd 28 fe ff ff e8 0e 57 7d f9 48 8d bb
      a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c
      02 00 0f 85 94 0c 00 00 48 8b 85 f8 fd ff ff 48 8b 9b a8 00
      RSP: 0018:ffffc90003d3ef88 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000015 RSI: ffffffff8803a102 RDI: 00000000000000a8
      RBP: ffffc90003d3f1d8 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000000 R12: ffff88801e2b10a8
      R13: dffffc0000000000 R14: 0000000000030000 R15: ffff888017b3be00
      FS: 00005555569af300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000056041c6d2000 CR3: 000000002bfca000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      <TASK>
      tcindex_change+0x1ea/0x320 net/sched/cls_tcindex.c:572
      tc_new_tfilter+0x96e/0x2220 net/sched/cls_api.c:2155
      rtnetlink_rcv_msg+0x959/0xca0 net/core/rtnetlink.c:6132
      netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574
      netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
      netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
      netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1942
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg+0xd3/0x120 net/socket.c:734
      ____sys_sendmsg+0x334/0x8c0 net/socket.c:2476
      ___sys_sendmsg+0x110/0x1b0 net/socket.c:2530
      __sys_sendmmsg+0x18f/0x460 net/socket.c:2616
      __do_sys_sendmmsg net/socket.c:2645 [inline]
      __se_sys_sendmmsg net/socket.c:2642 [inline]
      __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2642
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
      
      Fixes: ee059170
      
       ("net/sched: tcindex: update imperfect hash filters respecting rcu")
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1cbb51d8
    • Natalia Petrova's avatar
      i40e: Add checking for null for nlmsg_find_attr() · cd956906
      Natalia Petrova authored
      [ Upstream commit 7fa0b526 ]
      
      The result of nlmsg_find_attr() 'br_spec' is dereferenced in
      nla_for_each_nested(), but it can take NULL value in nla_find() function,
      which will result in an error.
      
      Found by Linux Verification Center (linuxtesting.org) with SVACE.
      
      Fixes: 51616018
      
       ("i40e: Add support for getlink, setlink ndo ops")
      Signed-off-by: default avatarNatalia Petrova <n.petrova@fintech.ru>
      Reviewed-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20230209172833.3596034-1-anthony.l.nguyen@intel.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cd956906
    • Pedro Tammela's avatar
      net/sched: act_ctinfo: use percpu stats · 290e7084
      Pedro Tammela authored
      [ Upstream commit 21c167aa ]
      
      The tc action act_ctinfo was using shared stats, fix it to use percpu stats
      since bstats_update() must be called with locks or with a percpu pointer argument.
      
      tdc results:
      1..12
      ok 1 c826 - Add ctinfo action with default setting
      ok 2 0286 - Add ctinfo action with dscp
      ok 3 4938 - Add ctinfo action with valid cpmark and zone
      ok 4 7593 - Add ctinfo action with drop control
      ok 5 2961 - Replace ctinfo action zone and action control
      ok 6 e567 - Delete ctinfo action with valid index
      ok 7 6a91 - Delete ctinfo action with invalid index
      ok 8 5232 - List ctinfo actions
      ok 9 7702 - Flush ctinfo actions
      ok 10 3201 - Add ctinfo action with duplicate index
      ok 11 8295 - Add ctinfo action with invalid index
      ok 12 3964 - Replace ctinfo action with invalid goto_chain control
      
      Fixes: 24ec483c
      
       ("net: sched: Introduce act_ctinfo action")
      Reviewed-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarLarysa Zaremba <larysa.zaremba@intel.com>
      Link: https://lore.kernel.org/r/20230210200824.444856-1-pctammela@mojatatu.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      290e7084
    • Baowen Zheng's avatar
      flow_offload: fill flags to action structure · 22d0cb47
      Baowen Zheng authored
      [ Upstream commit 40bd094d
      
       ]
      
      Fill flags to action structure to allow user control if
      the action should be offloaded to hardware or not.
      
      Signed-off-by: default avatarBaowen Zheng <baowen.zheng@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Stable-dep-of: 21c167aa
      
       ("net/sched: act_ctinfo: use percpu stats")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      22d0cb47
    • Matt Roper's avatar
      drm/i915/gen11: Wa_1408615072/Wa_1407596294 should be on GT list · d53360d4
      Matt Roper authored
      [ Upstream commit d5a1224a ]
      
      The UNSLICE_UNIT_LEVEL_CLKGATE register programmed by this workaround
      has 'BUS' style reset, indicating that it does not lose its value on
      engine resets.  Furthermore, this register is part of the GT forcewake
      domain rather than the RENDER domain, so it should not be impacted by
      RCS engine resets.  As such, we should implement this on the GT
      workaround list rather than an engine list.
      
      Bspec: 19219
      Fixes: 3551ff92
      
       ("drm/i915/gen11: Moving WAs to rcs_engine_wa_init()")
      Signed-off-by: default avatarMatt Roper <matthew.d.roper@intel.com>
      Reviewed-by: default avatarGustavo Sousa <gustavo.sousa@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230201222831.608281-2-matthew.d.roper@intel.com
      (cherry picked from commit 5f21dc07
      
      )
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d53360d4
    • Raviteja Goud Talla's avatar
      drm/i915/gen11: Moving WAs to icl_gt_workarounds_init() · 8174915c
      Raviteja Goud Talla authored
      [ Upstream commit 67b858dd
      
       ]
      
      Bspec page says "Reset: BUS", Accordingly moving w/a's:
      Wa_1407352427,Wa_1406680159 to proper function icl_gt_workarounds_init()
      Which will resolve guc enabling error
      
      v2:
        - Previous patch rev2 was created by email client which caused the
          Build failure, This v2 is to resolve the previous broken series
      
      Reviewed-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Signed-off-by: default avatarRaviteja Goud Talla <ravitejax.goud.talla@intel.com>
      Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211203145603.4006937-1-ravitejax.goud.talla@intel.com
      Stable-dep-of: d5a1224a
      
       ("drm/i915/gen11: Wa_1408615072/Wa_1407596294 should be on GT list")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8174915c
    • Qian Yingjin's avatar
      mm/filemap: fix page end in filemap_get_read_batch · 43dd56f7
      Qian Yingjin authored
      commit 5956592c upstream.
      
      I was running traces of the read code against an RAID storage system to
      understand why read requests were being misaligned against the underlying
      RAID strips.  I found that the page end offset calculation in
      filemap_get_read_batch() was off by one.
      
      When a read is submitted with end offset 1048575, then it calculates the
      end page for read of 256 when it should be 255.  "last_index" is the index
      of the page beyond the end of the read and it should be skipped when get a
      batch of pages for read in @filemap_get_read_batch().
      
      The below simple patch fixes the problem.  This code was introduced in
      kernel 5.12.
      
      Link: https://lkml.kernel.org/r/20230208022400.28962-1-coolqyj@163.com
      Fixes: cbd59c48
      
       ("mm/filemap: use head pages in generic_file_buffered_read")
      Signed-off-by: default avatarQian Yingjin <qian@ddn.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43dd56f7
    • Ryusuke Konishi's avatar
      nilfs2: fix underflow in second superblock position calculations · a158782b
      Ryusuke Konishi authored
      commit 99b9402a
      
       upstream.
      
      Macro NILFS_SB2_OFFSET_BYTES, which computes the position of the second
      superblock, underflows when the argument device size is less than 4096
      bytes.  Therefore, when using this macro, it is necessary to check in
      advance that the device size is not less than a lower limit, or at least
      that underflow does not occur.
      
      The current nilfs2 implementation lacks this check, causing out-of-bound
      block access when mounting devices smaller than 4096 bytes:
      
       I/O error, dev loop0, sector 36028797018963960 op 0x0:(READ) flags 0x0
       phys_seg 1 prio class 2
       NILFS (loop0): unable to read secondary superblock (blocksize = 1024)
      
      In addition, when trying to resize the filesystem to a size below 4096
      bytes, this underflow occurs in nilfs_resize_fs(), passing a huge number
      of segments to nilfs_sufile_resize(), corrupting parameters such as the
      number of segments in superblocks.  This causes excessive loop iterations
      in nilfs_sufile_resize() during a subsequent resize ioctl, causing
      semaphore ns_segctor_sem to block for a long time and hang the writer
      thread:
      
       INFO: task segctord:5067 blocked for more than 143 seconds.
            Not tainted 6.2.0-rc8-syzkaller-00015-gf6feea56f66d #0
       "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       task:segctord        state:D stack:23456 pid:5067  ppid:2
       flags:0x00004000
       Call Trace:
        <TASK>
        context_switch kernel/sched/core.c:5293 [inline]
        __schedule+0x1409/0x43f0 kernel/sched/core.c:6606
        schedule+0xc3/0x190 kernel/sched/core.c:6682
        rwsem_down_write_slowpath+0xfcf/0x14a0 kernel/locking/rwsem.c:1190
        nilfs_transaction_lock+0x25c/0x4f0 fs/nilfs2/segment.c:357
        nilfs_segctor_thread_construct fs/nilfs2/segment.c:2486 [inline]
        nilfs_segctor_thread+0x52f/0x1140 fs/nilfs2/segment.c:2570
        kthread+0x270/0x300 kernel/kthread.c:376
        ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
        </TASK>
       ...
       Call Trace:
        <TASK>
        folio_mark_accessed+0x51c/0xf00 mm/swap.c:515
        __nilfs_get_page_block fs/nilfs2/page.c:42 [inline]
        nilfs_grab_buffer+0x3d3/0x540 fs/nilfs2/page.c:61
        nilfs_mdt_submit_block+0xd7/0x8f0 fs/nilfs2/mdt.c:121
        nilfs_mdt_read_block+0xeb/0x430 fs/nilfs2/mdt.c:176
        nilfs_mdt_get_block+0x12d/0xbb0 fs/nilfs2/mdt.c:251
        nilfs_sufile_get_segment_usage_block fs/nilfs2/sufile.c:92 [inline]
        nilfs_sufile_truncate_range fs/nilfs2/sufile.c:679 [inline]
        nilfs_sufile_resize+0x7a3/0x12b0 fs/nilfs2/sufile.c:777
        nilfs_resize_fs+0x20c/0xed0 fs/nilfs2/super.c:422
        nilfs_ioctl_resize fs/nilfs2/ioctl.c:1033 [inline]
        nilfs_ioctl+0x137c/0x2440 fs/nilfs2/ioctl.c:1301
        ...
      
      This fixes these issues by inserting appropriate minimum device size
      checks or anti-underflow checks, depending on where the macro is used.
      
      Link: https://lkml.kernel.org/r/0000000000004e1dfa05f4a48e6b@google.com
      Link: https://lkml.kernel.org/r/20230214224043.24141-1-konishi.ryusuke@gmail.com
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: default avatar <syzbot+f0c4082ce5ebebdac63b@syzkaller.appspotmail.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a158782b
    • Guillaume Nault's avatar
      ipv6: Fix tcp socket connection with DSCP. · 13bc7dd5
      Guillaume Nault authored
      commit 8230680f upstream.
      
      Take into account the IPV6_TCLASS socket option (DSCP) in
      tcp_v6_connect(). Otherwise fib6_rule_match() can't properly
      match the DSCP value, resulting in invalid route lookup.
      
      For example:
      
        ip route add unreachable table main 2001:db8::10/124
      
        ip route add table 100 2001:db8::10/124 dev eth0
        ip -6 rule add dsfield 0x04 table 100
      
        echo test | socat - TCP6:[2001:db8::11]:54321,ipv6-tclass=0x04
      
      Without this patch, socat fails at connect() time ("No route to host")
      because the fib-rule doesn't jump to table 100 and the lookup ends up
      being done in the main table.
      
      Fixes: 2cc67cc7
      
       ("[IPV6] ROUTE: Routing by Traffic Class.")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13bc7dd5
    • Guillaume Nault's avatar
      ipv6: Fix datagram socket connection with DSCP. · f3326fa5
      Guillaume Nault authored
      commit e010ae08 upstream.
      
      Take into account the IPV6_TCLASS socket option (DSCP) in
      ip6_datagram_flow_key_init(). Otherwise fib6_rule_match() can't
      properly match the DSCP value, resulting in invalid route lookup.
      
      For example:
      
        ip route add unreachable table main 2001:db8::10/124
      
        ip route add table 100 2001:db8::10/124 dev eth0
        ip -6 rule add dsfield 0x04 table 100
      
        echo test | socat - UDP6:[2001:db8::11]:54321,ipv6-tclass=0x04
      
      Without this patch, socat fails at connect() time ("No route to host")
      because the fib-rule doesn't jump to table 100 and the lookup ends up
      being done in the main table.
      
      Fixes: 2cc67cc7
      
       ("[IPV6] ROUTE: Routing by Traffic Class.")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3326fa5
    • Jason Xing's avatar
      ixgbe: add double of VLAN header when computing the max MTU · 9c35c81f
      Jason Xing authored
      commit 0967bf83 upstream.
      
      Include the second VLAN HLEN into account when computing the maximum
      MTU size as other drivers do.
      
      Fixes: fabf1bce
      
       ("ixgbe: Prevent unsupported configurations with XDP")
      Signed-off-by: default avatarJason Xing <kernelxing@tencent.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9c35c81f
    • Jakub Kicinski's avatar
      net: mpls: fix stale pointer if allocation fails during device rename · 59a74da8
      Jakub Kicinski authored
      commit fda6c89f
      
       upstream.
      
      lianhui reports that when MPLS fails to register the sysctl table
      under new location (during device rename) the old pointers won't
      get overwritten and may be freed again (double free).
      
      Handle this gracefully. The best option would be unregistering
      the MPLS from the device completely on failure, but unfortunately
      mpls_ifdown() can fail. So failing fully is also unreliable.
      
      Another option is to register the new table first then only
      remove old one if the new one succeeds. That requires more
      code, changes order of notifications and two tables may be
      visible at the same time.
      
      sysctl point is not used in the rest of the code - set to NULL
      on failures and skip unregister if already NULL.
      
      Reported-by: default avatarlianhui tang <bluetlh@gmail.com>
      Fixes: 0fae3bf0
      
       ("mpls: handle device renames for per-device sysctls")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59a74da8
    • Cristian Ciocaltea's avatar
      net: stmmac: Restrict warning on disabling DMA store and fwd mode · bf8b820e
      Cristian Ciocaltea authored
      commit 05d7623a upstream.
      
      When setting 'snps,force_thresh_dma_mode' DT property, the following
      warning is always emitted, regardless the status of force_sf_dma_mode:
      
      dwmac-starfive 10020000.ethernet: force_sf_dma_mode is ignored if force_thresh_dma_mode is set.
      
      Do not print the rather misleading message when DMA store and forward
      mode is already disabled.
      
      Fixes: e2a240c7
      
       ("driver:net:stmmac: Disable DMA store and forward mode if platform data force_thresh_dma_mode is set.")
      Signed-off-by: default avatarCristian Ciocaltea <cristian.ciocaltea@collabora.com>
      Link: https://lore.kernel.org/r/20230210202126.877548-1-cristian.ciocaltea@collabora.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf8b820e
    • Michael Chan's avatar
      bnxt_en: Fix mqprio and XDP ring checking logic · 269520be
      Michael Chan authored
      commit 2038cc59 upstream.
      
      In bnxt_reserve_rings(), there is logic to check that the number of TX
      rings reserved is enough to cover all the mqprio TCs, but it fails to
      account for the TX XDP rings.  So the check will always fail if there
      are mqprio TCs and TX XDP rings.  As a result, the driver always fails
      to initialize after the XDP program is attached and the device will be
      brought down.  A subsequent ifconfig up will also fail because the
      number of TX rings is set to an inconsistent number.  Fix the check to
      properly account for TX XDP rings.  If the check fails, set the number
      of TX rings back to a consistent number after calling netdev_reset_tc().
      
      Fixes: 674f50a5
      
       ("bnxt_en: Implement new method to reserve rings.")
      Reviewed-by: default avatarHongguang Gao <hongguang.gao@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      269520be
    • Johannes Zink's avatar
      net: stmmac: fix order of dwmac5 FlexPPS parametrization sequence · 0428aabb
      Johannes Zink authored
      commit 4562c65e upstream.
      
      So far changing the period by just setting new period values while
      running did not work.
      
      The order as indicated by the publicly available reference manual of the i.MX8MP [1]
      indicates a sequence:
      
       * initiate the programming sequence
       * set the values for PPS period and start time
       * start the pulse train generation.
      
      This is currently not used in dwmac5_flex_pps_config(), which instead does:
      
       * initiate the programming sequence and immediately start the pulse train generation
       * set the values for PPS period and start time
      
      This caused the period values written not to take effect until the FlexPPS output was
      disabled and re-enabled again.
      
      This patch fix the order and allows the period to be set immediately.
      
      [1] https://www.nxp.com/webapp/Download?colCode=IMX8MPRM
      
      Fixes: 9a8a02c9
      
       ("net: stmmac: Add Flexible PPS support")
      Signed-off-by: default avatarJohannes Zink <j.zink@pengutronix.de>
      Link: https://lore.kernel.org/r/20230210143937.3427483-1-j.zink@pengutronix.de
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0428aabb
    • Hangyu Hua's avatar
      net: openvswitch: fix possible memory leak in ovs_meter_cmd_set() · 1563e998
      Hangyu Hua authored
      commit 2fa28f5c upstream.
      
      old_meter needs to be free after it is detached regardless of whether
      the new meter is successfully attached.
      
      Fixes: c7c4c44c
      
       ("net: openvswitch: expand the meters supported number")
      Signed-off-by: default avatarHangyu Hua <hbh25y@gmail.com>
      Acked-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1563e998
    • Miko Larsson's avatar
      net/usb: kalmia: Don't pass act_len in usb_bulk_msg error path · 338f826d
      Miko Larsson authored
      commit c68f345b upstream.
      
      syzbot reported that act_len in kalmia_send_init_packet() is
      uninitialized when passing it to the first usb_bulk_msg error path. Jiri
      Pirko noted that it's pointless to pass it in the error path, and that
      the value that would be printed in the second error path would be the
      value of act_len from the first call to usb_bulk_msg.[1]
      
      With this in mind, let's just not pass act_len to the usb_bulk_msg error
      paths.
      
      1: https://lore.kernel.org/lkml/Y9pY61y1nwTuzMOa@nanopsycho/
      
      Fixes: d4026123
      
       ("net/usb: Add Samsung Kalmia driver for Samsung GT-B3730")
      Reported-and-tested-by: default avatar <syzbot+cd80c5ef5121bfe85b55@syzkaller.appspotmail.com>
      Signed-off-by: default avatarMiko Larsson <mikoxyzzz@gmail.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      338f826d
    • Kuniyuki Iwashima's avatar
      dccp/tcp: Avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions. · 59e30d2b
      Kuniyuki Iwashima authored
      commit ca43ccf4 upstream.
      
      Eric Dumazet pointed out [0] that when we call skb_set_owner_r()
      for ipv6_pinfo.pktoptions, sk_rmem_schedule() has not been called,
      resulting in a negative sk_forward_alloc.
      
      We add a new helper which clones a skb and sets its owner only
      when sk_rmem_schedule() succeeds.
      
      Note that we move skb_set_owner_r() forward in (dccp|tcp)_v6_do_rcv()
      because tcp_send_synack() can make sk_forward_alloc negative before
      ipv6_opt_accepted() in the crossed SYN-ACK or self-connect() cases.
      
      [0]: https://lore.kernel.org/netdev/CANn89iK9oc20Jdi_41jb9URdF210r7d1Y-+uypbMSbOfY6jqrg@mail.gmail.com/
      
      Fixes: 323fbd0e ("net: dccp: Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv()")
      Fixes: 3df80d93 ("[DCCP]: Introduce DCCPv6")
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59e30d2b
    • Pedro Tammela's avatar
      net/sched: tcindex: update imperfect hash filters respecting rcu · becf5539
      Pedro Tammela authored
      commit ee059170 upstream.
      
      The imperfect hash area can be updated while packets are traversing,
      which will cause a use-after-free when 'tcf_exts_exec()' is called
      with the destroyed tcf_ext.
      
      CPU 0:               CPU 1:
      tcindex_set_parms    tcindex_classify
      tcindex_lookup
                           tcindex_lookup
      tcf_exts_change
                           tcf_exts_exec [UAF]
      
      Stop operating on the shared area directly, by using a local copy,
      and update the filter with 'rcu_replace_pointer()'. Delete the old
      filter version only after a rcu grace period elapsed.
      
      Fixes: 9b0d4446
      
       ("net: sched: avoid atomic swap in tcf_exts_change")
      Reported-by: default avatarvalis <sec@valis.email>
      Suggested-by: default avatarvalis <sec@valis.email>
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Link: https://lore.kernel.org/r/20230209143739.279867-1-pctammela@mojatatu.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      becf5539
    • Pietro Borrello's avatar
      sctp: sctp_sock_filter(): avoid list_entry() on possibly empty list · 3d5f95be
      Pietro Borrello authored
      commit a1221703 upstream.
      
      Use list_is_first() to check whether tsp->asoc matches the first
      element of ep->asocs, as the list is not guaranteed to have an entry.
      
      Fixes: 8f840e47
      
       ("sctp: add the sctp_diag.c file")
      Signed-off-by: default avatarPietro Borrello <borrello@diag.uniroma1.it>
      Acked-by: default avatarXin Long <lucien.xin@gmail.com>
      Link: https://lore.kernel.org/r/20230208-sctp-filter-v2-1-6e1f4017f326@diag.uniroma1.it
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d5f95be
    • Siddharth Vadapalli's avatar
      net: ethernet: ti: am65-cpsw: Add RX DMA Channel Teardown Quirk · fa56f164
      Siddharth Vadapalli authored
      commit 0ed577e7 upstream.
      
      In TI's AM62x/AM64x SoCs, successful teardown of RX DMA Channel raises an
      interrupt. The process of servicing this interrupt involves flushing all
      pending RX DMA descriptors and clearing the teardown completion marker
      (TDCM). The am65_cpsw_nuss_rx_packets() function invoked from the RX
      NAPI callback services the interrupt. Thus, it is necessary to wait for
      this handler to run, drain all packets and clear TDCM, before calling
      napi_disable() in am65_cpsw_nuss_common_stop() function post channel
      teardown. If napi_disable() executes before ensuring that TDCM is
      cleared, the TDCM remains set when the interfaces are down, resulting in
      an interrupt storm when the interfaces are brought up again.
      
      Since the interrupt raised to indicate the RX DMA Channel teardown is
      specific to the AM62x and AM64x SoCs, add a quirk for it.
      
      Fixes: 4f7cce27
      
       ("net: ethernet: ti: am65-cpsw: add support for am64x cpsw3g")
      Co-developed-by: default avatarVignesh Raghavendra <vigneshr@ti.com>
      Signed-off-by: default avatarVignesh Raghavendra <vigneshr@ti.com>
      Signed-off-by: default avatarSiddharth Vadapalli <s-vadapalli@ti.com>
      Reviewed-by: default avatarRoger Quadros <rogerq@kernel.org>
      Link: https://lore.kernel.org/r/20230209084432.189222-1-s-vadapalli@ti.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa56f164
    • Rafał Miłecki's avatar
      net: bgmac: fix BCM5358 support by setting correct flags · 2603a5ca
      Rafał Miłecki authored
      commit d61615c3 upstream.
      
      Code blocks handling BCMA_CHIP_ID_BCM5357 and BCMA_CHIP_ID_BCM53572 were
      incorrectly unified. Chip package values are not unique and cannot be
      checked independently. They are meaningful only in a context of a given
      chip.
      
      Packages BCM5358 and BCM47188 share the same value but then belong to
      different chips. Code unification resulted in treating BCM5358 as
      BCM47188 and broke its initialization.
      
      Link: https://github.com/openwrt/openwrt/issues/8278
      Fixes: cb1b0f90
      
       ("net: ethernet: bgmac: unify code of the same family")
      Cc: Jon Mason <jdmason@kudzu.us>
      Signed-off-by: default avatarRafał Miłecki <rafal@milecki.pl>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20230208091637.16291-1-zajec5@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2603a5ca
    • Jason Xing's avatar
      i40e: add double of VLAN header when computing the max MTU · a5e4f2b2
      Jason Xing authored
      commit ce45ffb8 upstream.
      
      Include the second VLAN HLEN into account when computing the maximum
      MTU size as other drivers do.
      
      Fixes: 0c8493d9
      
       ("i40e: add XDP support for pass and drop actions")
      Signed-off-by: default avatarJason Xing <kernelxing@tencent.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a5e4f2b2
    • Jason Xing's avatar
      ixgbe: allow to increase MTU to 3K with XDP enabled · 1f23ca5d
      Jason Xing authored
      commit f9cd6a44 upstream.
      
      Recently I encountered one case where I cannot increase the MTU size
      directly from 1500 to a much bigger value with XDP enabled if the
      server is equipped with IXGBE card, which happened on thousands of
      servers in production environment. After applying the current patch,
      we can set the maximum MTU size to 3K.
      
      This patch follows the behavior of changing MTU as i40e/ice does.
      
      References:
      [1] commit 23b44513 ("ice: allow 3k MTU for XDP")
      [2] commit 0c8493d9 ("i40e: add XDP support for pass and drop actions")
      
      Fixes: fabf1bce
      
       ("ixgbe: Prevent unsupported configurations with XDP")
      Signed-off-by: default avatarJason Xing <kernelxing@tencent.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f23ca5d
    • Andrew Morton's avatar
      revert "squashfs: harden sanity check in squashfs_read_xattr_id_table" · 65d07ae6
      Andrew Morton authored
      commit a5b21d8d upstream.
      
      This fix was nacked by Philip, for reasons identified in the email linked
      below.
      
      Link: https://lkml.kernel.org/r/68f15d67-8945-2728-1f17-5b53a80ec52d@squashfs.org.uk
      Fixes: 72e544b1
      
       ("squashfs: harden sanity check in squashfs_read_xattr_id_table")
      Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
      Cc: Fedor Pchelkin <pchelkin@ispras.ru>
      Cc: Phillip Lougher <phillip@squashfs.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65d07ae6
    • Felix Riemann's avatar
      net: Fix unwanted sign extension in netdev_stats_to_stats64() · 50267cf3
      Felix Riemann authored
      commit 9b55d3f0 upstream.
      
      When converting net_device_stats to rtnl_link_stats64 sign extension
      is triggered on ILP32 machines as 6c1c5097 changed the previous
      "ulong -> u64" conversion to "long -> u64" by accessing the
      net_device_stats fields through a (signed) atomic_long_t.
      
      This causes for example the received bytes counter to jump to 16EiB after
      having received 2^31 bytes. Casting the atomic value to "unsigned long"
      beforehand converting it into u64 avoids this.
      
      Fixes: 6c1c5097
      
       ("net: add atomic_long_t to net_device_stats fields")
      Signed-off-by: default avatarFelix Riemann <felix.riemann@sma.de>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50267cf3
    • Aaron Thompson's avatar
      Revert "mm: Always release pages to the buddy allocator in memblock_free_late()." · 3775c95f
      Aaron Thompson authored
      commit 647037ad upstream.
      
      This reverts commit 115d9d77.
      
      The pages being freed by memblock_free_late() have already been
      initialized, but if they are in the deferred init range,
      __free_one_page() might access nearby uninitialized pages when trying to
      coalesce buddies. This can, for example, trigger this BUG:
      
        BUG: unable to handle page fault for address: ffffe964c02580c8
        RIP: 0010:__list_del_entry_valid+0x3f/0x70
         <TASK>
         __free_one_page+0x139/0x410
         __free_pages_ok+0x21d/0x450
         memblock_free_late+0x8c/0xb9
         efi_free_boot_services+0x16b/0x25c
         efi_enter_virtual_mode+0x403/0x446
         start_kernel+0x678/0x714
         secondary_startup_64_no_verify+0xd2/0xdb
         </TASK>
      
      A proper fix will be more involved so revert this change for the time
      being.
      
      Fixes: 115d9d77
      
       ("mm: Always release pages to the buddy allocator in memblock_free_late().")
      Signed-off-by: default avatarAaron Thompson <dev@aaront.org>
      Link: https://lore.kernel.org/r/20230207082151.1303-1-dev@aaront.org
      Signed-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3775c95f
    • Misono Tomohiro's avatar
      selftest/lkdtm: Skip stack-entropy test if lkdtm is not available · 57081f83
      Misono Tomohiro authored
      commit 90091c36
      
       upstream.
      
      Exit with return code 4 if lkdtm is not available like other tests
      in order to properly skip the test.
      
      Signed-off-by: default avatarMisono Tomohiro <misono.tomohiro@jp.fujitsu.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20210805101236.1140381-1-misono.tomohiro@jp.fujitsu.com
      Cc: Andrew Paniakin <apanyaki@amazon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57081f83
    • Isaac J. Manjarres's avatar
      of: reserved_mem: Have kmemleak ignore dynamically allocated reserved mem · 9197daee
      Isaac J. Manjarres authored
      commit ce4d9a1e upstream.
      
      Patch series "Fix kmemleak crashes when scanning CMA regions", v2.
      
      When trying to boot a device with an ARM64 kernel with the following
      config options enabled:
      
      CONFIG_DEBUG_PAGEALLOC=y
      CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
      CONFIG_DEBUG_KMEMLEAK=y
      
      a crash is encountered when kmemleak starts to scan the list of gray
      or allocated objects that it maintains. Upon closer inspection, it was
      observed that these page-faults always occurred when kmemleak attempted
      to scan a CMA region.
      
      At the moment, kmemleak is made aware of CMA regions that are specified
      through the devicetree to be dynamically allocated within a range of
      addresses. However, kmemleak should not need to scan CMA regions or any
      reserved memory region, as those regions can be used for DMA transfers
      between drivers and peripherals, and thus wouldn't contain anything
      useful for kmemleak.
      
      Additionally, since CMA regions are unmapped from the kernel's address
      space when they are freed to the buddy allocator at boot when
      CONFIG_DEBUG_PAGEALLOC is enabled, kmemleak shouldn't attempt to access
      those memory regions, as that will trigger a crash. Thus, kmemleak
      should ignore all dynamically allocated reserved memory regions.
      
      
      This patch (of 1):
      
      Currently, kmemleak ignores dynamically allocated reserved memory regions
      that don't have a kernel mapping.  However, regions that do retain a
      kernel mapping (e.g.  CMA regions) do get scanned by kmemleak.
      
      This is not ideal for two reasons:
      
      1  kmemleak works by scanning memory regions for pointers to allocated
         objects to determine if those objects have been leaked or not.
         However, reserved memory regions can be used between drivers and
         peripherals for DMA transfers, and thus, would not contain pointers to
         allocated objects, making it unnecessary for kmemleak to scan these
         reserved memory regions.
      
      2  When CONFIG_DEBUG_PAGEALLOC is enabled, along with kmemleak, the
         CMA reserved memory regions are unmapped from the kernel's address
         space when they are freed to buddy at boot.  These CMA reserved regions
         are still tracked by kmemleak, however, and when kmemleak attempts to
         scan them, a crash will happen, as accessing the CMA region will result
         in a page-fault, since the regions are unmapped.
      
      Thus, use kmemleak_ignore_phys() for all dynamically allocated reserved
      memory regions, instead of those that do not have a kernel mapping
      associated with them.
      
      Link: https://lkml.kernel.org/r/20230208232001.2052777-1-isaacmanjarres@google.com
      Link: https://lkml.kernel.org/r/20230208232001.2052777-2-isaacmanjarres@google.com
      Fixes: a7259df7
      
       ("memblock: make memblock_find_in_range method private")
      Signed-off-by: default avatarIsaac J. Manjarres <isaacmanjarres@google.com>
      Acked-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Frank Rowand <frowand.list@gmail.com>
      Cc: Kirill A. Shutemov <kirill.shtuemov@linux.intel.com>
      Cc: Nick Kossifidis <mick@ics.forth.gr>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Cc: Saravana Kannan <saravanak@google.com>
      Cc: <stable@vger.kernel.org>	[5.15+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9197daee
    • Mike Kravetz's avatar
      hugetlb: check for undefined shift on 32 bit architectures · 8b29a186
      Mike Kravetz authored
      commit ec4288fe upstream.
      
      Users can specify the hugetlb page size in the mmap, shmget and
      memfd_create system calls.  This is done by using 6 bits within the flags
      argument to encode the base-2 logarithm of the desired page size.  The
      routine hstate_sizelog() uses the log2 value to find the corresponding
      hugetlb hstate structure.  Converting the log2 value (page_size_log) to
      potential hugetlb page size is the simple statement:
      
      	1UL << page_size_log
      
      Because only 6 bits are used for page_size_log, the left shift can not be
      greater than 63.  This is fine on 64 bit architectures where a long is 64
      bits.  However, if a value greater than 31 is passed on a 32 bit
      architecture (where long is 32 bits) the shift will result in undefined
      behavior.  This was generally not an issue as the result of the undefined
      shift had to exactly match hugetlb page size to proceed.
      
      Recent improvements in runtime checking have resulted in this undefined
      behavior throwing errors such as reported below.
      
      Fix by comparing page_size_log to BITS_PER_LONG before doing shift.
      
      Link: https://lkml.kernel.org/r/20230216013542.138708-1-mike.kravetz@oracle.com
      Link: https://lore.kernel.org/lkml/CA+G9fYuei_Tr-vN9GS7SfFyU1y9hNysnf=PB7kT0=yv4MiPgVg@mail.gmail.com/
      Fixes: 42d7395f
      
       ("mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Reviewed-by: default avatarJesper Juhl <jesperjuhl76@gmail.com>
      Acked-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Tested-by: default avatarLinux Kernel Functional Testing <lkft@linaro.org>
      Tested-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Cc: Anders Roxell <anders.roxell@linaro.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Sasha Levin <sashal@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b29a186
    • Munehisa Kamata's avatar
      sched/psi: Fix use-after-free in ep_remove_wait_queue() · cca2b3fe
      Munehisa Kamata authored
      commit c2dbe32d upstream.
      
      If a non-root cgroup gets removed when there is a thread that registered
      trigger and is polling on a pressure file within the cgroup, the polling
      waitqueue gets freed in the following path:
      
       do_rmdir
         cgroup_rmdir
           kernfs_drain_open_files
             cgroup_file_release
               cgroup_pressure_release
                 psi_trigger_destroy
      
      However, the polling thread still has a reference to the pressure file and
      will access the freed waitqueue when the file is closed or upon exit:
      
       fput
         ep_eventpoll_release
           ep_free
             ep_remove_wait_queue
               remove_wait_queue
      
      This results in use-after-free as pasted below.
      
      The fundamental problem here is that cgroup_file_release() (and
      consequently waitqueue's lifetime) is not tied to the file's real lifetime.
      Using wake_up_pollfree() here might be less than ideal, but it is in line
      with the comment at commit 42288cb4 ("wait: add wake_up_pollfree()")
      since the waitqueue's lifetime is not tied to file's one and can be
      considered as another special case. While this would be fixable by somehow
      making cgroup_file_release() be tied to the fput(), it would require
      sizable refactoring at cgroups or higher layer which might be more
      justifiable if we identify more cases like this.
      
        BUG: KASAN: use-after-free in _raw_spin_lock_irqsave+0x60/0xc0
        Write of size 4 at addr ffff88810e625328 by task a.out/4404
      
      	CPU: 19 PID: 4404 Comm: a.out Not tainted 6.2.0-rc6 #38
      	Hardware name: Amazon EC2 c5a.8xlarge/, BIOS 1.0 10/16/2017
      	Call Trace:
      	<TASK>
      	dump_stack_lvl+0x73/0xa0
      	print_report+0x16c/0x4e0
      	kasan_report+0xc3/0xf0
      	kasan_check_range+0x2d2/0x310
      	_raw_spin_lock_irqsave+0x60/0xc0
      	remove_wait_queue+0x1a/0xa0
      	ep_free+0x12c/0x170
      	ep_eventpoll_release+0x26/0x30
      	__fput+0x202/0x400
      	task_work_run+0x11d/0x170
      	do_exit+0x495/0x1130
      	do_group_exit+0x100/0x100
      	get_signal+0xd67/0xde0
      	arch_do_signal_or_restart+0x2a/0x2b0
      	exit_to_user_mode_prepare+0x94/0x100
      	syscall_exit_to_user_mode+0x20/0x40
      	do_syscall_64+0x52/0x90
      	entry_SYSCALL_64_after_hwframe+0x63/0xcd
      	</TASK>
      
       Allocated by task 4404:
      
      	kasan_set_track+0x3d/0x60
      	__kasan_kmalloc+0x85/0x90
      	psi_trigger_create+0x113/0x3e0
      	pressure_write+0x146/0x2e0
      	cgroup_file_write+0x11c/0x250
      	kernfs_fop_write_iter+0x186/0x220
      	vfs_write+0x3d8/0x5c0
      	ksys_write+0x90/0x110
      	do_syscall_64+0x43/0x90
      	entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
       Freed by task 4407:
      
      	kasan_set_track+0x3d/0x60
      	kasan_save_free_info+0x27/0x40
      	____kasan_slab_free+0x11d/0x170
      	slab_free_freelist_hook+0x87/0x150
      	__kmem_cache_free+0xcb/0x180
      	psi_trigger_destroy+0x2e8/0x310
      	cgroup_file_release+0x4f/0xb0
      	kernfs_drain_open_files+0x165/0x1f0
      	kernfs_drain+0x162/0x1a0
      	__kernfs_remove+0x1fb/0x310
      	kernfs_remove_by_name_ns+0x95/0xe0
      	cgroup_addrm_files+0x67f/0x700
      	cgroup_destroy_locked+0x283/0x3c0
      	cgroup_rmdir+0x29/0x100
      	kernfs_iop_rmdir+0xd1/0x140
      	vfs_rmdir+0xfe/0x240
      	do_rmdir+0x13d/0x280
      	__x64_sys_rmdir+0x2c/0x30
      	do_syscall_64+0x43/0x90
      	entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: 0e94682b
      
       ("psi: introduce psi monitor")
      Signed-off-by: default avatarMunehisa Kamata <kamatam@amazon.com>
      Signed-off-by: default avatarMengchi Cheng <mengcc@amazon.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/lkml/20230106224859.4123476-1-kamatam@amazon.com/
      Link: https://lore.kernel.org/r/20230214212705.4058045-1-kamatam@amazon.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cca2b3fe
    • Kailang Yang's avatar
      ALSA: hda/realtek - fixed wrong gpio assigned · c5f2151a
      Kailang Yang authored
      commit 2bdccfd2
      
       upstream.
      
      GPIO2 PIN use for output. Mask Dir and Data need to assign for 0x4. Not 0x3.
      This fixed was for Lenovo Desktop(0x17aa1056). GPIO2 use for AMP enable.
      
      Signed-off-by: default avatarKailang Yang <kailang@realtek.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/8d02bb9ac8134f878cd08607fdf088fd@realtek.com
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c5f2151a
    • Bo Liu's avatar
      ALSA: hda/conexant: add a new hda codec SN6180 · 1a3f8c85
      Bo Liu authored
      commit 18d7e16c
      
       upstream.
      
      The current kernel does not support the SN6180 codec chip.
      Add the SN6180 codec configuration item to kernel.
      
      Signed-off-by: default avatarBo Liu <bo.liu@senarytech.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/1675908828-1012-1-git-send-email-bo.liu@senarytech.com
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a3f8c85
    • Yang Yingliang's avatar
      mmc: mmc_spi: fix error handling in mmc_spi_probe() · ecad2faf
      Yang Yingliang authored
      commit cf4c9d2a upstream.
      
      If mmc_add_host() fails, it doesn't need to call mmc_remove_host(),
      or it will cause null-ptr-deref, because of deleting a not added
      device in mmc_remove_host().
      
      To fix this, goto label 'fail_glue_init', if mmc_add_host() fails,
      and change the label 'fail_add_host' to 'fail_gpiod_request'.
      
      Fixes: 15a0580c
      
       ("mmc_spi host driver")
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Cc:stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20230131013835.3564011-1-yangyingliang@huawei.com
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ecad2faf
    • Yang Yingliang's avatar
      mmc: sdio: fix possible resource leaks in some error paths · 1e06cf04
      Yang Yingliang authored
      commit 605d9fb9 upstream.
      
      If sdio_add_func() or sdio_init_func() fails, sdio_remove_func() can
      not release the resources, because the sdio function is not presented
      in these two cases, it won't call of_node_put() or put_device().
      
      To fix these leaks, make sdio_func_present() only control whether
      device_del() needs to be called or not, then always call of_node_put()
      and put_device().
      
      In error case in sdio_init_func(), the reference of 'card->dev' is
      not get, to avoid redundant put in sdio_free_func_cis(), move the
      get_device() to sdio_alloc_func() and put_device() to sdio_release_func(),
      it can keep the get/put function be balanced.
      
      Without this patch, while doing fault inject test, it can get the
      following leak reports, after this fix, the leak is gone.
      
      unreferenced object 0xffff888112514000 (size 2048):
        comm "kworker/3:2", pid 65, jiffies 4294741614 (age 124.774s)
        hex dump (first 32 bytes):
          00 e0 6f 12 81 88 ff ff 60 58 8d 06 81 88 ff ff  ..o.....`X......
          10 40 51 12 81 88 ff ff 10 40 51 12 81 88 ff ff  .@Q......@Q.....
        backtrace:
          [<000000009e5931da>] kmalloc_trace+0x21/0x110
          [<000000002f839ccb>] mmc_alloc_card+0x38/0xb0 [mmc_core]
          [<0000000004adcbf6>] mmc_sdio_init_card+0xde/0x170 [mmc_core]
          [<000000007538fea0>] mmc_attach_sdio+0xcb/0x1b0 [mmc_core]
          [<00000000d4fdeba7>] mmc_rescan+0x54a/0x640 [mmc_core]
      
      unreferenced object 0xffff888112511000 (size 2048):
        comm "kworker/3:2", pid 65, jiffies 4294741623 (age 124.766s)
        hex dump (first 32 bytes):
          00 40 51 12 81 88 ff ff e0 58 8d 06 81 88 ff ff  .@Q......X......
          10 10 51 12 81 88 ff ff 10 10 51 12 81 88 ff ff  ..Q.......Q.....
        backtrace:
          [<000000009e5931da>] kmalloc_trace+0x21/0x110
          [<00000000fcbe706c>] sdio_alloc_func+0x35/0x100 [mmc_core]
          [<00000000c68f4b50>] mmc_attach_sdio.cold.18+0xb1/0x395 [mmc_core]
          [<00000000d4fdeba7>] mmc_rescan+0x54a/0x640 [mmc_core]
      
      Fixes: 3d10a1ba
      
       ("sdio: fix reference counting in sdio_remove_func()")
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20230130125808.3471254-1-yangyingliang@huawei.com
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e06cf04
    • Paul Cercueil's avatar
      mmc: jz4740: Work around bug on JZ4760(B) · 732e3b29
      Paul Cercueil authored
      commit 3f18c504
      
       upstream.
      
      On JZ4760 and JZ4760B, SD cards fail to run if the maximum clock
      rate is set to 50 MHz, even though the controller officially does
      support it.
      
      Until the actual bug is found and fixed, limit the maximum clock rate to
      24 MHz.
      
      Signed-off-by: default avatarPaul Cercueil <paul@crapouillou.net>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20230131210229.68129-1-paul@crapouillou.net
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      732e3b29
    • Kuniyuki Iwashima's avatar
      tcp: Fix listen() regression in 5.15.88. · fdaf8853
      Kuniyuki Iwashima authored
      When we backport dadd0dca ("net/ulp: prevent ULP without clone op from
      entering the LISTEN status"), we have accidentally backported a part of
      7a7160ed
      
       ("net: Return errno in sk->sk_prot->get_port().") and removed
      err = -EADDRINUSE in inet_csk_listen_start().
      
      Thus, listen() no longer returns -EADDRINUSE even if ->get_port() failed
      as reported in [0].
      
      We set -EADDRINUSE to err just before ->get_port() to fix the regression.
      
      [0]: https://lore.kernel.org/stable/EF8A45D0-768A-4CD5-9A8A-0FA6E610ABF7@winter.cafe/
      
      Reported-by: default avatarWinter <winter@winter.cafe>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fdaf8853
    • Florian Westphal's avatar
      netfilter: nft_tproxy: restrict to prerouting hook · 9a1d92cb
      Florian Westphal authored
      commit 18bbc321 upstream.
      
      TPROXY is only allowed from prerouting, but nft_tproxy doesn't check this.
      This fixes a crash (null dereference) when using tproxy from e.g. output.
      
      Fixes: 4ed8eb65
      
       ("netfilter: nf_tables: Add native tproxy support")
      Reported-by: default avatarShell Chen <xierch@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarQingfang DENG <dqfext@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a1d92cb