Skip to content
  1. Aug 03, 2022
    • Darrick J. Wong's avatar
      xfs: prevent UAF in xfs_log_item_in_current_chkpt · 17c8097f
      Darrick J. Wong authored
      commit f8d92a66 upstream.
      
      While I was running with KASAN and lockdep enabled, I stumbled upon an
      KASAN report about a UAF to a freed CIL checkpoint.  Looking at the
      comment for xfs_log_item_in_current_chkpt, it seems pretty obvious to me
      that the original patch to xfs_defer_finish_noroll should have done
      something to lock the CIL to prevent it from switching the CIL contexts
      while the predicate runs.
      
      For upper level code that needs to know if a given log item is new
      enough not to need relogging, add a new wrapper that takes the CIL
      context lock long enough to sample the current CIL context.  This is
      kind of racy in that the CIL can switch the contexts immediately after
      sampling, but that's ok because the consequence is that the defer ops
      code is a little slow to relog items.
      
       ==================================================================
       BUG: KASAN: use-after-free in xfs_log_item_in_current_chkpt+0x139/0x160 [xfs]
       Read of size 8 at addr ffff88804ea5f608 by task fsstress/527999
      
       CPU: 1 PID: 527999 Comm: fsstress Tainted: G      D      5.16.0-rc4-xfsx #rc4
       Call Trace:
        <TASK>
        dump_stack_lvl+0x45/0x59
        print_address_description.constprop.0+0x1f/0x140
        kasan_report.cold+0x83/0xdf
        xfs_log_item_in_current_chkpt+0x139/0x160
        xfs_defer_finish_noroll+0x3bb/0x1e30
        __xfs_trans_commit+0x6c8/0xcf0
        xfs_reflink_remap_extent+0x66f/0x10e0
        xfs_reflink_remap_blocks+0x2dd/0xa90
        xfs_file_remap_range+0x27b/0xc30
        vfs_dedupe_file_range_one+0x368/0x420
        vfs_dedupe_file_range+0x37c/0x5d0
        do_vfs_ioctl+0x308/0x1260
        __x64_sys_ioctl+0xa1/0x170
        do_syscall_64+0x35/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xae
       RIP: 0033:0x7f2c71a2950b
       Code: 0f 1e fa 48 8b 05 85 39 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff
      ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01
      f0 ff ff 73 01 c3 48 8b 0d 55 39 0d 00 f7 d8 64 89 01 48
       RSP: 002b:00007ffe8c0e03c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
       RAX: ffffffffffffffda RBX: 00005600862a8740 RCX: 00007f2c71a2950b
       RDX: 00005600862a7be0 RSI: 00000000c0189436 RDI: 0000000000000004
       RBP: 000000000000000b R08: 0000000000000027 R09: 0000000000000003
       R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000005a
       R13: 00005600862804a8 R14: 0000000000016000 R15: 00005600862a8a20
        </TASK>
      
       Allocated by task 464064:
        kasan_save_stack+0x1e/0x50
        __kasan_kmalloc+0x81/0xa0
        kmem_alloc+0xcd/0x2c0 [xfs]
        xlog_cil_ctx_alloc+0x17/0x1e0 [xfs]
        xlog_cil_push_work+0x141/0x13d0 [xfs]
        process_one_work+0x7f6/0x1380
        worker_thread+0x59d/0x1040
        kthread+0x3b0/0x490
        ret_from_fork+0x1f/0x30
      
       Freed by task 51:
        kasan_save_stack+0x1e/0x50
        kasan_set_track+0x21/0x30
        kasan_set_free_info+0x20/0x30
        __kasan_slab_free+0xed/0x130
        slab_free_freelist_hook+0x7f/0x160
        kfree+0xde/0x340
        xlog_cil_committed+0xbfd/0xfe0 [xfs]
        xlog_cil_process_committed+0x103/0x1c0 [xfs]
        xlog_state_do_callback+0x45d/0xbd0 [xfs]
        xlog_ioend_work+0x116/0x1c0 [xfs]
        process_one_work+0x7f6/0x1380
        worker_thread+0x59d/0x1040
        kthread+0x3b0/0x490
        ret_from_fork+0x1f/0x30
      
       Last potentially related work creation:
        kasan_save_stack+0x1e/0x50
        __kasan_record_aux_stack+0xb7/0xc0
        insert_work+0x48/0x2e0
        __queue_work+0x4e7/0xda0
        queue_work_on+0x69/0x80
        xlog_cil_push_now.isra.0+0x16b/0x210 [xfs]
        xlog_cil_force_seq+0x1b7/0x850 [xfs]
        xfs_log_force_seq+0x1c7/0x670 [xfs]
        xfs_file_fsync+0x7c1/0xa60 [xfs]
        __x64_sys_fsync+0x52/0x80
        do_syscall_64+0x35/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
       The buggy address belongs to the object at ffff88804ea5f600
        which belongs to the cache kmalloc-256 of size 256
       The buggy address is located 8 bytes inside of
        256-byte region [ffff88804ea5f600, ffff88804ea5f700)
       The buggy address belongs to the page:
       page:ffffea00013a9780 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88804ea5ea00 pfn:0x4ea5e
       head:ffffea00013a9780 order:1 compound_mapcount:0
       flags: 0x4fff80000010200(slab|head|node=1|zone=1|lastcpupid=0xfff)
       raw: 04fff80000010200 ffffea0001245908 ffffea00011bd388 ffff888004c42b40
       raw: ffff88804ea5ea00 0000000000100009 00000001ffffffff 0000000000000000
       page dumped because: kasan: bad access detected
      
       Memory state around the buggy address:
        ffff88804ea5f500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
        ffff88804ea5f580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       >ffff88804ea5f600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                             ^
        ffff88804ea5f680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ffff88804ea5f700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ==================================================================
      
      Fixes: 4e919af7
      
       ("xfs: periodically relog deferred intent items")
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Acked-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      17c8097f
    • Dave Chinner's avatar
      xfs: xfs_log_force_lsn isn't passed a LSN · 6d3605f8
      Dave Chinner authored
      commit 5f9b4b0d
      
       upstream.
      
      [backported from CIL scalability series for dependency]
      
      In doing an investigation into AIL push stalls, I was looking at the
      log force code to see if an async CIL push could be done instead.
      This lead me to xfs_log_force_lsn() and looking at how it works.
      
      xfs_log_force_lsn() is only called from inode synchronisation
      contexts such as fsync(), and it takes the ip->i_itemp->ili_last_lsn
      value as the LSN to sync the log to. This gets passed to
      xlog_cil_force_lsn() via xfs_log_force_lsn() to flush the CIL to the
      journal, and then used by xfs_log_force_lsn() to flush the iclogs to
      the journal.
      
      The problem is that ip->i_itemp->ili_last_lsn does not store a
      log sequence number. What it stores is passed to it from the
      ->iop_committing method, which is called by xfs_log_commit_cil().
      The value this passes to the iop_committing method is the CIL
      context sequence number that the item was committed to.
      
      As it turns out, xlog_cil_force_lsn() converts the sequence to an
      actual commit LSN for the related context and returns that to
      xfs_log_force_lsn(). xfs_log_force_lsn() overwrites it's "lsn"
      variable that contained a sequence with an actual LSN and then uses
      that to sync the iclogs.
      
      This caused me some confusion for a while, even though I originally
      wrote all this code a decade ago. ->iop_committing is only used by
      a couple of log item types, and only inode items use the sequence
      number it is passed.
      
      Let's clean up the API, CIL structures and inode log item to call it
      a sequence number, and make it clear that the high level code is
      using CIL sequence numbers and not on-disk LSNs for integrity
      synchronisation purposes.
      
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarAllison Henderson <allison.henderson@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Acked-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d3605f8
    • Christoph Hellwig's avatar
      xfs: refactor xfs_file_fsync · 41fbfdab
      Christoph Hellwig authored
      commit f22c7f87
      
       upstream.
      
      [backported for dependency]
      
      Factor out the log syncing logic into two helpers to make the code easier
      to read and more maintainable.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Acked-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      41fbfdab
    • Eiichi Tsukata's avatar
      docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed · aadc39fd
      Eiichi Tsukata authored
      commit ea304a8b
      
       upstream.
      
      Updates descriptions for "mitigations=off" and "mitigations=auto,nosmt"
      with the respective retbleed= settings.
      
      Signed-off-by: default avatarEiichi Tsukata <eiichi.tsukata@nutanix.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: corbet@lwn.net
      Link: https://lore.kernel.org/r/20220728043907.165688-1-eiichi.tsukata@nutanix.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aadc39fd
    • Toshi Kani's avatar
      EDAC/ghes: Set the DIMM label unconditionally · c4cd52ab
      Toshi Kani authored
      commit 5e2805d5 upstream.
      
      The commit
      
        cb51a371 ("EDAC/ghes: Setup DIMM label from DMI and use it in error reports")
      
      enforced that both the bank and device strings passed to
      dimm_setup_label() are not NULL.
      
      However, there are BIOSes, for example on a
      
        HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 03/15/2019
      
      which don't populate both strings:
      
        Handle 0x0020, DMI type 17, 84 bytes
        Memory Device
                Array Handle: 0x0013
                Error Information Handle: Not Provided
                Total Width: 72 bits
                Data Width: 64 bits
                Size: 32 GB
                Form Factor: DIMM
                Set: None
                Locator: PROC 1 DIMM 1        <===== device
                Bank Locator: Not Specified   <===== bank
      
      This results in a buffer overflow because ghes_edac_register() calls
      strlen() on an uninitialized label, which had non-zero values left over
      from krealloc_array():
      
        detected buffer overflow in __fortify_strlen
         ------------[ cut here ]------------
         kernel BUG at lib/string_helpers.c:983!
         invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
         CPU: 1 PID: 1 Comm: swapper/0 Tainted: G          I       5.18.6-200.fc36.x86_64 #1
         Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 03/15/2019
         RIP: 0010:fortify_panic
         ...
         Call Trace:
          <TASK>
          ghes_edac_register.cold
          ghes_probe
          platform_probe
          really_probe
          __driver_probe_device
          driver_probe_device
          __driver_attach
          ? __device_attach_driver
          bus_for_each_dev
          bus_add_driver
          driver_register
          acpi_ghes_init
          acpi_init
          ? acpi_sleep_proc_init
          do_one_initcall
      
      The label contains garbage because the commit in Fixes reallocs the
      DIMMs array while scanning the system but doesn't clear the newly
      allocated memory.
      
      Change dimm_setup_label() to always initialize the label to fix the
      issue. Set it to the empty string in case BIOS does not provide both
      bank and device so that ghes_edac_register() can keep the default label
      given by edac_mc_alloc_dimms().
      
        [ bp: Rewrite commit message. ]
      
      Fixes: b9cae277
      
       ("EDAC/ghes: Scan the system once on driver init")
      Co-developed-by: default avatarRobert Richter <rric@kernel.org>
      Signed-off-by: default avatarRobert Richter <rric@kernel.org>
      Signed-off-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Tested-by: default avatarRobert Elliott <elliott@hpe.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20220719220124.760359-1-toshi.kani@hpe.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c4cd52ab
    • Florian Fainelli's avatar
      ARM: 9216/1: Fix MAX_DMA_ADDRESS overflow · c4546391
      Florian Fainelli authored
      [ Upstream commit fb0fd346 ]
      
      Commit 26f09e9b ("mm/memblock: add memblock memory allocation apis")
      added a check to determine whether arm_dma_zone_size is exceeding the
      amount of kernel virtual address space available between the upper 4GB
      virtual address limit and PAGE_OFFSET in order to provide a suitable
      definition of MAX_DMA_ADDRESS that should fit within the 32-bit virtual
      address space. The quantity used for comparison was off by a missing
      trailing 0, leading to MAX_DMA_ADDRESS to be overflowing a 32-bit
      quantity.
      
      This was caught thanks to CONFIG_DEBUG_VIRTUAL on the bcm2711 platform
      where we define a dma_zone_size of 1GB and we have a PAGE_OFFSET value
      of 0xc000_0000 (CONFIG_VMSPLIT_3G) leading to MAX_DMA_ADDRESS being
      0x1_0000_0000 which overflows the unsigned long type used throughout
      __pa() and then __virt_addr_valid(). Because the virtual address passed
      to __virt_addr_valid() would now be 0, the function would loudly warn
      and flood the kernel log, thus making the platform unable to boot
      properly.
      
      Fixes: 26f09e9b
      
       ("mm/memblock: add memblock memory allocation apis")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c4546391
    • Wei Mingzhi's avatar
      mt7601u: add USB device ID for some versions of XiaoDu WiFi Dongle. · e500aa9f
      Wei Mingzhi authored
      commit 829eea7c
      
       upstream.
      
      USB device ID of some versions of XiaoDu WiFi Dongle is 2955:1003
      instead of 2955:1001. Both are the same mt7601u hardware.
      
      Signed-off-by: default avatarWei Mingzhi <whistler@member.fsf.org>
      Acked-by: default avatarJakub Kicinski <kubakici@wp.pl>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/20210618160840.305024-1-whistler@member.fsf.org
      Cc: Yan Xinyu <sdlyyxy@bupt.edu.cn>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e500aa9f
    • Jaewon Kim's avatar
      page_alloc: fix invalid watermark check on a negative value · 2670f76a
      Jaewon Kim authored
      commit 9282012f upstream.
      
      There was a report that a task is waiting at the
      throttle_direct_reclaim. The pgscan_direct_throttle in vmstat was
      increasing.
      
      This is a bug where zone_watermark_fast returns true even when the free
      is very low. The commit f27ce0e1 ("page_alloc: consider highatomic
      reserve in watermark fast") changed the watermark fast to consider
      highatomic reserve. But it did not handle a negative value case which
      can be happened when reserved_highatomic pageblock is bigger than the
      actual free.
      
      If watermark is considered as ok for the negative value, allocating
      contexts for order-0 will consume all free pages without direct reclaim,
      and finally free page may become depleted except highatomic free.
      
      Then allocating contexts may fall into throttle_direct_reclaim. This
      symptom may easily happen in a system where wmark min is low and other
      reclaimers like kswapd does not make free pages quickly.
      
      Handle the negative case by using MIN.
      
      Link: https://lkml.kernel.org/r/20220725095212.25388-1-jaewon31.kim@samsung.com
      Fixes: f27ce0e1
      
       ("page_alloc: consider highatomic reserve in watermark fast")
      Signed-off-by: default avatarJaewon Kim <jaewon31.kim@samsung.com>
      Reported-by: default avatarGyeongHwan Hong <gh21.hong@samsung.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Yong-Taek Lee <ytk.lee@samsung.com>
      Cc: <stable@vger.kerenl.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2670f76a
    • Greg Kroah-Hartman's avatar
      ARM: crypto: comment out gcc warning that breaks clang builds · 80142466
      Greg Kroah-Hartman authored
      
      
      The gcc build warning prevents all clang-built kernels from working
      properly, so comment it out to fix the build.
      
      This is a -stable kernel only patch for now, it will be resolved
      differently in mainline releases in the future.
      
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: "Justin M. Forbes" <jforbes@fedoraproject.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80142466
    • Xin Long's avatar
      sctp: leave the err path free in sctp_stream_init to sctp_stream_free · 6f350558
      Xin Long authored
      [ Upstream commit 181d8d20 ]
      
      A NULL pointer dereference was reported by Wei Chen:
      
        BUG: kernel NULL pointer dereference, address: 0000000000000000
        RIP: 0010:__list_del_entry_valid+0x26/0x80
        Call Trace:
         <TASK>
         sctp_sched_dequeue_common+0x1c/0x90
         sctp_sched_prio_dequeue+0x67/0x80
         __sctp_outq_teardown+0x299/0x380
         sctp_outq_free+0x15/0x20
         sctp_association_free+0xc3/0x440
         sctp_do_sm+0x1ca7/0x2210
         sctp_assoc_bh_rcv+0x1f6/0x340
      
      This happens when calling sctp_sendmsg without connecting to server first.
      In this case, a data chunk already queues up in send queue of client side
      when processing the INIT_ACK from server in sctp_process_init() where it
      calls sctp_stream_init() to alloc stream_in. If it fails to alloc stream_in
      all stream_out will be freed in sctp_stream_init's err path. Then in the
      asoc freeing it will crash when dequeuing this data chunk as stream_out
      is missing.
      
      As we can't free stream out before dequeuing all data from send queue, and
      this patch is to fix it by moving the err path stream_out/in freeing in
      sctp_stream_init() to sctp_stream_free() which is eventually called when
      freeing the asoc in sctp_association_free(). This fix also makes the code
      in sctp_process_init() more clear.
      
      Note that in sctp_association_init() when it fails in sctp_stream_init(),
      sctp_association_free() will not be called, and in that case it should
      go to 'stream_free' err path to free stream instead of 'fail_init'.
      
      Fixes: 5bbbbe32
      
       ("sctp: introduce stream scheduler foundations")
      Reported-by: default avatarWei Chen <harperchen1110@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Link: https://lore.kernel.org/r/831a3dc100c4908ff76e5bcc363be97f2778bc0b.1658787066.git.lucien.xin@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6f350558
    • Alejandro Lucero's avatar
      sfc: disable softirqs for ptp TX · 510e5b37
      Alejandro Lucero authored
      [ Upstream commit 67c3b611 ]
      
      Sending a PTP packet can imply to use the normal TX driver datapath but
      invoked from the driver's ptp worker. The kernel generic TX code
      disables softirqs and preemption before calling specific driver TX code,
      but the ptp worker does not. Although current ptp driver functionality
      does not require it, there are several reasons for doing so:
      
         1) The invoked code is always executed with softirqs disabled for non
            PTP packets.
         2) Better if a ptp packet transmission is not interrupted by softirq
            handling which could lead to high latencies.
         3) netdev_xmit_more used by the TX code requires preemption to be
            disabled.
      
      Indeed a solution for dealing with kernel preemption state based on static
      kernel configuration is not possible since the introduction of dynamic
      preemption level configuration at boot time using the static calls
      functionality.
      
      Fixes: f79c957a
      
       ("drivers: net: sfc: use netdev_xmit_more helper")
      Signed-off-by: default avatarAlejandro Lucero <alejandro.lucero-palau@amd.com>
      Link: https://lore.kernel.org/r/20220726064504.49613-1-alejandro.lucero-palau@amd.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      510e5b37
    • Leo Yan's avatar
      perf symbol: Correct address for bss symbols · 3ec42508
      Leo Yan authored
      [ Upstream commit 2d86612a ]
      
      When using 'perf mem' and 'perf c2c', an issue is observed that tool
      reports the wrong offset for global data symbols.  This is a common
      issue on both x86 and Arm64 platforms.
      
      Let's see an example, for a test program, below is the disassembly for
      its .bss section which is dumped with objdump:
      
        ...
      
        Disassembly of section .bss:
      
        0000000000004040 <completed.0>:
        	...
      
        0000000000004080 <buf1>:
        	...
      
        00000000000040c0 <buf2>:
        	...
      
        0000000000004100 <thread>:
        	...
      
      First we used 'perf mem record' to run the test program and then used
      'perf --debug verbose=4 mem report' to observe what's the symbol info
      for 'buf1' and 'buf2' structures.
      
        # ./perf mem record -e ldlat-loads,ldlat-stores -- false_sharing.exe 8
        # ./perf --debug verbose=4 mem report
          ...
          dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 sh_addr: 0x4040 sh_offset: 0x3028
          symbol__new: buf2 0x30a8-0x30e8
          ...
          dso__load_sym_internal: adjusting symbol: st_value: 0x4080 sh_addr: 0x4040 sh_offset: 0x3028
          symbol__new: buf1 0x3068-0x30a8
          ...
      
      The perf tool relies on libelf to parse symbols, in executable and
      shared object files, 'st_value' holds a virtual address; 'sh_addr' is
      the address at which section's first byte should reside in memory, and
      'sh_offset' is the byte offset from the beginning of the file to the
      first byte in the section.  The perf tool uses below formula to convert
      a symbol's memory address to a file address:
      
        file_address = st_value - sh_addr + sh_offset
                          ^
                          ` Memory address
      
      We can see the final adjusted address ranges for buf1 and buf2 are
      [0x30a8-0x30e8) and [0x3068-0x30a8) respectively, apparently this is
      incorrect, in the code, the structure for 'buf1' and 'buf2' specifies
      compiler attribute with 64-byte alignment.
      
      The problem happens for 'sh_offset', libelf returns it as 0x3028 which
      is not 64-byte aligned, combining with disassembly, it's likely libelf
      doesn't respect the alignment for .bss section, therefore, it doesn't
      return the aligned value for 'sh_offset'.
      
      Suggested by Fangrui Song, ELF file contains program header which
      contains PT_LOAD segments, the fields p_vaddr and p_offset in PT_LOAD
      segments contain the execution info.  A better choice for converting
      memory address to file address is using the formula:
      
        file_address = st_value - p_vaddr + p_offset
      
      This patch introduces elf_read_program_header() which returns the
      program header based on the passed 'st_value', then it uses the formula
      above to calculate the symbol file address; and the debugging log is
      updated respectively.
      
      After applying the change:
      
        # ./perf --debug verbose=4 mem report
          ...
          dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 p_vaddr: 0x3d28 p_offset: 0x2d28
          symbol__new: buf2 0x30c0-0x3100
          ...
          dso__load_sym_internal: adjusting symbol: st_value: 0x4080 p_vaddr: 0x3d28 p_offset: 0x2d28
          symbol__new: buf1 0x3080-0x30c0
          ...
      
      Fixes: f17e04af
      
       ("perf report: Fix ELF symbol parsing")
      Reported-by: default avatarChang Rui <changruinj@gmail.com>
      Suggested-by: default avatarFangrui Song <maskray@google.com>
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220724060013.171050-2-leo.yan@linaro.org
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3ec42508
    • Jason Wang's avatar
      virtio-net: fix the race between refill work and close · 68078976
      Jason Wang authored
      [ Upstream commit 5a159128 ]
      
      We try using cancel_delayed_work_sync() to prevent the work from
      enabling NAPI. This is insufficient since we don't disable the source
      of the refill work scheduling. This means an NAPI poll callback after
      cancel_delayed_work_sync() can schedule the refill work then can
      re-enable the NAPI that leads to use-after-free [1].
      
      Since the work can enable NAPI, we can't simply disable NAPI before
      calling cancel_delayed_work_sync(). So fix this by introducing a
      dedicated boolean to control whether or not the work could be
      scheduled from NAPI.
      
      [1]
      ==================================================================
      BUG: KASAN: use-after-free in refill_work+0x43/0xd4
      Read of size 2 at addr ffff88810562c92e by task kworker/2:1/42
      
      CPU: 2 PID: 42 Comm: kworker/2:1 Not tainted 5.19.0-rc1+ #480
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      Workqueue: events refill_work
      Call Trace:
       <TASK>
       dump_stack_lvl+0x34/0x44
       print_report.cold+0xbb/0x6ac
       ? _printk+0xad/0xde
       ? refill_work+0x43/0xd4
       kasan_report+0xa8/0x130
       ? refill_work+0x43/0xd4
       refill_work+0x43/0xd4
       process_one_work+0x43d/0x780
       worker_thread+0x2a0/0x6f0
       ? process_one_work+0x780/0x780
       kthread+0x167/0x1a0
       ? kthread_exit+0x50/0x50
       ret_from_fork+0x22/0x30
       </TASK>
      ...
      
      Fixes: b2baed69
      
       ("virtio_net: set/cancel work on ndo_open/ndo_stop")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      68078976
    • Florian Westphal's avatar
      netfilter: nf_queue: do not allow packet truncation below transport header offset · 440dccd8
      Florian Westphal authored
      [ Upstream commit 99a63d36 ]
      
      Domingo Dirutigliano and Nicola Guerrera report kernel panic when
      sending nf_queue verdict with 1-byte nfta_payload attribute.
      
      The IP/IPv6 stack pulls the IP(v6) header from the packet after the
      input hook.
      
      If user truncates the packet below the header size, this skb_pull() will
      result in a malformed skb (skb->len < 0).
      
      Fixes: 7af4cc3f
      
       ("[NETFILTER]: Add "nfnetlink_queue" netfilter queue handler over nfnetlink")
      Reported-by: default avatarDomingo Dirutigliano <pwnzer0tt1@proton.me>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      440dccd8
    • Duoming Zhou's avatar
      sctp: fix sleep in atomic context bug in timer handlers · aeb2ff9f
      Duoming Zhou authored
      [ Upstream commit b89fc26f ]
      
      There are sleep in atomic context bugs in timer handlers of sctp
      such as sctp_generate_t3_rtx_event(), sctp_generate_probe_event(),
      sctp_generate_t1_init_event(), sctp_generate_timeout_event(),
      sctp_generate_t3_rtx_event() and so on.
      
      The root cause is sctp_sched_prio_init_sid() with GFP_KERNEL parameter
      that may sleep could be called by different timer handlers which is in
      interrupt context.
      
      One of the call paths that could trigger bug is shown below:
      
            (interrupt context)
      sctp_generate_probe_event
        sctp_do_sm
          sctp_side_effects
            sctp_cmd_interpreter
              sctp_outq_teardown
                sctp_outq_init
                  sctp_sched_set_sched
                    n->init_sid(..,GFP_KERNEL)
                      sctp_sched_prio_init_sid //may sleep
      
      This patch changes gfp_t parameter of init_sid in sctp_sched_set_sched()
      from GFP_KERNEL to GFP_ATOMIC in order to prevent sleep in atomic
      context bugs.
      
      Fixes: 5bbbbe32
      
       ("sctp: introduce stream scheduler foundations")
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Link: https://lore.kernel.org/r/20220723015809.11553-1-duoming@zju.edu.cn
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      aeb2ff9f
    • Michal Maloszewski's avatar
      i40e: Fix interface init with MSI interrupts (no MSI-X) · fad6caf9
      Michal Maloszewski authored
      [ Upstream commit 5fcbb711 ]
      
      Fix the inability to bring an interface up on a setup with
      only MSI interrupts enabled (no MSI-X).
      Solution is to add a default number of QPs = 1. This is enough,
      since without MSI-X support driver enables only a basic feature set.
      
      Fixes: bc6d33c8
      
       ("i40e: Fix the number of queues available to be mapped for use")
      Signed-off-by: default avatarDawid Lukwinski <dawid.lukwinski@intel.com>
      Signed-off-by: default avatarMichal Maloszewski <michal.maloszewski@intel.com>
      Tested-by: default avatarDave Switzer <david.switzer@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20220722175401.112572-1-anthony.l.nguyen@intel.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      fad6caf9
    • Kuniyuki Iwashima's avatar
      tcp: Fix data-races around sysctl_tcp_reflect_tos. · e4a7acd6
      Kuniyuki Iwashima authored
      [ Upstream commit 870e3a63 ]
      
      While reading sysctl_tcp_reflect_tos, it can be changed concurrently.
      Thus, we need to add READ_ONCE() to its readers.
      
      Fixes: ac8f1710
      
       ("tcp: reflect tos value received in SYN to the socket")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Acked-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e4a7acd6
    • Kuniyuki Iwashima's avatar
      tcp: Fix a data-race around sysctl_tcp_comp_sack_nr. · f310fb69
      Kuniyuki Iwashima authored
      [ Upstream commit 79f55473 ]
      
      While reading sysctl_tcp_comp_sack_nr, it can be changed concurrently.
      Thus, we need to add READ_ONCE() to its reader.
      
      Fixes: 9c21d2fc
      
       ("tcp: add tcp_comp_sack_nr sysctl")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f310fb69
    • Kuniyuki Iwashima's avatar
      tcp: Fix a data-race around sysctl_tcp_comp_sack_slack_ns. · d2476f20
      Kuniyuki Iwashima authored
      [ Upstream commit 22396941 ]
      
      While reading sysctl_tcp_comp_sack_slack_ns, it can be changed
      concurrently.  Thus, we need to add READ_ONCE() to its reader.
      
      Fixes: a70437cc
      
       ("tcp: add hrtimer slack to sack compression")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d2476f20
    • Kuniyuki Iwashima's avatar
      tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns. · 48323978
      Kuniyuki Iwashima authored
      [ Upstream commit 4866b2b0 ]
      
      While reading sysctl_tcp_comp_sack_delay_ns, it can be changed
      concurrently.  Thus, we need to add READ_ONCE() to its reader.
      
      Fixes: 6d82aa24
      
       ("tcp: add tcp_comp_sack_delay_ns sysctl")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      48323978
    • Jianglei Nie's avatar
      net: macsec: fix potential resource leak in macsec_add_rxsa() and macsec_add_txsa() · 530a4da3
      Jianglei Nie authored
      [ Upstream commit c7b205fb ]
      
      init_rx_sa() allocates relevant resource for rx_sa->stats and rx_sa->
      key.tfm with alloc_percpu() and macsec_alloc_tfm(). When some error
      occurs after init_rx_sa() is called in macsec_add_rxsa(), the function
      released rx_sa with kfree() without releasing rx_sa->stats and rx_sa->
      key.tfm, which will lead to a resource leak.
      
      We should call macsec_rxsa_put() instead of kfree() to decrease the ref
      count of rx_sa and release the relevant resource if the refcount is 0.
      The same bug exists in macsec_add_txsa() for tx_sa as well. This patch
      fixes the above two bugs.
      
      Fixes: 3cf3227a
      
       ("net: macsec: hardware offloading infrastructure")
      Signed-off-by: default avatarJianglei Nie <niejianglei2021@163.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      530a4da3
    • Sabrina Dubroca's avatar
      macsec: always read MACSEC_SA_ATTR_PN as a u64 · 6e0e0464
      Sabrina Dubroca authored
      [ Upstream commit c630d1fe ]
      
      Currently, MACSEC_SA_ATTR_PN is handled inconsistently, sometimes as a
      u32, sometimes forced into a u64 without checking the actual length of
      the attribute. Instead, we can use nla_get_u64 everywhere, which will
      read up to 64 bits into a u64, capped by the actual length of the
      attribute coming from userspace.
      
      This fixes several issues:
       - the check in validate_add_rxsa doesn't work with 32-bit attributes
       - the checks in validate_add_txsa and validate_upd_sa incorrectly
         reject X << 32 (with X != 0)
      
      Fixes: 48ef50fa
      
       ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6e0e0464
    • Sabrina Dubroca's avatar
      macsec: limit replay window size with XPN · 2daf0a12
      Sabrina Dubroca authored
      [ Upstream commit b07a0e20 ]
      
      IEEE 802.1AEbw-2013 (section 10.7.8) specifies that the maximum value
      of the replay window is 2^30-1, to help with recovery of the upper
      bits of the PN.
      
      To avoid leaving the existing macsec device in an inconsistent state
      if this test fails during changelink, reuse the cleanup mechanism
      introduced for HW offload. This wasn't needed until now because
      macsec_changelink_common could not fail during changelink, as
      modifying the cipher suite was not allowed.
      
      Finally, this must happen after handling IFLA_MACSEC_CIPHER_SUITE so
      that secy->xpn is set.
      
      Fixes: 48ef50fa
      
       ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2daf0a12
    • Sabrina Dubroca's avatar
      macsec: fix error message in macsec_add_rxsa and _txsa · 0755c9d0
      Sabrina Dubroca authored
      [ Upstream commit 3240eac4 ]
      
      The expected length is MACSEC_SALT_LEN, not MACSEC_SA_ATTR_SALT.
      
      Fixes: 48ef50fa
      
       ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0755c9d0
    • Sabrina Dubroca's avatar
      macsec: fix NULL deref in macsec_add_rxsa · 54c295a3
      Sabrina Dubroca authored
      [ Upstream commit f46040ee ]
      
      Commit 48ef50fa added a test on tb_sa[MACSEC_SA_ATTR_PN], but
      nothing guarantees that it's not NULL at this point. The same code was
      added to macsec_add_txsa, but there it's not a problem because
      validate_add_txsa checks that the MACSEC_SA_ATTR_PN attribute is
      present.
      
      Note: it's not possible to reproduce with iproute, because iproute
      doesn't allow creating an SA without specifying the PN.
      
      Fixes: 48ef50fa
      
       ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=208315
      Reported-by: default avatarFrantisek Sumsal <fsumsal@redhat.com>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      54c295a3
    • Xin Long's avatar
      Documentation: fix sctp_wmem in ip-sysctl.rst · 034bfadc
      Xin Long authored
      [ Upstream commit aa709da0 ]
      
      Since commit 1033990a ("sctp: implement memory accounting on tx path"),
      SCTP has supported memory accounting on tx path where 'sctp_wmem' is used
      by sk_wmem_schedule(). So we should fix the description for this option in
      ip-sysctl.rst accordingly.
      
      v1->v2:
        - Improve the description as Marcelo suggested.
      
      Fixes: 1033990a
      
       ("sctp: implement memory accounting on tx path")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      034bfadc
    • Kuniyuki Iwashima's avatar
      tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit. · 4aea33f4
      Kuniyuki Iwashima authored
      [ Upstream commit 2afdbe7b ]
      
      While reading sysctl_tcp_invalid_ratelimit, it can be changed
      concurrently.  Thus, we need to add READ_ONCE() to its reader.
      
      Fixes: 032ee423
      
       ("tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4aea33f4
    • Kuniyuki Iwashima's avatar
      tcp: Fix a data-race around sysctl_tcp_autocorking. · c4e6029a
      Kuniyuki Iwashima authored
      [ Upstream commit 85225e6f ]
      
      While reading sysctl_tcp_autocorking, it can be changed concurrently.
      Thus, we need to add READ_ONCE() to its reader.
      
      Fixes: f54b3111
      
       ("tcp: auto corking")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c4e6029a
    • Kuniyuki Iwashima's avatar
      tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen. · 83edb788
      Kuniyuki Iwashima authored
      [ Upstream commit 1330ffac ]
      
      While reading sysctl_tcp_min_rtt_wlen, it can be changed concurrently.
      Thus, we need to add READ_ONCE() to its reader.
      
      Fixes: f6722583
      
       ("tcp: track min RTT using windowed min-filter")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      83edb788
    • Kuniyuki Iwashima's avatar
      tcp: Fix a data-race around sysctl_tcp_min_tso_segs. · f47e7e5b
      Kuniyuki Iwashima authored
      [ Upstream commit e0bb4ab9 ]
      
      While reading sysctl_tcp_min_tso_segs, it can be changed concurrently.
      Thus, we need to add READ_ONCE() to its reader.
      
      Fixes: 95bd09eb
      
       ("tcp: TSO packets automatic sizing")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f47e7e5b
    • Liang He's avatar
      net: sungem_phy: Add of_node_put() for reference returned by of_get_parent() · 5584fe97
      Liang He authored
      [ Upstream commit ebbbe23f ]
      
      In bcm5421_init(), we should call of_node_put() for the reference
      returned by of_get_parent() which has increased the refcount.
      
      Fixes: 3c326fe9
      
       ("[PATCH] ppc64: Add new PHY to sungem")
      Signed-off-by: default avatarLiang He <windhl@126.com>
      Link: https://lore.kernel.org/r/20220720131003.1287426-1-windhl@126.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5584fe97
    • Kuniyuki Iwashima's avatar
      igmp: Fix data-races around sysctl_igmp_qrv. · b399ffaf
      Kuniyuki Iwashima authored
      [ Upstream commit 8ebcc62c ]
      
      While reading sysctl_igmp_qrv, it can be changed concurrently.
      Thus, we need to add READ_ONCE() to its readers.
      
      This test can be packed into a helper, so such changes will be in the
      follow-up series after net is merged into net-next.
      
        qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv);
      
      Fixes: a9fe8e29
      
       ("ipv4: implement igmp_qrv sysctl to tune igmp robustness variable")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b399ffaf
    • Maxim Mikityanskiy's avatar
      net/tls: Remove the context from the list in tls_device_down · 4c1318da
      Maxim Mikityanskiy authored
      commit f6336724 upstream.
      
      tls_device_down takes a reference on all contexts it's going to move to
      the degraded state (software fallback). If sk_destruct runs afterwards,
      it can reduce the reference counter back to 1 and return early without
      destroying the context. Then tls_device_down will release the reference
      it took and call tls_device_free_ctx. However, the context will still
      stay in tls_device_down_list forever. The list will contain an item,
      memory for which is released, making a memory corruption possible.
      
      Fix the above bug by properly removing the context from all lists before
      any call to tls_device_free_ctx.
      
      Fixes: 3740651b
      
       ("tls: Fix context leak on tls_device_down")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c1318da
    • Ziyang Xuan's avatar
      ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr · 8008e797
      Ziyang Xuan authored
      commit 85f0173d upstream.
      
      Change net device's MTU to smaller than IPV6_MIN_MTU or unregister
      device while matching route. That may trigger null-ptr-deref bug
      for ip6_ptr probability as following.
      
      =========================================================
      BUG: KASAN: null-ptr-deref in find_match.part.0+0x70/0x134
      Read of size 4 at addr 0000000000000308 by task ping6/263
      
      CPU: 2 PID: 263 Comm: ping6 Not tainted 5.19.0-rc7+ #14
      Call trace:
       dump_backtrace+0x1a8/0x230
       show_stack+0x20/0x70
       dump_stack_lvl+0x68/0x84
       print_report+0xc4/0x120
       kasan_report+0x84/0x120
       __asan_load4+0x94/0xd0
       find_match.part.0+0x70/0x134
       __find_rr_leaf+0x408/0x470
       fib6_table_lookup+0x264/0x540
       ip6_pol_route+0xf4/0x260
       ip6_pol_route_output+0x58/0x70
       fib6_rule_lookup+0x1a8/0x330
       ip6_route_output_flags_noref+0xd8/0x1a0
       ip6_route_output_flags+0x58/0x160
       ip6_dst_lookup_tail+0x5b4/0x85c
       ip6_dst_lookup_flow+0x98/0x120
       rawv6_sendmsg+0x49c/0xc70
       inet_sendmsg+0x68/0x94
      
      Reproducer as following:
      Firstly, prepare conditions:
      $ip netns add ns1
      $ip netns add ns2
      $ip link add veth1 type veth peer name veth2
      $ip link set veth1 netns ns1
      $ip link set veth2 netns ns2
      $ip netns exec ns1 ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1
      $ip netns exec ns2 ip -6 addr add 2001:0db8:0:f101::2/64 dev veth2
      $ip netns exec ns1 ifconfig veth1 up
      $ip netns exec ns2 ifconfig veth2 up
      $ip netns exec ns1 ip -6 route add 2000::/64 dev veth1 metric 1
      $ip netns exec ns2 ip -6 route add 2001::/64 dev veth2 metric 1
      
      Secondly, execute the following two commands in two ssh windows
      respectively:
      $ip netns exec ns1 sh
      $while true; do ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1; ip -6 route add 2000::/64 dev veth1 metric 1; ping6 2000::2; done
      
      $ip netns exec ns1 sh
      $while true; do ip link set veth1 mtu 1000; ip link set veth1 mtu 1500; sleep 5; done
      
      It is because ip6_ptr has been assigned to NULL in addrconf_ifdown() firstly,
      then ip6_ignore_linkdown() accesses ip6_ptr directly without NULL check.
      
      	cpu0			cpu1
      fib6_table_lookup
      __find_rr_leaf
      			addrconf_notify [ NETDEV_CHANGEMTU ]
      			addrconf_ifdown
      			RCU_INIT_POINTER(dev->ip6_ptr, NULL)
      find_match
      ip6_ignore_linkdown
      
      So we can add NULL check for ip6_ptr before using in ip6_ignore_linkdown() to
      fix the null-ptr-deref bug.
      
      Fixes: dcd1f572
      
       ("net/ipv6: Remove fib6_idev")
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20220728013307.656257-1-william.xuanziyang@huawei.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8008e797
    • Kuniyuki Iwashima's avatar
      net: ping6: Fix memleak in ipv6_renew_options(). · a84b8b53
      Kuniyuki Iwashima authored
      commit e2732600 upstream.
      
      When we close ping6 sockets, some resources are left unfreed because
      pingv6_prot is missing sk->sk_prot->destroy().  As reported by
      syzbot [0], just three syscalls leak 96 bytes and easily cause OOM.
      
          struct ipv6_sr_hdr *hdr;
          char data[24] = {0};
          int fd;
      
          hdr = (struct ipv6_sr_hdr *)data;
          hdr->hdrlen = 2;
          hdr->type = IPV6_SRCRT_TYPE_4;
      
          fd = socket(AF_INET6, SOCK_DGRAM, NEXTHDR_ICMP);
          setsockopt(fd, IPPROTO_IPV6, IPV6_RTHDR, data, 24);
          close(fd);
      
      To fix memory leaks, let's add a destroy function.
      
      Note the socket() syscall checks if the GID is within the range of
      net.ipv4.ping_group_range.  The default value is [1, 0] so that no
      GID meets the condition (1 <= GID <= 0).  Thus, the local DoS does
      not succeed until we change the default value.  However, at least
      Ubuntu/Fedora/RHEL loosen it.
      
          $ cat /usr/lib/sysctl.d/50-default.conf
          ...
          -net.ipv4.ping_group_range = 0 2147483647
      
      Also, there could be another path reported with these options, and
      some of them require CAP_NET_RAW.
      
        setsockopt
            IPV6_ADDRFORM (inet6_sk(sk)->pktoptions)
            IPV6_RECVPATHMTU (inet6_sk(sk)->rxpmtu)
            IPV6_HOPOPTS (inet6_sk(sk)->opt)
            IPV6_RTHDRDSTOPTS (inet6_sk(sk)->opt)
            IPV6_RTHDR (inet6_sk(sk)->opt)
            IPV6_DSTOPTS (inet6_sk(sk)->opt)
            IPV6_2292PKTOPTIONS (inet6_sk(sk)->opt)
      
        getsockopt
            IPV6_FLOWLABEL_MGR (inet6_sk(sk)->ipv6_fl_list)
      
      For the record, I left a different splat with syzbot's one.
      
        unreferenced object 0xffff888006270c60 (size 96):
          comm "repro2", pid 231, jiffies 4294696626 (age 13.118s)
          hex dump (first 32 bytes):
            01 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00  ....D...........
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          backtrace:
            [<00000000f6bc7ea9>] sock_kmalloc (net/core/sock.c:2564 net/core/sock.c:2554)
            [<000000006d699550>] do_ipv6_setsockopt.constprop.0 (net/ipv6/ipv6_sockglue.c:715)
            [<00000000c3c3b1f5>] ipv6_setsockopt (net/ipv6/ipv6_sockglue.c:1024)
            [<000000007096a025>] __sys_setsockopt (net/socket.c:2254)
            [<000000003a8ff47b>] __x64_sys_setsockopt (net/socket.c:2265 net/socket.c:2262 net/socket.c:2262)
            [<000000007c409dcb>] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
            [<00000000e939c4a9>] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      
      [0]: https://syzkaller.appspot.com/bug?extid=a8430774139ec3ab7176
      
      Fixes: 6d0bfe22
      
       ("net: ipv6: Add IPv6 support to the ping socket.")
      Reported-by: default avatar <syzbot+a8430774139ec3ab7176@syzkaller.appspotmail.com>
      Reported-by: default avatarAyushman Dutta <ayudutta@amazon.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20220728012220.46918-1-kuniyu@amazon.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a84b8b53
    • Kuniyuki Iwashima's avatar
      tcp: Fix a data-race around sysctl_tcp_challenge_ack_limit. · c37c7f35
      Kuniyuki Iwashima authored
      commit db3815a2 upstream.
      
      While reading sysctl_tcp_challenge_ack_limit, it can be changed
      concurrently.  Thus, we need to add READ_ONCE() to its reader.
      
      Fixes: 282f23c6
      
       ("tcp: implement RFC 5961 3.2")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c37c7f35
    • Kuniyuki Iwashima's avatar
      tcp: Fix a data-race around sysctl_tcp_limit_output_bytes. · 9ffb4fdf
      Kuniyuki Iwashima authored
      commit 9fb90193 upstream.
      
      While reading sysctl_tcp_limit_output_bytes, it can be changed
      concurrently.  Thus, we need to add READ_ONCE() to its reader.
      
      Fixes: 46d3ceab
      
       ("tcp: TCP Small Queues")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9ffb4fdf
    • Kuniyuki Iwashima's avatar
      tcp: Fix data-races around sysctl_tcp_moderate_rcvbuf. · 3e933125
      Kuniyuki Iwashima authored
      commit 78047648 upstream.
      
      While reading sysctl_tcp_moderate_rcvbuf, it can be changed
      concurrently.  Thus, we need to add READ_ONCE() to its readers.
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e933125
    • Wei Wang's avatar
      Revert "tcp: change pingpong threshold to 3" · 77ac046a
      Wei Wang authored
      commit 4d8f24ee upstream.
      
      This reverts commit 4a41f453.
      
      This to-be-reverted commit was meant to apply a stricter rule for the
      stack to enter pingpong mode. However, the condition used to check for
      interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is
      jiffy based and might be too coarse, which delays the stack entering
      pingpong mode.
      We revert this patch so that we no longer use the above condition to
      determine interactive session, and also reduce pingpong threshold to 1.
      
      Fixes: 4a41f453
      
       ("tcp: change pingpong threshold to 3")
      Reported-by: default avatarLemmyHuang <hlm3280@163.com>
      Suggested-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20220721204404.388396-1-weiwan@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      77ac046a
    • Liang He's avatar
      scsi: ufs: host: Hold reference returned by of_parse_phandle() · 54a73d65
      Liang He authored
      commit a3435afb upstream.
      
      In ufshcd_populate_vreg(), we should hold the reference returned by
      of_parse_phandle() and then use it to call of_node_put() for refcount
      balance.
      
      Link: https://lore.kernel.org/r/20220719071529.1081166-1-windhl@126.com
      Fixes: aa497613
      
       ("ufs: Add regulator enable support")
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarLiang He <windhl@126.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54a73d65