Skip to content
  1. Aug 31, 2022
    • Khazhismel Kumykov's avatar
      writeback: avoid use-after-free after removing device · 9a6c710f
      Khazhismel Kumykov authored
      commit f87904c0 upstream.
      
      When a disk is removed, bdi_unregister gets called to stop further
      writeback and wait for associated delayed work to complete.  However,
      wb_inode_writeback_end() may schedule bandwidth estimation dwork after
      this has completed, which can result in the timer attempting to access the
      just freed bdi_writeback.
      
      Fix this by checking if the bdi_writeback is alive, similar to when
      scheduling writeback work.
      
      Since this requires wb->work_lock, and wb_inode_writeback_end() may get
      called from interrupt, switch wb->work_lock to an irqsafe lock.
      
      Link: https://lkml.kernel.org/r/20220801155034.3772543-1-khazhy@google.com
      Fixes: 45a2966f
      
       ("writeback: fix bandwidth estimate for spiky workload")
      Signed-off-by: default avatarKhazhismel Kumykov <khazhy@google.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Michael Stapelberg <stapelberg+linux@google.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a6c710f
    • Siddh Raman Pant's avatar
      loop: Check for overflow while configuring loop · 9be7fa7e
      Siddh Raman Pant authored
      commit c490a0b5
      
       upstream.
      
      The userspace can configure a loop using an ioctl call, wherein
      a configuration of type loop_config is passed (see lo_ioctl()'s
      case on line 1550 of drivers/block/loop.c). This proceeds to call
      loop_configure() which in turn calls loop_set_status_from_info()
      (see line 1050 of loop.c), passing &config->info which is of type
      loop_info64*. This function then sets the appropriate values, like
      the offset.
      
      loop_device has lo_offset of type loff_t (see line 52 of loop.c),
      which is typdef-chained to long long, whereas loop_info64 has
      lo_offset of type __u64 (see line 56 of include/uapi/linux/loop.h).
      
      The function directly copies offset from info to the device as
      follows (See line 980 of loop.c):
      	lo->lo_offset = info->lo_offset;
      
      This results in an overflow, which triggers a warning in iomap_iter()
      due to a call to iomap_iter_done() which has:
      	WARN_ON_ONCE(iter->iomap.offset > iter->pos);
      
      Thus, check for negative value during loop_set_status_from_info().
      
      Bug report: https://syzkaller.appspot.com/bug?id=c620fe14aac810396d3c3edc9ad73848bf69a29e
      
      Reported-and-tested-by: default avatar <syzbot+a8e049cd3abd342936b6@syzkaller.appspotmail.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarSiddh Raman Pant <code@siddh.me>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220823160810.181275-1-code@siddh.me
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9be7fa7e
    • Jan Beulich's avatar
      x86/PAT: Have pat_enabled() properly reflect state when running on Xen · a210408b
      Jan Beulich authored
      commit 72cbc8f0 upstream.
      
      After commit ID in the Fixes: tag, pat_enabled() returns false (because
      of PAT initialization being suppressed in the absence of MTRRs being
      announced to be available).
      
      This has become a problem: the i915 driver now fails to initialize when
      running PV on Xen (i915_gem_object_pin_map() is where I located the
      induced failure), and its error handling is flaky enough to (at least
      sometimes) result in a hung system.
      
      Yet even beyond that problem the keying of the use of WC mappings to
      pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular
      graphics frame buffer accesses would have been quite a bit less optimal
      than possible.
      
      Arrange for the function to return true in such environments, without
      undermining the rest of PAT MSR management logic considering PAT to be
      disabled: specifically, no writes to the PAT MSR should occur.
      
      For the new boolean to live in .init.data, init_cache_modes() also needs
      moving to .init.text (where it could/should have lived already before).
      
        [ bp: This is the "small fix" variant for stable. It'll get replaced
          with a proper PAT and MTRR detection split upstream but that is too
          involved for a stable backport.
          - additional touchups to commit msg. Use cpu_feature_enabled(). ]
      
      Fixes: bdd8b6c9
      
       ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()")
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: <stable@vger.kernel.org>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Lucas De Marchi <lucas.demarchi@intel.com>
      Link: https://lore.kernel.org/r/9385fa60-fa5d-f559-a137-6608408f88b0@suse.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a210408b
    • Peter Zijlstra's avatar
      x86/nospec: Unwreck the RSB stuffing · d9975eea
      Peter Zijlstra authored
      commit 4e3aa923 upstream.
      
      Commit 2b129932
      
       ("x86/speculation: Add RSB VM Exit protections")
      made a right mess of the RSB stuffing, rewrite the whole thing to not
      suck.
      
      Thanks to Andrew for the enlightening comment about Post-Barrier RSB
      things so we can make this code less magical.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/YvuNdDWoUZSBjYcm@worktop.programming.kicks-ass.net
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9975eea
    • Pawan Gupta's avatar
      x86/bugs: Add "unknown" reporting for MMIO Stale Data · 9d0a2105
      Pawan Gupta authored
      commit 7df54884 upstream.
      
      Older Intel CPUs that are not in the affected processor list for MMIO
      Stale Data vulnerabilities currently report "Not affected" in sysfs,
      which may not be correct. Vulnerability status for these older CPUs is
      unknown.
      
      Add known-not-affected CPUs to the whitelist. Report "unknown"
      mitigation status for CPUs that are not in blacklist, whitelist and also
      don't enumerate MSR ARCH_CAPABILITIES bits that reflect hardware
      immunity to MMIO Stale Data vulnerabilities.
      
      Mitigation is not deployed when the status is unknown.
      
        [ bp: Massage, fixup. ]
      
      Fixes: 8d50cdf8
      
       ("x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data")
      Suggested-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Suggested-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/a932c154772f2121794a5f2eded1a11013114711.1657846269.git.pawan.kumar.gupta@linux.intel.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d0a2105
    • Tom Lendacky's avatar
      x86/sev: Don't use cc_platform_has() for early SEV-SNP calls · 0666703c
      Tom Lendacky authored
      commit cdaa0a40 upstream.
      
      When running identity-mapped and depending on the kernel configuration,
      it is possible that the compiler uses jump tables when generating code
      for cc_platform_has().
      
      This causes a boot failure because the jump table uses un-mapped kernel
      virtual addresses, not identity-mapped addresses. This has been seen
      with CONFIG_RETPOLINE=n.
      
      Similar to sme_encrypt_kernel(), use an open-coded direct check for the
      status of SNP rather than trying to eliminate the jump table. This
      preserves any code optimization in cc_platform_has() that can be useful
      post boot. It also limits the changes to SEV-specific files so that
      future compiler features won't necessarily require possible build changes
      just because they are not compatible with running identity-mapped.
      
        [ bp: Massage commit message. ]
      
      Fixes: 5e5ccff6
      
       ("x86/sev: Add helper for validating pages in early enc attribute changes")
      Reported-by: default avatarSean Christopherson <seanjc@google.com>
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org> # 5.19.x
      Link: https://lore.kernel.org/all/YqfabnTRxFSM+LoX@google.com/
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0666703c
    • Chen Zhongjin's avatar
      x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry · a1029075
      Chen Zhongjin authored
      commit fc2e426b upstream.
      
      When meeting ftrace trampolines in ORC unwinding, unwinder uses address
      of ftrace_{regs_}call address to find the ORC entry, which gets next frame at
      sp+176.
      
      If there is an IRQ hitting at sub $0xa8,%rsp, the next frame should be
      sp+8 instead of 176. It makes unwinder skip correct frame and throw
      warnings such as "wrong direction" or "can't access registers", etc,
      depending on the content of the incorrect frame address.
      
      By adding the base address ftrace_{regs_}caller with the offset
      *ip - ops->trampoline*, we can get the correct address to find the ORC entry.
      
      Also change "caller" to "tramp_addr" to make variable name conform to
      its content.
      
      [ mingo: Clarified the changelog a bit. ]
      
      Fixes: 6be7fa3c
      
       ("ftrace, orc, x86: Handle ftrace dynamically allocated trampolines")
      Signed-off-by: default avatarChen Zhongjin <chenzhongjin@huawei.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20220819084334.244016-1-chenzhongjin@huawei.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1029075
    • Juergen Gross's avatar
      x86/entry: Fix entry_INT80_compat for Xen PV guests · d1a6d0a9
      Juergen Gross authored
      commit 5b9f0c4d upstream.
      
      Commit
      
        c89191ce ("x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS")
      
      missed one use case of SWAPGS in entry_INT80_compat(). Removing of
      the SWAPGS macro led to asm just using "swapgs", as it is accepting
      instructions in capital letters, too.
      
      This in turn leads to splats in Xen PV guests like:
      
        [   36.145223] general protection fault, maybe for address 0x2d: 0000 [#1] PREEMPT SMP NOPTI
        [   36.145794] CPU: 2 PID: 1847 Comm: ld-linux.so.2 Not tainted 5.19.1-1-default #1 \
      	  openSUSE Tumbleweed f3b44bfb672cdb9f235aff53b57724eba8b9411b
        [   36.146608] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 11/14/2013
        [   36.148126] RIP: e030:entry_INT80_compat+0x3/0xa3
      
      Fix that by open coding this single instance of the SWAPGS macro.
      
      Fixes: c89191ce
      
       ("x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS")
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Cc: <stable@vger.kernel.org> # 5.19
      Link: https://lore.kernel.org/r/20220816071137.4893-1-jgross@suse.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1a6d0a9
    • Kan Liang's avatar
      perf/x86/lbr: Enable the branch type for the Arch LBR by default · 66f2f9f2
      Kan Liang authored
      commit 32ba156d upstream.
      
      On the platform with Arch LBR, the HW raw branch type encoding may leak
      to the perf tool when the SAVE_TYPE option is not set.
      
      In the intel_pmu_store_lbr(), the HW raw branch type is stored in
      lbr_entries[].type. If the SAVE_TYPE option is set, the
      lbr_entries[].type will be converted into the generic PERF_BR_* type
      in the intel_pmu_lbr_filter() and exposed to the user tools.
      But if the SAVE_TYPE option is NOT set by the user, the current perf
      kernel doesn't clear the field. The HW raw branch type leaks.
      
      There are two solutions to fix the issue for the Arch LBR.
      One is to clear the field if the SAVE_TYPE option is NOT set.
      The other solution is to unconditionally convert the branch type and
      expose the generic type to the user tools.
      
      The latter is implemented here, because
      - The branch type is valuable information. I don't see a case where
        you would not benefit from the branch type. (Stephane Eranian)
      - Not having the branch type DOES NOT save any space in the
        branch record (Stephane Eranian)
      - The Arch LBR HW can retrieve the common branch types from the
        LBR_INFO. It doesn't require the high overhead SW disassemble.
      
      Fixes: 47125db2
      
       ("perf/x86/intel/lbr: Support Architectural LBR")
      Reported-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20220816125612.2042397-1-kan.liang@linux.intel.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      66f2f9f2
    • Kan Liang's avatar
      perf/x86/intel: Fix pebs event constraints for ADL · e31430b2
      Kan Liang authored
      commit cde643ff upstream.
      
      According to the latest event list, the LOAD_LATENCY PEBS event only
      works on the GP counter 0 and 1 for ADL and RPL.
      
      Update the pebs event constraints table.
      
      Fixes: f83d2f91
      
       ("perf/x86/intel: Add Alder Lake Hybrid support")
      Reported-by: default avatarAmmy Yi <ammy.yi@intel.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20220818184429.2355857-1-kan.liang@linux.intel.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e31430b2
    • Michael Roth's avatar
      x86/boot: Don't propagate uninitialized boot_params->cc_blob_address · ffbf5efd
      Michael Roth authored
      commit 4b1c7424 upstream.
      
      In some cases, bootloaders will leave boot_params->cc_blob_address
      uninitialized rather than zeroing it out. This field is only meant to be
      set by the boot/compressed kernel in order to pass information to the
      uncompressed kernel when SEV-SNP support is enabled.
      
      Therefore, there are no cases where the bootloader-provided values
      should be treated as anything other than garbage. Otherwise, the
      uncompressed kernel may attempt to access this bogus address, leading to
      a crash during early boot.
      
      Normally, sanitize_boot_params() would be used to clear out such fields
      but that happens too late: sev_enable() may have already initialized
      it to a valid value that should not be zeroed out. Instead, have
      sev_enable() zero it out unconditionally beforehand.
      
      Also ensure this happens for !CONFIG_AMD_MEM_ENCRYPT as well by also
      including this handling in the sev_enable() stub function.
      
        [ bp: Massage commit message and comments. ]
      
      Fixes: b190a043
      
       ("x86/sev: Add SEV-SNP feature detection/setup")
      Reported-by: default avatarJeremi Piotrowski <jpiotrowski@linux.microsoft.com>
      Reported-by: default avatar <watnuss@gmx.de>
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216387
      Link: https://lore.kernel.org/r/20220823160734.89036-1-michael.roth@amd.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ffbf5efd
    • Filipe Manana's avatar
      btrfs: update generation of hole file extent item when merging holes · 1fc82cdd
      Filipe Manana authored
      commit e6e3dec6 upstream.
      
      When punching a hole into a file range that is adjacent with a hole and we
      are not using the no-holes feature, we expand the range of the adjacent
      file extent item that represents a hole, to save metadata space.
      
      However we don't update the generation of hole file extent item, which
      means a full fsync will not log that file extent item if the fsync happens
      in a later transaction (since commit 7f30c072 ("btrfs: stop copying
      old file extents when doing a full fsync")).
      
      For example, if we do this:
      
          $ mkfs.btrfs -f -O ^no-holes /dev/sdb
          $ mount /dev/sdb /mnt
          $ xfs_io -f -c "pwrite -S 0xab 2M 2M" /mnt/foobar
          $ sync
      
      We end up with 2 file extent items in our file:
      
      1) One that represents the hole for the file range [0, 2M), with a
         generation of 7;
      
      2) Another one that represents an extent covering the range [2M, 4M).
      
      After that if we do the following:
      
          $ xfs_io -c "fpunch 2M 2M" /mnt/foobar
      
      We end up with a single file extent item in the file, which represents a
      hole for the range [0, 4M) and with a generation of 7 - because we end
      dropping the data extent for range [2M, 4M) and then update the file
      extent item that represented the hole at [0, 2M), by increasing
      length from 2M to 4M.
      
      Then doing a full fsync and power failing:
      
          $ xfs_io -c "fsync" /mnt/foobar
          <power failure>
      
      will result in the full fsync not logging the file extent item that
      represents the hole for the range [0, 4M), because its generation is 7,
      which is lower than the generation of the current transaction (8).
      As a consequence, after mounting again the filesystem (after log replay),
      the region [2M, 4M) does not have a hole, it still points to the
      previous data extent.
      
      So fix this by always updating the generation of existing file extent
      items representing holes when we merge/expand them. This solves the
      problem and it's the same approach as when we merge prealloc extents that
      got written (at btrfs_mark_extent_written()). Setting the generation to
      the current transaction's generation is also what we do when merging
      the new hole extent map with the previous one or the next one.
      
      A test case for fstests, covering both cases of hole file extent item
      merging (to the left and to the right), will be sent soon.
      
      Fixes: 7f30c072
      
       ("btrfs: stop copying old file extents when doing a full fsync")
      CC: stable@vger.kernel.org # 5.18+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1fc82cdd
    • Zixuan Fu's avatar
      btrfs: fix possible memory leak in btrfs_get_dev_args_from_path() · 4b124ad8
      Zixuan Fu authored
      commit 9ea0106a
      
       upstream.
      
      In btrfs_get_dev_args_from_path(), btrfs_get_bdev_and_sb() can fail if
      the path is invalid. In this case, btrfs_get_dev_args_from_path()
      returns directly without freeing args->uuid and args->fsid allocated
      before, which causes memory leak.
      
      To fix these possible leaks, when btrfs_get_bdev_and_sb() fails,
      btrfs_put_dev_args_from_path() is called to clean up the memory.
      
      Reported-by: default avatarTOTE Robot <oslab@tsinghua.edu.cn>
      Fixes: faa775c4
      
       ("btrfs: add a btrfs_get_dev_args_from_path helper")
      CC: stable@vger.kernel.org # 5.16
      Reviewed-by: default avatarBoris Burkov <boris@bur.io>
      Signed-off-by: default avatarZixuan Fu <r33s3n6@gmail.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4b124ad8
    • Goldwyn Rodrigues's avatar
      btrfs: check if root is readonly while setting security xattr · 0f72e355
      Goldwyn Rodrigues authored
      commit b5111127
      
       upstream.
      
      For a filesystem which has btrfs read-only property set to true, all
      write operations including xattr should be denied. However, security
      xattr can still be changed even if btrfs ro property is true.
      
      This happens because xattr_permission() does not have any restrictions
      on security.*, system.*  and in some cases trusted.* from VFS and
      the decision is left to the underlying filesystem. See comments in
      xattr_permission() for more details.
      
      This patch checks if the root is read-only before performing the set
      xattr operation.
      
      Testcase:
      
        DEV=/dev/vdb
        MNT=/mnt
      
        mkfs.btrfs -f $DEV
        mount $DEV $MNT
        echo "file one" > $MNT/f1
      
        setfattr -n "security.one" -v 2 $MNT/f1
        btrfs property set /mnt ro true
      
        setfattr -n "security.one" -v 1 $MNT/f1
      
        umount $MNT
      
      CC: stable@vger.kernel.org # 4.9+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarGoldwyn Rodrigues <rgoldwyn@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f72e355
    • Omar Sandoval's avatar
      btrfs: fix space cache corruption and potential double allocations · a2e54eb6
      Omar Sandoval authored
      commit ced8ecf0 upstream.
      
      When testing space_cache v2 on a large set of machines, we encountered a
      few symptoms:
      
      1. "unable to add free space :-17" (EEXIST) errors.
      2. Missing free space info items, sometimes caught with a "missing free
         space info for X" error.
      3. Double-accounted space: ranges that were allocated in the extent tree
         and also marked as free in the free space tree, ranges that were
         marked as allocated twice in the extent tree, or ranges that were
         marked as free twice in the free space tree. If the latter made it
         onto disk, the next reboot would hit the BUG_ON() in
         add_new_free_space().
      4. On some hosts with no on-disk corruption or error messages, the
         in-memory space cache (dumped with drgn) disagreed with the free
         space tree.
      
      All of these symptoms have the same underlying cause: a race between
      caching the free space for a block group and returning free space to the
      in-memory space cache for pinned extents causes us to double-add a free
      range to the space cache. This race exists when free space is cached
      from the free space tree (space_cache=v2) or the extent tree
      (nospace_cache, or space_cache=v1 if the cache needs to be regenerated).
      struct btrfs_block_group::last_byte_to_unpin and struct
      btrfs_block_group::progress are supposed to protect against this race,
      but commit d0c2f4fa ("btrfs: make concurrent fsyncs wait less when
      waiting for a transaction commit") subtly broke this by allowing
      multiple transactions to be unpinning extents at the same time.
      
      Specifically, the race is as follows:
      
      1. An extent is deleted from an uncached block group in transaction A.
      2. btrfs_commit_transaction() is called for transaction A.
      3. btrfs_run_delayed_refs() -> __btrfs_free_extent() runs the delayed
         ref for the deleted extent.
      4. __btrfs_free_extent() -> do_free_extent_accounting() ->
         add_to_free_space_tree() adds the deleted extent back to the free
         space tree.
      5. do_free_extent_accounting() -> btrfs_update_block_group() ->
         btrfs_cache_block_group() queues up the block group to get cached.
         block_group->progress is set to block_group->start.
      6. btrfs_commit_transaction() for transaction A calls
         switch_commit_roots(). It sets block_group->last_byte_to_unpin to
         block_group->progress, which is block_group->start because the block
         group hasn't been cached yet.
      7. The caching thread gets to our block group. Since the commit roots
         were already switched, load_free_space_tree() sees the deleted extent
         as free and adds it to the space cache. It finishes caching and sets
         block_group->progress to U64_MAX.
      8. btrfs_commit_transaction() advances transaction A to
         TRANS_STATE_SUPER_COMMITTED.
      9. fsync calls btrfs_commit_transaction() for transaction B. Since
         transaction A is already in TRANS_STATE_SUPER_COMMITTED and the
         commit is for fsync, it advances.
      10. btrfs_commit_transaction() for transaction B calls
          switch_commit_roots(). This time, the block group has already been
          cached, so it sets block_group->last_byte_to_unpin to U64_MAX.
      11. btrfs_commit_transaction() for transaction A calls
          btrfs_finish_extent_commit(), which calls unpin_extent_range() for
          the deleted extent. It sees last_byte_to_unpin set to U64_MAX (by
          transaction B!), so it adds the deleted extent to the space cache
          again!
      
      This explains all of our symptoms above:
      
      * If the sequence of events is exactly as described above, when the free
        space is re-added in step 11, it will fail with EEXIST.
      * If another thread reallocates the deleted extent in between steps 7
        and 11, then step 11 will silently re-add that space to the space
        cache as free even though it is actually allocated. Then, if that
        space is allocated *again*, the free space tree will be corrupted
        (namely, the wrong item will be deleted).
      * If we don't catch this free space tree corruption, it will continue
        to get worse as extents are deleted and reallocated.
      
      The v1 space_cache is synchronously loaded when an extent is deleted
      (btrfs_update_block_group() with alloc=0 calls btrfs_cache_block_group()
      with load_cache_only=1), so it is not normally affected by this bug.
      However, as noted above, if we fail to load the space cache, we will
      fall back to caching from the extent tree and may hit this bug.
      
      The easiest fix for this race is to also make caching from the free
      space tree or extent tree synchronous. Josef tested this and found no
      performance regressions.
      
      A few extra changes fall out of this change. Namely, this fix does the
      following, with step 2 being the crucial fix:
      
      1. Factor btrfs_caching_ctl_wait_done() out of
         btrfs_wait_block_group_cache_done() to allow waiting on a caching_ctl
         that we already hold a reference to.
      2. Change the call in btrfs_cache_block_group() of
         btrfs_wait_space_cache_v1_finished() to
         btrfs_caching_ctl_wait_done(), which makes us wait regardless of the
         space_cache option.
      3. Delete the now unused btrfs_wait_space_cache_v1_finished() and
         space_cache_v1_done().
      4. Change btrfs_cache_block_group()'s `int load_cache_only` parameter to
         `bool wait` to more accurately describe its new meaning.
      5. Change a few callers which had a separate call to
         btrfs_wait_block_group_cache_done() to use wait = true instead.
      6. Make btrfs_wait_block_group_cache_done() static now that it's not
         used outside of block-group.c anymore.
      
      Fixes: d0c2f4fa
      
       ("btrfs: make concurrent fsyncs wait less when waiting for a transaction commit")
      CC: stable@vger.kernel.org # 5.12+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a2e54eb6
    • Anand Jain's avatar
      btrfs: add info when mount fails due to stale replace target · b4656b25
      Anand Jain authored
      commit f2c3bec2
      
       upstream.
      
      If the replace target device reappears after the suspended replace is
      cancelled, it blocks the mount operation as it can't find the matching
      replace-item in the metadata. As shown below,
      
         BTRFS error (device sda5): replace devid present without an active replace item
      
      To overcome this situation, the user can run the command
      
         btrfs device scan --forget <replace target device>
      
      and try the mount command again. And also, to avoid repeating the issue,
      superblock on the devid=0 must be wiped.
      
         wipefs -a device-path-to-devid=0.
      
      This patch adds some info when this situation occurs.
      
      Reported-by: default avatarSamuel Greiner <samuel@balkonien.org>
      Link: https://lore.kernel.org/linux-btrfs/b4f62b10-b295-26ea-71f9-9a5c9299d42c@balkonien.org/T/
      CC: stable@vger.kernel.org # 5.0+
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4656b25
    • Anand Jain's avatar
      btrfs: replace: drop assert for suspended replace · 955d400e
      Anand Jain authored
      commit 59a39919 upstream.
      
      If the filesystem mounts with the replace-operation in a suspended state
      and try to cancel the suspended replace-operation, we hit the assert. The
      assert came from the commit fe97e2e1 ("btrfs: dev-replace: replace's
      scrub must not be running in suspended state") that was actually not
      required. So just remove it.
      
       $ mount /dev/sda5 /btrfs
      
          BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing
          BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded'
      
       $ mount -o degraded /dev/sda5 /btrfs <-- success.
      
       $ btrfs replace cancel /btrfs
      
          kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131
          kernel: ------------[ cut here ]------------
          kernel: kernel BUG at fs/btrfs/ctree.h:3750!
      
      After the patch:
      
       $ btrfs replace cancel /btrfs
      
          BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled
      
      Fixes: fe97e2e1
      
       ("btrfs: dev-replace: replace's scrub must not be running in suspended state")
      CC: stable@vger.kernel.org # 5.0+
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      955d400e
    • Filipe Manana's avatar
      btrfs: fix silent failure when deleting root reference · e08fcb12
      Filipe Manana authored
      commit 47bf225a upstream.
      
      At btrfs_del_root_ref(), if btrfs_search_slot() returns an error, we end
      up returning from the function with a value of 0 (success). This happens
      because the function returns the value stored in the variable 'err',
      which is 0, while the error value we got from btrfs_search_slot() is
      stored in the 'ret' variable.
      
      So fix it by setting 'err' with the error value.
      
      Fixes: 8289ed9f
      
       ("btrfs: replace the BUG_ON in btrfs_del_root_ref with proper error handling")
      CC: stable@vger.kernel.org # 5.16+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e08fcb12
    • Aleksander Jan Bajkowski's avatar
      net: lantiq_xrx200: restore buffer if memory allocation failed · 3ef2786e
      Aleksander Jan Bajkowski authored
      [ Upstream commit c9c3b177 ]
      
      In a situation where memory allocation fails, an invalid buffer address
      is stored. When this descriptor is used again, the system panics in the
      build_skb() function when accessing memory.
      
      Fixes: 7ea6cd16
      
       ("lantiq: net: fix duplicated skb in rx descriptor ring")
      Signed-off-by: default avatarAleksander Jan Bajkowski <olek2@wp.pl>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3ef2786e
    • Aleksander Jan Bajkowski's avatar
      net: lantiq_xrx200: fix lock under memory pressure · 0d9981b0
      Aleksander Jan Bajkowski authored
      [ Upstream commit c4b6e934 ]
      
      When the xrx200_hw_receive() function returns -ENOMEM, the NAPI poll
      function immediately returns an error.
      This is incorrect for two reasons:
      * the function terminates without enabling interrupts or scheduling NAPI,
      * the error code (-ENOMEM) is returned instead of the number of received
      packets.
      
      After the first memory allocation failure occurs, packet reception is
      locked due to disabled interrupts from DMA..
      
      Fixes: fe1a5642
      
       ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver")
      Signed-off-by: default avatarAleksander Jan Bajkowski <olek2@wp.pl>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0d9981b0
    • Aleksander Jan Bajkowski's avatar
      net: lantiq_xrx200: confirm skb is allocated before using · 73f47586
      Aleksander Jan Bajkowski authored
      [ Upstream commit c8b04370 ]
      
      xrx200_hw_receive() assumes build_skb() always works and goes straight
      to skb_reserve(). However, build_skb() can fail under memory pressure.
      
      Add a check in case build_skb() failed to allocate and return NULL.
      
      Fixes: e0155935
      
       ("net: lantiq_xrx200: convert to build_skb")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarAleksander Jan Bajkowski <olek2@wp.pl>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      73f47586
    • Heiner Kallweit's avatar
      net: stmmac: work around sporadic tx issue on link-up · 27a5ab8f
      Heiner Kallweit authored
      [ Upstream commit a3a57bf0 ]
      
      This is a follow-up to the discussion in [0]. It seems to me that
      at least the IP version used on Amlogic SoC's sometimes has a problem
      if register MAC_CTRL_REG is written whilst the chip is still processing
      a previous write. But that's just a guess.
      Adding a delay between two writes to this register helps, but we can
      also simply omit the offending second write. This patch uses the second
      approach and is based on a suggestion from Qi Duan.
      Benefit of this approach is that we can save few register writes, also
      on not affected chip versions.
      
      [0] https://www.spinics.net/lists/netdev/msg831526.html
      
      Fixes: bfab27a1
      
       ("stmmac: add the experimental PCI support")
      Suggested-by: default avatarQi Duan <qi.duan@amlogic.com>
      Suggested-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Link: https://lore.kernel.org/r/e99857ce-bd90-5093-ca8c-8cd480b5a0a2@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      27a5ab8f
    • R Mohamed Shah's avatar
      ionic: VF initial random MAC address if no assigned mac · c830d712
      R Mohamed Shah authored
      [ Upstream commit 19058be7 ]
      
      Assign a random mac address to the VF interface station
      address if it boots with a zero mac address in order to match
      similar behavior seen in other VF drivers.  Handle the errors
      where the older firmware does not allow the VF to set its own
      station address.
      
      Newer firmware will allow the VF to set the station mac address
      if it hasn't already been set administratively through the PF.
      Setting it will also be allowed if the VF has trust.
      
      Fixes: fbb39807
      
       ("ionic: support sr-iov operations")
      Signed-off-by: default avatarR Mohamed Shah <mohamed@pensando.io>
      Signed-off-by: default avatarShannon Nelson <snelson@pensando.io>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c830d712
    • Shannon Nelson's avatar
      ionic: fix up issues with handling EAGAIN on FW cmds · 79e77fb1
      Shannon Nelson authored
      [ Upstream commit 0fc4dd45 ]
      
      In looping on FW update tests we occasionally see the
      FW_ACTIVATE_STATUS command fail while it is in its EAGAIN loop
      waiting for the FW activate step to finsh inside the FW.  The
      firmware is complaining that the done bit is set when a new
      dev_cmd is going to be processed.
      
      Doing a clean on the cmd registers and doorbell before exiting
      the wait-for-done and cleaning the done bit before the sleep
      prevents this from occurring.
      
      Fixes: fbfb8031
      
       ("ionic: Add hardware init and device commands")
      Signed-off-by: default avatarShannon Nelson <snelson@pensando.io>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      79e77fb1
    • Shannon Nelson's avatar
      ionic: clear broken state on generation change · 94d71d99
      Shannon Nelson authored
      [ Upstream commit 9cb9dadb ]
      
      There is a case found in heavy testing where a link flap happens just
      before a firmware Recovery event and the driver gets stuck in the
      BROKEN state.  This comes from the driver getting interrupted by a FW
      generation change when coming back up from the link flap, and the call
      to ionic_start_queues() in ionic_link_status_check() fails.  This can be
      addressed by having the fw_up code clear the BROKEN bit if seen, rather
      than waiting for a user to manually force the interface down and then
      back up.
      
      Fixes: 9e8eaf84
      
       ("ionic: stop watchdog when in broken state")
      Signed-off-by: default avatarShannon Nelson <snelson@pensando.io>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      94d71d99
    • David Howells's avatar
      rxrpc: Fix locking in rxrpc's sendmsg · 091dc91e
      David Howells authored
      [ Upstream commit b0f571ec ]
      
      Fix three bugs in the rxrpc's sendmsg implementation:
      
       (1) rxrpc_new_client_call() should release the socket lock when returning
           an error from rxrpc_get_call_slot().
      
       (2) rxrpc_wait_for_tx_window_intr() will return without the call mutex
           held in the event that we're interrupted by a signal whilst waiting
           for tx space on the socket or relocking the call mutex afterwards.
      
           Fix this by: (a) moving the unlock/lock of the call mutex up to
           rxrpc_send_data() such that the lock is not held around all of
           rxrpc_wait_for_tx_window*() and (b) indicating to higher callers
           whether we're return with the lock dropped.  Note that this means
           recvmsg() will not block on this call whilst we're waiting.
      
       (3) After dropping and regaining the call mutex, rxrpc_send_data() needs
           to go and recheck the state of the tx_pending buffer and the
           tx_total_len check in case we raced with another sendmsg() on the same
           call.
      
      Thinking on this some more, it might make sense to have different locks for
      sendmsg() and recvmsg().  There's probably no need to make recvmsg() wait
      for sendmsg().  It does mean that recvmsg() can return MSG_EOR indicating
      that a call is dead before a sendmsg() to that call returns - but that can
      currently happen anyway.
      
      Without fix (2), something like the following can be induced:
      
      	WARNING: bad unlock balance detected!
      	5.16.0-rc6-syzkaller #0 Not tainted
      	-------------------------------------
      	syz-executor011/3597 is trying to release lock (&call->user_mutex) at:
      	[<ffffffff885163a3>] rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
      	but there are no more locks to release!
      
      	other info that might help us debug this:
      	no locks held by syz-executor011/3597.
      	...
      	Call Trace:
      	 <TASK>
      	 __dump_stack lib/dump_stack.c:88 [inline]
      	 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
      	 print_unlock_imbalance_bug include/trace/events/lock.h:58 [inline]
      	 __lock_release kernel/locking/lockdep.c:5306 [inline]
      	 lock_release.cold+0x49/0x4e kernel/locking/lockdep.c:5657
      	 __mutex_unlock_slowpath+0x99/0x5e0 kernel/locking/mutex.c:900
      	 rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
      	 rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:561
      	 sock_sendmsg_nosec net/socket.c:704 [inline]
      	 sock_sendmsg+0xcf/0x120 net/socket.c:724
      	 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
      	 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
      	 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
      	 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      	 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      	 entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [Thanks to Hawkins Jiawei and Khalid Masum for their attempts to fix this]
      
      Fixes: bc5e3a54
      
       ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
      Reported-by: default avatar <syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Tested-by: default avatar <syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com>
      cc: Hawkins Jiawei <yin31149@gmail.com>
      cc: Khalid Masum <khalid.masum.92@gmail.com>
      cc: Dan Carpenter <dan.carpenter@oracle.com>
      cc: linux-afs@lists.infradead.org
      Link: https://lore.kernel.org/r/166135894583.600315.7170979436768124075.stgit@warthog.procyon.org.uk
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      091dc91e
    • Lorenzo Bianconi's avatar
      net: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2 · b886aebd
      Lorenzo Bianconi authored
      [ Upstream commit 0cf731f9 ]
      
      Properly report hw rx hash for mt7986 chipset accroding to the new dma
      descriptor layout.
      
      Fixes: 197c9e9b
      
       ("net: ethernet: mtk_eth_soc: introduce support for mt7986 chipset")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Link: https://lore.kernel.org/r/091394ea4e705fbb35f828011d98d0ba33808f69.1661257293.git.lorenzo@kernel.org
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b886aebd
    • Lorenzo Bianconi's avatar
      net: ethernet: mtk_eth_soc: enable rx cksum offload for MTK_NETSYS_V2 · 2f237570
      Lorenzo Bianconi authored
      [ Upstream commit da6e113f
      
       ]
      
      Enable rx checksum offload for mt7986 chipset.
      
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Link: https://lore.kernel.org/r/c8699805c18f7fd38315fcb8da2787676d83a32c.1654544585.git.lorenzo@kernel.org
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2f237570
    • Sylwester Dziedziuch's avatar
      i40e: Fix incorrect address type for IPv6 flow rules · 82fd1402
      Sylwester Dziedziuch authored
      [ Upstream commit bcf3a156 ]
      
      It was not possible to create 1-tuple flow director
      rule for IPv6 flow type. It was caused by incorrectly
      checking for source IP address when validating user provided
      destination IP address.
      
      Fix this by changing ip6src to correct ip6dst address
      in destination IP address validation for IPv6 flow type.
      
      Fixes: efca91e8
      
       ("i40e: Add flow director support for IPv6")
      Signed-off-by: default avatarSylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      82fd1402
    • Jacob Keller's avatar
      ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter · c2b99b2a
      Jacob Keller authored
      [ Upstream commit 25d7a5f5 ]
      
      The ixgbe_ptp_start_cyclecounter is intended to be called whenever the
      cyclecounter parameters need to be changed.
      
      Since commit a9763f3c
      
       ("ixgbe: Update PTP to support X550EM_x
      devices"), this function has cleared the SYSTIME registers and reset the
      TSAUXC DISABLE_SYSTIME bit.
      
      While these need to be cleared during ixgbe_ptp_reset, it is wrong to clear
      them during ixgbe_ptp_start_cyclecounter. This function may be called
      during both reset and link status change. When link changes, the SYSTIME
      counter is still operating normally, but the cyclecounter should be updated
      to account for the possibly changed parameters.
      
      Clearing SYSTIME when link changes causes the timecounter to jump because
      the cycle counter now reads zero.
      
      Extract the SYSTIME initialization out to a new function and call this
      during ixgbe_ptp_reset. This prevents the timecounter adjustment and avoids
      an unnecessary reset of the current time.
      
      This also restores the original SYSTIME clearing that occurred during
      ixgbe_ptp_reset before the commit above.
      
      Reported-by: default avatarSteve Payne <spayne@aurora.tech>
      Reported-by: default avatarIlya Evenbach <ievenbach@aurora.tech>
      Fixes: a9763f3c
      
       ("ixgbe: Update PTP to support X550EM_x devices")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c2b99b2a
    • Kuniyuki Iwashima's avatar
      net: Fix a data-race around sysctl_somaxconn. · 18a8b826
      Kuniyuki Iwashima authored
      [ Upstream commit 3c9ba81d ]
      
      While reading sysctl_somaxconn, it can be changed concurrently.
      Thus, we need to add READ_ONCE() to its reader.
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      18a8b826
    • Kuniyuki Iwashima's avatar
      net: Fix a data-race around netdev_unregister_timeout_secs. · 8a536935
      Kuniyuki Iwashima authored
      [ Upstream commit 05e49cfc ]
      
      While reading netdev_unregister_timeout_secs, it can be changed
      concurrently.  Thus, we need to add READ_ONCE() to its reader.
      
      Fixes: 5aa3afe1
      
       ("net: make unregister netdev warning timeout configurable")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Acked-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8a536935
    • Kuniyuki Iwashima's avatar
      net: Fix a data-race around gro_normal_batch. · 21c6c135
      Kuniyuki Iwashima authored
      [ Upstream commit 8db24af3 ]
      
      While reading gro_normal_batch, it can be changed concurrently.
      Thus, we need to add READ_ONCE() to its reader.
      
      Fixes: 323ebb61
      
       ("net: use listified RX for handling GRO_NORMAL skbs")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Acked-by: default avatarEdward Cree <ecree.xilinx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      21c6c135
    • Kuniyuki Iwashima's avatar
      net: Fix data-races around sysctl_devconf_inherit_init_net. · bdb33552
      Kuniyuki Iwashima authored
      [ Upstream commit a5612ca1 ]
      
      While reading sysctl_devconf_inherit_init_net, it can be changed
      concurrently.  Thus, we need to add READ_ONCE() to its readers.
      
      Fixes: 856c395c
      
       ("net: introduce a knob to control whether to inherit devconf config")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bdb33552
    • Kuniyuki Iwashima's avatar
      net: Fix data-races around sysctl_fb_tunnels_only_for_init_net. · e94dd3e9
      Kuniyuki Iwashima authored
      [ Upstream commit af67508e ]
      
      While reading sysctl_fb_tunnels_only_for_init_net, it can be changed
      concurrently.  Thus, we need to add READ_ONCE() to its readers.
      
      Fixes: 79134e6c
      
       ("net: do not create fallback tunnels for non-default namespaces")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e94dd3e9
    • Kuniyuki Iwashima's avatar
      net: Fix a data-race around netdev_budget_usecs. · d923063b
      Kuniyuki Iwashima authored
      [ Upstream commit fa45d484 ]
      
      While reading netdev_budget_usecs, it can be changed concurrently.
      Thus, we need to add READ_ONCE() to its reader.
      
      Fixes: 7acf8a1e
      
       ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d923063b
    • Kuniyuki Iwashima's avatar
      net: Fix data-races around sysctl_max_skb_frags. · c34e06f0
      Kuniyuki Iwashima authored
      [ Upstream commit 657b991a ]
      
      While reading sysctl_max_skb_frags, it can be changed concurrently.
      Thus, we need to add READ_ONCE() to its readers.
      
      Fixes: 5f74f82e
      
       ("net:Add sysctl_max_skb_frags")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c34e06f0
    • Kuniyuki Iwashima's avatar
      net: Fix a data-race around netdev_budget. · 293ec6ac
      Kuniyuki Iwashima authored
      [ Upstream commit 2e0c4237 ]
      
      While reading netdev_budget, it can be changed concurrently.
      Thus, we need to add READ_ONCE() to its reader.
      
      Fixes: 51b0bded
      
       ("[NET]: Separate two usages of netdev_max_backlog.")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      293ec6ac
    • Kuniyuki Iwashima's avatar
      net: Fix a data-race around sysctl_net_busy_read. · 6a520caf
      Kuniyuki Iwashima authored
      [ Upstream commit e59ef36f ]
      
      While reading sysctl_net_busy_read, it can be changed concurrently.
      Thus, we need to add READ_ONCE() to its reader.
      
      Fixes: 2d48d67f
      
       ("net: poll/select low latency socket support")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6a520caf
    • Kuniyuki Iwashima's avatar
      net: Fix a data-race around sysctl_net_busy_poll. · 05d92723
      Kuniyuki Iwashima authored
      [ Upstream commit c42b7cdd ]
      
      While reading sysctl_net_busy_poll, it can be changed concurrently.
      Thus, we need to add READ_ONCE() to its reader.
      
      Fixes: 06021292
      
       ("net: add low latency socket poll")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      05d92723