Skip to content
  1. Mar 12, 2022
    • Helge Deller's avatar
      parisc/unaligned: Rewrite 32-bit inline assembly of emulate_ldd() · 427c1073
      Helge Deller authored
      
      
      Convert to use real temp variables instead of clobbering processor
      registers.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      427c1073
    • Helge Deller's avatar
      parisc/unaligned: Rewrite inline assembly of emulate_ldw() · e8aa7b17
      Helge Deller authored
      
      
      Convert to use real temp variables instead of clobbering processor
      registers.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      e8aa7b17
    • Helge Deller's avatar
      parisc/unaligned: Rewrite inline assembly of emulate_ldh() · f85b2af1
      Helge Deller authored
      
      
      Convert to use real temp variables instead of clobbering processor
      registers.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      f85b2af1
    • Helge Deller's avatar
      parisc/unaligned: Use EFAULT fixup handler in unaligned handlers · d1434e03
      Helge Deller authored
      
      
      Convert the inline assembly code to use the automatic EFAULT exception
      handler. With that the fixup code can be dropped.
      
      The other change is to allow double-word only when a 64-bit kernel is
      used instead of depending on CONFIG_PA20.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      d1434e03
    • Helge Deller's avatar
      parisc: Reduce code size by optimizing get_current() function calls · 8278cc16
      Helge Deller authored
      
      
      The get_current() code uses the mfctl() macro to get the pointer to the
      current task struct from %cr30. The problem with the mfctl() macro is,
      that it is marked volatile which is basically correct, because mfctl()
      is used to get e.g. the current internal timer or interrupt flags as
      well.
      
      But specifically the task struct pointer (%cr30) doesn't change over
      time when the kernel executes code for a task.
      
      So, by dropping the volatile when retrieving %cr30 the compiler is now
      able to get this value only once and optimize the generated code a lot.
      
      A bloat-o-meter comparism shows that this patch saves ~5kB kernel code
      on a 32-bit kernel and ~6kB kernel code on a 64-bit kernel.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      8278cc16
    • Helge Deller's avatar
      parisc: Use constants to encode the space registers like SR_KERNEL · 360bd6c6
      Helge Deller authored
      
      
      Use the provided space register constants instead of hardcoded values.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      360bd6c6
    • Helge Deller's avatar
      parisc: Use SR_USER and SR_KERNEL in get_user() and put_user() · 5613a930
      Helge Deller authored
      
      
      Instead of hardcoding the space registers as strings, use the SR_USER
      and SR_KERNEL constants to form the space register in the access
      functions.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      5613a930
    • Helge Deller's avatar
      parisc: Add defines for various space register · 46b4016f
      Helge Deller authored
      
      
      Provide defines for space registers (SR_KERNEL, SR_USER, ...) which
      should be used instead of hardcoding the values.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      46b4016f
    • Helge Deller's avatar
      parisc: Always use the self-extracting kernel feature · b9f50eea
      Helge Deller authored
      
      
      This patch drops the CONFIG_PARISC_SELF_EXTRACT option.
      
      The palo boot loader is able to decompress a kernel which was compressed
      with gzip. That possibility was useful when the Linux kernel
      self-extracting feature wasn't implemented yet.
      
      Beside the fact that the self-extracting feature offers much better
      compression rates, we do support self-extracting kernels already since
      kernel v4.14, so now it's really time to get rid of that old option and
      always use the self-extractor.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      b9f50eea
    • Helge Deller's avatar
      video/fbdev/stifb: Implement the stifb_fillrect() function · 9c379c65
      Helge Deller authored
      
      
      The stifb driver (for Artist/HCRX graphics on PA-RISC) was missing
      the fillrect function.
      Tested on a 715/64 PA-RISC machine and in qemu.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      9c379c65
    • Helge Deller's avatar
      parisc: Add vDSO support · df24e178
      Helge Deller authored
      Add minimal vDSO support, which provides the signal trampoline helpers,
      but none of the userspace syscall helpers like time wrappers.
      
      The big benefit of this vDSO implementation is, that we now don't need
      an executeable stack any longer. PA-RISC is one of the last
      architectures where an executeable stack was needed in oder to implement
      the signal trampolines by putting assembly instructions on the stack
      which then gets executed. Instead the kernel will provide the relevant
      code in the vDSO page and only put the pointers to the signal
      information on the stack.
      
      By dropping the need for executable stacks we avoid running into issues
      with applications which want non executable stacks for security reasons.
      Additionally, alternative stacks on memory areas without exec
      permissions are supported too.
      
      This code is based on an initial implementation by Randolph Chung from 2006:
      https://lore.kernel.org/linux-parisc/4544A34A.6080700@tausq.org/
      
      
      
      I did the porting and lifted the code to current code base. Dave fixed
      the unwind code so that gdb and glibc are able to backtrace through the
      code. An additional patch to gdb will be pushed upstream by Dave.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarDave Anglin <dave.anglin@bell.net>
      Cc: Randolph Chung <randolph@tausq.org>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      df24e178
    • John David Anglin's avatar
      parisc: Simplify fast path for non-access data TLB faults · 14615ecc
      John David Anglin authored
      
      
      With the latest cache fix for non-access faults and the support for
      non-access faults (code 17) in handle_interruption, we can remove
      the fast path emulation for fdc, fic, pdc, lpa, probe and probei
      instructions.
      
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      14615ecc
    • John David Anglin's avatar
      parisc: Fix handling off probe non-access faults · e00b0a2a
      John David Anglin authored
      
      
      Currently, the parisc kernel does not fully support non-access TLB
      fault handling for probe instructions. In the fast path, we set the
      target register to zero if it is not a shadowed register. The slow
      path is not implemented, so we call do_page_fault. The architecture
      indicates that non-access faults should not cause a page fault from
      disk.
      
      This change adds to code to provide non-access fault support for
      probe instructions. It also modifies the handling of faults on
      userspace so that if the address lies in a valid VMA and the access
      type matches that for the VMA, the probe target register is set to
      one. Otherwise, the target register is set to zero.
      
      This was done to make probe instructions more useful for userspace.
      Probe instructions are not very useful if they set the target register
      to zero whenever a page is not present in memory. Nominally, the
      purpose of the probe instruction is determine whether read or write
      access to a given address is allowed.
      
      This fixes a problem in function pointer comparison noticed in the
      glibc testsuite (stdio-common/tst-vfprintf-user-type). The same
      problem is likely in glibc (_dl_lookup_address).
      
      V2 adds flush and lpa instruction support to handle_nadtlb_fault.
      
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      e00b0a2a
    • John David Anglin's avatar
      parisc: Fix non-access data TLB cache flush faults · f839e5f1
      John David Anglin authored
      
      
      When a page is not present, we get non-access data TLB faults from
      the fdc and fic instructions in flush_user_dcache_range_asm and
      flush_user_icache_range_asm. When these occur, the cache line is
      not invalidated and potentially we get memory corruption. The
      problem was hidden by the nullification of the flush instructions.
      
      These faults also affect performance. With pa8800/pa8900 processors,
      there will be 32 faults per 4 KB page since the cache line is 128
      bytes.  There will be more faults with earlier processors.
      
      The problem is fixed by using flush_cache_pages(). It does the flush
      using a tmp alias mapping.
      
      The flush_cache_pages() call in flush_cache_range() flushed too
      large a range.
      
      V2: Remove unnecessary preempt_disable() and preempt_enable() calls.
      
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      f839e5f1
  2. Mar 07, 2022
    • Linus Torvalds's avatar
      Linux 5.17-rc7 · ffb217a1
      Linus Torvalds authored
      ffb217a1
    • Linus Torvalds's avatar
      Merge tag 'for-5.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 3ee65c0f
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "A few more fixes for various problems that have user visible effects
        or seem to be urgent:
      
         - fix corruption when combining DIO and non-blocking io_uring over
           multiple extents (seen on MariaDB)
      
         - fix relocation crash due to premature return from commit
      
         - fix quota deadlock between rescan and qgroup removal
      
         - fix item data bounds checks in tree-checker (found on a fuzzed
           image)
      
         - fix fsync of prealloc extents after EOF
      
         - add missing run of delayed items after unlink during log replay
      
         - don't start relocation until snapshot drop is finished
      
         - fix reversed condition for subpage writers locking
      
         - fix warning on page error"
      
      * tag 'for-5.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: fallback to blocking mode when doing async dio over multiple extents
        btrfs: add missing run of delayed items after unlink during log replay
        btrfs: qgroup: fix deadlock between rescan worker and remove qgroup
        btrfs: fix relocation crash due to premature return from btrfs_commit_transaction()
        btrfs: do not start relocation until in progress drops are done
        btrfs: tree-checker: use u64 for item data end to avoid overflow
        btrfs: do not WARN_ON() if we have PageError set
        btrfs: fix lost prealloc extents beyond eof after full fsync
        btrfs: subpage: fix a wrong check on subpage->writers
      3ee65c0f
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · f81664f7
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "x86 guest:
      
         - Tweaks to the paravirtualization code, to avoid using them when
           they're pointless or harmful
      
        x86 host:
      
         - Fix for SRCU lockdep splat
      
         - Brown paper bag fix for the propagation of errno"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: pull kvm->srcu read-side to kvm_arch_vcpu_ioctl_run
        KVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots()
        KVM: x86: Yield to IPI target vCPU only if it is busy
        x86/kvmclock: Fix Hyper-V Isolated VM's boot issue when vCPUs > 64
        x86/kvm: Don't waste memory if kvmclock is disabled
        x86/kvm: Don't use PV TLB/yield when mwait is advertised
      f81664f7
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 9bdeaca1
      Linus Torvalds authored
      Pull powerpc fix from Michael Ellerman:
       "Fix build failure when CONFIG_PPC_64S_HASH_MMU is not set.
      
        Thanks to Murilo Opsfelder Araujo, and Erhard F"
      
      * tag 'powerpc-5.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/64s: Fix build failure when CONFIG_PPC_64S_HASH_MMU is not set
      9bdeaca1
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · f40a33f5
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix sorting on old "cpu" value in histograms
      
       - Fix return value of __setup() boot parameter handlers
      
      * tag 'trace-v5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Fix return value of __setup handlers
        tracing/histogram: Fix sorting on old "cpu" value
      f40a33f5
  3. Mar 06, 2022
  4. Mar 05, 2022
    • Murilo Opsfelder Araujo's avatar
      powerpc/64s: Fix build failure when CONFIG_PPC_64S_HASH_MMU is not set · 58dbe9b3
      Murilo Opsfelder Araujo authored
      The following build failure occurs when CONFIG_PPC_64S_HASH_MMU is not
      set:
      
          arch/powerpc/kernel/setup_64.c: In function ‘setup_per_cpu_areas’:
          arch/powerpc/kernel/setup_64.c:811:21: error: ‘mmu_linear_psize’ undeclared (first use in this function); did you mean ‘mmu_virtual_psize’?
            811 |                 if (mmu_linear_psize == MMU_PAGE_4K)
                |                     ^~~~~~~~~~~~~~~~
                |                     mmu_virtual_psize
          arch/powerpc/kernel/setup_64.c:811:21: note: each undeclared identifier is reported only once for each function it appears in
      
      Move the declaration of mmu_linear_psize outside of
      CONFIG_PPC_64S_HASH_MMU ifdef.
      
      After the above is fixed, it fails later with the following error:
      
          ld: arch/powerpc/kexec/file_load_64.o: in function `.arch_kexec_kernel_image_probe':
          file_load_64.c:(.text+0x1c1c): undefined reference to `.add_htab_mem_range'
      
      Fix that, too, by conditioning add_htab_mem_range() symbol to
      CONFIG_PPC_64S_HASH_MMU.
      
      Fixes: 387e220a
      
       ("powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU")
      Reported-by: default avatarErhard F. <erhard_f@mailbox.org>
      Signed-off-by: default avatarMurilo Opsfelder Araujo <muriloo@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215567
      Link: https://lore.kernel.org/r/20220301204743.45133-1-muriloo@linux.ibm.com
      58dbe9b3
    • Linus Torvalds's avatar
      Merge tag 'block-5.17-2022-03-04' of git://git.kernel.dk/linux-block · ac84e82f
      Linus Torvalds authored
      Pull block fix from Jens Axboe:
       "Just a small UAF fix for blktrace"
      
      * tag 'block-5.17-2022-03-04' of git://git.kernel.dk/linux-block:
        blktrace: fix use after free for struct blk_trace
      ac84e82f
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 07ebd38a
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - Fixes for a handful of KASAN-related crashes.
      
       - A fix to avoid a crash during boot for SPARSEMEM &&
         !SPARSEMEM_VMEMMAP configurations.
      
       - A fix to stop reporting some incorrect errors under DEBUG_VIRTUAL.
      
       - A fix for the K210's device tree to properly populate the interrupt
         map, so hart1 will get interrupts again.
      
      * tag 'riscv-for-linus-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: dts: k210: fix broken IRQs on hart1
        riscv: Fix kasan pud population
        riscv: Move high_memory initialization to setup_bootmem
        riscv: Fix config KASAN && DEBUG_VIRTUAL
        riscv: Fix DEBUG_VIRTUAL false warnings
        riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP
        riscv: Fix is_linear_mapping with recent move of KASAN region
      07ebd38a
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 3f509f59
      Linus Torvalds authored
      Pull iommu fixes from Joerg Roedel:
      
       - Fix a double list_add() in Intel VT-d code
      
       - Add missing put_device() in Tegra SMMU driver
      
       - Two AMD IOMMU fixes:
           - Memory leak in IO page-table freeing code
           - Add missing recovery from event-log overflow
      
      * tag 'iommu-fixes-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/tegra-smmu: Fix missing put_device() call in tegra_smmu_find
        iommu/vt-d: Fix double list_add when enabling VMD in scalable mode
        iommu/amd: Fix I/O page table memory leak
        iommu/amd: Recover from event log overflow
      3f509f59
    • Linus Torvalds's avatar
      Merge tag 'thermal-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · a4ffdb61
      Linus Torvalds authored
      Pull thermal control fix from Rafael Wysocki:
       "Fix NULL pointer dereference in the thermal netlink interface (Nicolas
        Cavallari)"
      
      * tag 'thermal-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        thermal: core: Fix TZ_GET_TRIP NULL pointer dereference
      a4ffdb61
    • Linus Torvalds's avatar
      Merge tag 'sound-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 8d670948
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "Hopefully the last PR for 5.17, including just a few small changes:
        an additional fix for ASoC ops boundary check and other minor
        device-specific fixes"
      
      * tag 'sound-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: intel_hdmi: Fix reference to PCM buffer address
        ASoC: cs4265: Fix the duplicated control name
        ASoC: ops: Shift tested values in snd_soc_put_volsw() by +min
      8d670948
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2022-03-04' of git://anongit.freedesktop.org/drm/drm · c4fc118a
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Things are quieting down as expected, just a small set of fixes, i915,
        exynos, amdgpu, vrr, bridge and hdlcd. Nothing scary at all.
      
        i915:
         - Fix GuC SLPC unset command
         - Fix misidentification of some Apple MacBook Pro laptops as Jasper Lake
      
        amdgpu:
         - Suspend regression fix
      
        exynos:
         - irq handling fixes
         - Fix two regressions to TE-gpio handling
      
        arm/hdlcd:
         - Select DRM_GEM_CMEA_HELPER for HDLCD
      
        bridge:
         - ti-sn65dsi86: Properly undo autosuspend
      
        vrr:
         - Fix potential NULL-pointer deref"
      
      * tag 'drm-fixes-2022-03-04' of git://anongit.freedesktop.org/drm/drm:
        drm/amdgpu: fix suspend/resume hang regression
        drm/vrr: Set VRR capable prop only if it is attached to connector
        drm/arm: arm hdlcd select DRM_GEM_CMA_HELPER
        drm/bridge: ti-sn65dsi86: Properly undo autosuspend
        drm/i915: s/JSP2/ICP2/ PCH
        drm/i915/guc/slpc: Correct the param count for unset param
        drm/exynos: Search for TE-gpio in DSI panel's node
        drm/exynos: Don't fail if no TE-gpio is defined for DSI driver
        drm/exynos: gsc: Use platform_get_irq() to get the interrupt
        drm/exynos/fimc: Use platform_get_irq() to get the interrupt
        drm/exynos/exynos_drm_fimd: Use platform_get_irq_byname() to get the interrupt
        drm/exynos: mixer: Use platform_get_irq() to get the interrupt
        drm/exynos/exynos7_drm_decon: Use platform_get_irq_byname() to get the interrupt
      c4fc118a
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 0b7344a6
      Linus Torvalds authored
      Pull pin control fixes from Linus Walleij:
       "These two fixes should fix the issues seen on the OrangePi, first we
        needed the correct offset when calling pinctrl_gpio_direction(), and
        fixing that made a lockdep issue explode in our face. Both now fixed"
      
      * tag 'pinctrl-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: sunxi: Use unique lockdep classes for IRQs
        pinctrl-sunxi: sunxi_pinctrl_gpio_direction_in/output: use correct offset
      0b7344a6
    • Randy Dunlap's avatar
      tracing: Fix return value of __setup handlers · 1d02b444
      Randy Dunlap authored
      __setup() handlers should generally return 1 to indicate that the
      boot options have been handled.
      
      Using invalid option values causes the entire kernel boot option
      string to be reported as Unknown and added to init's environment
      strings, polluting it.
      
        Unknown kernel command line parameters "BOOT_IMAGE=/boot/bzImage-517rc6
          kprobe_event=p,syscall_any,$arg1 trace_options=quiet
          trace_clock=jiffies", will be passed to user space.
      
       Run /sbin/init as init process
         with arguments:
           /sbin/init
         with environment:
           HOME=/
           TERM=linux
           BOOT_IMAGE=/boot/bzImage-517rc6
           kprobe_event=p,syscall_any,$arg1
           trace_options=quiet
           trace_clock=jiffies
      
      Return 1 from the __setup() handlers so that init's environment is not
      polluted with kernel boot options.
      
      Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
      Link: https://lkml.kernel.org/r/20220303031744.32356-1-rdunlap@infradead.org
      
      Cc: stable@vger.kernel.org
      Fixes: 7bcfaf54 ("tracing: Add trace_options kernel command line parameter")
      Fixes: e1e232ca ("tracing: Add trace_clock=<clock> kernel parameter")
      Fixes: 970988e1
      
       ("tracing/kprobe: Add kprobe_event= boot parameter")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarIgor Zhbanov <i.zhbanov@omprussia.ru>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      1d02b444
    • Daniel Borkmann's avatar
      mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls · 0708a0af
      Daniel Borkmann authored
      syzkaller was recently triggering an oversized kvmalloc() warning via
      xdp_umem_create().
      
      The triggered warning was added back in 7661809d ("mm: don't allow
      oversized kvmalloc() calls"). The rationale for the warning for huge
      kvmalloc sizes was as a reaction to a security bug where the size was
      more than UINT_MAX but not everything was prepared to handle unsigned
      long sizes.
      
      Anyway, the AF_XDP related call trace from this syzkaller report was:
      
        kvmalloc include/linux/mm.h:806 [inline]
        kvmalloc_array include/linux/mm.h:824 [inline]
        kvcalloc include/linux/mm.h:829 [inline]
        xdp_umem_pin_pages net/xdp/xdp_umem.c:102 [inline]
        xdp_umem_reg net/xdp/xdp_umem.c:219 [inline]
        xdp_umem_create+0x6a5/0xf00 net/xdp/xdp_umem.c:252
        xsk_setsockopt+0x604/0x790 net/xdp/xsk.c:1068
        __sys_setsockopt+0x1fd/0x4e0 net/socket.c:2176
        __do_sys_setsockopt net/socket.c:2187 [inline]
        __se_sys_setsockopt net/socket.c:2184 [inline]
        __x64_sys_setsockopt+0xb5/0x150 net/socket.c:2184
        do_syscall_x64 arch/x86/entry/common.c:50 [inline]
        do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Björn mentioned that requests for >2GB allocation can still be valid:
      
        The structure that is being allocated is the page-pinning accounting.
        AF_XDP has an internal limit of U32_MAX pages, which is *a lot*, but
        still fewer than what memcg allows (PAGE_COUNTER_MAX is a LONG_MAX/
        PAGE_SIZE on 64 bit systems). [...]
      
        I could just change from U32_MAX to INT_MAX, but as I stated earlier
        that has a hacky feeling to it. [...] From my perspective, the code
        isn't broken, with the memcg limits in consideration. [...]
      
      Linus says:
      
        [...] Pretty much every time this has come up, the kernel warning has
        shown that yes, the code was broken and there really wasn't a reason
        for doing allocations that big.
      
        Of course, some people would be perfectly fine with the allocation
        failing, they just don't want the warning. I didn't want __GFP_NOWARN
        to shut it up originally because I wanted people to see all those
        cases, but these days I think we can just say "yeah, people can shut
        it up explicitly by saying 'go ahead and fail this allocation, don't
        warn about it'".
      
        So enough time has passed that by now I'd certainly be ok with [it].
      
      Thus allow call-sites to silence such userspace triggered splats if the
      allocation requests have __GFP_NOWARN. For xdp_umem_pin_pages()'s call
      to kvcalloc() this is already the case, so nothing else needed there.
      
      Fixes: 7661809d
      
       ("mm: don't allow oversized kvmalloc() calls")
      Reported-by: default avatar <syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatar <syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com>
      Cc: Björn Töpel <bjorn@kernel.org>
      Cc: Magnus Karlsson <magnus.karlsson@intel.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: David S. Miller <davem@davemloft.net>
      Link: https://lore.kernel.org/bpf/CAJ+HfNhyfsT5cS_U9EC213ducHs9k9zNxX9+abqC0kTrPbQ0gg@mail.gmail.com
      Link: https://lore.kernel.org/bpf/20211201202905.b9892171e3f5b9a60f9da251@linux-foundation.org
      
      
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Ackd-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0708a0af