Skip to content
  1. Nov 16, 2021
    • Dmitrii Banshchikov's avatar
      bpf: Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs · 5e0bc308
      Dmitrii Banshchikov authored
      
      
      Use of bpf_ktime_get_coarse_ns() and bpf_timer_* helpers in tracing
      progs may result in locking issues.
      
      bpf_ktime_get_coarse_ns() uses ktime_get_coarse_ns() time accessor that
      isn't safe for any context:
      ======================================================
      WARNING: possible circular locking dependency detected
      5.15.0-syzkaller #0 Not tainted
      ------------------------------------------------------
      syz-executor.4/14877 is trying to acquire lock:
      ffffffff8cb30008 (tk_core.seq.seqcount){----}-{0:0}, at: ktime_get_coarse_ts64+0x25/0x110 kernel/time/timekeeping.c:2255
      
      but task is already holding lock:
      ffffffff90dbf200 (&obj_hash[i].lock){-.-.}-{2:2}, at: debug_object_deactivate+0x61/0x400 lib/debugobjects.c:735
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (&obj_hash[i].lock){-.-.}-{2:2}:
             lock_acquire+0x19f/0x4d0 kernel/locking/lockdep.c:5625
             __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
             _raw_spin_lock_irqsave+0xd1/0x120 kernel/locking/spinlock.c:162
             __debug_object_init+0xd9/0x1860 lib/debugobjects.c:569
             debug_hrtimer_init kernel/time/hrtimer.c:414 [inline]
             debug_init kernel/time/hrtimer.c:468 [inline]
             hrtimer_init+0x20/0x40 kernel/time/hrtimer.c:1592
             ntp_init_cmos_sync kernel/time/ntp.c:676 [inline]
             ntp_init+0xa1/0xad kernel/time/ntp.c:1095
             timekeeping_init+0x512/0x6bf kernel/time/timekeeping.c:1639
             start_kernel+0x267/0x56e init/main.c:1030
             secondary_startup_64_no_verify+0xb1/0xbb
      
      -> #0 (tk_core.seq.seqcount){----}-{0:0}:
             check_prev_add kernel/locking/lockdep.c:3051 [inline]
             check_prevs_add kernel/locking/lockdep.c:3174 [inline]
             validate_chain+0x1dfb/0x8240 kernel/locking/lockdep.c:3789
             __lock_acquire+0x1382/0x2b00 kernel/locking/lockdep.c:5015
             lock_acquire+0x19f/0x4d0 kernel/locking/lockdep.c:5625
             seqcount_lockdep_reader_access+0xfe/0x230 include/linux/seqlock.h:103
             ktime_get_coarse_ts64+0x25/0x110 kernel/time/timekeeping.c:2255
             ktime_get_coarse include/linux/timekeeping.h:120 [inline]
             ktime_get_coarse_ns include/linux/timekeeping.h:126 [inline]
             ____bpf_ktime_get_coarse_ns kernel/bpf/helpers.c:173 [inline]
             bpf_ktime_get_coarse_ns+0x7e/0x130 kernel/bpf/helpers.c:171
             bpf_prog_a99735ebafdda2f1+0x10/0xb50
             bpf_dispatcher_nop_func include/linux/bpf.h:721 [inline]
             __bpf_prog_run include/linux/filter.h:626 [inline]
             bpf_prog_run include/linux/filter.h:633 [inline]
             BPF_PROG_RUN_ARRAY include/linux/bpf.h:1294 [inline]
             trace_call_bpf+0x2cf/0x5d0 kernel/trace/bpf_trace.c:127
             perf_trace_run_bpf_submit+0x7b/0x1d0 kernel/events/core.c:9708
             perf_trace_lock+0x37c/0x440 include/trace/events/lock.h:39
             trace_lock_release+0x128/0x150 include/trace/events/lock.h:58
             lock_release+0x82/0x810 kernel/locking/lockdep.c:5636
             __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:149 [inline]
             _raw_spin_unlock_irqrestore+0x75/0x130 kernel/locking/spinlock.c:194
             debug_hrtimer_deactivate kernel/time/hrtimer.c:425 [inline]
             debug_deactivate kernel/time/hrtimer.c:481 [inline]
             __run_hrtimer kernel/time/hrtimer.c:1653 [inline]
             __hrtimer_run_queues+0x2f9/0xa60 kernel/time/hrtimer.c:1749
             hrtimer_interrupt+0x3b3/0x1040 kernel/time/hrtimer.c:1811
             local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
             __sysvec_apic_timer_interrupt+0xf9/0x270 arch/x86/kernel/apic/apic.c:1103
             sysvec_apic_timer_interrupt+0x8c/0xb0 arch/x86/kernel/apic/apic.c:1097
             asm_sysvec_apic_timer_interrupt+0x12/0x20
             __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:152 [inline]
             _raw_spin_unlock_irqrestore+0xd4/0x130 kernel/locking/spinlock.c:194
             try_to_wake_up+0x702/0xd20 kernel/sched/core.c:4118
             wake_up_process kernel/sched/core.c:4200 [inline]
             wake_up_q+0x9a/0xf0 kernel/sched/core.c:953
             futex_wake+0x50f/0x5b0 kernel/futex/waitwake.c:184
             do_futex+0x367/0x560 kernel/futex/syscalls.c:127
             __do_sys_futex kernel/futex/syscalls.c:199 [inline]
             __se_sys_futex+0x401/0x4b0 kernel/futex/syscalls.c:180
             do_syscall_x64 arch/x86/entry/common.c:50 [inline]
             do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
             entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      There is a possible deadlock with bpf_timer_* set of helpers:
      hrtimer_start()
        lock_base();
        trace_hrtimer...()
          perf_event()
            bpf_run()
              bpf_timer_start()
                hrtimer_start()
                  lock_base()         <- DEADLOCK
      
      Forbid use of bpf_ktime_get_coarse_ns() and bpf_timer_* helpers in
      BPF_PROG_TYPE_KPROBE, BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT
      and BPF_PROG_TYPE_RAW_TRACEPOINT prog types.
      
      Fixes: d0551261 ("bpf: Add bpf_ktime_get_coarse_ns helper")
      Fixes: b00628b1 ("bpf: Introduce bpf timers.")
      Reported-by: default avatar <syzbot+43fd005b5a1b4d10781e@syzkaller.appspotmail.com>
      Signed-off-by: default avatarDmitrii Banshchikov <me@ubique.spb.ru>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211113142227.566439-2-me@ubique.spb.ru
      5e0bc308
  2. Nov 13, 2021
  3. Nov 12, 2021
  4. Nov 11, 2021
    • Willem de Bruijn's avatar
      selftests/net: udpgso_bench_rx: fix port argument · d336509c
      Willem de Bruijn authored
      
      
      The below commit added optional support for passing a bind address.
      It configures the sockaddr bind arguments before parsing options and
      reconfigures on options -b and -4.
      
      This broke support for passing port (-p) on its own.
      
      Configure sockaddr after parsing all arguments.
      
      Fixes: 3327a9c4 ("selftests: add functionals test for UDP GRO")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d336509c
    • M Chetan Kumar's avatar
      net: wwan: iosm: fix compilation warning · 29cd3867
      M Chetan Kumar authored
      
      
      curr_phase is unused. Removed the dead code.
      
      Fixes: 8d9be063 ("net: wwan: iosm: transport layer support for fw flashing/cd")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarM Chetan Kumar <m.chetan.kumar@linux.intel.com>
      Reviewed-by: default avatarLoic Poulain <loic.poulain@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29cd3867
    • Rahul Lakkireddy's avatar
      cxgb4: fix eeprom len when diagnostics not implemented · 4ca110bf
      Rahul Lakkireddy authored
      
      
      Ensure diagnostics monitoring support is implemented for the SFF 8472
      compliant port module and set the correct length for ethtool port
      module eeprom read.
      
      Fixes: f56ec676 ("cxgb4: Add support for ethtool i2c dump")
      Signed-off-by: default avatarManoj Malviya <manojmalviya@chelsio.com>
      Signed-off-by: default avatarRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ca110bf
    • Alexander Lobakin's avatar
      net: fix premature exit from NAPI state polling in napi_disable() · 0315a075
      Alexander Lobakin authored
      Commit 719c5719 ("net: make napi_disable() symmetric with
      enable") accidentally introduced a bug sometimes leading to a kernel
      BUG when bringing an iface up/down under heavy traffic load.
      
      Prior to this commit, napi_disable() was polling n->state until
      none of (NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC) is set and then
      always flip them. Now there's a possibility to get away with the
      NAPIF_STATE_SCHE unset as 'continue' drops us to the cmpxchg()
      call with an uninitialized variable, rather than straight to
      another round of the state check.
      
      Error path looks like:
      
      napi_disable():
      unsigned long val, new; /* new is uninitialized */
      
      do {
      	val = READ_ONCE(n->state); /* NAPIF_STATE_NPSVC and/or
      				      NAPIF_STATE_SCHED is set */
      	if (val & (NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC)) { /* true */
      		usleep_range(20, 200);
      		continue; /* go straight to the condition check */
      	}
      	new = val | <...>
      } while (cmpxchg(&n->state, val, new) != val); /* state == val, cmpxchg()
      						  writes garbage */
      
      napi_enable():
      do {
      	val = READ_ONCE(n->state);
      	BUG_ON(!test_bit(NAPI_STATE_SCHED, &val)); /* 50/50 boom */
      <...>
      
      while the typical BUG splat is like:
      
      [  172.652461] ------------[ cut here ]------------
      [  172.652462] kernel BUG at net/core/dev.c:6937!
      [  172.656914] invalid opcode: 0000 [#1] PREEMPT SMP PTI
      [  172.661966] CPU: 36 PID: 2829 Comm: xdp_redirect_cp Tainted: G          I       5.15.0 #42
      [  172.670222] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
      [  172.680646] RIP: 0010:napi_enable+0x5a/0xd0
      [  172.684832] Code: 07 49 81 cc 00 01 00 00 4c 89 e2 48 89 d8 80 e6 fb f0 48 0f b1 55 10 48 39 c3 74 10 48 8b 5d 10 f6 c7 04 75 3d f6 c3 01 75 b4 <0f> 0b 5b 5d 41 5c c3 65 ff 05 b8 e5 61 53 48 c7 c6 c0 f3 34 ad 48
      [  172.703578] RSP: 0018:ffffa3c9497477a8 EFLAGS: 00010246
      [  172.708803] RAX: ffffa3c96615a014 RBX: 0000000000000000 RCX: ffff8a4b575301a0
      < snip >
      [  172.782403] Call Trace:
      [  172.784857]  <TASK>
      [  172.786963]  ice_up_complete+0x6f/0x210 [ice]
      [  172.791349]  ice_xdp+0x136/0x320 [ice]
      [  172.795108]  ? ice_change_mtu+0x180/0x180 [ice]
      [  172.799648]  dev_xdp_install+0x61/0xe0
      [  172.803401]  dev_xdp_attach+0x1e0/0x550
      [  172.807240]  dev_change_xdp_fd+0x1e6/0x220
      [  172.811338]  do_setlink+0xee8/0x1010
      [  172.814917]  rtnl_setlink+0xe5/0x170
      [  172.818499]  ? bpf_lsm_binder_set_context_mgr+0x10/0x10
      [  172.823732]  ? security_capable+0x36/0x50
      < snip >
      
      Fix this by replacing 'do { } while (cmpxchg())' with an "infinite"
      for-loop with an explicit break.
      
      From v1 [0]:
       - just use a for-loop to simplify both the fix and the existing
         code (Eric).
      
      [0] https://lore.kernel.org/netdev/20211110191126.1214-1-alexandr.lobakin@intel.com
      
      
      
      Fixes: 719c5719 ("net: make napi_disable() symmetric with enable")
      Suggested-by: Eric Dumazet <edumazet@google.com> # for-loop
      Signed-off-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Reviewed-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20211110195605.1304-1-alexandr.lobakin@intel.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0315a075
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · debe436e
      Linus Torvalds authored
      Pull ext4 updates from Ted Ts'o:
       "Only bug fixes and cleanups for ext4 this merge window.
      
        Of note are fixes for the combination of the inline_data and
        fast_commit fixes, and more accurately calculating when to schedule
        additional lazy inode table init, especially when CONFIG_HZ is 100HZ"
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: fix error code saved on super block during file system abort
        ext4: inline data inode fast commit replay fixes
        ext4: commit inline data during fast commit
        ext4: scope ret locally in ext4_try_to_trim_range()
        ext4: remove an unused variable warning with CONFIG_QUOTA=n
        ext4: fix boolreturn.cocci warnings in fs/ext4/name.c
        ext4: prevent getting empty inode buffer
        ext4: move ext4_fill_raw_inode() related functions
        ext4: factor out ext4_fill_raw_inode()
        ext4: prevent partial update of the extent blocks
        ext4: check for inconsistent extents between index and leaf block
        ext4: check for out-of-order index extents in ext4_valid_extent_entries()
        ext4: convert from atomic_t to refcount_t on ext4_io_end->count
        ext4: refresh the ext4_ext_path struct after dropping i_data_sem.
        ext4: ensure enough credits in ext4_ext_shift_path_extents
        ext4: correct the left/middle/right debug message for binsearch
        ext4: fix lazy initialization next schedule time computation in more granular unit
        Revert "ext4: enforce buffer head state assertion in ext4_da_map_blocks"
      debe436e
    • Linus Torvalds's avatar
      Merge tag 'for-5.16-deadlock-fix-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 6070dcc8
      Linus Torvalds authored
      Pull btrfs fix from David Sterba:
       "Fix for a deadlock when direct/buffered IO is done on a mmaped file
        and a fault happens (details in the patch). There's a fstest
        generic/647 that triggers the problem and makes testing hard"
      
      * tag 'for-5.16-deadlock-fix-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: fix deadlock due to page faults during direct IO reads and writes
      6070dcc8
    • Linus Torvalds's avatar
      Merge tag 'nfsd-5.16' of git://linux-nfs.org/~bfields/linux · 38764c73
      Linus Torvalds authored
      Pull nfsd updates from Bruce Fields:
       "A slow cycle for nfsd: mainly cleanup, including Neil's patch dropping
        support for a filehandle format deprecated 20 years ago, and further
        xdr-related cleanup from Chuck"
      
      * tag 'nfsd-5.16' of git://linux-nfs.org/~bfields/linux: (26 commits)
        nfsd4: remove obselete comment
        nfsd: document server-to-server-copy parameters
        NFSD:fix boolreturn.cocci warning
        nfsd: update create verifier comment
        SUNRPC: Change return value type of .pc_encode
        SUNRPC: Replace the "__be32 *p" parameter to .pc_encode
        NFSD: Save location of NFSv4 COMPOUND status
        SUNRPC: Change return value type of .pc_decode
        SUNRPC: Replace the "__be32 *p" parameter to .pc_decode
        SUNRPC: De-duplicate .pc_release() call sites
        SUNRPC: Simplify the SVC dispatch code path
        SUNRPC: Capture value of xdr_buf::page_base
        SUNRPC: Add trace event when alloc_pages_bulk() makes no progress
        svcrdma: Split svcrmda_wc_{read,write} tracepoints
        svcrdma: Split the svcrdma_wc_send() tracepoint
        svcrdma: Split the svcrdma_wc_receive() tracepoint
        NFSD: Have legacy NFSD WRITE decoders use xdr_stream_subsegment()
        SUNRPC: xdr_stream_subsegment() must handle non-zero page_bases
        NFSD: Initialize pointer ni with NULL and not plain integer 0
        NFSD: simplify struct nfsfh
        ...
      38764c73
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-5.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 2ec20f48
      Linus Torvalds authored
      Pull NFS client updates from Trond Myklebust:
       "Highlights include:
      
        Features:
         - NFSv4.1 can always retrieve and cache the ACCESS mode on OPEN
         - Optimisations for READDIR and the 'ls -l' style workload
         - Further replacements of dprintk() with tracepoints and other
           tracing improvements
         - Ensure we re-probe NFSv4 server capabilities when the user does a
           "mount -o remount"
      
        Bugfixes:
         - Fix an Oops in pnfs_mark_request_commit()
         - Fix up deadlocks in the commit code
         - Fix regressions in NFSv2/v3 attribute revalidation due to the
           change_attr_type optimisations
         - Fix some dentry verifier races
         - Fix some missing dentry verifier settings
         - Fix a performance regression in nfs_set_open_stateid_locked()
         - SUNRPC was sending multiple SYN calls when re-establishing a TCP
           connection.
         - Fix multiple NFSv4 issues due to missing sanity checking of server
           return values
         - Fix a potential Oops when FREE_STATEID races with an unmount
      
        Cleanups:
         - Clean up the labelled NFS code
         - Remove unused header <linux/pnfs_osd_xdr.h>"
      
      * tag 'nfs-for-5.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (84 commits)
        NFSv4: Sanity check the parameters in nfs41_update_target_slotid()
        NFS: Remove the nfs4_label argument from decode_getattr_*() functions
        NFS: Remove the nfs4_label argument from nfs_setsecurity
        NFS: Remove the nfs4_label argument from nfs_fhget()
        NFS: Remove the nfs4_label argument from nfs_add_or_obtain()
        NFS: Remove the nfs4_label argument from nfs_instantiate()
        NFS: Remove the nfs4_label from the nfs_setattrres
        NFS: Remove the nfs4_label from the nfs4_getattr_res
        NFS: Remove the f_label from the nfs4_opendata and nfs_openres
        NFS: Remove the nfs4_label from the nfs4_lookupp_res struct
        NFS: Remove the label from the nfs4_lookup_res struct
        NFS: Remove the nfs4_label from the nfs4_link_res struct
        NFS: Remove the nfs4_label from the nfs4_create_res struct
        NFS: Remove the nfs4_label from the nfs_entry struct
        NFS: Create a new nfs_alloc_fattr_with_label() function
        NFS: Always initialise fattr->label in nfs_fattr_alloc()
        NFSv4.2: alloc_file_pseudo() takes an open flag, not an f_mode
        NFS: Don't allocate nfs_fattr on the stack in __nfs42_ssc_open()
        NFSv4: Remove unnecessary 'minor version' check
        NFSv4: Fix potential Oops in decode_op_map()
        ...
      2ec20f48
    • Linus Torvalds's avatar
      Merge branch 'exit-cleanups-for-v5.16' of... · 5147da90
      Linus Torvalds authored
      Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
      
      Pull exit cleanups from Eric Biederman:
       "While looking at some issues related to the exit path in the kernel I
        found several instances where the code is not using the existing
        abstractions properly.
      
        This set of changes introduces force_fatal_sig a way of sending a
        signal and not allowing it to be caught, and corrects the misuse of
        the existing abstractions that I found.
      
        A lot of the misuse of the existing abstractions are silly things such
        as doing something after calling a no return function, rolling BUG by
        hand, doing more work than necessary to terminate a kernel thread, or
        calling do_exit(SIGKILL) instead of calling force_sig(SIGKILL).
      
        In the review a deficiency in force_fatal_sig and force_sig_seccomp
        where ptrace or sigaction could prevent the delivery of the signal was
        found. I have added a change that adds SA_IMMUTABLE to change that
        makes it impossible to interrupt the delivery of those signals, and
        allows backporting to fix force_sig_seccomp
      
        And Arnd found an issue where a function passed to kthread_run had the
        wrong prototype, and after my cleanup was failing to build."
      
      * 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (23 commits)
        soc: ti: fix wkup_m3_rproc_boot_thread return type
        signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed
        signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
        exit/r8188eu: Replace the macro thread_exit with a simple return 0
        exit/rtl8712: Replace the macro thread_exit with a simple return 0
        exit/rtl8723bs: Replace the macro thread_exit with a simple return 0
        signal/x86: In emulate_vsyscall force a signal instead of calling do_exit
        signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig
        signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails
        exit/syscall_user_dispatch: Send ordinary signals on failure
        signal: Implement force_fatal_sig
        exit/kthread: Have kernel threads return instead of calling do_exit
        signal/s390: Use force_sigsegv in default_trap_handler
        signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
        signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
        signal/sparc: In setup_tsb_params convert open coded BUG into BUG
        signal/powerpc: On swapcontext failure force SIGSEGV
        signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
        signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
        signal/sparc32: Remove unreachable do_exit in do_sparc_fault
        ...
      5147da90
    • Linus Torvalds's avatar
      Merge tag 'kernel.sys.v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · a41b7445
      Linus Torvalds authored
      Pull prctl updates from Christian Brauner:
       "This contains the missing prctl uapi pieces for PR_SCHED_CORE.
      
        In order to activate core scheduling the caller is expected to specify
        the scope of the new core scheduling domain.
      
        For example, passing 2 in the 4th argument of
      
           prctl(PR_SCHED_CORE, PR_SCHED_CORE_CREATE, <pid>,  2, 0);
      
        would indicate that the new core scheduling domain encompasses all
        tasks in the process group of <pid>. Specifying 0 would only create a
        core scheduling domain for the thread identified by <pid> and 2 would
        encompass the whole thread-group of <pid>.
      
        Note, the values 0, 1, and 2 correspond to PIDTYPE_PID, PIDTYPE_TGID,
        and PIDTYPE_PGID. A first version tried to expose those values
        directly to which I objected because:
      
         - PIDTYPE_* is an enum that is kernel internal which we should not
           expose to userspace directly.
      
         - PIDTYPE_* indicates what a given struct pid is used for it doesn't
           express a scope.
      
        But what the 4th argument of PR_SCHED_CORE prctl() expresses is the
        scope of the operation, i.e. the scope of the core scheduling domain
        at creation time. So Eugene's patch now simply introduces three new
        defines PR_SCHED_CORE_SCOPE_THREAD, PR_SCHED_CORE_SCOPE_THREAD_GROUP,
        and PR_SCHED_CORE_SCOPE_PROCESS_GROUP. They simply express what
        happens.
      
        This has been on the mailing list for quite a while with all relevant
        scheduler folks Cced. I announced multiple times that I'd pick this up
        if I don't see or her anyone else doing it. None of this touches
        proper scheduler code but only concerns uapi so I think this is fine.
      
        With core scheduling being quite common now for vm managers (e.g.
        moving individual vcpu threads into their own core scheduling domain)
        and container managers (e.g. moving the init process into its own core
        scheduling domain and letting all created children inherit it) having
        to rely on raw numbers passed as the 4th argument in prctl() is a bit
        annoying and everyone is starting to come up with their own defines"
      
      * tag 'kernel.sys.v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        uapi/linux/prctl: provide macro definitions for the PR_SCHED_CORE type argument
      a41b7445
    • Linus Torvalds's avatar
      Merge tag 'pidfd.v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · 6752de1a
      Linus Torvalds authored
      Pull pidfd updates from Christian Brauner:
       "Various places in the kernel have picked up pidfds.
      
        The two most recent additions have probably been the ability to use
        pidfds in bpf maps and the usage of pidfds in mm-based syscalls such
        as process_mrelease() and process_madvise().
      
        The same pattern to turn a pidfd into a struct task exists in two
        places. One of those places used PIDTYPE_TGID while the other one used
        PIDTYPE_PID even though it is clearly documented in all pidfd-helpers
        that pidfds __currently__ only refer to thread-group leaders (subject
        to change in the future if need be).
      
        This isn't a bug per se but has the potential to be one if we allow
        pidfds to refer to individual threads. If that happens we want to
        audit all codepaths that make use of them to ensure they can deal with
        pidfds refering to individual threads.
      
        This adds a simple helper to turn a pidfd into a struct task making it
        easy to grep for such places. Plus, it gets rid of code-duplication"
      
      * tag 'pidfd.v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        mm: use pidfd_get_task()
        pid: add pidfd_get_task() helper
      6752de1a
    • Linus Torvalds's avatar
      Merge tag 'thermal-5.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 88100752
      Linus Torvalds authored
      Pull more thermal control updates from Rafael Wysocki:
       "These fix two issues in the thermal core and one in the int340x
        thermal driver.
      
        Specifics:
      
         - Replace pr_warn() with pr_warn_once() in user_space_bind() to
           reduce kernel log noise (Rafael Wysocki).
      
         - Extend the RFIM mailbox interface in the int340x thermal driver to
           return 64 bit values to allow all values returned by the hardware
           to be handled correctly (Srinivas Pandruvada).
      
         - Fix possible NULL pointer dereferences in the of_thermal_ family of
           functions (Subbaraman Narayanamurthy)"
      
      * tag 'thermal-5.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        thermal: Replace pr_warn() with pr_warn_once() in user_space_bind()
        thermal: Fix NULL pointer dereferences in of_thermal_ functions
        thermal/drivers/int340x: processor_thermal: Suppot 64 bit RFIM responses
      88100752
    • Linus Torvalds's avatar
      Merge tag 'pm-5.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · d422555f
      Linus Torvalds authored
      Pull more power management updates from Rafael Wysocki:
       "These fix three intel_pstate driver regressions, fix locking in the
        core code suspending and resuming devices during system PM
        transitions, fix the handling of cpuidle drivers based on runtime PM
        during system-wide suspend, fix two issues in the operating
        performance points (OPP) framework and resource-managed helpers to it.
      
        Specifics:
      
         - Fix two intel_pstate driver regressions related to the HWP
           interrupt handling added recently (Srinivas Pandruvada).
      
         - Fix intel_pstate driver regression introduced during the 5.11 cycle
           and causing HWP desired performance to be mishandled in some cases
           when switching driver modes and during system suspend and shutdown
           (Rafael Wysocki).
      
         - Fix system-wide device suspend and resume locking to avoid
           deadlocks when device objects are deleted during a system-wide PM
           transition (Rafael Wysocki).
      
         - Modify system-wide suspend of devices to prevent cpuidle drivers
           based on runtime PM from misbehaving during the "no IRQ" phase of
           it (Ulf Hansson).
      
         - Fix return value of _opp_add_static_v2() helper (YueHaibing).
      
         - Fix required-opp handle count (Pavankumar Kondeti).
      
         - Add resource managed OPP helpers, update dev_pm_opp_attach_genpd(),
           update their devfreq users, and make minor DT binding change
           (Dmitry Osipenko)"
      
      * tag 'pm-5.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM: sleep: Avoid calling put_device() under dpm_list_mtx
        cpufreq: intel_pstate: Clear HWP Status during HWP Interrupt enable
        cpufreq: intel_pstate: Fix unchecked MSR 0x773 access
        cpufreq: intel_pstate: Clear HWP desired on suspend/shutdown and offline
        PM: sleep: Fix runtime PM based cpuidle support
        dt-bindings: opp: Allow multi-worded OPP entry name
        opp: Fix return in _opp_add_static_v2()
        PM / devfreq: tegra30: Check whether clk_round_rate() returns zero rate
        PM / devfreq: tegra30: Use resource-managed helpers
        PM / devfreq: Add devm_devfreq_add_governor()
        opp: Add more resource-managed variants of dev_pm_opp_of_add_table()
        opp: Change type of dev_pm_opp_attach_genpd(names) argument
        opp: Fix required-opps phandle array count check
      d422555f
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 285fc3db
      Linus Torvalds authored
      Pull more ACPI updates from Rafael Wysocki:
       "These add support for a new ACPI device configuration object called
        _DSC, fix some issues including one recent regression, add two new
        items to quirk lists and clean up assorted pieces of code.
      
        Specifics:
      
         - Add support for new ACPI device configuration object called _DSC
           ("Deepest State for Configuration") to allow certain devices to be
           probed without changing their power states, document it and make
           two drivers use it (Sakari Ailus, Rajmohan Mani).
      
         - Fix device wakeup power reference counting broken recently by
           mistake (Rafael Wysocki).
      
         - Drop unused symbol and macros depending on it from acgcc.h (Rafael
           Wysocki).
      
         - Add HP ZHAN 66 Pro to the "no EC wakeup" quirk list (Binbin Zhou).
      
         - Add Xiaomi Mi Pad 2 to the backlight quirk list and drop an unused
           piece of data from all of the list entries (Hans de Goede).
      
         - Fix register read accesses handling in the Intel PMIC operation
           region driver (Hans de Goede).
      
         - Clean up static variables initialization in the EC driver
           (wangzhitong)"
      
      * tag 'acpi-5.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        Documentation: ACPI: Fix non-D0 probe _DSC object example
        ACPI: Drop ACPI_USE_BUILTIN_STDARG ifdef from acgcc.h
        ACPI: PM: Fix device wakeup power reference counting error
        ACPI: video: use platform backlight driver on Xiaomi Mi Pad 2
        ACPI: video: Drop dmi_system_id.ident settings from video_detect_dmi_table[]
        ACPI: PMIC: Fix intel_pmic_regs_handler() read accesses
        ACPI: EC: Remove initialization of static variables to false
        ACPI: EC: Use ec_no_wakeup on HP ZHAN 66 Pro
        at24: Support probing while in non-zero ACPI D state
        media: i2c: imx319: Support device probe in non-zero ACPI D state
        ACPI: Add a convenience function to tell a device is in D0 state
        Documentation: ACPI: Document _DSC object usage for enum power state
        i2c: Allow an ACPI driver to manage the device's power state during probe
        ACPI: scan: Obtain device's desired enumeration power state
      285fc3db