Skip to content
  1. Oct 15, 2021
  2. Oct 14, 2021
    • Kees Cook's avatar
      sched: Fill unconditional hole induced by sched_entity · 804bccba
      Kees Cook authored
      
      
      With struct sched_entity before the other sched entities, its alignment
      won't induce a struct hole. This saves 64 bytes in defconfig task_struct:
      
      Before:
      	...
              unsigned int               rt_priority;          /*   120     4 */
      
              /* XXX 4 bytes hole, try to pack */
      
              /* --- cacheline 2 boundary (128 bytes) --- */
              const struct sched_class  * sched_class;         /*   128     8 */
      
              /* XXX 56 bytes hole, try to pack */
      
              /* --- cacheline 3 boundary (192 bytes) --- */
              struct sched_entity        se __attribute__((__aligned__(64))); /*   192   448 */
              /* --- cacheline 10 boundary (640 bytes) --- */
              struct sched_rt_entity     rt;                   /*   640    48 */
              struct sched_dl_entity     dl __attribute__((__aligned__(8))); /*   688   224 */
              /* --- cacheline 14 boundary (896 bytes) was 16 bytes ago --- */
      
      After:
      	...
              unsigned int               rt_priority;          /*   120     4 */
      
              /* XXX 4 bytes hole, try to pack */
      
              /* --- cacheline 2 boundary (128 bytes) --- */
              struct sched_entity        se __attribute__((__aligned__(64))); /*   128   448 */
              /* --- cacheline 9 boundary (576 bytes) --- */
              struct sched_rt_entity     rt;                   /*   576    48 */
              struct sched_dl_entity     dl __attribute__((__aligned__(8))); /*   624   224 */
              /* --- cacheline 13 boundary (832 bytes) was 16 bytes ago --- */
      
      Summary diff:
      -	/* size: 7040, cachelines: 110, members: 188 */
      +	/* size: 6976, cachelines: 109, members: 188 */
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20210924025450.4138503-1-keescook@chromium.org
      804bccba
    • Zhang Qiao's avatar
      kernel/sched: Fix sched_fork() access an invalid sched_task_group · 4ef0c5c6
      Zhang Qiao authored
      There is a small race between copy_process() and sched_fork()
      where child->sched_task_group point to an already freed pointer.
      
      	parent doing fork()      | someone moving the parent
      				 | to another cgroup
        -------------------------------+-------------------------------
        copy_process()
            + dup_task_struct()<1>
      				  parent move to another cgroup,
      				  and free the old cgroup. <2>
            + sched_fork()
      	+ __set_task_cpu()<3>
      	+ task_fork_fair()
      	  + sched_slice()<4>
      
      In the worst case, this bug can lead to "use-after-free" and
      cause panic as shown above:
      
        (1) parent copy its sched_task_group to child at <1>;
      
        (2) someone move the parent to another cgroup and free the old
            cgroup at <2>;
      
        (3) the sched_task_group and cfs_rq that belong to the old cgroup
            will be accessed at <3> and <4>, which cause a panic:
      
        [] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
        [] PGD 8000001fa0a86067 P4D 8000001fa0a86067 PUD 2029955067 PMD 0
        [] Oops: 0000 [#1] SMP PTI
        [] CPU: 7 PID: 648398 Comm: ebizzy Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0.x86_64+ #1
        [] RIP: 0010:sched_slice+0x84/0xc0
      
        [] Call Trace:
        []  task_fork_fair+0x81/0x120
        []  sched_fork+0x132/0x240
        []  copy_process.part.5+0x675/0x20e0
        []  ? __handle_mm_fault+0x63f/0x690
        []  _do_fork+0xcd/0x3b0
        []  do_syscall_64+0x5d/0x1d0
        []  entry_SYSCALL_64_after_hwframe+0x65/0xca
        [] RIP: 0033:0x7f04418cd7e1
      
      Between cgroup_can_fork() and cgroup_post_fork(), the cgroup
      membership and thus sched_task_group can't change. So update child's
      sched_task_group at sched_post_fork() and move task_fork() and
      __set_task_cpu() (where accees the sched_task_group) from sched_fork()
      to sched_post_fork().
      
      Fixes: 8323f26c
      
       ("sched: Fix race in task_group")
      Signed-off-by: default avatarZhang Qiao <zhangqiao22@huawei.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Link: https://lkml.kernel.org/r/20210915064030.2231-1-zhangqiao22@huawei.com
      4ef0c5c6
    • Yicong Yang's avatar
      sched/topology: Remove unused numa_distance in cpu_attach_domain() · f9ec6fea
      Yicong Yang authored
      numa_distance in cpu_attach_domain() is introduced in
      commit b5b21734 ("sched/topology: Warn when NUMA diameter > 2")
      to warn user when NUMA diameter > 2 as we'll misrepresent
      the scheduler topology structures at that time. This is
      fixed by Barry in commit 585b6d27
      
       ("sched/topology: fix the issue
      groups don't span domain->span for NUMA diameter > 2") and
      numa_distance is unused now. So remove it.
      
      Signed-off-by: default avatarYicong Yang <yangyicong@hisilicon.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarBarry Song <baohua@kernel.org>
      Reviewed-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Link: https://lore.kernel.org/r/20210915063158.80639-1-yangyicong@hisilicon.com
      f9ec6fea
    • Bharata B Rao's avatar
      sched/numa: Fix a few comments · 7d380f24
      Bharata B Rao authored
      
      
      Fix a few comments to help understand them better.
      
      Signed-off-by: default avatarBharata B Rao <bharata@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Link: https://lkml.kernel.org/r/20211004105706.3669-4-bharata@amd.com
      7d380f24
    • Bharata B Rao's avatar
      sched/numa: Remove the redundant member numa_group::fault_cpus · 5b763a14
      Bharata B Rao authored
      
      
      numa_group::fault_cpus is actually a pointer to the region
      in numa_group::faults[] where NUMA_CPU stats are located.
      
      Remove this redundant member and use numa_group::faults[NUMA_CPU]
      directly like it is done for similar per-process numa fault stats.
      
      There is no functionality change due to this commit.
      
      Signed-off-by: default avatarBharata B Rao <bharata@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Link: https://lkml.kernel.org/r/20211004105706.3669-3-bharata@amd.com
      5b763a14
    • Bharata B Rao's avatar
      sched/numa: Replace hard-coded number by a define in numa_task_group() · 7a2341fc
      Bharata B Rao authored
      
      
      While allocating group fault stats, task_numa_group()
      is using a hard coded number 4. Replace this by
      NR_NUMA_HINT_FAULT_STATS.
      
      No functionality change in this commit.
      
      Signed-off-by: default avatarBharata B Rao <bharata@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Link: https://lkml.kernel.org/r/20211004105706.3669-2-bharata@amd.com
      7a2341fc
    • Peter Zijlstra's avatar
      sched,livepatch: Use wake_up_if_idle() · 5de62ea8
      Peter Zijlstra authored
      
      
      Make sure to prod idle CPUs so they call klp_update_patch_state().
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Acked-by: default avatarMiroslav Benes <mbenes@suse.cz>
      Acked-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Tested-by: default avatarPetr Mladek <pmladek@suse.com>
      Tested-by: Vasily Gorbik <gor@linux.ibm.com> # on s390
      Link: https://lkml.kernel.org/r/20210929151723.162004989@infradead.org
      5de62ea8
  3. Oct 07, 2021
  4. Oct 06, 2021
    • Peter Zijlstra's avatar
      sched: Fix DEBUG && !SCHEDSTATS warn · 769fdf83
      Peter Zijlstra authored
      
      
      When !SCHEDSTATS schedstat_enabled() is an unconditional 0 and the
      whole block doesn't exist, however GCC figures the scoped variable
      'stats' is unused and complains about it.
      
      Upgrade the warning from -Wunused-variable to -Wunused-but-set-variable
      by writing it in two statements. This fixes the build because the new
      warning is in W=1.
      
      Given that whole if(0) {} thing, I don't feel motivated to change
      things overly much and quite strongly feel this is the compiler being
      daft.
      
      Fixes: cb3e971c435d ("sched: Make struct sched_statistics independent of fair sched class")
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      769fdf83
  5. Oct 05, 2021