Skip to content
  1. Dec 15, 2020
  2. Dec 11, 2020
    • Barry Song's avatar
      sched/fair: Trivial correction of the newidle_balance() comment · 5b78f2dc
      Barry Song authored
      
      
      idle_balance() has been renamed to newidle_balance(). To differentiate
      with nohz_idle_balance, it seems refining the comment will be helpful
      for the readers of the code.
      
      Signed-off-by: default avatarBarry Song <song.bao.hua@hisilicon.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lkml.kernel.org/r/20201202220641.22752-1-song.bao.hua@hisilicon.com
      5b78f2dc
    • Mel Gorman's avatar
      sched/fair: Clear SMT siblings after determining the core is not idle · 13d5a5e9
      Mel Gorman authored
      
      
      The clearing of SMT siblings from the SIS mask before checking for an idle
      core is a small but unnecessary cost. Defer the clearing of the siblings
      until the scan moves to the next potential target. The cost of this was
      not measured as it is borderline noise but it should be self-evident.
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Link: https://lkml.kernel.org/r/20201130144020.GS3371@techsingularity.net
      13d5a5e9
    • Mauro Carvalho Chehab's avatar
      sched: Fix kernel-doc markup · 59a74b15
      Mauro Carvalho Chehab authored
      
      
      Kernel-doc requires that a kernel-doc markup to be immediately
      below the function prototype, as otherwise it will rename it.
      So, move sys_sched_yield() markup to the right place.
      
      Also fix the cpu_util() markup: Kernel-doc markups
      should use this format:
              identifier - description
      
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Link: https://lkml.kernel.org/r/50cd6f460aeb872ebe518a8e9cfffda2df8bdb0a.1606823973.git.mchehab+huawei@kernel.org
      59a74b15
    • Giovanni Gherdovich's avatar
      x86: Print ratio freq_max/freq_base used in frequency invariance calculations · 3149cd55
      Giovanni Gherdovich authored
      
      
      The value freq_max/freq_base is a fundamental component of frequency
      invariance calculations. It may come from a variety of sources such as MSRs
      or ACPI data, tracking it down when troubleshooting a system could be
      non-trivial. It is worth saving it in the kernel logs.
      
       # dmesg | grep 'Estimated ratio of average max'
       [   14.024036] smpboot: Estimated ratio of average max frequency by base frequency (times 1024): 1289
      
      Signed-off-by: default avatarGiovanni Gherdovich <ggherdovich@suse.cz>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lkml.kernel.org/r/20201112182614.10700-4-ggherdovich@suse.cz
      3149cd55
    • Giovanni Gherdovich's avatar
      x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC · 976df7e5
      Giovanni Gherdovich authored
      
      
      Frequency invariant accounting calculations need the ratio
      freq_curr/freq_max, but freq_max is unknown as it depends on dynamic power
      allocation between cores: AMD EPYC CPUs implement "Core Performance Boost".
      Three candidates are considered to estimate this value:
      
      - maximum non-boost frequency
      - maximum boost frequency
      - the mid point between the above two
      
      Experimental data on an AMD EPYC Zen2 machine slightly favors the third
      option, which is applied with this patch.
      
      The analysis uses the ondemand cpufreq governor as baseline, and compares
      it with schedutil in a number of configurations. Using the freq_max value
      described above offers a moderate advantage in performance and efficiency:
      
      sugov-max (freq_max=max_boost) performs the worst on tbench: less
      throughput and reduced efficiency than the other invariant-schedutil
      options (see "Data Overview" below). Consider that tbench is generally a
      problematic case as no schedutil version currently is better than ondemand.
      
      sugov-P0 (freq_max=max_P) is the worst on dbench, while the other sugov's
      can surpass ondemand with less filesystem latency and slightly increased
      efficiency.
      
      1. DATA OVERVIEW
      2. DETAILED PERFORMANCE TABLES
      3. POWER CONSUMPTION TABLE
      
      1. DATA OVERVIEW
      ================
      
      sugov-noinv : non-invariant schedutil governor
      sugov-max   : invariant schedutil, freq_max=max_boost
      sugov-mid   : invariant schedutil, freq_max=midpoint
      sugov-P0    : invariant schedutil, freq_max=max_P
      perfgov     : performance governor
      
      driver      : acpi_cpufreq
      machine     : AMD EPYC 7742 (Zen2, aka "Rome"), dual socket,
                    128 cores / 256 threads, SATA SSD storage, 250G of memory,
      	      XFS filesystem
      
      Benchmarks are described in the next section.
      Tilde (~) means the value is the same as baseline.
      
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                  ondemand  perfgov  sugov-noinv  sugov-max  sugov-mid  sugov-P0  better if
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                              PERFORMANCE RATIOS
      tbench        1.00       1.44       0.90       0.87       0.93       0.93      higher
      dbench        1.00       0.91       0.95       0.94       0.94       1.06      lower
      kernbench     1.00       0.93       ~          ~          ~          0.97      lower
      gitsource     1.00       0.66       0.97       0.96       ~          0.95      lower
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                          PERFORMANCE-PER-WATT RATIOS
      tbench        1.00       1.16       0.84       0.84       0.88       0.85      higher
      dbench        1.00       1.03       1.02       1.02       1.02       0.93      higher
      kernbench     1.00       1.05       ~          ~          ~          ~         higher
      gitsource     1.00       1.46       1.04       1.04       ~          1.05      higher
      
      2. DETAILED PERFORMANCE TABLES
      ==============================
      
      Benchmark          : tbench4 (i.e. dbench4 over the network, actually loopback)
      Varying parameter  : number of clients
      Unit               : MB/sec (higher is better)
      
                        5.9.0-ondemand (BASELINE)                   5.9.0-perfgov               5.9.0-sugov-noinv
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      Hmean  1        427.19  +- 0.16% (        )     778.35  +- 0.10% (  82.20%)     346.92  +- 0.14% ( -18.79%)
      Hmean  2        853.82  +- 0.09% (        )    1536.23  +- 0.03% (  79.93%)     694.36  +- 0.05% ( -18.68%)
      Hmean  4       1657.54  +- 0.12% (        )    2938.18  +- 0.12% (  77.26%)    1362.81  +- 0.11% ( -17.78%)
      Hmean  8       3301.87  +- 0.06% (        )    5679.10  +- 0.04% (  72.00%)    2693.35  +- 0.04% ( -18.43%)
      Hmean  16      6139.65  +- 0.05% (        )    9498.81  +- 0.04% (  54.71%)    4889.97  +- 0.17% ( -20.35%)
      Hmean  32     11170.28  +- 0.09% (        )   17393.25  +- 0.08% (  55.71%)    9104.55  +- 0.09% ( -18.49%)
      Hmean  64     19322.97  +- 0.17% (        )   31573.91  +- 0.08% (  63.40%)   18552.52  +- 0.40% (  -3.99%)
      Hmean  128    30383.71  +- 0.11% (        )   37416.91  +- 0.15% (  23.15%)   25938.70  +- 0.41% ( -14.63%)
      Hmean  256    31143.96  +- 0.41% (        )   30908.76  +- 0.88% (  -0.76%)   29754.32  +- 0.24% (  -4.46%)
      Hmean  512    30858.49  +- 0.26% (        )   38524.60  +- 1.19% (  24.84%)   42080.39  +- 0.56% (  36.37%)
      Hmean  1024   39187.37  +- 0.19% (        )   36213.86  +- 0.26% (  -7.59%)   39555.98  +- 0.12% (   0.94%)
      
                                  5.9.0-sugov-max                 5.9.0-sugov-mid                  5.9.0-sugov-P0
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      Hmean  1        352.59  +- 1.03% ( -17.46%)     352.08  +- 0.75% ( -17.58%)     352.31  +- 1.48% ( -17.53%)
      Hmean  2        697.32  +- 0.08% ( -18.33%)     700.16  +- 0.20% ( -18.00%)     696.79  +- 0.06% ( -18.39%)
      Hmean  4       1369.88  +- 0.04% ( -17.35%)    1369.72  +- 0.07% ( -17.36%)    1365.91  +- 0.05% ( -17.59%)
      Hmean  8       2696.79  +- 0.04% ( -18.33%)    2711.06  +- 0.04% ( -17.89%)    2715.10  +- 0.61% ( -17.77%)
      Hmean  16      4725.03  +- 0.03% ( -23.04%)    4875.65  +- 0.02% ( -20.59%)    4953.05  +- 0.28% ( -19.33%)
      Hmean  32      9231.65  +- 0.10% ( -17.36%)    8704.89  +- 0.27% ( -22.07%)   10562.02  +- 0.36% (  -5.45%)
      Hmean  64     15364.27  +- 0.19% ( -20.49%)   17786.64  +- 0.15% (  -7.95%)   19665.40  +- 0.22% (   1.77%)
      Hmean  128    42100.58  +- 0.13% (  38.56%)   34946.28  +- 0.13% (  15.02%)   38635.79  +- 0.06% (  27.16%)
      Hmean  256    30660.23  +- 1.08% (  -1.55%)   32307.67  +- 0.54% (   3.74%)   31153.27  +- 0.12% (   0.03%)
      Hmean  512    24604.32  +- 0.14% ( -20.27%)   40408.50  +- 1.10% (  30.95%)   38800.29  +- 1.23% (  25.74%)
      Hmean  1024   35535.47  +- 0.28% (  -9.32%)   41070.38  +- 2.56% (   4.81%)   31308.29  +- 2.52% ( -20.11%)
      
      Benchmark          : dbench (filesystem stressor)
      Varying parameter  : number of clients
      Unit               : seconds (lower is better)
      
      NOTE-1: This dbench version measures the average latency of a set of filesystem
              operations, as we found the traditional dbench metric (throughput) to be
      	misleading.
      NOTE-2: Due to high variability, we partition the original dataset and apply
              statistical bootrapping (a resampling method). Accuracy is reported in the
      	form of 95% confidence intervals.
      
                        5.9.0-ondemand (BASELINE)                   5.9.0-perfgov               5.9.0-sugov-noinv
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      SubAmean  1         98.79  +- 0.92 (        )      83.36  +- 0.82 (  15.62%)      84.82  +- 0.92 (  14.14%)
      SubAmean  2        116.00  +- 0.89 (        )     102.12  +- 0.77 (  11.96%)     109.63  +- 0.89 (   5.49%)
      SubAmean  4        149.90  +- 1.03 (        )     132.12  +- 0.91 (  11.86%)     143.90  +- 1.15 (   4.00%)
      SubAmean  8        182.41  +- 1.13 (        )     159.86  +- 0.93 (  12.36%)     165.82  +- 1.03 (   9.10%)
      SubAmean  16       237.83  +- 1.23 (        )     219.46  +- 1.14 (   7.72%)     229.28  +- 1.19 (   3.59%)
      SubAmean  32       334.34  +- 1.49 (        )     309.94  +- 1.42 (   7.30%)     321.19  +- 1.36 (   3.93%)
      SubAmean  64       576.61  +- 2.16 (        )     540.75  +- 2.00 (   6.22%)     551.27  +- 1.99 (   4.39%)
      SubAmean  128     1350.07  +- 4.14 (        )    1205.47  +- 3.20 (  10.71%)    1280.26  +- 3.75 (   5.17%)
      SubAmean  256     3444.42  +- 7.97 (        )    3698.00 +- 27.43 (  -7.36%)    3494.14  +- 7.81 (  -1.44%)
      SubAmean  2048   39457.89 +- 29.01 (        )   34105.33 +- 41.85 (  13.57%)   39688.52 +- 36.26 (  -0.58%)
      
                                  5.9.0-sugov-max                 5.9.0-sugov-mid                  5.9.0-sugov-P0
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      SubAmean  1         85.68  +- 1.04 (  13.27%)      84.16  +- 0.84 (  14.81%)      83.99  +- 0.90 (  14.99%)
      SubAmean  2        108.42  +- 0.95 (   6.54%)     109.91  +- 1.39 (   5.24%)     112.06  +- 0.91 (   3.39%)
      SubAmean  4        136.90  +- 1.04 (   8.67%)     137.59  +- 0.93 (   8.21%)     136.55  +- 0.95 (   8.91%)
      SubAmean  8        163.15  +- 0.96 (  10.56%)     166.07  +- 1.02 (   8.96%)     165.81  +- 0.99 (   9.10%)
      SubAmean  16       224.86  +- 1.12 (   5.45%)     223.83  +- 1.06 (   5.89%)     230.66  +- 1.19 (   3.01%)
      SubAmean  32       320.51  +- 1.38 (   4.13%)     322.85  +- 1.49 (   3.44%)     321.96  +- 1.46 (   3.70%)
      SubAmean  64       553.25  +- 1.93 (   4.05%)     554.19  +- 2.08 (   3.89%)     562.26  +- 2.22 (   2.49%)
      SubAmean  128     1264.35  +- 3.72 (   6.35%)    1256.99  +- 3.46 (   6.89%)    2018.97 +- 18.79 ( -49.55%)
      SubAmean  256     3466.25  +- 8.25 (  -0.63%)    3450.58  +- 8.44 (  -0.18%)    5032.12 +- 38.74 ( -46.09%)
      SubAmean  2048   39133.10 +- 45.71 (   0.82%)   39905.95 +- 34.33 (  -1.14%)   53811.86 +-193.04 ( -36.38%)
      
      Benchmark          : kernbench (kernel compilation)
      Varying parameter  : number of jobs
      Unit               : seconds (lower is better)
      
                        5.9.0-ondemand (BASELINE)                   5.9.0-perfgov               5.9.0-sugov-noinv
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      Amean  2        471.71 +- 26.61% (        )     409.88 +- 16.99% (  13.11%)     430.63  +- 0.18% (   8.71%)
      Amean  4        211.87  +- 0.58% (        )     194.03  +- 0.74% (   8.42%)     215.33  +- 0.64% (  -1.63%)
      Amean  8        109.79  +- 1.27% (        )     101.43  +- 1.53% (   7.61%)     111.05  +- 1.95% (  -1.15%)
      Amean  16        59.50  +- 1.28% (        )      55.61  +- 1.35% (   6.55%)      59.65  +- 1.78% (  -0.24%)
      Amean  32        34.94  +- 1.22% (        )      32.36  +- 1.95% (   7.41%)      35.44  +- 0.63% (  -1.43%)
      Amean  64        22.58  +- 0.38% (        )      20.97  +- 1.28% (   7.11%)      22.41  +- 1.73% (   0.74%)
      Amean  128       17.72  +- 0.44% (        )      16.68  +- 0.32% (   5.88%)      17.65  +- 0.96% (   0.37%)
      Amean  256       16.44  +- 0.53% (        )      15.76  +- 0.32% (   4.18%)      16.76  +- 0.60% (  -1.93%)
      Amean  512       16.54  +- 0.21% (        )      15.62  +- 0.41% (   5.53%)      16.84  +- 0.85% (  -1.83%)
      
                                  5.9.0-sugov-max                 5.9.0-sugov-mid                  5.9.0-sugov-P0
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      Amean  2        421.30  +- 0.24% (  10.69%)     419.26  +- 0.15% (  11.12%)     414.38  +- 0.33% (  12.15%)
      Amean  4  	217.81  +- 5.53% (  -2.80%)     211.63  +- 0.99% (   0.12%)     208.43  +- 0.47% (   1.63%)
      Amean  8  	108.80  +- 0.43% (   0.90%)     108.48  +- 1.44% (   1.19%)     108.59  +- 3.08% (   1.09%)
      Amean  16 	 58.84  +- 0.74% (   1.12%)      58.37  +- 0.94% (   1.91%)      57.78  +- 0.78% (   2.90%)
      Amean  32 	 34.04  +- 2.00% (   2.59%)      34.28  +- 1.18% (   1.91%)      33.98  +- 2.21% (   2.75%)
      Amean  64 	 22.22  +- 1.69% (   1.60%)      22.27  +- 1.60% (   1.38%)      22.25  +- 1.41% (   1.47%)
      Amean  128	 17.55  +- 0.24% (   0.97%)      17.53  +- 0.94% (   1.04%)      17.49  +- 0.43% (   1.30%)
      Amean  256	 16.51  +- 0.46% (  -0.40%)      16.48  +- 0.48% (  -0.19%)      16.44  +- 1.21% (   0.00%)
      Amean  512	 16.50  +- 0.35% (   0.19%)      16.35  +- 0.42% (   1.14%)      16.37  +- 0.33% (   0.99%)
      
      Benchmark          : gitsource (time to run the git unit test suite)
      Varying parameter  : none
      Unit               : seconds (lower is better)
      
                        5.9.0-ondemand (BASELINE)                   5.9.0-perfgov               5.9.0-sugov-noinv
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      Amean          1035.76  +- 0.30% (        )     688.21  +- 0.04% (  33.56%)    1003.85  +- 0.14% (   3.08%)
      
                                  5.9.0-sugov-max                 5.9.0-sugov-mid                  5.9.0-sugov-P0
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      Amean           995.82  +- 0.08% (   3.86%)    1011.98  +- 0.03% (   2.30%)     986.87  +- 0.19% (   4.72%)
      
      3. POWER CONSUMPTION TABLE
      ==========================
      
      Average power consumption (watts).
      
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                  ondemand  perfgov  sugov-noinv  sugov-max  sugov-mid  sugov-P0
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      tbench4     227.25     281.83     244.17     236.76     241.50     247.99
      dbench4     151.97     161.87     157.08     158.10     158.06     153.73
      kernbench   162.78     167.22     162.90     164.19     164.65     164.72
      gitsource   133.65     139.00     133.04     134.43     134.18     134.32
      
      Signed-off-by: default avatarGiovanni Gherdovich <ggherdovich@suse.cz>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lkml.kernel.org/r/20201112182614.10700-3-ggherdovich@suse.cz
      976df7e5
    • Nathan Fontenot's avatar
      x86, sched: Calculate frequency invariance for AMD systems · 41ea6672
      Nathan Fontenot authored
      
      
      This is the first pass in creating the ability to calculate the
      frequency invariance on AMD systems. This approach uses the CPPC
      highest performance and nominal performance values that range from
      0 - 255 instead of a high and base frquency. This is because we do
      not have the ability on AMD to get a highest frequency value.
      
      On AMD systems the highest performance and nominal performance
      vaues do correspond to the highest and base frequencies for the system
      so using them should produce an appropriate ratio but some tweaking
      is likely necessary.
      
      Due to CPPC being initialized later in boot than when the frequency
      invariant calculation is currently made, I had to create a callback
      from the CPPC init code to do the calculation after we have CPPC
      data.
      
      Special thanks to "kernel test robot <lkp@intel.com>" for reporting that
      compilation of drivers/acpi/cppc_acpi.c is conditional to
      CONFIG_ACPI_CPPC_LIB, not just CONFIG_ACPI.
      
      [ ggherdovich@suse.cz: made safe under CPU hotplug, edited changelog. ]
      
      Signed-off-by: default avatarNathan Fontenot <nathan.fontenot@amd.com>
      Signed-off-by: default avatarGiovanni Gherdovich <ggherdovich@suse.cz>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lkml.kernel.org/r/20201112182614.10700-2-ggherdovich@suse.cz
      41ea6672
  3. Nov 27, 2020
  4. Nov 26, 2020
    • Linus Torvalds's avatar
      Merge tag 'media/v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · fa02fcd9
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
      
       - a rand Kconfig fixup for mtk-vcodec
      
       - a fix at h264 handling at cedrus codec driver
      
       - some warning fixes when config PM is not enabled at marvell-ccic
      
       - two fixes at venus codec driver: one related to codec profile and the
         other one related to a bad error path which causes an OOPS on module
         re-bind
      
      * tag 'media/v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        media: venus: pm_helpers: Fix kernel module reload
        media: venus: venc: Fix setting of profile and level
        media: cedrus: h264: Fix check for presence of scaling matrix
        media: media/platform/marvell-ccic: fix warnings when CONFIG_PM is not enabled
        media: mtk-vcodec: fix build breakage when one of VPU or SCP is enabled
        media: mtk-vcodec: move firmware implementations into their own files
      fa02fcd9
  5. Nov 25, 2020
    • Linus Torvalds's avatar
      Merge tag '5.10-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 127c501a
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Four smb3 fixes for stable: one fixes a memleak, the other three
        address a problem found with decryption offload that can cause a use
        after free"
      
      * tag '5.10-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: Handle error case during offload read path
        smb3: Avoid Mid pending list corruption
        smb3: Call cifs reconnect from demultiplex thread
        cifs: fix a memleak with modefromsid
      127c501a
    • Hugh Dickins's avatar
      mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback) · 073861ed
      Hugh Dickins authored
      Twice now, when exercising ext4 looped on shmem huge pages, I have crashed
      on the PF_ONLY_HEAD check inside PageWaiters(): ext4_finish_bio() calling
      end_page_writeback() calling wake_up_page() on tail of a shmem huge page,
      no longer an ext4 page at all.
      
      The problem is that PageWriteback is not accompanied by a page reference
      (as the NOTE at the end of test_clear_page_writeback() acknowledges): as
      soon as TestClearPageWriteback has been done, that page could be removed
      from page cache, freed, and reused for something else by the time that
      wake_up_page() is reached.
      
      https://lore.kernel.org/linux-mm/20200827122019.GC14765@casper.infradead.org/
      Matthew Wilcox suggested avoiding or weakening the PageWaiters() tail
      check; but I'm paranoid about even looking at an unreferenced struct page,
      lest its memory might itself have already been reused or hotremoved (and
      wake_up_page_bit() may modify that memory with its ClearPageWaiters()).
      
      Then on crashing a second time, realized there's a stronger reason against
      that approach.  If my testing just occasionally crashes on that check,
      when the page is reused for part of a compound page, wouldn't it be much
      more common for the page to get reused as an order-0 page before reaching
      wake_up_page()?  And on rare occasions, might that reused page already be
      marked PageWriteback by its new user, and already be waited upon?  What
      would that look like?
      
      It would look like BUG_ON(PageWriteback) after wait_on_page_writeback()
      in write_cache_pages() (though I have never seen that crash myself).
      
      Matthew Wilcox explaining this to himself:
       "page is allocated, added to page cache, dirtied, writeback starts,
      
        --- thread A ---
        filesystem calls end_page_writeback()
              test_clear_page_writeback()
        --- context switch to thread B ---
        truncate_inode_pages_range() finds the page, it doesn't have writeback set,
        we delete it from the page cache.  Page gets reallocated, dirtied, writeback
        starts again.  Then we call write_cache_pages(), see
        PageWriteback() set, call wait_on_page_writeback()
        --- context switch back to thread A ---
        wake_up_page(page, PG_writeback);
        ... thread B is woken, but because the wakeup was for the old use of
        the page, PageWriteback is still set.
      
        Devious"
      
      And prior to 2a9127fc
      
       ("mm: rewrite wait_on_page_bit_common() logic")
      this would have been much less likely: before that, wake_page_function()'s
      non-exclusive case would stop walking and not wake if it found Writeback
      already set again; whereas now the non-exclusive case proceeds to wake.
      
      I have not thought of a fix that does not add a little overhead: the
      simplest fix is for end_page_writeback() to get_page() before calling
      test_clear_page_writeback(), then put_page() after wake_up_page().
      
      Was there a chance of missed wakeups before, since a page freed before
      reaching wake_up_page() would have PageWaiters cleared?  I think not,
      because each waiter does hold a reference on the page.  This bug comes
      when the old use of the page, the one we do TestClearPageWriteback on,
      had *no* waiters, so no additional page reference beyond the page cache
      (and whoever racily freed it).  The reuse of the page has a waiter
      holding a reference, and its own PageWriteback set; but the belated
      wake_up_page() has woken the reuse to hit that BUG_ON(PageWriteback).
      
      Reported-by: default avatar <syzbot+3622cea378100f45d59f@syzkaller.appspotmail.com>
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Fixes: 2a9127fc
      
       ("mm: rewrite wait_on_page_bit_common() logic")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org # v5.8+
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      073861ed
    • Linus Torvalds's avatar
      Merge tag 's390-5.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 80145ac2
      Linus Torvalds authored
      Pull s390 fix from Heiko Carstens:
       "Disable interrupts when restoring fpu and vector registers, otherwise
        KVM guests might see corrupted register contents"
      
      * tag 's390-5.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390: fix fpu restore in entry.S
      80145ac2
    • Linus Torvalds's avatar
      Merge tag 'arc-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · b1489422
      Linus Torvalds authored
      Pull ARC fixes from Vineet Gupta:
       "A couple more stack unwinder related fixes:
      
         - More stack unwinding updates
      
         - Misc minor fixes"
      
      * tag 'arc-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
        ARC: stack unwinding: reorganize how initial register state setup
        ARC: stack unwinding: don't assume non-current task is sleeping
        ARC: mm: fix spelling mistakes
        ARC: bitops: Remove unecessary operation and value
      b1489422
  6. Nov 24, 2020
  7. Nov 23, 2020
    • Rafael J. Wysocki's avatar
      Merge branch 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm · 05b8955f
      Rafael J. Wysocki authored
      Pull SCMI cpufreq driver fix for 5.10-rc6 from Viresh Kumar:
      
      "This fixes a build issues with SCMI cpufreq driver in the
       !CONFIG_COMMON_CLK case."
      
      * 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
        cpufreq: scmi: Fix build for !CONFIG_COMMON_CLK
      05b8955f
    • Sven Schnelle's avatar
      s390: fix fpu restore in entry.S · 1179f170
      Sven Schnelle authored
      We need to disable interrupts in load_fpu_regs(). Otherwise an
      interrupt might come in after the registers are loaded, but before
      CIF_FPU is cleared in load_fpu_regs(). When the interrupt returns,
      CIF_FPU will be cleared and the registers will never be restored.
      
      The entry.S code usually saves the interrupt state in __SF_EMPTY on the
      stack when disabling/restoring interrupts. sie64a however saves the pointer
      to the sie control block in __SF_SIE_CONTROL, which references the same
      location.  This is non-obvious to the reader. To avoid thrashing the sie
      control block pointer in load_fpu_regs(), move the __SIE_* offsets eight
      bytes after __SF_EMPTY on the stack.
      
      Cc: <stable@vger.kernel.org> # 5.8
      Fixes: 0b0ed657
      
       ("s390: remove critical section cleanup from entry.S")
      Reported-by: default avatarPierre Morel <pmorel@linux.ibm.com>
      Signed-off-by: default avatarSven Schnelle <svens@linux.ibm.com>
      Acked-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      1179f170
    • Sudeep Holla's avatar
      cpufreq: scmi: Fix build for !CONFIG_COMMON_CLK · f943849f
      Sudeep Holla authored
      Commit 8410e7f3 ("cpufreq: scmi: Fix OPP addition failure with a
      dummy clock provider") registers a dummy clock provider using
      devm_of_clk_add_hw_provider. These *_hw_provider functions are defined
      only when CONFIG_COMMON_CLK=y. One possible fix is to add the Kconfig
      dependency, but since we plan to move away from the clock dependency
      for scmi cpufreq, it is preferrable to avoid that.
      
      Let us just conditionally compile out the offending call to
      devm_of_clk_add_hw_provider. It also uses the variable 'dev' outside
      of the #ifdef block to avoid build warning.
      
      Fixes: 8410e7f3
      
       ("cpufreq: scmi: Fix OPP addition failure with a dummy clock provider")
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      f943849f
    • Linus Torvalds's avatar
      Linux 5.10-rc5 · 418baf2c
      Linus Torvalds authored
      418baf2c
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · d5530d82
      Linus Torvalds authored
      Pull HID fixes from Jiri Kosina:
      
       - Various functionality / regression fixes for Logitech devices from
         Hans de Goede
      
       - Fix for (recently added) GPIO support in mcp2221 driver from Lars
         Povlsen
      
       - Power management handling fix/quirk in i2c-hid driver for certain
         BIOSes that have strange aproach to power-cycle from Hans de Goede
      
       - a few device ID additions and device-specific quirks
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
        HID: logitech-dj: Fix Dinovo Mini when paired with a MX5x00 receiver
        HID: logitech-dj: Fix an error in mse_bluetooth_descriptor
        HID: Add Logitech Dinovo Edge battery quirk
        HID: logitech-hidpp: Add HIDPP_CONSUMER_VENDOR_KEYS quirk for the Dinovo Edge
        HID: logitech-dj: Handle quad/bluetooth keyboards with a builtin trackpad
        HID: add HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE for Gamevice devices
        HID: mcp2221: Fix GPIO output handling
        HID: hid-sensor-hub: Fix issue with devices with no report ID
        HID: i2c-hid: Put ACPI enumerated devices in D3 on shutdown
        HID: add support for Sega Saturn
        HID: cypress: Support Varmilo Keyboards' media hotkeys
        HID: ite: Replace ABS_MISC 120/121 events with touchpad on/off keypresses
        HID: logitech-hidpp: Add PID for MX Anywhere 2
        HID: uclogic: Add ID for Trust Flex Design Tablet
      d5530d82
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f4b936f5
      Linus Torvalds authored
      Pull scheduler fixes from Thomas Gleixner:
       "A couple of scheduler fixes:
      
         - Make the conditional update of the overutilized state work
           correctly by caching the relevant flags state before overwriting
           them and checking them afterwards.
      
         - Fix a data race in the wakeup path which caused loadavg on ARM64
           platforms to become a random number generator.
      
         - Fix the ordering of the iowaiter accounting operations so it can't
           be decremented before it is incremented.
      
         - Fix a bug in the deadline scheduler vs. priority inheritance when a
           non-deadline task A has inherited the parameters of a deadline task
           B and then blocks on a non-deadline task C.
      
           The second inheritance step used the static deadline parameters of
           task A, which are usually 0, instead of further propagating task
           B's parameters. The zero initialized parameters trigger a bug in
           the deadline scheduler"
      
      * tag 'sched-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/deadline: Fix priority inheritance with multiple scheduling classes
        sched: Fix rq->nr_iowait ordering
        sched: Fix data-race in wakeup
        sched/fair: Fix overutilized update in enqueue_task_fair()
      f4b936f5
    • Linus Torvalds's avatar
      Merge tag 'perf-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 48da3305
      Linus Torvalds authored
      Pull perf fix from Thomas Gleixner:
       "A single fix for the x86 perf sysfs interfaces which used kobject
        attributes instead of device attributes and therefore making clang's
        control flow integrity checker upset"
      
      * tag 'perf-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86: fix sysfs type mismatches
      48da3305
    • Linus Torvalds's avatar
      Merge tag 'locking-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 855cf1ee
      Linus Torvalds authored
      Pull locking fix from Thomas Gleixner:
       "A single fix for lockdep which makes the recursion protection cover
        graph lock/unlock"
      
      * tag 'locking-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        lockdep: Put graph lock/unlock under lock_recursion protection
      855cf1ee
    • Linus Torvalds's avatar
      Merge tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 68d3fa23
      Linus Torvalds authored
      Pull EFI fixes from Borislav Petkov:
       "Forwarded EFI fixes from Ard Biesheuvel:
      
         - fix memory leak in efivarfs driver
      
         - fix HYP mode issue in 32-bit ARM version of the EFI stub when built
           in Thumb2 mode
      
         - avoid leaking EFI pgd pages on allocation failure"
      
      * tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi/x86: Free efi_pgd with free_pages()
        efivarfs: fix memory leak in efivarfs_create()
        efi/arm: set HSCTLR Thumb2 bit correctly for HVC calls from HYP
      68d3fa23
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7d53be55
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
      
       - An IOMMU VT-d build fix when CONFIG_PCI_ATS=n along with a revert of
         same because the proper one is going through the IOMMU tree (Thomas
         Gleixner)
      
       - An Intel microcode loader fix to save the correct microcode patch to
         apply during resume (Chen Yu)
      
       - A fix to not access user memory of other processes when dumping
         opcode bytes (Thomas Gleixner)
      
      * tag 'x86_urgent_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        Revert "iommu/vt-d: Take CONFIG_PCI_ATS into account"
        x86/dumpstack: Do not try to access user space code of other tasks
        x86/microcode/intel: Check patch signature before saving microcode for early loading
        iommu/vt-d: Take CONFIG_PCI_ATS into account
      7d53be55
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 4a51c60a
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "8 patches.
      
        Subsystems affected by this patch series: mm (madvise, pagemap,
        readahead, memcg, userfaultfd), kbuild, and vfs"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm: fix madvise WILLNEED performance problem
        libfs: fix error cast of negative value in simple_attr_write()
        mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
        mm: memcg/slab: fix root memcg vmstats
        mm: fix readahead_page_batch for retry entries
        mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports
        compiler-clang: remove version check for BPF Tracing
        mm/madvise: fix memory leak from process_madvise
      4a51c60a
    • Linus Torvalds's avatar
      Merge tag 'staging-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · d27637ec
      Linus Torvalds authored
      Pull staging and IIO fixes from Greg KH:
       "Here are some small Staging and IIO driver fixes for 5.10-rc5. They
        include:
      
         - IIO fixes for reported regressions and problems
      
         - new device ids for IIO drivers
      
         - new device id for rtl8723bs driver
      
         - staging ralink driver Kconfig dependency fix
      
         - staging mt7621-pci bus resource fix
      
        All of these have been in linux-next all week with no reported issues"
      
      * tag 'staging-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        iio: accel: kxcjk1013: Add support for KIOX010A ACPI DSM for setting tablet-mode
        iio: accel: kxcjk1013: Replace is_smo8500_device with an acpi_type enum
        docs: ABI: testing: iio: stm32: remove re-introduced unsupported ABI
        iio: light: fix kconfig dependency bug for VCNL4035
        iio/adc: ingenic: Fix AUX/VBAT readings when touchscreen is used
        iio/adc: ingenic: Fix battery VREF for JZ4770 SoC
        staging: rtl8723bs: Add 024c:0627 to the list of SDIO device-ids
        staging: ralink-gdma: fix kconfig dependency bug for DMA_RALINK
        staging: mt7621-pci: avoid to request pci bus resources
        iio: imu: st_lsm6dsx: set 10ms as min shub slave timeout
        counter/ti-eqep: Fix regmap max_register
        iio: adc: stm32-adc: fix a regression when using dma and irq
        iio: adc: mediatek: fix unset field
        iio: cros_ec: Use default frequencies when EC returns invalid information
      d27637ec
    • Linus Torvalds's avatar
      Merge tag 'tty-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · de758035
      Linus Torvalds authored
      Pull tty fixes from Greg KH:
       "Here are some small tty/serial fixes for 5.10-rc5 that resolve some
        reported issues:
      
         - speakup crash when telling the kernel to use a device that isn't
           really there
      
         - imx serial driver fixes for reported problems
      
         - ar933x_uart driver fix for probe error handling path
      
        All have been in linux-next for a while with no reported issues"
      
      * tag 'tty-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        serial: ar933x_uart: disable clk on error handling path in probe
        tty: serial: imx: keep console clocks always on
        speakup: Do not let the line discipline be used several times
        tty: serial: imx: fix potential deadlock
      de758035
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · a7f07fc1
      Linus Torvalds authored
      Pull ext4 fixes from Ted Ts'o:
       "A final set of miscellaneous bug fixes for ext4"
      
      * tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: fix bogus warning in ext4_update_dx_flag()
        jbd2: fix kernel-doc markups
        ext4: drop fast_commit from /proc/mounts
      a7f07fc1
    • David Howells's avatar
      afs: Fix speculative status fetch going out of order wrt to modifications · a9e5c87c
      David Howells authored
      When doing a lookup in a directory, the afs filesystem uses a bulk
      status fetch to speculatively retrieve the statuses of up to 48 other
      vnodes found in the same directory and it will then either update extant
      inodes or create new ones - effectively doing 'lookup ahead'.
      
      To avoid the possibility of deadlocking itself, however, the filesystem
      doesn't lock all of those inodes; rather just the directory inode is
      locked (by the VFS).
      
      When the operation completes, afs_inode_init_from_status() or
      afs_apply_status() is called, depending on whether the inode already
      exists, to commit the new status.
      
      A case exists, however, where the speculative status fetch operation may
      straddle a modification operation on one of those vnodes.  What can then
      happen is that the speculative bulk status RPC retrieves the old status,
      and whilst that is happening, the modification happens - which returns
      an updated status, then the modification status is committed, then we
      attempt to commit the speculative status.
      
      This results in something like the following being seen in dmesg:
      
      	kAFS: vnode modified {100058:861} 8->9 YFS.InlineBulkStatus
      
      showing that for vnode 861 on volume 100058, we saw YFS.InlineBulkStatus
      say that the vnode had data version 8 when we'd already recorded version
      9 due to a local modification.  This was causing the cache to be
      invalidated for that vnode when it shouldn't have been.  If it happens
      on a data file, this might lead to local changes being lost.
      
      Fix this by ignoring speculative status updates if the data version
      doesn't match the expected value.
      
      Note that it is possible to get a DV regression if a volume gets
      restored from a backup - but we should get a callback break in such a
      case that should trigger a recheck anyway.  It might be worth checking
      the volume creation time in the volsync info and, if a change is
      observed in that (as would happen on a restore), invalidate all caches
      associated with the volume.
      
      Fixes: 5cf9dd55
      
       ("afs: Prospectively look up extra files when doing a single lookup")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9e5c87c
    • Matthew Wilcox (Oracle)'s avatar
      mm: fix madvise WILLNEED performance problem · 66383800
      Matthew Wilcox (Oracle) authored
      The calculation of the end page index was incorrect, leading to a
      regression of 70% when running stress-ng.
      
      With this fix, we instead see a performance improvement of 3%.
      
      Fixes: e6e88712
      
       ("mm: optimise madvise WILLNEED")
      Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarXing Zhengjun <zhengjun.xing@linux.intel.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: "Chen, Rong A" <rong.a.chen@intel.com>
      Link: https://lkml.kernel.org/r/20201109134851.29692-1-willy@infradead.org
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      66383800