Skip to content
  1. Jul 09, 2016
    • Zhao Lei's avatar
      sched/cpuacct: Introduce cpuacct.usage_all to show all CPU stats together · 277a13e4
      Zhao Lei authored
      
      
      In current code, we can get cpuacct data from several files,
      but each file has various limitations.
      
      For example:
      
       - We can get CPU usage in user and kernel mode via cpuacct.stat,
         but we can't get detailed data about each CPU.
      
       - We can get each CPU's kernel mode usage in cpuacct.usage_percpu_sys,
         but we can't get user mode usage data at the same time.
      
      This patch introduces cpuacct.usage_all, to show all detailed CPU
      accounting data together:
      
       # cat cpuacct.usage_all
       cpu user system
       0 3809760299 5807968992
       1 3250329855 454612211
       ..
      
      Signed-off-by: default avatarZhao Lei <zhaolei@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/7744460969edd7caaf0e903592ee52353ed9bdd6.1466415271.git.zhaolei@cn.fujitsu.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      277a13e4
    • Zhao Lei's avatar
      sched/cpuacct: Use loop to consolidate code in cpuacct_stats_show() · 8e546bfa
      Zhao Lei authored
      
      
      In cpuacct_stats_show() we currently we have copies of similar code,
      for each cpustat(system/user) variant.
      
      Use a loop instead to consolidate the code. This will also work better
      if we extend the CPUACCT_STAT_NSTATS type.
      
      Signed-off-by: default avatarZhao Lei <zhaolei@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/b0597d4224655e9f333f1a6224ed9654c7d7d36a.1466415271.git.zhaolei@cn.fujitsu.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8e546bfa
    • Zhao Lei's avatar
      sched/cpuacct: Merge cpuacct_usage_index and cpuacct_stat_index enums · 9acacc2a
      Zhao Lei authored
      
      
      These two types have similar function, no need to separate them.
      
      Signed-off-by: default avatarZhao Lei <zhaolei@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/436748885270d64363c7dc67167507d486c2057a.1466415271.git.zhaolei@cn.fujitsu.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9acacc2a
  2. Jun 27, 2016
    • Peter Zijlstra's avatar
      sched/fair: Rework throttle_count sync · 55e16d30
      Peter Zijlstra authored
      
      
      Since we already take rq->lock when creating a cgroup, use it to also
      sync the throttle_count and avoid the extra state and enqueue path
      branch.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bsegall@google.com
      Cc: linux-kernel@vger.kernel.org
      [ Fixed build warning. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      55e16d30
    • Zev Weiss's avatar
      sched/core: Fix sched_getaffinity() return value kerneldoc comment · 599b4840
      Zev Weiss authored
      
      
      Previous version was probably written referencing the man page for
      glibc's wrapper, but the wrapper's behavior differs from that of the
      syscall itself in this case.
      
      Signed-off-by: default avatarZev Weiss <zev@bewilderbeest.net>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1466975603-25408-1-git-send-email-zev@bewilderbeest.net
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      599b4840
    • Peter Zijlstra's avatar
      sched/fair: Reorder cgroup creation code · 8663e24d
      Peter Zijlstra authored
      
      
      A future patch needs rq->lock held _after_ we link the task_group into
      the hierarchy. In order to avoid taking every rq->lock twice, reorder
      things a little and create online_fair_sched_group() to be called
      after we link the task_group.
      
      All this code is still ran from css_alloc() so css_online() isn't in
      fact used for this.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bsegall@google.com
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8663e24d
    • Peter Zijlstra's avatar
      sched/fair: Apply more PELT fixes · 3d30544f
      Peter Zijlstra authored
      
      
      One additional 'rule' for using update_cfs_rq_load_avg() is that one
      should call update_tg_load_avg() if it returns true.
      
      Add a bunch of comments to hopefully clarify some of the rules:
      
       o  You need to update cfs_rq _before_ any entity attach/detach,
          this is important, because while for mathmatical consisency this
          isn't strictly needed, it is required for the physical
          interpretation of the model, you attach/detach _now_.
      
       o  When you modify the cfs_rq avg, you have to then call
          update_tg_load_avg() in order to propagate changes upwards.
      
       o  (Fair) entities are always attached, switched_{to,from}_fair()
          deal with !fair. This directly follows from the definition of the
          cfs_rq averages, namely that they are a direct sum of all
          (runnable or blocked) entities on that rq.
      
      It is the second rule that this patch enforces, but it adds comments
      pertaining to all of them.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3d30544f
    • Peter Zijlstra's avatar
      sched/fair: Fix PELT integrity for new tasks · 7dc603c9
      Peter Zijlstra authored
      
      
      Vincent and Yuyang found another few scenarios in which entity
      tracking goes wobbly.
      
      The scenarios are basically due to the fact that new tasks are not
      immediately attached and thereby differ from the normal situation -- a
      task is always attached to a cfs_rq load average (such that it
      includes its blocked contribution) and are explicitly
      detached/attached on migration to another cfs_rq.
      
      Scenario 1: switch to fair class
      
        p->sched_class = fair_class;
        if (queued)
          enqueue_task(p);
            ...
              enqueue_entity()
      	  enqueue_entity_load_avg()
      	    migrated = !sa->last_update_time (true)
      	    if (migrated)
      	      attach_entity_load_avg()
        check_class_changed()
          switched_from() (!fair)
          switched_to()   (fair)
            switched_to_fair()
              attach_entity_load_avg()
      
      If @p is a new task that hasn't been fair before, it will have
      !last_update_time and, per the above, end up in
      attach_entity_load_avg() _twice_.
      
      Scenario 2: change between cgroups
      
        sched_move_group(p)
          if (queued)
            dequeue_task()
          task_move_group_fair()
            detach_task_cfs_rq()
              detach_entity_load_avg()
            set_task_rq()
            attach_task_cfs_rq()
              attach_entity_load_avg()
          if (queued)
            enqueue_task();
              ...
                enqueue_entity()
      	    enqueue_entity_load_avg()
      	      migrated = !sa->last_update_time (true)
      	      if (migrated)
      	        attach_entity_load_avg()
      
      Similar as with scenario 1, if @p is a new task, it will have
      !load_update_time and we'll end up in attach_entity_load_avg()
      _twice_.
      
      Furthermore, notice how we do a detach_entity_load_avg() on something
      that wasn't attached to begin with.
      
      As stated above; the problem is that the new task isn't yet attached
      to the load tracking and thereby violates the invariant assumption.
      
      This patch remedies this by ensuring a new task is indeed properly
      attached to the load tracking on creation, through
      post_init_entity_util_avg().
      
      Of course, this isn't entirely as straightforward as one might think,
      since the task is hashed before we call wake_up_new_task() and thus
      can be poked at. We avoid this by adding TASK_NEW and teaching
      cpu_cgroup_can_attach() to refuse such tasks.
      
      Reported-by: default avatarYuyang Du <yuyang.du@intel.com>
      Reported-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7dc603c9
    • Vincent Guittot's avatar
      sched/cgroup: Fix cpu_cgroup_fork() handling · ea86cb4b
      Vincent Guittot authored
      
      
      A new fair task is detached and attached from/to task_group with:
      
        cgroup_post_fork()
          ss->fork(child) := cpu_cgroup_fork()
            sched_move_task()
              task_move_group_fair()
      
      Which is wrong, because at this point in fork() the task isn't fully
      initialized and it cannot 'move' to another group, because its not
      attached to any group as yet.
      
      In fact, cpu_cgroup_fork() needs a small part of sched_move_task() so we
      can just call this small part directly instead sched_move_task(). And
      the task doesn't really migrate because it is not yet attached so we
      need the following sequence:
      
        do_fork()
          sched_fork()
            __set_task_cpu()
      
          cgroup_post_fork()
            set_task_rq() # set task group and runqueue
      
          wake_up_new_task()
            select_task_rq() can select a new cpu
            __set_task_cpu
            post_init_entity_util_avg
              attach_task_cfs_rq()
            activate_task
              enqueue_task
      
      This patch makes that happen.
      
      Signed-off-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      [ Added TASK_SET_GROUP to set depth properly. ]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ea86cb4b
    • Peter Zijlstra's avatar
      sched/fair: Fix PELT integrity for new groups · 01011473
      Peter Zijlstra authored
      
      
      Vincent reported that when a new task is moved into a new cgroup it
      gets attached twice to the load tracking:
      
        sched_move_task()
          task_move_group_fair()
            detach_task_cfs_rq()
            set_task_rq()
            attach_task_cfs_rq()
              attach_entity_load_avg()
                se->avg.last_load_update = cfs_rq->avg.last_load_update // == 0
      
        enqueue_entity()
          enqueue_entity_load_avg()
            update_cfs_rq_load_avg()
              now = clock()
              __update_load_avg(&cfs_rq->avg)
                cfs_rq->avg.last_load_update = now
                // ages load/util for: now - 0, load/util -> 0
            if (migrated)
              attach_entity_load_avg()
                se->avg.last_load_update = cfs_rq->avg.last_load_update; // now != 0
      
      The problem is that we don't update cfs_rq load_avg before all
      entity attach/detach operations. Only enqueue_task() and migrate_task()
      do this.
      
      By fixing this, the above will not happen, because the
      sched_move_task() attach will have updated cfs_rq's last_load_update
      time before attach, and in turn the attach will have set the entity's
      last_load_update stamp.
      
      Note that there is a further problem with sched_move_task() calling
      detach on a task that hasn't yet been attached; this will be taken
      care of in a subsequent patch.
      
      Reported-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Tested-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      01011473
    • Peter Zijlstra's avatar
      sched/fair: Fix and optimize the fork() path · e210bffd
      Peter Zijlstra authored
      
      
      The task_fork_fair() callback already calls __set_task_cpu() and takes
      rq->lock.
      
      If we move the sched_class::task_fork callback in sched_fork() under
      the existing p->pi_lock, right after its set_task_cpu() call, we can
      avoid doing two such calls and omit the IRQ disabling on the rq->lock.
      
      Change to __set_task_cpu() to skip the migration bits, this is a new
      task, not a migration. Similarly, make wake_up_new_task() use
      __set_task_cpu() for the same reason, the task hasn't actually
      migrated as it hasn't ever ran.
      
      This cures the problem of calling migrate_task_rq_fair(), which does
      remove_entity_from_load_avg() on tasks that have never been added to
      the load avg to begin with.
      
      This bug would result in transiently messed up load_avg values, averaged
      out after a few dozen milliseconds. This is probably the reason why
      this bug was not found for such a long time.
      
      Reported-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e210bffd
    • Ingo Molnar's avatar
    • Peter Zijlstra's avatar
      sched/fair: Fix calc_cfs_shares() fixed point arithmetics width confusion · ea1dc6fc
      Peter Zijlstra authored
      Commit:
      
        fde7d22e ("sched/fair: Fix overly small weight for interactive group entities")
      
      did something non-obvious but also did it buggy yet latent.
      
      The problem was exposed for real by a later commit in the v4.7 merge window:
      
        2159197d
      
       ("sched/core: Enable increased load resolution on 64-bit kernels")
      
      ... after which tg->load_avg and cfs_rq->load.weight had different
      units (10 bit fixed point and 20 bit fixed point resp.).
      
      Add a comment to explain the use of cfs_rq->load.weight over the
      'natural' cfs_rq->avg.load_avg and add scale_load_down() to correct
      for the difference in unit.
      
      Since this is (now, as per a previous commit) the only user of
      calc_tg_weight(), collapse it.
      
      The effects of this bug should be randomly inconsistent SMP-balancing
      of cgroups workloads.
      
      Reported-by: default avatarJirka Hladky <jhladky@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 2159197d ("sched/core: Enable increased load resolution on 64-bit kernels")
      Fixes: fde7d22e
      
       ("sched/fair: Fix overly small weight for interactive group entities")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ea1dc6fc
    • Peter Zijlstra's avatar
      sched/fair: Fix effective_load() to consistently use smoothed load · 7dd49125
      Peter Zijlstra authored
      Starting with the following commit:
      
        fde7d22e
      
       ("sched/fair: Fix overly small weight for interactive group entities")
      
      calc_tg_weight() doesn't compute the right value as expected by effective_load().
      
      The difference is in the 'correction' term. In order to ensure \Sum
      rw_j >= rw_i we cannot use tg->load_avg directly, since that might be
      lagging a correction on the current cfs_rq->avg.load_avg value.
      Therefore we use tg->load_avg - cfs_rq->tg_load_avg_contrib +
      cfs_rq->avg.load_avg.
      
      Now, per the referenced commit, calc_tg_weight() doesn't use
      cfs_rq->avg.load_avg, as is later used in @w, but uses
      cfs_rq->load.weight instead.
      
      So stop using calc_tg_weight() and do it explicitly.
      
      The effects of this bug are wake_affine() making randomly
      poor choices in cgroup-intense workloads.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> # v4.3+
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: fde7d22e
      
       ("sched/fair: Fix overly small weight for interactive group entities")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7dd49125
    • Linus Torvalds's avatar
      Linux 4.7-rc5 · 4c2e07c6
      Linus Torvalds authored
      4c2e07c6
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 2ac9b973
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Two straightforward fixes.
      
        One is a concurrency issue only affecting SAS connected SATA drives,
        but which could hang the storage subsystem if it triggers (because the
        outstanding command count on error never goes back to zero) and the
        other is a NO_TAG fallout from the switch to hostwide tags which
        causes the system to crash on module insertion (we've checked
        carefully and only the 53c700 family of drivers is vulnerable to this
        issue)"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        53c700: fix BUG on untagged commands
        scsi: fix race between simultaneous decrements of ->host_failed
      2ac9b973
  3. Jun 25, 2016
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.7-part2' of... · da2f6aba
      Linus Torvalds authored
      Merge branch 'for-linus-4.7-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
      
      Pull btrfs fixes part 2 from Chris Mason:
       "This has one patch from Omar to bring iterate_shared back to btrfs.
      
        We have a tree of work we queue up for directory items and it doesn't
        lend itself well to shared access.  While we're cleaning it up, Omar
        has changed things to use an exclusive lock when there are delayed
        items"
      
      * 'for-linus-4.7-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes
      da2f6aba
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · b971712a
      Linus Torvalds authored
      Pull btrfs fixes from Chris Mason:
       "I have a two part pull this time because one of the patches Dave
        Sterba collected needed to be against v4.7-rc2 or higher (we used
        rc4).  I try to make my for-linus-xx branch testable on top of the
        last major so we can hand fixes to people on the list more easily, so
        I've split this pull in two.
      
        This first part has some fixes and two performance improvements that
        we've been testing for some time.
      
        Josef's two performance fixes are most notable.  The transid tracking
        patch makes a big improvement on pretty much every workload"
      
      * 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        Btrfs: Force stripesize to the value of sectorsize
        btrfs: fix disk_i_size update bug when fallocate() fails
        Btrfs: fix error handling in map_private_extent_buffer
        Btrfs: fix error return code in btrfs_init_test_fs()
        Btrfs: don't do nocow check unless we have to
        btrfs: fix deadlock in delayed_ref_async_start
        Btrfs: track transid for delayed ref flushing
      b971712a
    • Linus Torvalds's avatar
      Merge tag 'sound-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · ca83a55c
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "Again pretty calm weeks: we've had only a few trivial / stable
        HD-audio fixes in addition to a possible race fix for snd-dummy driver
        spotted by syzkaller"
      
      * tag 'sound-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: dummy: Fix a use-after-free at closing
        ALSA: hda / realtek - add two more Thinkpad IDs (5050,5053) for tpt460 fixup
        ALSA: hda - Fix the headset mic jack detection on Dell machine
        ALSA: hda/tegra: iomem fixups for sparse warnings
        ALSA: hdac_regmap - fix the register access for runtime PM
      ca83a55c
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9a949a98
      Linus Torvalds authored
      Pull x86 kprobe fix from Thomas Gleixner:
       "A single fix clearing the TF bit when a fault is single stepped"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        kprobes/x86: Clear TF bit in fault on single-stepping
      9a949a98
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 57801c1b
      Linus Torvalds authored
      Pull scheduler fixes from Thomas Gleixner:
       "A couple of scheduler fixes:
      
         - force watchdog reset while processing sysrq-w
      
         - fix a deadlock when enabling trace events in the scheduler
      
         - fixes to the throttled next buddy logic
      
         - fixes for the average accounting (missing serialization and
           underflow handling)
      
         - allow kernel threads for fallback to online but not active cpus"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/core: Allow kthreads to fall back to online && !active cpus
        sched/fair: Do not announce throttled next buddy in dequeue_task_fair()
        sched/fair: Initialize throttle_count for new task-groups lazily
        sched/fair: Fix cfs_rq avg tracking underflow
        kernel/sysrq, watchdog, sched/core: Reset watchdog on all CPUs while processing sysrq-w
        sched/debug: Fix deadlock when enabling sched events
        sched/fair: Fix post_init_entity_util_avg() serialization
      57801c1b
    • Omar Sandoval's avatar
      Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes · 02dbfc99
      Omar Sandoval authored
      Commit fe742fd4
      
       ("Revert "btrfs: switch to ->iterate_shared()"")
      backed out the conversion to ->iterate_shared() for Btrfs because the
      delayed inode handling in btrfs_real_readdir() is racy. However, we can
      still do readdir in parallel if there are no delayed nodes.
      
      This is a temporary fix which upgrades the shared inode lock to an
      exclusive lock only when we have delayed items until we come up with a
      more complete solution. While we're here, rename the
      btrfs_{get,put}_delayed_items functions to make it very clear that
      they're just for readdir.
      
      Tested with xfstests and by doing a parallel kernel build:
      
      	while make tinyconfig && make -j4 && git clean dqfx; do
      		:
      	done
      
      along with a bunch of parallel finds in another shell:
      
      	while true; do
      		for ((i=0; i<4; i++)); do
      			find . >/dev/null &
      		done
      		wait
      	done
      
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      02dbfc99
    • Linus Torvalds's avatar
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e3b22bc3
      Linus Torvalds authored
      Pull locking fix from Thomas Gleixner:
       "A single fix to address a race in the static key logic"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/static_key: Fix concurrent static_key_slow_inc()
      e3b22bc3
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2de23071
      Linus Torvalds authored
      Pull irq fix from Thomas Gleixner:
       "A single fix for the fallout from the conversion of MIPS GIC to irq
        domains"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/mips-gic: Fix IRQs in gic_dev_domain
      2de23071
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 2f6e9747
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "mm/radix (Aneesh Kumar K.V):
         - Update to tlb functions ric argument
         - Flush page walk cache when freeing page table
         - Update Radix tree size as per ISA 3.0
      
        mm/hash (Aneesh Kumar K.V):
         - Use the correct PPP mask when updating HPTE
         - Don't add memory coherence if cache inhibited is set
      
        eeh (Gavin Shan):
         - Fix invalid cached PE primary bus
      
        bpf/jit (Naveen N. Rao):
         - Disable classic BPF JIT on ppc64le
      
        .. and fix faults caused by radix patching of SLB miss handler
        (Michael Ellerman)"
      
      * tag 'powerpc-4.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/bpf/jit: Disable classic BPF JIT on ppc64le
        powerpc: Fix faults caused by radix patching of SLB miss handler
        powerpc/eeh: Fix invalid cached PE primary bus
        powerpc/mm/radix: Update Radix tree size as per ISA 3.0
        powerpc/mm/hash: Don't add memory coherence if cache inhibited is set
        powerpc/mm/hash: Use the correct PPP mask when updating HPTE
        powerpc/mm/radix: Flush page walk cache when freeing page table
        powerpc/mm/radix: Update to tlb functions ric argument
      2f6e9747
    • Michael Ellerman's avatar
      Fix build break in fork.c when THREAD_SIZE < PAGE_SIZE · 9521d399
      Michael Ellerman authored
      Commit b235beea ("Clarify naming of thread info/stack allocators")
      breaks the build on some powerpc configs, where THREAD_SIZE < PAGE_SIZE:
      
        kernel/fork.c:235:2: error: implicit declaration of function 'free_thread_stack'
        kernel/fork.c:355:8: error: assignment from incompatible pointer type
          stack = alloc_thread_stack_node(tsk, node);
          ^
      
      Fix it by renaming free_stack() to free_thread_stack(), and updating the
      return type of alloc_thread_stack_node().
      
      Fixes: b235beea
      
       ("Clarify naming of thread info/stack allocators")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9521d399
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 086e3eb6
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "Two weeks worth of fixes here"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits)
        init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64
        autofs: don't get stuck in a loop if vfs_write() returns an error
        mm/page_owner: avoid null pointer dereference
        tools/vm/slabinfo: fix spelling mistake: "Ocurrences" -> "Occurrences"
        fs/nilfs2: fix potential underflow in call to crc32_le
        oom, suspend: fix oom_reaper vs. oom_killer_disable race
        ocfs2: disable BUG assertions in reading blocks
        mm, compaction: abort free scanner if split fails
        mm: prevent KASAN false positives in kmemleak
        mm/hugetlb: clear compound_mapcount when freeing gigantic pages
        mm/swap.c: flush lru pvecs on compound page arrival
        memcg: css_alloc should return an ERR_PTR value on error
        memcg: mem_cgroup_migrate() may be called with irq disabled
        hugetlb: fix nr_pmds accounting with shared page tables
        Revert "mm: disable fault around on emulated access bit architecture"
        Revert "mm: make faultaround produce old ptes"
        mailmap: add Boris Brezillon's email
        mailmap: add Antoine Tenart's email
        mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask
        mm: mempool: kasan: don't poot mempool objects in quarantine
        ...
      086e3eb6
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · aebe9bb8
      Linus Torvalds authored
      Pull rdma fixes from Doug Ledford:
       "This is the second batch of queued up rdma patches for this rc cycle.
      
        There isn't anything really major in here.  It's passed 0day,
        linux-next, and local testing across a wide variety of hardware.
        There are still a few known issues to be tracked down, but this should
        amount to the vast majority of the rdma RC fixes.
      
        Round two of 4.7 rc fixes:
      
         - A couple minor fixes to the rdma core
         - Multiple minor fixes to hfi1
         - Multiple minor fixes to mlx4/mlx4
         - A few minor fixes to i40iw"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (31 commits)
        IB/srpt: Reduce QP buffer size
        i40iw: Enable level-1 PBL for fast memory registration
        i40iw: Return correct max_fast_reg_page_list_len
        i40iw: Correct status check on i40iw_get_pble
        i40iw: Correct CQ arming
        IB/rdmavt: Correct qp_priv_alloc() return value test
        IB/hfi1: Don't zero out qp->s_ack_queue in rvt_reset_qp
        IB/hfi1: Fix deadlock with txreq allocation slow path
        IB/mlx4: Prevent cross page boundary allocation
        IB/mlx4: Fix memory leak if QP creation failed
        IB/mlx4: Verify port number in flow steering create flow
        IB/mlx4: Fix error flow when sending mads under SRIOV
        IB/mlx4: Fix the SQ size of an RC QP
        IB/mlx5: Fix wrong naming of port_rcv_data counter
        IB/mlx5: Fix post send fence logic
        IB/uverbs: Initialize ib_qp_init_attr with zeros
        IB/core: Fix false search of the IB_SA_WELL_KNOWN_GUID
        IB/core: Fix RoCE v1 multicast join logic issue
        IB/core: Fix no default GIDs when netdevice reregisters
        IB/hfi1: Send a pkey change event on driver pkey update
        ...
      aebe9bb8
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 3fb5e59c
      Linus Torvalds authored
      Pull HID fix from Jiri Kosina:
       "hiddev ioctl() validation fix from Scott Bauer"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
        HID: hiddev: validate num_values for HIDIOCGUSAGES, HIDIOCSUSAGES commands
      3fb5e59c
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-linus-v4.7-rc5' of... · 260eaba4
      Linus Torvalds authored
      Merge tag 'hwmon-for-linus-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull hwmon fix from Guenter Roeck:
       "Improve fan type detection for dell-smm to prevent kernel hang"
      
      * tag 'hwmon-for-linus-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (dell-smm) Cache fan_type() calls and change fan detection
      260eaba4
    • Linus Torvalds's avatar
      Merge tag 'acpi-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · ed13fbbf
      Linus Torvalds authored
      Pull ACPI fix from Rafael Wysocki:
       "Stable-candidate fix for a deadlock in ACPICA introduced during the
        4.5 development cycle by a commit attempting to improve the handling
        of AML code that doesn't belong to any namespace objects in a given
        definition block (Lv Zheng)"
      
      * tag 'acpi-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPICA: Namespace: Fix deadlock triggered by MLC support in dynamic table loading
      ed13fbbf
    • Linus Torvalds's avatar
      Merge tag 'pm-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 3522b35c
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "Fix for a latent cpufreq driver bug uncovered by a recent ACPICA
        change and several fixes for the devfreq framework, including one fix
        for an issue introduced recently.
      
        Specifics:
      
         - Fix a latent initialization issue in the pcc-cpufreq driver
           (incorrect initial value of a structure field) that has been
           uncovered by a recent ACPICA commit (Mike Galbraith).
      
         - Add a missing notification in an update_devfreq() error code path
           forgotten by a recent devfreq commit (Chanwoo Choi).
      
         - Fix devfreq device frequency initialization (Lukasz Luba).
      
         - Fix an incorrect IS_ERR() check in the devfreq framework discovered
           by the Smatch checker (Dan Carpenter).
      
         - Drop two excessive put_device() calls from the devfreq framework
           (MyungJoo Ham, Cai Zhiyong).
      
         - Fix a possible memory leak in the devfreq framework and drop an
           unnecessary kfree() invocation from it (MyungJoo Ham)"
      
      * tag 'pm-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM / devfreq: Send the DEVFREQ_POSTCHANGE notification when target() is failed
        cpufreq: pcc-cpufreq: Fix doorbell.access_width
        PM / devfreq: fix initialization of current frequency in last status
        PM / devfreq: exynos-nocp: Remove incorrect IS_ERR() check
        PM / devfreq: remove double put_device
        PM / devfreq: fix double call put_device
        PM / devfreq: fix duplicated kfree on devfreq pointer
        PM / devfreq: devm_kzalloc to have dev pointer more precisely
      3522b35c
    • Linus Torvalds's avatar
      Merge tag 'for-linus-4.7b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 032fd3e5
      Linus Torvalds authored
      Pull xen bug fixes from David Vrabel:
      
       - fix x86 PV dom0 crash during early boot on some hardware
      
       - fix two pciback bugs affects certain devices
      
       - fix potential overflow when clearing page tables in x86 PV
      
      * tag 'for-linus-4.7b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen-pciback: return proper values during BAR sizing
        x86/xen: avoid m2p lookup when setting early page table entries
        xen/pciback: Fix conf_space read/write overlap check.
        x86/xen: fix upper bound of pmd loop in xen_cleanhighmap()
        xen/balloon: Fix declared-but-not-defined warning
      032fd3e5
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · d05be0d7
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "Here are a few more arm64 fixes, but things do finally appear to be
        slowing down.  The main fix is avoiding hibernation in a previously
        unanticipated situation where we have CPUs parked in the kernel, but
        it's all good stuff.
      
         - Fix icache/dcache sync for anonymous pages under migration
         - Correct the ASID limit check
         - Fix parallel builds of Image and Image.gz
         - Refuse to hibernate when we have CPUs that we can't offline"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: hibernate: Don't hibernate on systems with stuck CPUs
        arm64: smp: Add function to determine if cpus are stuck in the kernel
        arm64: mm: remove page_mapping check in __sync_icache_dcache
        arm64: fix boot image dependencies to not generate invalid images
        arm64: update ASID limit
      d05be0d7
    • Rasmus Villemoes's avatar
      init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64 · 0fd5ed8d
      Rasmus Villemoes authored
      When I replaced kasprintf("%pf") with a direct call to
      sprint_symbol_no_offset I must have broken the initcall blacklisting
      feature on the arches where dereference_function_descriptor() is
      non-trivial.
      
      Fixes: c8cdd2be
      
       (init/main.c: simplify initcall_blacklisted())
      Link: http://lkml.kernel.org/r/1466027283-4065-1-git-send-email-linux@rasmusvillemoes.dk
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0fd5ed8d
    • Andrey Vagin's avatar
      autofs: don't get stuck in a loop if vfs_write() returns an error · 5a9294e5
      Andrey Vagin authored
      
      
      __vfs_write() returns a negative value in a error case.
      
      Link: http://lkml.kernel.org/r/20160616083108.6278.65815.stgit@pluto.themaw.net
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Signed-off-by: default avatarIan Kent <raven@themaw.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5a9294e5
    • Sudip Mukherjee's avatar
      mm/page_owner: avoid null pointer dereference · 8285027f
      Sudip Mukherjee authored
      We have dereferenced page_ext before checking it.  Lets check it first
      and then used it.
      
      Fixes: f86e4271
      
       ("mm: check the return value of lookup_page_ext for all call sites")
      Link: http://lkml.kernel.org/r/1465249059-7883-1-git-send-email-sudipm.mukherjee@gmail.com
      Signed-off-by: default avatarSudip Mukherjee <sudip.mukherjee@codethink.co.uk>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8285027f
    • Colin Ian King's avatar
      tools/vm/slabinfo: fix spelling mistake: "Ocurrences" -> "Occurrences" · 7c5b7239
      Colin Ian King authored
      
      
      trivial fix to spelling mistake
      
      Link: http://lkml.kernel.org/r/1466672144-831-1-git-send-email-colin.king@canonical.com
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7c5b7239
    • Torsten Hilbrich's avatar
      fs/nilfs2: fix potential underflow in call to crc32_le · 63d2f95d
      Torsten Hilbrich authored
      
      
      The value `bytes' comes from the filesystem which is about to be
      mounted.  We cannot trust that the value is always in the range we
      expect it to be.
      
      Check its value before using it to calculate the length for the crc32_le
      call.  It value must be larger (or equal) sumoff + 4.
      
      This fixes a kernel bug when accidentially mounting an image file which
      had the nilfs2 magic value 0x3434 at the right offset 0x406 by chance.
      The bytes 0x01 0x00 were stored at 0x408 and were interpreted as a
      s_bytes value of 1.  This caused an underflow when substracting sumoff +
      4 (20) in the call to crc32_le.
      
        BUG: unable to handle kernel paging request at ffff88021e600000
        IP:  crc32_le+0x36/0x100
        ...
        Call Trace:
          nilfs_valid_sb.part.5+0x52/0x60 [nilfs2]
          nilfs_load_super_block+0x142/0x300 [nilfs2]
          init_nilfs+0x60/0x390 [nilfs2]
          nilfs_mount+0x302/0x520 [nilfs2]
          mount_fs+0x38/0x160
          vfs_kern_mount+0x67/0x110
          do_mount+0x269/0xe00
          SyS_mount+0x9f/0x100
          entry_SYSCALL_64_fastpath+0x16/0x71
      
      Link: http://lkml.kernel.org/r/1466778587-5184-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
      Signed-off-by: default avatarTorsten Hilbrich <torsten.hilbrich@secunet.com>
      Tested-by: default avatarTorsten Hilbrich <torsten.hilbrich@secunet.com>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      63d2f95d
    • Michal Hocko's avatar
      oom, suspend: fix oom_reaper vs. oom_killer_disable race · 74070542
      Michal Hocko authored
      Tetsuo has reported the following potential oom_killer_disable vs.
      oom_reaper race:
      
       (1) freeze_processes() starts freezing user space threads.
       (2) Somebody (maybe a kenrel thread) calls out_of_memory().
       (3) The OOM killer calls mark_oom_victim() on a user space thread
           P1 which is already in __refrigerator().
       (4) oom_killer_disable() sets oom_killer_disabled = true.
       (5) P1 leaves __refrigerator() and enters do_exit().
       (6) The OOM reaper calls exit_oom_victim(P1) before P1 can call
           exit_oom_victim(P1).
       (7) oom_killer_disable() returns while P1 not yet finished
       (8) P1 perform IO/interfere with the freezer.
      
      This situation is unfortunate.  We cannot move oom_killer_disable after
      all the freezable kernel threads are frozen because the oom victim might
      depend on some of those kthreads to make a forward progress to exit so
      we could deadlock.  It is also far from trivial to teach the oom_reaper
      to not call exit_oom_victim() because then we would lose a guarantee of
      the OOM killer and oom_killer_disable forward progress because
      exit_mm->mmput might block and never call exit_oom_victim.
      
      It seems the easiest way forward is to workaround this race by calling
      try_to_freeze_tasks again after oom_killer_disable.  This will make sure
      that all the tasks are frozen or it bails out.
      
      Fixes: 449d777d
      
       ("mm, oom_reaper: clear TIF_MEMDIE for all tasks queued for oom_reaper")
      Link: http://lkml.kernel.org/r/1466597634-16199-1-git-send-email-mhocko@kernel.org
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      74070542