Skip to content
  1. Jun 19, 2015
    • Peter Zijlstra's avatar
      sched,lockdep: Employ lock pinning · cbce1a68
      Peter Zijlstra authored
      
      
      Employ the new lockdep lock pinning annotation to ensure no
      'accidental' lock-breaks happen with rq->lock.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: oleg@redhat.com
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124744.003233193@infradead.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      cbce1a68
    • Peter Zijlstra's avatar
      lockdep: Implement lock pinning · a24fc60d
      Peter Zijlstra authored
      
      
      Add a lockdep annotation that WARNs if you 'accidentially' unlock a
      lock.
      
      This is especially helpful for code with callbacks, where the upper
      layer assumes a lock remains taken but a lower layer thinks it maybe
      can drop and reacquire the lock.
      
      By unwittingly breaking up the lock, races can be introduced.
      
      Lock pinning is a lockdep annotation that helps with this, when you
      lockdep_pin_lock() a held lock, any unlock without a
      lockdep_unpin_lock() will produce a WARN. Think of this as a relative
      of lockdep_assert_held(), except you don't only assert its held now,
      but ensure it stays held until you release your assertion.
      
      RFC: a possible alternative API would be something like:
      
        int cookie = lockdep_pin_lock(&foo);
        ...
        lockdep_unpin_lock(&foo, cookie);
      
      Where we pick a random number for the pin_count; this makes it
      impossible to sneak a lock break in without also passing the right
      cookie along.
      
      I've not done this because it ends up generating code for !LOCKDEP,
      esp. if you need to pass the cookie around for some reason.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: oleg@redhat.com
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124743.906731065@infradead.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      a24fc60d
    • Peter Zijlstra's avatar
      lockdep: Simplify lock_release() · e0f56fd7
      Peter Zijlstra authored
      
      
      lock_release() takes this nested argument that's mostly pointless
      these days, remove the implementation but leave the argument a
      rudiment for now.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: oleg@redhat.com
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124743.840411606@infradead.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      e0f56fd7
    • Peter Zijlstra's avatar
      sched: Streamline the task migration locking a little · 5e16bbc2
      Peter Zijlstra authored
      
      
      The whole migrate_task{,s}() locking seems a little shaky, there's a
      lot of dropping an require happening. Pull the locking up into the
      callers as far as possible to streamline the lot.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: oleg@redhat.com
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124743.755256708@infradead.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      5e16bbc2
    • Peter Zijlstra's avatar
      sched: Move code around · 5cc389bc
      Peter Zijlstra authored
      
      
      In preparation to reworking set_cpus_allowed_ptr() move some code
      around. This also removes some superfluous #ifdefs and adds comments
      to some #endifs.
      
         text    data     bss     dec     hex filename
      12211532        1738144 1081344 15031020         e55aec defconfig-build/vmlinux.pre
      12211532        1738144 1081344 15031020         e55aec defconfig-build/vmlinux.post
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: oleg@redhat.com
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124743.662086684@infradead.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      5cc389bc
    • Peter Zijlstra's avatar
      sched,dl: Fix sched class hopping CBS hole · a649f237
      Peter Zijlstra authored
      We still have a few pending issues with the deadline code, one of which
      is that switching between scheduling classes can 'leak' CBS state.
      
      Close the hole by retaining the current CBS state when leaving
      SCHED_DEADLINE and unconditionally programming the deadline timer.
      The timer will then reset the CBS state if the task is still
      !SCHED_DEADLINE by the time it hits.
      
      If the task left SCHED_DEADLINE it will not call task_dead_dl() and
      we'll not cancel the hrtimer, leaving us a pending timer in free
      space. Avoid this by giving the timer a task reference, this avoids
      littering the task exit path for this rather uncommon case.
      
      In order to do this, I had to move dl_task_offline_migration() below
      the replenishment, such that the task_rq()->lock fully covers that.
      While doing this, I noticed that it (was) buggy in assuming a task is
      enqueued and or we need to enqueue the task now. Fixing this means
      select_task_rq_dl() might encounter an offline rq -- look into that.
      
      As a result this kills cancel_dl_timer() which included a rq->lock
      break.
      
      Fixes: 40767b0d
      
       ("sched/deadline: Fix deadline parameter modification handling")
      Cc: Wanpeng Li <wanpeng.li@linux.intel.com>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: oleg@redhat.com
      Cc: wanpeng.li@linux.intel.com
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124743.574192138@infradead.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      a649f237
    • Peter Zijlstra's avatar
      sched, dl: Convert switched_{from, to}_dl() / prio_changed_dl() to balance callbacks · 9916e214
      Peter Zijlstra authored
      
      
      Remove the direct {push,pull} balancing operations from
      switched_{from,to}_dl() / prio_changed_dl() and use the balance
      callback queue.
      
      Again, err on the side of too many reschedules; since too few is a
      hard bug while too many is just annoying.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: oleg@redhat.com
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124742.968262663@infradead.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      9916e214
    • Peter Zijlstra's avatar
      sched,dl: Remove return value from pull_dl_task() · 0ea60c20
      Peter Zijlstra authored
      
      
      In order to be able to use pull_dl_task() from a callback, we need to
      do away with the return value.
      
      Since the return value indicates if we should reschedule, do this
      inside the function. Since not all callers currently do this, this can
      increase the number of reschedules due rt balancing.
      
      Too many reschedules is not a correctness issues, too few are.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: oleg@redhat.com
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124742.859398977@infradead.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      0ea60c20
    • Peter Zijlstra's avatar
      sched, rt: Convert switched_{from, to}_rt() / prio_changed_rt() to balance callbacks · fd7a4bed
      Peter Zijlstra authored
      
      
      Remove the direct {push,pull} balancing operations from
      switched_{from,to}_rt() / prio_changed_rt() and use the balance
      callback queue.
      
      Again, err on the side of too many reschedules; since too few is a
      hard bug while too many is just annoying.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: oleg@redhat.com
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124742.766832367@infradead.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      fd7a4bed
    • Peter Zijlstra's avatar
      sched,rt: Remove return value from pull_rt_task() · 8046d680
      Peter Zijlstra authored
      
      
      In order to be able to use pull_rt_task() from a callback, we need to
      do away with the return value.
      
      Since the return value indicates if we should reschedule, do this
      inside the function. Since not all callers currently do this, this can
      increase the number of reschedules due rt balancing.
      
      Too many reschedules is not a correctness issues, too few are.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: oleg@redhat.com
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124742.679002000@infradead.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      8046d680
    • Peter Zijlstra's avatar
      sched: Allow balance callbacks for check_class_changed() · 4c9a4bc8
      Peter Zijlstra authored
      
      
      In order to remove dropping rq->lock from the
      switched_{to,from}()/prio_changed() sched_class methods, run the
      balance callbacks after it.
      
      We need to remove dropping rq->lock because its buggy,
      suppose using sched_setattr()/sched_setscheduler() to change a running
      task from FIFO to OTHER.
      
      By the time we get to switched_from_rt() the task is already enqueued
      on the cfs runqueues. If switched_from_rt() does pull_rt_task() and
      drops rq->lock, load-balancing can come in and move our task @p to
      another rq.
      
      The subsequent switched_to_fair() still assumes @p is on @rq and bad
      things will happen.
      
      By using balance callbacks we delay the load-balancing operations
      {rt,dl}x{push,pull} until we've done all the important work and the
      task is fully set up.
      
      Furthermore, the balance callbacks do not know about @p, therefore
      they cannot get confused like this.
      
      Reported-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: oleg@redhat.com
      Cc: wanpeng.li@linux.intel.com
      Link: http://lkml.kernel.org/r/20150611124742.615343911@infradead.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      4c9a4bc8
    • Peter Zijlstra's avatar
      sched: Use replace normalize_task() with __sched_setscheduler() · dbc7f069
      Peter Zijlstra authored
      
      
      Reduce duplicate logic; normalize_task() is a simplified version of
      __sched_setscheduler(). Parametrize the difference and collapse.
      
      This reduces the amount of check_class_changed() sites.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: oleg@redhat.com
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124742.532642391@infradead.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      dbc7f069
    • Peter Zijlstra's avatar
      sched: Replace post_schedule with a balance callback list · e3fca9e7
      Peter Zijlstra authored
      
      
      Generalize the post_schedule() stuff into a balance callback list.
      This allows us to more easily use it outside of schedule() and cross
      sched_class.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: oleg@redhat.com
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124742.424032725@infradead.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      e3fca9e7
    • Thomas Gleixner's avatar
      Merge branch 'timers/core' into sched/hrtimers · 624bbdfa
      Thomas Gleixner authored
      Merge sched/core and timers/core so we can apply the sched balancing
      patch queue, which depends on both.
      624bbdfa
    • Peter Zijlstra's avatar
      hrtimer: Allow hrtimer::function() to free the timer · 887d9dc9
      Peter Zijlstra authored
      
      
      Currently an hrtimer callback function cannot free its own timer
      because __run_hrtimer() still needs to clear HRTIMER_STATE_CALLBACK
      after it. Freeing the timer would result in a clear use-after-free.
      
      Solve this by using a scheme similar to regular timers; track the
      current running timer in hrtimer_clock_base::running.
      
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: wanpeng.li@linux.intel.com
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124743.471563047@infradead.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      887d9dc9
    • Peter Zijlstra's avatar
      seqcount: Introduce raw_write_seqcount_barrier() · c4bfa3f5
      Peter Zijlstra authored
      
      
      Introduce raw_write_seqcount_barrier(), a new construct that can be
      used to provide write barrier semantics in seqcount read loops instead
      of the usual consistency guarantee.
      
      raw_write_seqcount_barier() is equivalent to:
      
      	raw_write_seqcount_begin();
      	raw_write_seqcount_end();
      
      But avoids issueing two back-to-back smp_wmb() instructions.
      
      This construct works because the read side will 'stall' when observing
      odd values. This means that -- referring to the example in the comment
      below -- even though there is no (matching) read barrier between the
      loads of X and Y, we cannot observe !x && !y, because:
      
       - if we observe Y == false we must observe the first sequence
         increment, which makes us loop, until
      
       - we observe !(seq & 1) -- the second sequence increment -- at which
         time we must also observe T == true.
      
      Suggested-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: umgwanakikbuti@gmail.com
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: oleg@redhat.com
      Cc: wanpeng.li@linux.intel.com
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20150617122924.GP3644@twins.programming.kicks-ass.net
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      c4bfa3f5
    • Peter Zijlstra's avatar
      seqcount: Rename write_seqcount_barrier() · a7c6f571
      Peter Zijlstra authored
      
      
      I'll shortly be introducing another seqcount primitive that's useful
      to provide ordering semantics and would like to use the
      write_seqcount_barrier() name for that.
      
      Seeing how there's only one user of the current primitive, lets rename
      it to invalidate, as that appears what its doing.
      
      While there, employ lockdep_assert_held() instead of
      assert_spin_locked() to not generate debug code for regular kernels.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: wanpeng.li@linux.intel.com
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124743.279926217@infradead.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      a7c6f571
    • Peter Zijlstra's avatar
      hrtimer: Fix hrtimer_is_queued() hole · 8edfb036
      Peter Zijlstra authored
      
      
      A queued hrtimer that gets restarted (hrtimer_start*() while
      hrtimer_is_queued()) will briefly appear as unqueued/inactive, even
      though the timer has always been active, we just moved it.
      
      Close this hole by preserving timer->state in
      hrtimer_start_range_ns()'s remove_hrtimer() call.
      
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124743.175989138@infradead.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      8edfb036
    • Oleg Nesterov's avatar
      hrtimer: Remove HRTIMER_STATE_MIGRATE · c04dca02
      Oleg Nesterov authored
      
      
      I do not understand HRTIMER_STATE_MIGRATE. Unless I am totally
      confused it looks buggy and simply unneeded.
      
      migrate_hrtimer_list() sets it to keep hrtimer_active() == T, but this
      is not enough: this can fool, say, hrtimer_is_queued() in
      dequeue_signal().
      
      Can't migrate_hrtimer_list() simply use HRTIMER_STATE_ENQUEUED?
      This fixes the race and we can kill STATE_MIGRATE.
      
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: ktkhai@parallels.com
      Cc: rostedt@goodmis.org
      Cc: juri.lelli@gmail.com
      Cc: pang.xunlei@linaro.org
      Cc: wanpeng.li@linux.intel.com
      Cc: umgwanakikbuti@gmail.com
      Link: http://lkml.kernel.org/r/20150611124743.072387650@infradead.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      c04dca02
  2. Jun 18, 2015
    • John Stultz's avatar
      selftest: Timers: Avoid signal deadlock in leap-a-day · 51a16c1e
      John Stultz authored
      In 0c4a5fc9
      
       (Add leap-second timer edge testing to
      leap-a-day.c), we added a timer to the test which checks to make
      sure timers near the leapsecond edge behave correctly.
      
      However, the output generated from the timer uses ctime_r, which
      isn't async-signal safe, and should that signal land while the
      main test is using ctime_r to print its output, its possible for
      the test to deadlock on glibc internal locks.
      
      Thus this patch reworks the output to avoid using ctime_r in
      the signal handler.
      
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Link: http://lkml.kernel.org/r/1434565003-3386-1-git-send-email-john.stultz@linaro.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      51a16c1e
    • John Stultz's avatar
      timekeeping: Copy the shadow-timekeeper over the real timekeeper last · 906c5557
      John Stultz authored
      The fix in d1518326
      
       (time: Move clock_was_set_seq update
      before updating shadow-timekeeper) was unfortunately incomplete.
      
      The main gist of that change was to do the shadow-copy update
      last, so that any state changes were properly duplicated, and
      we wouldn't accidentally have stale data in the shadow.
      
      Unfortunately in the main update_wall_time() logic, we update
      use the shadow-timekeeper to calculate the next update values,
      then while holding the lock, copy the shadow-timekeeper over,
      then call timekeeping_update() to do some additional
      bookkeeping, (skipping the shadow mirror). The bug with this is
      the additional bookkeeping isn't all read-only, and some
      changes timkeeper state. Thus we might then overwrite this state
      change on the next update.
      
      To avoid this problem, do the timekeeping_update() on the
      shadow-timekeeper prior to copying the full state over to
      the real-timekeeper.
      
      This avoids problems with both the clock_was_set_seq and
      next_leap_ktime being overwritten and possibly the
      fast-timekeepers as well.
      
      Many thanks to Prarit for his rigorous testing, which discovered
      this problem, along with Prarit and Daniel's work validating this
      fix.
      
      Reported-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Tested-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Tested-by: default avatarDaniel Bristot de Oliveira <bristot@redhat.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: Ingo Molnar <mingo@kernel.org>
      Link: http://lkml.kernel.org/r/1434560753-7441-1-git-send-email-john.stultz@linaro.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      906c5557
    • Viresh Kumar's avatar
      clockevents: Check state instead of mode in suspend/resume path · a9d20988
      Viresh Kumar authored
      CLOCK_EVT_MODE_* macros are present for backward compatibility (as most
      of the drivers are still using old ->set_mode() interface).
      
      These macro's shouldn't be used anymore in code, that is common to both
      driver interfaces, i.e. ->set_mode() and ->set_state_*().
      
      Drivers implementing ->set_state_*() interface, which have their
      clkevt->mode set to 0 (clkevt device structures are normally globally
      defined), will not participate in suspend/resume as they will always be
      marked as UNUSED.
      
      Fix this by checking state of the clockevent device instead of mode,
      which is updated for both the interfaces.
      
      Fixes: ac34ad27
      
       ("clockevents: Do not suspend/resume if unused")
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Cc: linaro-kernel@lists.linaro.org
      Cc: alexandre.belloni@free-electrons.com
      Cc: sylvain.rochet@finsecur.com
      Link: http://lkml.kernel.org/r/a1964eef6e8a47d02b1ff9083c6c91f73f0ff643.1434537215.git.viresh.kumar@linaro.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      a9d20988
  3. Jun 12, 2015
    • John Stultz's avatar
      selftests: timers: Add leap-second timer edge testing to leap-a-day.c · 0c4a5fc9
      John Stultz authored
      
      
      Prarit reported an issue w/ timers around the leapsecond, where a
      timer set for Midnight UTC (00:00:00) might fire a second early right
      before the leapsecond (23:59:60 - though it appears as a repeated
      23:59:59) is applied.
      
      So I've updated the leap-a-day.c test to integrate a similar test,
      where we set a timer and check if it triggers at the right time, and
      if the ntp state transition is managed properly.
      
      Reported-by: default avatarDaniel Bristot de Oliveira <bristot@redhat.com>
      Reported-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Link: http://lkml.kernel.org/r/1434063297-28657-6-git-send-email-john.stultz@linaro.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      0c4a5fc9
    • John Stultz's avatar
      ntp: Do leapsecond adjustment in adjtimex read path · 96efdcf2
      John Stultz authored
      
      
      Since the leapsecond is applied at tick-time, this means there is a
      small window of time at the start of a leap-second where we cross into
      the next second before applying the leap.
      
      This patch modified adjtimex so that the leap-second is applied on the
      second edge. Providing more correct leapsecond behavior.
      
      This does make it so that adjtimex()'s returned time values can be
      inconsistent with time values read from gettimeofday() or
      clock_gettime(CLOCK_REALTIME,...)  for a brief period of one tick at
      the leapsecond.  However, those other interfaces do not provide the
      TIME_OOP time_state return that adjtimex() provides, which allows the
      leapsecond to be properly represented. They instead only see a time
      discontinuity, and cannot tell the first 23:59:59 from the repeated
      23:59:59 leap second.
      
      This seems like a reasonable tradeoff given clock_gettime() /
      gettimeofday() cannot properly represent a leapsecond, and users
      likely care more about performance, while folks who are using
      adjtimex() more likely care about leap-second correctness.
      
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: Ingo Molnar <mingo@kernel.org>
      Link: http://lkml.kernel.org/r/1434063297-28657-5-git-send-email-john.stultz@linaro.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      96efdcf2
    • John Stultz's avatar
      time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge · 833f32d7
      John Stultz authored
      
      
      Currently, leapsecond adjustments are done at tick time. As a result,
      the leapsecond was applied at the first timer tick *after* the
      leapsecond (~1-10ms late depending on HZ), rather then exactly on the
      second edge.
      
      This was in part historical from back when we were always tick based,
      but correcting this since has been avoided since it adds extra
      conditional checks in the gettime fastpath, which has performance
      overhead.
      
      However, it was recently pointed out that ABS_TIME CLOCK_REALTIME
      timers set for right after the leapsecond could fire a second early,
      since some timers may be expired before we trigger the timekeeping
      timer, which then applies the leapsecond.
      
      This isn't quite as bad as it sounds, since behaviorally it is similar
      to what is possible w/ ntpd made leapsecond adjustments done w/o using
      the kernel discipline. Where due to latencies, timers may fire just
      prior to the settimeofday call. (Also, one should note that all
      applications using CLOCK_REALTIME timers should always be careful,
      since they are prone to quirks from settimeofday() disturbances.)
      
      However, the purpose of having the kernel do the leap adjustment is to
      avoid such latencies, so I think this is worth fixing.
      
      So in order to properly keep those timers from firing a second early,
      this patch modifies the ntp and timekeeping logic so that we keep
      enough state so that the update_base_offsets_now accessor, which
      provides the hrtimer core the current time, can check and apply the
      leapsecond adjustment on the second edge. This prevents the hrtimer
      core from expiring timers too early.
      
      This patch does not modify any other time read path, so no additional
      overhead is incurred. However, this also means that the leap-second
      continues to be applied at tick time for all other read-paths.
      
      Apologies to Richard Cochran, who pushed for similar changes years
      ago, which I resisted due to the concerns about the performance
      overhead.
      
      While I suspect this isn't extremely critical, folks who care about
      strict leap-second correctness will likely want to watch
      this. Potentially a -stable candidate eventually.
      
      Originally-suggested-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Reported-by: default avatarDaniel Bristot de Oliveira <bristot@redhat.com>
      Reported-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Link: http://lkml.kernel.org/r/1434063297-28657-4-git-send-email-john.stultz@linaro.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      833f32d7
    • John Stultz's avatar
      ntp: Introduce and use SECS_PER_DAY macro instead of 86400 · 90bf361c
      John Stultz authored
      
      
      Currently the leapsecond logic uses what looks like magic values.
      
      Improve this by defining SECS_PER_DAY and using that macro
      to make the logic more clear.
      
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jiri Bohac <jbohac@suse.cz>
      Cc: Ingo Molnar <mingo@kernel.org>
      Link: http://lkml.kernel.org/r/1434063297-28657-3-git-send-email-john.stultz@linaro.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      90bf361c
    • John Stultz's avatar
      time: Move clock_was_set_seq update before updating shadow-timekeeper · d1518326
      John Stultz authored
      It was reported that 868a3e91
      
       (hrtimer: Make offset
      update smarter) was causing timer problems after suspend/resume.
      
      The problem with that change is the modification to
      clock_was_set_seq in timekeeping_update is done prior to
      mirroring the time state to the shadow-timekeeper. Thus the
      next time we do update_wall_time() the updated sequence is
      overwritten by whats in the shadow copy.
      
      This patch moves the shadow-timekeeper mirroring to the end
      of the function, after all updates have been made, so all data
      is kept in sync.
      
      (This patch also affects the update_fast_timekeeper calls which
      were also problematically done prior to the mirroring).
      
      Reported-and-tested-by: default avatarJeremiah Mahler <jmmahler@gmail.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/1434063297-28657-2-git-send-email-john.stultz@linaro.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      d1518326
  4. Jun 10, 2015
    • Joe Perches's avatar
      clocksource: Use current logging style · 45bbfe64
      Joe Perches authored
      
      
      clocksource messages aren't prefixed in dmesg so it's a bit unclear
      what subsystem emits the messages.
      
      Use pr_fmt and pr_<level> to auto-prefix the messages appropriately.
      
      Miscellanea:
      
      o Remove "Warning" from KERN_WARNING level messages
      o Align "timekeeping watchdog: " messages
      o Coalesce formats
      o Align multiline arguments
      
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Link: http://lkml.kernel.org/r/1432579795.2846.75.camel@perches.com
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      45bbfe64
    • Nicholas Mc Guire's avatar
      time: Allow gcc to fold usecs_to_jiffies(constant) · c569a23d
      Nicholas Mc Guire authored
      
      
      To allow constant folding in usecs_to_jiffies() conditionally calls
      the HZ dependent _usecs_to_jiffies() helpers or, when gcc can not
      figure out constant folding, __usecs_to_jiffies, which is the renamed
      original usecs_to_jiffies() function.
      
      Signed-off-by: default avatarNicholas Mc Guire <hofrat@osadl.org>
      Cc: Masahiro Yamada <yamada.m@jp.panasonic.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Michal Marek <mmarek@suse.cz>
      Link: http://lkml.kernel.org/r/1432832996-12129-2-git-send-email-hofrat@osadl.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      c569a23d
    • Nicholas Mc Guire's avatar
      time: Refactor usecs_to_jiffies · ae60d6a0
      Nicholas Mc Guire authored
      Refactor the usecs_to_jiffies conditional code part in time.c and
      jiffies.h putting it into conditional functions rather than #ifdefs
      to improve readability. This is analogous to the msecs_to_jiffies()
      cleanup in commit ca42aaf0
      
       ("time: Refactor msecs_to_jiffies")
      
      Signed-off-by: default avatarNicholas Mc Guire <hofrat@osadl.org>
      Cc: Masahiro Yamada <yamada.m@jp.panasonic.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Michal Marek <mmarek@suse.cz>
      Link: http://lkml.kernel.org/r/1432832996-12129-1-git-send-email-hofrat@osadl.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      ae60d6a0
  5. Jun 08, 2015
    • Borislav Petkov's avatar
      hrtimers: Make sure hrtimer_resolution is unsigned int · d711b8b3
      Borislav Petkov authored
      
      
      ... in the !CONFIG_HIGH_RES_TIMERS case too. And thus fix warnings like
      this one:
      
      net/sched/sch_api.c: In function ‘psched_show’:
      net/sched/sch_api.c:1891:6: warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 6 has type ‘long int’ [-Wformat=]
            (u32)NSEC_PER_SEC / hrtimer_resolution);
      
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/1433583000-32090-1-git-send-email-bp@alien8.de
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      d711b8b3
  6. Jun 07, 2015
    • Rik van Riel's avatar
      sched/numa: Only consider less busy nodes as numa balancing destinations · 6f9aad0b
      Rik van Riel authored
      Changeset a43455a1 ("sched/numa: Ensure task_numa_migrate() checks
      the preferred node") fixes an issue where workloads would never
      converge on a fully loaded (or overloaded) system.
      
      However, it introduces a regression on less than fully loaded systems,
      where workloads converge on a few NUMA nodes, instead of properly
      staying spread out across the whole system. This leads to a reduction
      in available memory bandwidth, and usable CPU cache, with predictable
      performance problems.
      
      The root cause appears to be an interaction between the load balancer
      and NUMA balancing, where the short term load represented by the load
      balancer differs from the long term load the NUMA balancing code would
      like to base its decisions on.
      
      Simply reverting a43455a1 would re-introduce the non-convergence
      of workloads on fully loaded systems, so that is not a good option. As
      an aside, the check done before a43455a1 only applied to a task's
      preferred node, not to other candidate nodes in the system, so the
      converge-on-too-few-nodes problem still happens, just to a lesser
      degree.
      
      Instead, try to compensate for the impedance mismatch between the load
      balancer and NUMA balancing by only ever considering a lesser loaded
      node as a destination for NUMA balancing, regardless of whether the
      task is trying to move to the preferred node, or to another node.
      
      This patch also addresses the issue that a system with a single
      runnable thread would never migrate that thread to near its memory,
      introduced by 095bebf6 ("sched/numa: Do not move past the balance
      point if unbalanced").
      
      A test where the main thread creates a large memory area, and spawns a
      worker thread to iterate over the memory (placed on another node by
      select_task_rq_fair), after which the main thread goes to sleep and
      waits for the worker thread to loop over all the memory now sees the
      worker thread migrated to where the memory is, instead of having all
      the memory migrated over like before.
      
      Jirka has run a number of performance tests on several systems: single
      instance SpecJBB 2005 performance is 7-15% higher on a 4 node system,
      with higher gains on systems with more cores per socket.
      Multi-instance SpecJBB 2005 (one per node), linpack, and stream see
      little or no changes with the revert of 095bebf6
      
       and this patch.
      
      Reported-by: default avatarArtem Bityutski <dedekind1@gmail.com>
      Reported-by: default avatarJirka Hladky <jhladky@redhat.com>
      Tested-by: default avatarJirka Hladky <jhladky@redhat.com>
      Tested-by: default avatarArtem Bityutskiy <dedekind1@gmail.com>
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150528095249.3083ade0@annuminas.surriel.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6f9aad0b
    • Rik van Riel's avatar
      Revert 095bebf6 ("sched/numa: Do not move past the balance point if unbalanced") · e4991b24
      Rik van Riel authored
      Commit 095bebf6 ("sched/numa: Do not move past the balance point
      if unbalanced") broke convergence of workloads with just one runnable
      thread, by making it impossible for the one runnable thread on the
      system to move from one NUMA node to another.
      
      Instead, the thread would remain where it was, and pull all the memory
      across to its location, which is much slower than just migrating the
      thread to where the memory is.
      
      The next patch has a better fix for the issue that 095bebf6
      
       tried
      to address.
      
      Reported-by: default avatarJirka Hladky <jhladky@redhat.com>
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dedekind1@gmail.com
      Cc: mgorman@suse.de
      Link: http://lkml.kernel.org/r/1432753468-7785-2-git-send-email-riel@redhat.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e4991b24
    • Ben Segall's avatar
      sched/fair: Prevent throttling in early pick_next_task_fair() · 54d27365
      Ben Segall authored
      
      
      The optimized task selection logic optimistically selects a new task
      to run without first doing a full put_prev_task(). This is so that we
      can avoid a put/set on the common ancestors of the old and new task.
      
      Similarly, we should only call check_cfs_rq_runtime() to throttle
      eligible groups if they're part of the common ancestry, otherwise it
      is possible to end up with no eligible task in the simple task
      selection.
      
      Imagine:
      		/root
      	/prev		/next
      	/A		/B
      
      If our optimistic selection ends up throttling /next, we goto simple
      and our put_prev_task() ends up throttling /prev, after which we're
      going to bug out in set_next_entity() because there aren't any tasks
      left.
      
      Avoid this scenario by only throttling common ancestors.
      
      Reported-by: default avatarMohammed Naser <mnaser@vexxhost.com>
      Reported-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarBen Segall <bsegall@google.com>
      [ munged Changelog ]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: pjt@google.com
      Fixes: 678d5718
      
       ("sched/fair: Optimize cgroup pick_next_task_fair()")
      Link: http://lkml.kernel.org/r/xm26wq1oswoq.fsf@sword-of-the-dawn.mtv.corp.google.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      54d27365
    • Frederic Weisbecker's avatar
      preempt: Reorganize the notrace definitions a bit · 9a92e3dc
      Frederic Weisbecker authored
      
      
      preempt.h has two seperate "#ifdef CONFIG_PREEMPT" sections: one to
      define preempt_enable() and another to define preempt_enable_notrace().
      
      Lets gather both.
      
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1433432349-1021-4-git-send-email-fweisbec@gmail.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9a92e3dc
    • Frederic Weisbecker's avatar
      preempt: Use preempt_schedule_context() as the official tracing preemption point · 4eaca0a8
      Frederic Weisbecker authored
      preempt_schedule_context() is a tracing safe preemption point but it's
      only used when CONFIG_CONTEXT_TRACKING=y. Other configs have tracing
      recursion issues since commit:
      
        b30f0e3f
      
       ("sched/preempt: Optimize preemption operations on __schedule() callers")
      
      introduced function based preemp_count_*() ops.
      
      Lets make it available on all configs and give it a more appropriate
      name for its new position.
      
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1433432349-1021-3-git-send-email-fweisbec@gmail.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4eaca0a8
    • Frederic Weisbecker's avatar
      sched: Make preempt_schedule_context() function-tracing safe · be690035
      Frederic Weisbecker authored
      Since function tracing disables preemption, it needs a safe preemption
      point to use when preemption is re-enabled without worrying about tracing
      recursion. Ie: to avoid tracing recursion, that preemption point can't
      be traced (use of notrace qualifier) and it can't call any traceable
      function before that preemption point disables preemption itself, which
      disarms the recursion.
      
      preempt_schedule() was fine until commit:
      
        b30f0e3f
      
       ("sched/preempt: Optimize preemption operations on __schedule() callers")
      
      because PREEMPT_ACTIVE (which has the property to disable preemption
      and this disarm tracing preemption recursion) was set before calling
      any further function.
      
      But that commit introduced the use of preempt_count_add/sub() functions
      to set PREEMPT_ACTIVE and because these functions are called before
      preemption gets a chance to be disabled, we have a tracing recursion.
      
      preempt_schedule_context() is one of the possible preemption functions
      used by tracing. Its special purpose is to avoid tracing recursion
      against context tracking. Lets enhance this function to become more
      generally tracing safe by disabling preemption with raw accessors, such
      that no function is called before preemption gets disabled and disarm
      the tracing recursion.
      
      This function is going to become the specific tracing-safe preemption
      point in further commit.
      
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1433432349-1021-2-git-send-email-fweisbec@gmail.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      be690035
  7. Jun 02, 2015