Skip to content
  1. Oct 28, 2014
    • Peter Zijlstra's avatar
      sched: Debug nested sleeps · 8eb23b9f
      Peter Zijlstra authored
      
      
      Validate we call might_sleep() with TASK_RUNNING, which catches places
      where we nest blocking primitives, eg. mutex usage in a wait loop.
      
      Since all blocking is arranged through task_struct::state, nesting
      this will cause the inner primitive to set TASK_RUNNING and the outer
      will thus not block.
      
      Another observed problem is calling a blocking function from
      schedule()->sched_submit_work()->blk_schedule_flush_plug() which will
      then destroy the task state for the actual __schedule() call that
      comes after it.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: tglx@linutronix.de
      Cc: ilya.dryomov@inktank.com
      Cc: umgwanakikbuti@gmail.com
      Cc: oleg@redhat.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140924082242.591637616@infradead.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8eb23b9f
    • Peter Zijlstra's avatar
      sched, net: Clean up sk_wait_event() vs. might_sleep() · 26cabd31
      Peter Zijlstra authored
      
      
      WARNING: CPU: 1 PID: 1744 at kernel/sched/core.c:7104 __might_sleep+0x58/0x90()
      do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff81070e10>] prepare_to_wait+0x50 /0xa0
      
       [<ffffffff8105bc38>] __might_sleep+0x58/0x90
       [<ffffffff8148c671>] lock_sock_nested+0x31/0xb0
       [<ffffffff81498aaa>] sk_stream_wait_memory+0x18a/0x2d0
      
      Which is a false positive because sk_wait_event() will already have
      TASK_RUNNING at that point if it would've gone through
      schedule_timeout().
      
      So annotate with sched_annotate_sleep(); which goes away on !DEBUG builds.
      
      Reported-by: default avatarIlya Dryomov <ilya.dryomov@inktank.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140924082242.524407432@infradead.org
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: netdev@vger.kernel.org
      Cc: tglx@linutronix.de
      Cc: ilya.dryomov@inktank.com
      Cc: umgwanakikbuti@gmail.com
      Cc: oleg@redhat.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      26cabd31
    • Peter Zijlstra's avatar
      sched, modules: Fix nested sleep in add_unformed_module() · 3c9b2c3d
      Peter Zijlstra authored
      
      
      This is a genuine bug in add_unformed_module(), we cannot use blocking
      primitives inside a wait loop.
      
      So rewrite the wait_event_interruptible() usage to use the fresh
      wait_woken() stuff.
      
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: tglx@linutronix.de
      Cc: ilya.dryomov@inktank.com
      Cc: umgwanakikbuti@gmail.com
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: oleg@redhat.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Link: http://lkml.kernel.org/r/20140924082242.458562904@infradead.org
      [ So this is probably complex to backport and the race wasn't reported AFAIK,
        so not marked for -stable. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3c9b2c3d
    • Peter Zijlstra's avatar
      sched, smp: Correctly deal with nested sleeps · 7d4d2696
      Peter Zijlstra authored
      
      
      smp_hotplug_thread::{setup,unpark} functions can sleep too, so be
      consistent and do the same for all callbacks.
      
       __might_sleep+0x74/0x80
       kmem_cache_alloc_trace+0x4e/0x1c0
       perf_event_alloc+0x55/0x450
       perf_event_create_kernel_counter+0x2f/0x100
       watchdog_nmi_enable+0x8d/0x160
       watchdog_enable+0x45/0x90
       smpboot_thread_fn+0xec/0x2b0
       kthread+0xe4/0x100
       ret_from_fork+0x7c/0xb0
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: tglx@linutronix.de
      Cc: ilya.dryomov@inktank.com
      Cc: umgwanakikbuti@gmail.com
      Cc: oleg@redhat.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140924082242.392279328@infradead.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7d4d2696
    • Peter Zijlstra's avatar
      sched, tty: Deal with nested sleeps · 97d9e28d
      Peter Zijlstra authored
      
      
      n_tty_{read,write} are wait loops with sleeps in. Wait loops rely on
      task_struct::state and sleeps do too, since that's the only means of
      actually sleeping. Therefore the nested sleeps destroy the wait loop
      state.
      
      Fix this by using the new woken_wake_function and wait_woken() stuff,
      which registers wakeups in wait and thereby allows shrinking the
      task_state::state changes to the actual sleep part.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: tglx@linutronix.de
      Cc: ilya.dryomov@inktank.com
      Cc: umgwanakikbuti@gmail.com
      Cc: oleg@redhat.com
      Link: http://lkml.kernel.org/r/20140924082242.323011233@infradead.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      97d9e28d
    • Peter Zijlstra's avatar
      sched, inotify: Deal with nested sleeps · e23738a7
      Peter Zijlstra authored
      
      
      inotify_read is a wait loop with sleeps in. Wait loops rely on
      task_struct::state and sleeps do too, since that's the only means of
      actually sleeping. Therefore the nested sleeps destroy the wait loop
      state and the wait loop breaks the sleep functions that assume
      TASK_RUNNING (mutex_lock).
      
      Fix this by using the new woken_wake_function and wait_woken() stuff,
      which registers wakeups in wait and thereby allows shrinking the
      task_state::state changes to the actual sleep part.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: tglx@linutronix.de
      Cc: ilya.dryomov@inktank.com
      Cc: umgwanakikbuti@gmail.com
      Cc: Robert Love <rlove@rlove.org>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: John McCutchan <john@johnmccutchan.com>
      Cc: Robert Love <rlove@rlove.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Link: http://lkml.kernel.org/r/20140924082242.254858080@infradead.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e23738a7
    • Peter Zijlstra's avatar
      sched, exit: Deal with nested sleeps · 1029a2b5
      Peter Zijlstra authored
      
      
      do_wait() is a big wait loop, but we set TASK_RUNNING too late; we end
      up calling potential sleeps before we reset it.
      
      Not strictly a bug since we're guaranteed to exit the loop and not
      call schedule(); put in annotations to quiet might_sleep().
      
       WARNING: CPU: 0 PID: 1 at ../kernel/sched/core.c:7123 __might_sleep+0x7e/0x90()
       do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8109a788>] do_wait+0x88/0x270
      
       Call Trace:
        [<ffffffff81694991>] dump_stack+0x4e/0x7a
        [<ffffffff8109877c>] warn_slowpath_common+0x8c/0xc0
        [<ffffffff8109886c>] warn_slowpath_fmt+0x4c/0x50
        [<ffffffff810bca6e>] __might_sleep+0x7e/0x90
        [<ffffffff811a1c15>] might_fault+0x55/0xb0
        [<ffffffff8109a3fb>] wait_consider_task+0x90b/0xc10
        [<ffffffff8109a804>] do_wait+0x104/0x270
        [<ffffffff8109b837>] SyS_wait4+0x77/0x100
        [<ffffffff8169d692>] system_call_fastpath+0x16/0x1b
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: tglx@linutronix.de
      Cc: umgwanakikbuti@gmail.com
      Cc: ilya.dryomov@inktank.com
      Cc: Alex Elder <alex.elder@linaro.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Axel Lin <axel.lin@ingics.com>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Guillaume Morin <guillaume@morinfr.org>
      Cc: Ionut Alexa <ionut.m.alexa@gmail.com>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Michal Schmidt <mschmidt@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20140924082242.186408915@infradead.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1029a2b5
    • Peter Zijlstra's avatar
      sched/wait: Add might_sleep() checks · e22b886a
      Peter Zijlstra authored
      
      
      Add more might_sleep() checks, suppose someone put a wait_event() like
      thing in a wait loop..
      
      Can't put might_sleep() in ___wait_event() because there's the locked
      primitives which call ___wait_event() with locks held.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: tglx@linutronix.de
      Cc: ilya.dryomov@inktank.com
      Cc: umgwanakikbuti@gmail.com
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140924082242.119255706@infradead.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e22b886a
    • Peter Zijlstra's avatar
      sched/wait: Provide infrastructure to deal with nested blocking · 61ada528
      Peter Zijlstra authored
      
      
      There are a few places that call blocking primitives from wait loops,
      provide infrastructure to support this without the typical
      task_struct::state collision.
      
      We record the wakeup in wait_queue_t::flags which leaves
      task_struct::state free to be used by others.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: tglx@linutronix.de
      Cc: ilya.dryomov@inktank.com
      Cc: umgwanakikbuti@gmail.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140924082242.051202318@infradead.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      61ada528
    • Peter Zijlstra's avatar
      locking/mutex: Don't assume TASK_RUNNING · 6f942a1f
      Peter Zijlstra authored
      
      
      We're going to make might_sleep() test for TASK_RUNNING, because
      blocking without TASK_RUNNING will destroy the task state by setting
      it to TASK_RUNNING.
      
      There are a few occasions where its 'valid' to call blocking
      primitives (and mutex_lock in particular) and not have TASK_RUNNING,
      typically such cases are right before we set TASK_RUNNING anyhow.
      
      Robustify the code by not assuming this; this has the beneficial side
      effect of allowing optional code emission for fixing the above
      might_sleep() false positives.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: tglx@linutronix.de
      Cc: ilya.dryomov@inktank.com
      Cc: umgwanakikbuti@gmail.com
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140924082241.988560063@infradead.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6f942a1f
    • Wanpeng Li's avatar
      sched/deadline: Don't balance during wakeup if wakee is pinned · f4e9d94a
      Wanpeng Li authored
      
      
      Use nr_cpus_allowed to bail from select_task_rq() when only one cpu
      can be used, and saves some cycles for pinned tasks.
      
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1413253360-5318-2-git-send-email-wanpeng.li@linux.intel.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f4e9d94a
    • Wanpeng Li's avatar
      sched/deadline: Don't check SD_BALANCE_FORK · 1d7e974c
      Wanpeng Li authored
      
      
      There is no need to do balance during fork since SCHED_DEADLINE
      tasks can't fork. This patch avoid the SD_BALANCE_FORK check.
      
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1413253360-5318-1-git-send-email-wanpeng.li@linux.intel.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1d7e974c
    • Juri Lelli's avatar
      sched/deadline: Ensure that updates to exclusive cpusets don't break AC · f82f8042
      Juri Lelli authored
      
      
      How we deal with updates to exclusive cpusets is currently broken.
      As an example, suppose we have an exclusive cpuset composed of
      two cpus: A[cpu0,cpu1]. We can assign SCHED_DEADLINE task to it
      up to the allowed bandwidth. If we want now to modify cpusetA's
      cpumask, we have to check that removing a cpu's amount of
      bandwidth doesn't break AC guarantees. This thing isn't checked
      in the current code.
      
      This patch fixes the problem above, denying an update if the
      new cpumask won't have enough bandwidth for SCHED_DEADLINE tasks
      that are currently active.
      
      Signed-off-by: default avatarJuri Lelli <juri.lelli@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: cgroups@vger.kernel.org
      Link: http://lkml.kernel.org/r/5433E6AF.5080105@arm.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f82f8042
    • Juri Lelli's avatar
      sched/deadline: Fix bandwidth check/update when migrating tasks between exclusive cpusets · 7f51412a
      Juri Lelli authored
      
      
      Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
      affinity (performing what is commonly called clustered scheduling).
      Unfortunately, such thing is currently broken for two reasons:
      
       - No check is performed when the user tries to attach a task to
         an exlusive cpuset (recall that exclusive cpusets have an
         associated maximum allowed bandwidth).
      
       - Bandwidths of source and destination cpusets are not correctly
         updated after a task is migrated between them.
      
      This patch fixes both things at once, as they are opposite faces
      of the same coin.
      
      The check is performed in cpuset_can_attach(), as there aren't any
      points of failure after that function. The updated is split in two
      halves. We first reserve bandwidth in the destination cpuset, after
      we pass the check in cpuset_can_attach(). And we then release
      bandwidth from the source cpuset when the task's affinity is
      actually changed. Even if there can be time windows when sched_setattr()
      may erroneously fail in the source cpuset, we are fine with it, as
      we can't perfom an atomic update of both cpusets at once.
      
      Reported-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Reported-by: default avatarVincent Legout <vincent@legout.info>
      Signed-off-by: default avatarJuri Lelli <juri.lelli@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dario Faggioli <raistlin@linux.it>
      Cc: Michael Trimarchi <michael@amarulasolutions.com>
      Cc: Fabio Checconi <fchecconi@gmail.com>
      Cc: michael@amarulasolutions.com
      Cc: luca.abeni@unitn.it
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: cgroups@vger.kernel.org
      Link: http://lkml.kernel.org/r/1411118561-26323-3-git-send-email-juri.lelli@arm.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7f51412a
    • Wanpeng Li's avatar
      sched/deadline: Do not try to push tasks if pinned task switches to dl · d9aade7a
      Wanpeng Li authored
      As Kirill mentioned (https://lkml.org/lkml/2013/1/29/118):
      
       | If rq has already had 2 or more pushable tasks and we try to add a
       | pinned task then call of push_rt_task will just waste a time.
      
      Just switched pinned task is not able to be pushed. If the rq has had
      several dl tasks before they have already been considered as candidates
      to be pushed (or pulled). This patch implements the same behavior as rt
      class which introduced by commit 10447917
      
       ("sched/rt: Do not try to
      push tasks if pinned task switches to RT").
      
      Suggested-by: default avatarKirill V Tkhai <tkhai@yandex.ru>
      Acked-by: default avatarJuri Lelli <juri.lelli@arm.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1413938203-224610-1-git-send-email-wanpeng.li@linux.intel.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d9aade7a
    • Oleg Nesterov's avatar
      sched: Kill task_preempt_count() · e2336f6e
      Oleg Nesterov authored
      
      
      task_preempt_count() is pointless if preemption counter is per-cpu,
      currently this is x86 only. It is only valid if the task is not
      running, and even in this case the only info it can provide is the
      state of PREEMPT_ACTIVE bit.
      
      Change its single caller to check p->on_rq instead, this should be
      the same if p->state != TASK_RUNNING, and kill this helper.
      
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Kirill Tkhai <tkhai@yandex.ru>
      Cc: Alexander Graf <agraf@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-arch@vger.kernel.org
      Link: http://lkml.kernel.org/r/20141008183348.GC17495@redhat.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e2336f6e
    • Oleg Nesterov's avatar
      sched: Make finish_task_switch() return 'struct rq *' · dfa50b60
      Oleg Nesterov authored
      
      
      Both callers of finish_task_switch() need to recalculate this_rq()
      and pass it as an argument, plus __schedule() does this again after
      context_switch().
      
      It would be simpler to call this_rq() once in finish_task_switch()
      and return the this rq to the callers.
      
      Note: probably "int cpu" in __schedule() should die; it is not used
      and both rcu_note_context_switch() and wq_worker_sleeping() do not
      really need this argument.
      
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Kirill Tkhai <tkhai@yandex.ru>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20141009193232.GB5408@redhat.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      dfa50b60
    • Oleg Nesterov's avatar
      sched: Fix schedule_tail() to disable preemption · 1a43a14a
      Oleg Nesterov authored
      
      
      finish_task_switch() enables preemption, so post_schedule(rq) can be
      called on the wrong (and even dead) CPU. Afaics, nothing really bad
      can happen, but in this case we can wrongly clear rq->post_schedule
      on that CPU. And this simply looks wrong in any case.
      
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Kirill Tkhai <tkhai@yandex.ru>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20141008193644.GA32055@redhat.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1a43a14a
    • Oleg Nesterov's avatar
      sched: Fix the PREEMPT_ACTIVE check in __trace_sched_switch_state() · 8f9fbf09
      Oleg Nesterov authored
      
      
      task_preempt_count() has nothing to do with the actual preempt counter,
      thread_info->saved_preempt_count is only valid right after switch_to().
      
      __trace_sched_switch_state() can use preempt_count(), prev is still the
      current task when trace_sched_switch() is called.
      
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      [ Added BUG_ON(). ]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20141007195108.GB28002@redhat.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8f9fbf09
    • Rik van Riel's avatar
      sched/numa: Check all nodes when placing a pseudo-interleaved group · 9de05d48
      Rik van Riel authored
      
      
      In pseudo-interleaved numa_groups, all tasks try to relocate to
      the group's preferred_nid.  When a group is spread across multiple
      NUMA nodes, this can lead to tasks swapping their location with
      other tasks inside the same group, instead of swapping location with
      tasks from other NUMA groups. This can keep NUMA groups from converging.
      
      Examining all nodes, when dealing with a task in a pseudo-interleaved
      NUMA group, avoids this problem. Note that only CPUs in nodes that
      improve the task or group score are examined, so the loop isn't too
      bad.
      
      Tested-by: default avatarVinod Chegu <chegu_vinod@hp.com>
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: "Vinod Chegu" <chegu_vinod@hp.com>
      Cc: mgorman@suse.de
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20141009172747.0d97c38c@annuminas.surriel.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9de05d48
    • Rik van Riel's avatar
      sched/numa: Find the preferred nid with complex NUMA topology · 54009416
      Rik van Riel authored
      
      
      On systems with complex NUMA topologies, the node scoring is adjusted
      to allow workloads to converge on nodes that are near each other.
      
      The way a task group's preferred nid is determined needs to be adjusted,
      in order for the preferred_nid to be consistent with group_weight scoring.
      This ensures that we actually try to converge workloads on adjacent nodes.
      
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Tested-by: default avatarChegu Vinod <chegu_vinod@hp.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: mgorman@suse.de
      Cc: chegu_vinod@hp.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1413530994-9732-6-git-send-email-riel@redhat.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      54009416
    • Rik van Riel's avatar
      sched/numa: Calculate node scores in complex NUMA topologies · 6c6b1193
      Rik van Riel authored
      
      
      In order to do task placement on systems with complex NUMA topologies,
      it is necessary to count the faults on nodes nearby the node that is
      being examined for a potential move.
      
      In case of a system with a backplane interconnect, we are dealing with
      groups of NUMA nodes; each of the nodes within a group is the same number
      of hops away from nodes in other groups in the system. Optimal placement
      on this topology is achieved by counting all nearby nodes equally. When
      comparing nodes A and B at distance N, nearby nodes are those at distances
      smaller than N from nodes A or B.
      
      Placement strategy on a system with a glueless mesh NUMA topology needs
      to be different, because there are no natural groups of nodes determined
      by the hardware. Instead, when dealing with two nodes A and B at distance
      N, N >= 2, there will be intermediate nodes at distance < N from both nodes
      A and B. Good placement can be achieved by right shifting the faults on
      nearby nodes by the number of hops from the node being scored. In this
      context, a nearby node is any node less than the maximum distance in the
      system away from the node. Those nodes are skipped for efficiency reasons,
      there is no real policy reason to do so.
      
      Placement policy on directly connected NUMA systems is not affected.
      
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Tested-by: default avatarChegu Vinod <chegu_vinod@hp.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: mgorman@suse.de
      Cc: chegu_vinod@hp.com
      Link: http://lkml.kernel.org/r/1413530994-9732-5-git-send-email-riel@redhat.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6c6b1193
    • Rik van Riel's avatar
      sched/numa: Prepare for complex topology placement · 7bd95320
      Rik van Riel authored
      
      
      Preparatory patch for adding NUMA placement on systems with
      complex NUMA topology. Also fix a potential divide by zero
      in group_weight()
      
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Tested-by: default avatarChegu Vinod <chegu_vinod@hp.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: mgorman@suse.de
      Cc: chegu_vinod@hp.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1413530994-9732-4-git-send-email-riel@redhat.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7bd95320
    • Rik van Riel's avatar
      sched/numa: Classify the NUMA topology of a system · e3fe70b1
      Rik van Riel authored
      
      
      Smaller NUMA systems tend to have all NUMA nodes directly connected
      to each other. This includes the degenerate case of a system with just
      one node, ie. a non-NUMA system.
      
      Larger systems can have two kinds of NUMA topology, which affects how
      tasks and memory should be placed on the system.
      
      On glueless mesh systems, nodes that are not directly connected to
      each other will bounce traffic through intermediary nodes. Task groups
      can be run closer to each other by moving tasks from a node to an
      intermediary node between it and the task's preferred node.
      
      On NUMA systems with backplane controllers, the intermediary hops
      are incapable of running programs. This creates "islands" of nodes
      that are at an equal distance to anywhere else in the system.
      
      Each kind of topology requires a slightly different placement
      algorithm; this patch provides the mechanism to detect the kind
      of NUMA topology of a system.
      
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Tested-by: default avatarChegu Vinod <chegu_vinod@hp.com>
      [ Changed to use kernel/sched/sched.h ]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: mgorman@suse.de
      Cc: chegu_vinod@hp.com
      Link: http://lkml.kernel.org/r/1413530994-9732-3-git-send-email-riel@redhat.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e3fe70b1
    • Rik van Riel's avatar
      sched/numa: Export info needed for NUMA balancing on complex topologies · 9942f79b
      Rik van Riel authored
      
      
      Export some information that is necessary to do placement of
      tasks on systems with multi-level NUMA topologies.
      
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: mgorman@suse.de
      Cc: chegu_vinod@hp.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1413530994-9732-2-git-send-email-riel@redhat.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9942f79b
    • Kirill Tkhai's avatar
      sched/dl: Fix preemption checks · f3a7e1a9
      Kirill Tkhai authored
      
      
      1) switched_to_dl() check is wrong. We reschedule only
         if rq->curr is deadline task, and we do not reschedule
         if it's a lower priority task. But we must always
         preempt a task of other classes.
      
      2) dl_task_timer():
         Policy does not change in case of priority inheritance.
         rt_mutex_setprio() changes prio, while policy remains old.
      
      So we lose some balancing logic in dl_task_timer() and
      switched_to_dl() when we check policy instead of priority. Boosted
      task may be rq->curr.
      
      (I didn't change switched_from_dl() because no check is necessary
      there at all).
      
      I've looked at this place(switched_to_dl) several times and even fixed
      this function, but found just now...  I suppose some performance tests
      may work better after this.
      
      Signed-off-by: default avatarKirill Tkhai <ktkhai@parallels.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1413909356.19914.128.camel@tkhai
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f3a7e1a9
    • Chen Hanxiao's avatar
      sched: Update comments for CLONE_NEWNS · fcd964dd
      Chen Hanxiao authored
      
      
      Signed-off-by: default avatarChen Hanxiao <chenhanxiao@cn.fujitsu.com>
      Acked-by: default avatarSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-api@vger.kernel.org
      Link: http://lkml.kernel.org/r/1412674147-8941-1-git-send-email-chenhanxiao@cn.fujitsu.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fcd964dd
    • Oleg Nesterov's avatar
      sched: stop the unbound recursion in preempt_schedule_context() · 009f60e2
      Oleg Nesterov authored
      
      
      preempt_schedule_context() does preempt_enable_notrace() at the end
      and this can call the same function again; exception_exit() is heavy
      and it is quite possible that need-resched is true again.
      
      1. Change this code to dec preempt_count() and check need_resched()
         by hand.
      
      2. As Linus suggested, we can use the PREEMPT_ACTIVE bit and avoid
         the enable/disable dance around __schedule(). But in this case
         we need to move into sched/core.c.
      
      3. Cosmetic, but x86 forgets to declare this function. This doesn't
         really matter because it is only called by asm helpers, still it
         make sense to add the declaration into asm/preempt.h to match
         preempt_schedule().
      
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Graf <agraf@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/20141005202322.GB27962@redhat.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      009f60e2
    • Kirill Tkhai's avatar
      sched/fair: Fix division by zero sysctl_numa_balancing_scan_size · 64192658
      Kirill Tkhai authored
      
      
      File /proc/sys/kernel/numa_balancing_scan_size_mb allows writing of zero.
      
      This bash command reproduces problem:
      
      $ while :; do echo 0 > /proc/sys/kernel/numa_balancing_scan_size_mb; \
      	   echo 256 > /proc/sys/kernel/numa_balancing_scan_size_mb; done
      
      	divide error: 0000 [#1] SMP
      	Modules linked in:
      	CPU: 0 PID: 24112 Comm: bash Not tainted 3.17.0+ #8
      	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      	task: ffff88013c852600 ti: ffff880037a68000 task.ti: ffff880037a68000
      	RIP: 0010:[<ffffffff81074191>]  [<ffffffff81074191>] task_scan_min+0x21/0x50
      	RSP: 0000:ffff880037a6bce0  EFLAGS: 00010246
      	RAX: 0000000000000a00 RBX: 00000000000003e8 RCX: 0000000000000000
      	RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88013c852600
      	RBP: ffff880037a6bcf0 R08: 0000000000000001 R09: 0000000000015c90
      	R10: ffff880239bf6c00 R11: 0000000000000016 R12: 0000000000003fff
      	R13: ffff88013c852600 R14: ffffea0008d1b000 R15: 0000000000000003
      	FS:  00007f12bb048700(0000) GS:ffff88007da00000(0000) knlGS:0000000000000000
      	CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      	CR2: 0000000001505678 CR3: 0000000234770000 CR4: 00000000000006f0
      	Stack:
      	 ffff88013c852600 0000000000003fff ffff880037a6bd18 ffffffff810741d1
      	 ffff88013c852600 0000000000003fff 000000000002bfff ffff880037a6bda8
      	 ffffffff81077ef7 ffffea0008a56d40 0000000000000001 0000000000000001
      	Call Trace:
      	 [<ffffffff810741d1>] task_scan_max+0x11/0x40
      	 [<ffffffff81077ef7>] task_numa_fault+0x1f7/0xae0
      	 [<ffffffff8115a896>] ? migrate_misplaced_page+0x276/0x300
      	 [<ffffffff81134a4d>] handle_mm_fault+0x62d/0xba0
      	 [<ffffffff8103e2f1>] __do_page_fault+0x191/0x510
      	 [<ffffffff81030122>] ? native_smp_send_reschedule+0x42/0x60
      	 [<ffffffff8106dc00>] ? check_preempt_curr+0x80/0xa0
      	 [<ffffffff8107092c>] ? wake_up_new_task+0x11c/0x1a0
      	 [<ffffffff8104887d>] ? do_fork+0x14d/0x340
      	 [<ffffffff811799bb>] ? get_unused_fd_flags+0x2b/0x30
      	 [<ffffffff811799df>] ? __fd_install+0x1f/0x60
      	 [<ffffffff8103e67c>] do_page_fault+0xc/0x10
      	 [<ffffffff8150d322>] page_fault+0x22/0x30
      	RIP  [<ffffffff81074191>] task_scan_min+0x21/0x50
      	RSP <ffff880037a6bce0>
      	---[ end trace 9a826d16936c04de ]---
      
      Also fix race in task_scan_min (it depends on compiler behaviour).
      
      Signed-off-by: default avatarKirill Tkhai <ktkhai@parallels.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dario Faggioli <raistlin@linux.it>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Link: http://lkml.kernel.org/r/1413455977.24793.78.camel@tkhai
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      64192658
    • Yasuaki Ishimatsu's avatar
      sched/fair: Care divide error in update_task_scan_period() · 2847c90e
      Yasuaki Ishimatsu authored
      
      
      While offling node by hot removing memory, the following divide error
      occurs:
      
        divide error: 0000 [#1] SMP
        [...]
        Call Trace:
         [...] handle_mm_fault
         [...] ? try_to_wake_up
         [...] ? wake_up_state
         [...] __do_page_fault
         [...] ? do_futex
         [...] ? put_prev_entity
         [...] ? __switch_to
         [...] do_page_fault
         [...] page_fault
        [...]
        RIP  [<ffffffff810a7081>] task_numa_fault
         RSP <ffff88084eb2bcb0>
      
      The issue occurs as follows:
        1. When page fault occurs and page is allocated from node 1,
           task_struct->numa_faults_buffer_memory[] of node 1 is
           incremented and p->numa_faults_locality[] is also incremented
           as follows:
      
           o numa_faults_buffer_memory[]       o numa_faults_locality[]
                    NR_NUMA_HINT_FAULT_TYPES
                   |      0     |     1     |
           ----------------------------------  ----------------------
            node 0 |      0     |     0     |   remote |      0     |
            node 1 |      0     |     1     |   locale |      1     |
           ----------------------------------  ----------------------
      
        2. node 1 is offlined by hot removing memory.
      
        3. When page fault occurs, fault_types[] is calculated by using
           p->numa_faults_buffer_memory[] of all online nodes in
           task_numa_placement(). But node 1 was offline by step 2. So
           the fault_types[] is calculated by using only
           p->numa_faults_buffer_memory[] of node 0. So both of fault_types[]
           are set to 0.
      
        4. The values(0) of fault_types[] pass to update_task_scan_period().
      
        5. numa_faults_locality[1] is set to 1. So the following division is
           calculated.
      
              static void update_task_scan_period(struct task_struct *p,
                                      unsigned long shared, unsigned long private){
              ...
                      ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, (private + shared));
              }
      
        6. But both of private and shared are set to 0. So divide error
           occurs here.
      
      The divide error is rare case because the trigger is node offline.
      This patch always increments denominator for avoiding divide error.
      
      Signed-off-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/54475703.8000505@jp.fujitsu.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2847c90e
    • Kirill Tkhai's avatar
      sched/numa: Fix unsafe get_task_struct() in task_numa_assign() · 1effd9f1
      Kirill Tkhai authored
      
      
      Unlocked access to dst_rq->curr in task_numa_compare() is racy.
      If curr task is exiting this may be a reason of use-after-free:
      
      task_numa_compare()                    do_exit()
          ...                                        current->flags |= PF_EXITING;
          ...                                    release_task()
          ...                                        ~~delayed_put_task_struct()~~
          ...                                    schedule()
          rcu_read_lock()                        ...
          cur = ACCESS_ONCE(dst_rq->curr)        ...
              ...                                rq->curr = next;
              ...                                    context_switch()
              ...                                        finish_task_switch()
              ...                                            put_task_struct()
              ...                                                __put_task_struct()
              ...                                                    free_task_struct()
              task_numa_assign()                                     ...
                  get_task_struct()                                  ...
      
      As noted by Oleg:
      
        <<The lockless get_task_struct(tsk) is only safe if tsk == current
          and didn't pass exit_notify(), or if this tsk was found on a rcu
          protected list (say, for_each_process() or find_task_by_vpid()).
          IOW, it is only safe if release_task() was not called before we
          take rcu_read_lock(), in this case we can rely on the fact that
          delayed_put_pid() can not drop the (potentially) last reference
          until rcu_read_unlock().
      
          And as Kirill pointed out task_numa_compare()->task_numa_assign()
          path does get_task_struct(dst_rq->curr) and this is not safe. The
          task_struct itself can't go away, but rcu_read_lock() can't save
          us from the final put_task_struct() in finish_task_switch(); this
          reference goes away without rcu gp>>
      
      The patch provides simple check of PF_EXITING flag. If it's not set,
      this guarantees that call_rcu() of delayed_put_task_struct() callback
      hasn't happened yet, so we can safely do get_task_struct() in
      task_numa_assign().
      
      Locked dst_rq->lock protects from concurrency with the last schedule().
      Reusing or unmapping of cur's memory may happen without it.
      
      Suggested-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarKirill Tkhai <ktkhai@parallels.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1413962231.19914.130.camel@tkhai
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1effd9f1
    • Juri Lelli's avatar
      sched/deadline: Fix races between rt_mutex_setprio() and dl_task_timer() · aee38ea9
      Juri Lelli authored
      
      
      dl_task_timer() is racy against several paths. Daniel noticed that
      the replenishment timer may experience a race condition against an
      enqueue_dl_entity() called from rt_mutex_setprio(). With his own
      words:
      
       rt_mutex_setprio() resets p->dl.dl_throttled. So the pattern is:
       start_dl_timer() throttled = 1, rt_mutex_setprio() throlled = 0,
       sched_switch() -> enqueue_task(), dl_task_timer-> enqueue_task()
       throttled is 0
      
      => BUG_ON(on_dl_rq(dl_se)) fires as the scheduling entity is already
      enqueued on the -deadline runqueue.
      
      As we do for the other races, we just bail out in the replenishment
      timer code.
      
      Reported-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Tested-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Signed-off-by: default avatarJuri Lelli <juri.lelli@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: vincent@legout.info
      Cc: Dario Faggioli <raistlin@linux.it>
      Cc: Michael Trimarchi <michael@amarulasolutions.com>
      Cc: Fabio Checconi <fchecconi@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1414142198-18552-5-git-send-email-juri.lelli@arm.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      aee38ea9
    • Juri Lelli's avatar
      sched/deadline: Don't replenish from a !SCHED_DEADLINE entity · 64be6f1f
      Juri Lelli authored
      
      
      In the deboost path, right after the dl_boosted flag has been
      reset, we can currently end up replenishing using -deadline
      parameters of a !SCHED_DEADLINE entity. This of course causes
      a bug, as those parameters are empty.
      
      In the case depicted above it is safe to simply bail out, as
      the deboosted task is going to be back to its original scheduling
      class anyway.
      
      Reported-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Tested-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Signed-off-by: default avatarJuri Lelli <juri.lelli@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: vincent@legout.info
      Cc: Dario Faggioli <raistlin@linux.it>
      Cc: Michael Trimarchi <michael@amarulasolutions.com>
      Cc: Fabio Checconi <fchecconi@gmail.com>
      Link: http://lkml.kernel.org/r/1414142198-18552-4-git-send-email-juri.lelli@arm.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      64be6f1f
    • Kirill Tkhai's avatar
      sched: Fix race between task_group and sched_task_group · eeb61e53
      Kirill Tkhai authored
      
      
      The race may happen when somebody is changing task_group of a forking task.
      Child's cgroup is the same as parent's after dup_task_struct() (there just
      memory copying). Also, cfs_rq and rt_rq are the same as parent's.
      
      But if parent changes its task_group before it's called cgroup_post_fork(),
      we do not reflect this situation on child. Child's cfs_rq and rt_rq remain
      the same, while child's task_group changes in cgroup_post_fork().
      
      To fix this we introduce fork() method, which calls sched_move_task() directly.
      This function changes sched_task_group on appropriate (also its logic has
      no problem with freshly created tasks, so we shouldn't introduce something
      special; we are able just to use it).
      
      Possibly, this decides the Burke Libbey's problem: https://lkml.org/lkml/2014/10/24/456
      
      Signed-off-by: default avatarKirill Tkhai <ktkhai@parallels.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1414405105.19914.169.camel@tkhai
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      eeb61e53
  2. Oct 27, 2014
    • Linus Torvalds's avatar
      Linux 3.18-rc2 · cac7f242
      Linus Torvalds authored
      cac7f242
    • Linus Torvalds's avatar
      Merge tag 'armsoc-for-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 88e23761
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "Another week, another small batch of fixes.
      
        Most of these make zynq, socfpga and sunxi platforms work a bit
        better:
      
         - due to new requirements for regulators, DWMMC on socfpga broke past
           v3.17
         - SMP spinup fix for socfpga
         - a few DT fixes for zynq
         - another option (FIXED_REGULATOR) for sunxi is needed that used to
           be selected by other options but no longer is.
         - a couple of small DT fixes for at91
         - ...and a couple for i.MX"
      
      * tag 'armsoc-for-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        ARM: dts: imx28-evk: Let i2c0 run at 100kHz
        ARM: i.MX6: Fix "emi" clock name typo
        ARM: multi_v7_defconfig: enable CONFIG_MMC_DW_ROCKCHIP
        ARM: sunxi_defconfig: enable CONFIG_REGULATOR_FIXED_VOLTAGE
        ARM: dts: socfpga: Add a 3.3V fixed regulator node
        ARM: dts: socfpga: Fix SD card detect
        ARM: dts: socfpga: rename gpio nodes
        ARM: at91/dt: sam9263: fix PLLB frequencies
        power: reset: at91-reset: fix power down register
        MAINTAINERS: add atmel ssc driver maintainer entry
        arm: socfpga: fix fetching cpu1start_addr for SMP
        ARM: zynq: DT: trivial: Fix mc node
        ARM: zynq: DT: Add cadence watchdog node
        ARM: zynq: DT: Add missing reference for memory-controller
        ARM: zynq: DT: Add missing reference for ADC
        ARM: zynq: DT: Add missing address for L2 pl310
        ARM: zynq: DT: Remove 222 MHz OPP
        ARM: zynq: DT: Fix GEM register area size
      88e23761
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · d1e14f1d
      Linus Torvalds authored
      Pull vfs updates from Al Viro:
       "overlayfs merge + leak fix for d_splice_alias() failure exits"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        overlayfs: embed middle into overlay_readdir_data
        overlayfs: embed root into overlay_readdir_data
        overlayfs: make ovl_cache_entry->name an array instead of pointer
        overlayfs: don't hold ->i_mutex over opening the real directory
        fix inode leaks on d_splice_alias() failure exits
        fs: limit filesystem stacking depth
        overlay: overlay filesystem documentation
        overlayfs: implement show_options
        overlayfs: add statfs support
        overlay filesystem
        shmem: support RENAME_WHITEOUT
        ext4: support RENAME_WHITEOUT
        vfs: add RENAME_WHITEOUT
        vfs: add whiteout support
        vfs: export check_sticky()
        vfs: introduce clone_private_mount()
        vfs: export __inode_permission() to modules
        vfs: export do_splice_direct() to modules
        vfs: add i_op->dentry_open()
      d1e14f1d
  3. Oct 26, 2014
    • Olof Johansson's avatar
      Merge tag 'imx-fixes-3.18' of... · efc176a8
      Olof Johansson authored
      
      Merge tag 'imx-fixes-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes
      
      Merge "ARM: imx: fixes for 3.18" from Shawn Guo:
      
      The i.MX fixes for 3.18:
       - Revert one patch which increases I2C bus frequency on imx28-evk
       - Fix a typo on imx6q EIM clock name
      
      * tag 'imx-fixes-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
        ARM: dts: imx28-evk: Let i2c0 run at 100kHz
        ARM: i.MX6: Fix "emi" clock name typo
      
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      efc176a8
  4. Oct 25, 2014