Skip to content
  1. Oct 09, 2013
    • Mel Gorman's avatar
      sched/numa: Update NUMA hinting faults once per scan · 745d6147
      Mel Gorman authored
      
      
      NUMA hinting fault counts and placement decisions are both recorded in the
      same array which distorts the samples in an unpredictable fashion. The values
      linearly accumulate during the scan and then decay creating a sawtooth-like
      pattern in the per-node counts. It also means that placement decisions are
      time sensitive. At best it means that it is very difficult to state that
      the buffer holds a decaying average of past faulting behaviour. At worst,
      it can confuse the load balancer if it sees one node with an artifically high
      count due to very recent faulting activity and may create a bouncing effect.
      
      This patch adds a second array. numa_faults stores the historical data
      which is used for placement decisions. numa_faults_buffer holds the
      fault activity during the current scan window. When the scan completes,
      numa_faults decays and the values from numa_faults_buffer are copied
      across.
      
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-22-git-send-email-mgorman@suse.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      745d6147
    • Mel Gorman's avatar
      sched/numa: Select a preferred node with the most numa hinting faults · 688b7585
      Mel Gorman authored
      
      
      This patch selects a preferred node for a task to run on based on the
      NUMA hinting faults. This information is later used to migrate tasks
      towards the node during balancing.
      
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-21-git-send-email-mgorman@suse.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      688b7585
    • Mel Gorman's avatar
      sched/numa: Track NUMA hinting faults on per-node basis · f809ca9a
      Mel Gorman authored
      
      
      This patch tracks what nodes numa hinting faults were incurred on.
      This information is later used to schedule a task on the node storing
      the pages most frequently faulted by the task.
      
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-20-git-send-email-mgorman@suse.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f809ca9a
    • Mel Gorman's avatar
      sched/numa: Slow scan rate if no NUMA hinting faults are being recorded · f307cd1a
      Mel Gorman authored
      
      
      NUMA PTE scanning slows if a NUMA hinting fault was trapped and no page
      was migrated. For long-lived but idle processes there may be no faults
      but the scan rate will be high and just waste CPU. This patch will slow
      the scan rate for processes that are not trapping faults.
      
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-19-git-send-email-mgorman@suse.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f307cd1a
    • Mel Gorman's avatar
      sched/numa: Set the scan rate proportional to the memory usage of the task being scanned · 598f0ec0
      Mel Gorman authored
      
      
      The NUMA PTE scan rate is controlled with a combination of the
      numa_balancing_scan_period_min, numa_balancing_scan_period_max and
      numa_balancing_scan_size. This scan rate is independent of the size
      of the task and as an aside it is further complicated by the fact that
      numa_balancing_scan_size controls how many pages are marked pte_numa and
      not how much virtual memory is scanned.
      
      In combination, it is almost impossible to meaningfully tune the min and
      max scan periods and reasoning about performance is complex when the time
      to complete a full scan is is partially a function of the tasks memory
      size. This patch alters the semantic of the min and max tunables to be
      about tuning the length time it takes to complete a scan of a tasks occupied
      virtual address space. Conceptually this is a lot easier to understand. There
      is a "sanity" check to ensure the scan rate is never extremely fast based on
      the amount of virtual memory that should be scanned in a second. The default
      of 2.5G seems arbitrary but it is to have the maximum scan rate after the
      patch roughly match the maximum scan rate before the patch was applied.
      
      On a similar note, numa_scan_period is in milliseconds and not
      jiffies. Properly placed pages slow the scanning rate but adding 10 jiffies
      to numa_scan_period means that the rate scanning slows depends on HZ which
      is confusing. Get rid of the jiffies_to_msec conversion and treat it as ms.
      
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-18-git-send-email-mgorman@suse.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      598f0ec0
    • Mel Gorman's avatar
      sched/numa: Initialise numa_next_scan properly · 7e8d16b6
      Mel Gorman authored
      
      
      Scan delay logic and resets are currently initialised to start scanning
      immediately instead of delaying properly. Initialise them properly at
      fork time and catch when a new mm has been allocated.
      
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-17-git-send-email-mgorman@suse.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7e8d16b6
    • Mel Gorman's avatar
      Revert "mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node" · b726b7df
      Mel Gorman authored
      PTE scanning and NUMA hinting fault handling is expensive so commit
      5bca2303
      
       ("mm: sched: numa: Delay PTE scanning until a task is scheduled
      on a new node") deferred the PTE scan until a task had been scheduled on
      another node. The problem is that in the purely shared memory case that
      this may never happen and no NUMA hinting fault information will be
      captured. We are not ruling out the possibility that something better
      can be done here but for now, this patch needs to be reverted and depend
      entirely on the scan_delay to avoid punishing short-lived processes.
      
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-16-git-send-email-mgorman@suse.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b726b7df
    • Peter Zijlstra's avatar
      sched/numa: Continue PTE scanning even if migrate rate limited · 9e645ab6
      Peter Zijlstra authored
      
      
      Avoiding marking PTEs pte_numa because a particular NUMA node is migrate rate
      limited sees like a bad idea. Even if this node can't migrate anymore other
      nodes might and we want up-to-date information to do balance decisions.
      We already rate limit the actual migrations, this should leave enough
      bandwidth to allow the non-migrating scanning. I think its important we
      keep up-to-date information if we're going to do placement based on it.
      
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1381141781-10992-15-git-send-email-mgorman@suse.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9e645ab6
    • Peter Zijlstra's avatar
      sched/numa: Mitigate chance that same task always updates PTEs · 19a78d11
      Peter Zijlstra authored
      
      
      With a trace_printk("working\n"); right after the cmpxchg in
      task_numa_work() we can see that of a 4 thread process, its always the
      same task winning the race and doing the protection change.
      
      This is a problem since the task doing the protection change has a
      penalty for taking faults -- it is busy when marking the PTEs. If its
      always the same task the ->numa_faults[] get severely skewed.
      
      Avoid this by delaying the task doing the protection change such that
      it is unlikely to win the privilege again.
      
      Before:
      
      root@interlagos:~# grep "thread 0/.*working" /debug/tracing/trace | tail -15
            thread 0/0-3232  [022] ....   212.787402: task_numa_work: working
            thread 0/0-3232  [022] ....   212.888473: task_numa_work: working
            thread 0/0-3232  [022] ....   212.989538: task_numa_work: working
            thread 0/0-3232  [022] ....   213.090602: task_numa_work: working
            thread 0/0-3232  [022] ....   213.191667: task_numa_work: working
            thread 0/0-3232  [022] ....   213.292734: task_numa_work: working
            thread 0/0-3232  [022] ....   213.393804: task_numa_work: working
            thread 0/0-3232  [022] ....   213.494869: task_numa_work: working
            thread 0/0-3232  [022] ....   213.596937: task_numa_work: working
            thread 0/0-3232  [022] ....   213.699000: task_numa_work: working
            thread 0/0-3232  [022] ....   213.801067: task_numa_work: working
            thread 0/0-3232  [022] ....   213.903155: task_numa_work: working
            thread 0/0-3232  [022] ....   214.005201: task_numa_work: working
            thread 0/0-3232  [022] ....   214.107266: task_numa_work: working
            thread 0/0-3232  [022] ....   214.209342: task_numa_work: working
      
      After:
      
      root@interlagos:~# grep "thread 0/.*working" /debug/tracing/trace | tail -15
            thread 0/0-3253  [005] ....   136.865051: task_numa_work: working
            thread 0/2-3255  [026] ....   136.965134: task_numa_work: working
            thread 0/3-3256  [024] ....   137.065217: task_numa_work: working
            thread 0/3-3256  [024] ....   137.165302: task_numa_work: working
            thread 0/3-3256  [024] ....   137.265382: task_numa_work: working
            thread 0/0-3253  [004] ....   137.366465: task_numa_work: working
            thread 0/2-3255  [026] ....   137.466549: task_numa_work: working
            thread 0/0-3253  [004] ....   137.566629: task_numa_work: working
            thread 0/0-3253  [004] ....   137.666711: task_numa_work: working
            thread 0/1-3254  [028] ....   137.766799: task_numa_work: working
            thread 0/0-3253  [004] ....   137.866876: task_numa_work: working
            thread 0/2-3255  [026] ....   137.966960: task_numa_work: working
            thread 0/1-3254  [028] ....   138.067041: task_numa_work: working
            thread 0/2-3255  [026] ....   138.167123: task_numa_work: working
            thread 0/3-3256  [024] ....   138.267207: task_numa_work: working
      
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1381141781-10992-14-git-send-email-mgorman@suse.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      19a78d11
    • Mel Gorman's avatar
      mm: numa: Do not migrate or account for hinting faults on the zero page · a1a46184
      Mel Gorman authored
      
      
      The zero page is not replicated between nodes and is often shared between
      processes. The data is read-only and likely to be cached in local CPUs
      if heavily accessed meaning that the remote memory access cost is less
      of a concern. This patch prevents trapping faults on the zero pages. For
      tasks using the zero page this will reduce the number of PTE updates,
      TLB flushes and hinting faults.
      
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      [ Correct use of is_huge_zero_page]
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-13-git-send-email-mgorman@suse.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a1a46184
    • Mel Gorman's avatar
      mm: Only flush TLBs if a transhuge PMD is modified for NUMA pte scanning · f123d74a
      Mel Gorman authored
      
      
      NUMA PTE scanning is expensive both in terms of the scanning itself and
      the TLB flush if there are any updates. The TLB flush is avoided if no
      PTEs are updated but there is a bug where transhuge PMDs are considered
      to be updated even if they were already pmd_numa. This patch addresses
      the problem and TLB flushes should be reduced.
      
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-12-git-send-email-mgorman@suse.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f123d74a
    • Mel Gorman's avatar
      mm: Do not flush TLB during protection change if !pte_present && !migration_entry · e920e14c
      Mel Gorman authored
      
      
      NUMA PTE scanning is expensive both in terms of the scanning itself and
      the TLB flush if there are any updates. Currently non-present PTEs are
      accounted for as an update and incurring a TLB flush where it is only
      necessary for anonymous migration entries. This patch addresses the
      problem and should reduce TLB flushes.
      
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-11-git-send-email-mgorman@suse.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e920e14c
    • Mel Gorman's avatar
      mm: Account for a THP NUMA hinting update as one PTE update · afcae265
      Mel Gorman authored
      
      
      A THP PMD update is accounted for as 512 pages updated in vmstat.  This is
      large difference when estimating the cost of automatic NUMA balancing and
      can be misleading when comparing results that had collapsed versus split
      THP. This patch addresses the accounting issue.
      
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-10-git-send-email-mgorman@suse.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      afcae265
    • Mel Gorman's avatar
      mm: Close races between THP migration and PMD numa clearing · a54a407f
      Mel Gorman authored
      
      
      THP migration uses the page lock to guard against parallel allocations
      but there are cases like this still open
      
        Task A					Task B
        ---------------------				---------------------
        do_huge_pmd_numa_page				do_huge_pmd_numa_page
        lock_page
        mpol_misplaced == -1
        unlock_page
        goto clear_pmdnuma
      						lock_page
      						mpol_misplaced == 2
      						migrate_misplaced_transhuge
        pmd = pmd_mknonnuma
        set_pmd_at
      
      During hours of testing, one crashed with weird errors and while I have
      no direct evidence, I suspect something like the race above happened.
      This patch extends the page lock to being held until the pmd_numa is
      cleared to prevent migration starting in parallel while the pmd_numa is
      being cleared. It also flushes the old pmd entry and orders pagetable
      insertion before rmap insertion.
      
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-9-git-send-email-mgorman@suse.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a54a407f
    • Mel Gorman's avatar
      mm: numa: Sanitize task_numa_fault() callsites · 8191acbd
      Mel Gorman authored
      
      
      There are three callers of task_numa_fault():
      
       - do_huge_pmd_numa_page():
           Accounts against the current node, not the node where the
           page resides, unless we migrated, in which case it accounts
           against the node we migrated to.
      
       - do_numa_page():
           Accounts against the current node, not the node where the
           page resides, unless we migrated, in which case it accounts
           against the node we migrated to.
      
       - do_pmd_numa_page():
           Accounts not at all when the page isn't migrated, otherwise
           accounts against the node we migrated towards.
      
      This seems wrong to me; all three sites should have the same
      sementaics, furthermore we should accounts against where the page
      really is, we already know where the task is.
      
      So modify all three sites to always account; we did after all receive
      the fault; and always account to where the page is after migration,
      regardless of success.
      
      They all still differ on when they clear the PTE/PMD; ideally that
      would get sorted too.
      
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-8-git-send-email-mgorman@suse.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8191acbd
    • Mel Gorman's avatar
      mm: Prevent parallel splits during THP migration · b8916634
      Mel Gorman authored
      
      
      THP migrations are serialised by the page lock but on its own that does
      not prevent THP splits. If the page is split during THP migration then
      the pmd_same checks will prevent page table corruption but the unlock page
      and other fix-ups potentially will cause corruption. This patch takes the
      anon_vma lock to prevent parallel splits during migration.
      
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-7-git-send-email-mgorman@suse.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b8916634
    • Mel Gorman's avatar
      mm: Wait for THP migrations to complete during NUMA hinting faults · ff9042b1
      Mel Gorman authored
      
      
      The locking for migrating THP is unusual. While normal page migration
      prevents parallel accesses using a migration PTE, THP migration relies on
      a combination of the page_table_lock, the page lock and the existance of
      the NUMA hinting PTE to guarantee safety but there is a bug in the scheme.
      
      If a THP page is currently being migrated and another thread traps a
      fault on the same page it checks if the page is misplaced. If it is not,
      then pmd_numa is cleared. The problem is that it checks if the page is
      misplaced without holding the page lock meaning that the racing thread
      can be migrating the THP when the second thread clears the NUMA bit
      and faults a stale page.
      
      This patch checks if the page is potentially being migrated and stalls
      using the lock_page if it is potentially being migrated before checking
      if the page is misplaced or not.
      
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-6-git-send-email-mgorman@suse.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ff9042b1
    • Mel Gorman's avatar
      mm: numa: Do not account for a hinting fault if we raced · 0c3a775e
      Mel Gorman authored
      
      
      If another task handled a hinting fault in parallel then do not double
      account for it.
      
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-5-git-send-email-mgorman@suse.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0c3a775e
    • Peter Zijlstra's avatar
      sched/numa: Fix comments · c69307d5
      Peter Zijlstra authored
      
      
      Fix a 80 column violation and a PTE vs PMD reference.
      
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1381141781-10992-4-git-send-email-mgorman@suse.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c69307d5
    • Mel Gorman's avatar
      mm: numa: Document automatic NUMA balancing sysctls · 10fc05d0
      Mel Gorman authored
      
      
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1381141781-10992-3-git-send-email-mgorman@suse.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      10fc05d0
    • Ingo Molnar's avatar
      Merge tag 'v3.12-rc4' into sched/core · 37bf0637
      Ingo Molnar authored
      
      
      Merge Linux v3.12-rc4 to fix a conflict and also to refresh the tree
      before applying more scheduler patches.
      
      Conflicts:
      	arch/avr32/include/asm/Kbuild
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      37bf0637
  2. Oct 07, 2013
    • Linus Torvalds's avatar
      Linux 3.12-rc4 · d0e639c9
      Linus Torvalds authored
      d0e639c9
    • Eric W. Biederman's avatar
      net: Update the sysctl permissions handler to test effective uid/gid · 2433c8f0
      Eric W. Biederman authored
      
      
      Modify the code to use current_euid(), and in_egroup_p, as in done
      in fs/proc/proc_sysctl.c:test_perm()
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reported-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2433c8f0
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · 13caa8ed
      Linus Torvalds authored
      Pull SCSI target fixes from Nicholas Bellinger:
       "Here are the outstanding target fixes queued up for v3.12-rc4 code.
      
        The highlights include:
      
         - Make vhost/scsi tag percpu_ida_alloc() use GFP_ATOMIC
         - Allow sess_cmd_map allocation failure fallback to use vzalloc
         - Fix COMPARE_AND_WRITE se_cmd->data_length bug with FILEIO backends
         - Fixes for COMPARE_AND_WRITE callback recursive failure OOPs + non
           zero scsi_status bug
         - Make iscsi-target do acknowledgement tag release from RX context
         - Setup iscsi-target with extra (cmdsn_depth / 2) percpu_ida tags
      
        Also included is a iscsi-target patch CC'ed for v3.10+ that avoids
        legacy wait_for_task=true release during fast-past StatSN
        acknowledgement, and two other SRP target related patches that address
        long-standing issues that are CC'ed for v3.3+.
      
        Extra thanks to Thomas Glanzmann for his testing feedback with
        COMPARE_AND_WRITE + EXTENDED_COPY VAAI logic"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
        iscsi-target; Allow an extra tag_num / 2 number of percpu_ida tags
        iscsi-target: Perform release of acknowledged tags from RX context
        iscsi-target: Only perform wait_for_tasks when performing shutdown
        target: Fail on non zero scsi_status in compare_and_write_callback
        target: Fix recursive COMPARE_AND_WRITE callback failure
        target: Reset data_length for COMPARE_AND_WRITE to NoLB * block_size
        ib_srpt: always set response for task management
        target: Fall back to vzalloc upon ->sess_cmd_map kzalloc failure
        vhost/scsi: Use GFP_ATOMIC with percpu_ida_alloc for obtaining tag
        ib_srpt: Destroy cm_id before destroying QP.
        target: Fix xop->dbl assignment in target_xcopy_parse_segdesc_02
      13caa8ed
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma · 831ae3c1
      Linus Torvalds authored
      Pull slave-dmaengine fixes from Vinod Koul:
       "Here is the slave dmanegine fixes.  We have the fix for deadlock issue
        on imx-dma by Michael and Josh's edma config fix along with author
        change"
      
      * 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: imx-dma: fix callback path in tasklet
        dmaengine: imx-dma: fix lockdep issue between irqhandler and tasklet
        dmaengine: imx-dma: fix slow path issue in prep_dma_cyclic
        dma/Kconfig: Make TI_EDMA select TI_PRIV_EDMA
        edma: Update author email address
      831ae3c1
  3. Oct 06, 2013
    • Shawn Bohrer's avatar
      sched/rt: Remove redundant nr_cpus_allowed test · 6bfa687c
      Shawn Bohrer authored
      In 76854c7e
      
       ("sched: Use
      rt.nr_cpus_allowed to recover select_task_rq() cycles") an
      optimization was added to select_task_rq_rt() that immediately
      returns when p->nr_cpus_allowed == 1 at the beginning of the
      function.
      
      This makes the latter p->nr_cpus_allowed > 1 check redundant,
      which can now be removed.
      
      Signed-off-by: default avatarShawn Bohrer <sbohrer@rgmadvisors.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Mike Galbraith <mgalbraith@suse.de>
      Cc: tomk@rgmadvisors.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1380914693-24634-1-git-send-email-shawn.bohrer@gmail.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6bfa687c
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · e62063d6
      Linus Torvalds authored
      Pull btrfs fixes from Chris Mason:
       "This is a small collection of fixes, including a regression fix from
        Liu Bo that solves rare crashes with compression on.
      
        I've merged my for-linus up to 3.12-rc3 because the top commit is only
        meant for 3.12.  The rest of the fixes are also available in my master
        branch on top of my last 3.11 based pull"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        btrfs: Fix crash due to not allocating integrity data for a bioset
        Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing
        Btrfs: eliminate races in worker stopping code
        Btrfs: fix crash of compressed writes
        Btrfs: fix transid verify errors when recovering log tree
      e62063d6
    • Linus Torvalds's avatar
      Merge tag 'gpio-v3.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 85f6d2db
      Linus Torvalds authored
      Pull GPIO fixes from Linus Walleij:
       "Two patches for the OMAP driver, dealing with setting up IRQs properly
        on the device tree boot path"
      
      * tag 'gpio-v3.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpio/omap: auto-setup a GPIO when used as an IRQ
        gpio/omap: maintain GPIO and IRQ usage separately
      85f6d2db
    • Linus Torvalds's avatar
      Merge tag 'usb-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 4ed54764
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are none fixes for various USB driver problems.  The majority are
        gadget/musb fixes, but there are some new device ids in here as well"
      
      * tag 'usb-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: chipidea: add Intel Clovertrail pci id
        usb: gadget: s3c-hsotg: fix can_write limit for non-periodic endpoints
        usb: gadget: f_fs: fix error handling
        usb: musb: dsps: do not bind to "musb-hdrc"
        USB: serial: option: Ignore card reader interface on Huawei E1750
        usb: musb: gadget: fix otg active status flag
        usb: phy: gpio-vbus: fix deferred probe from __init
        usb: gadget: pxa25x_udc: fix deferred probe from __init
        usb: musb: fix otg default state
      4ed54764
    • Linus Torvalds's avatar
      Merge tag 'tty-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · e3757a1f
      Linus Torvalds authored
      Pull tty fixes from Greg KH:
       "Here are two tty driver fixes for 3.12-rc4.
      
        One fixes the reported regression in the n_tty code that a number of
        people found recently, and the other one fixes an issue with xen
        consoles that broke in 3.10"
      
      * tag 'tty-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        xen/hvc: allow xenboot console to be used again
        tty: Fix pty master read() after slave closes
      e3757a1f
    • Linus Torvalds's avatar
      Merge tag 'staging-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 20fa7867
      Linus Torvalds authored
      Pull staging fixes from Greg KH:
       "Here are 4 tiny staging and iio driver fixes for 3.12-rc4.  Nothing
        major, just some small fixes for reported issues"
      
      * tag 'staging-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        staging: comedi: ni_65xx: (bug fix) confine insn_bits to one subdevice
        iio:magnetometer: Bugfix magnetometer default output registers
        iio: Remove debugfs entries in iio_device_unregister()
        iio: amplifiers: ad8366: Remove regulator_put
      20fa7867
  4. Oct 05, 2013
    • Darrick J. Wong's avatar
      btrfs: Fix crash due to not allocating integrity data for a bioset · b208c2f7
      Darrick J. Wong authored
      
      
      When btrfs creates a bioset, we must also allocate the integrity data pool.
      Otherwise btrfs will crash when it tries to submit a bio to a checksumming
      disk:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
       IP: [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
       PGD 2305e4067 PUD 23063d067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP
       Modules linked in: btrfs scsi_debug xfs ext4 jbd2 ext3 jbd mbcache
      sch_fq_codel eeprom lpc_ich mfd_core nfsd exportfs auth_rpcgss af_packet
      raid6_pq xor zlib_deflate libcrc32c [last unloaded: scsi_debug]
       CPU: 1 PID: 4486 Comm: mount Not tainted 3.12.0-rc1-mcsum #2
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       task: ffff8802451c9720 ti: ffff880230698000 task.ti: ffff880230698000
       RIP: 0010:[<ffffffff8111e28a>]  [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
       RSP: 0018:ffff880230699688  EFLAGS: 00010286
       RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000005f8445
       RDX: 0000000000000001 RSI: 0000000000000010 RDI: 0000000000000000
       RBP: ffff8802306996f8 R08: 0000000000011200 R09: 0000000000000008
       R10: 0000000000000020 R11: ffff88009d6e8000 R12: 0000000000011210
       R13: 0000000000000030 R14: ffff8802306996b8 R15: ffff8802451c9720
       FS:  00007f25b8a16800(0000) GS:ffff88024fc80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 0000000000000018 CR3: 0000000230576000 CR4: 00000000000007e0
       Stack:
        ffff8802451c9720 0000000000000002 ffffffff81a97100 0000000000281250
        ffffffff81a96480 ffff88024fc99150 ffff880228d18200 0000000000000000
        0000000000000000 0000000000000040 ffff880230e8c2e8 ffff8802459dc900
       Call Trace:
        [<ffffffff811b2208>] bio_integrity_alloc+0x48/0x1b0
        [<ffffffff811b26fc>] bio_integrity_prep+0xac/0x360
        [<ffffffff8111e298>] ? mempool_alloc+0x58/0x150
        [<ffffffffa03e8041>] ? alloc_extent_state+0x31/0x110 [btrfs]
        [<ffffffff81241579>] blk_queue_bio+0x1c9/0x460
        [<ffffffff8123e58a>] generic_make_request+0xca/0x100
        [<ffffffff8123e639>] submit_bio+0x79/0x160
        [<ffffffffa03f865e>] btrfs_map_bio+0x48e/0x5b0 [btrfs]
        [<ffffffffa03c821a>] btree_submit_bio_hook+0xda/0x110 [btrfs]
        [<ffffffffa03e7eba>] submit_one_bio+0x6a/0xa0 [btrfs]
        [<ffffffffa03ef450>] read_extent_buffer_pages+0x250/0x310 [btrfs]
        [<ffffffff8125eef6>] ? __radix_tree_preload+0x66/0xf0
        [<ffffffff8125f1c5>] ? radix_tree_insert+0x95/0x260
        [<ffffffffa03c66f6>] btree_read_extent_buffer_pages.constprop.128+0xb6/0x120
      [btrfs]
        [<ffffffffa03c8c1a>] read_tree_block+0x3a/0x60 [btrfs]
        [<ffffffffa03caefd>] open_ctree+0x139d/0x2030 [btrfs]
        [<ffffffffa03a282a>] btrfs_mount+0x53a/0x7d0 [btrfs]
        [<ffffffff8113ab0b>] ? pcpu_alloc+0x8eb/0x9f0
        [<ffffffff81167305>] ? __kmalloc_track_caller+0x35/0x1e0
        [<ffffffff81176ba0>] mount_fs+0x20/0xd0
        [<ffffffff81191096>] vfs_kern_mount+0x76/0x120
        [<ffffffff81193320>] do_mount+0x200/0xa40
        [<ffffffff81135cdb>] ? strndup_user+0x5b/0x80
        [<ffffffff81193bf0>] SyS_mount+0x90/0xe0
        [<ffffffff8156d31d>] system_call_fastpath+0x1a/0x1f
       Code: 4c 8d 75 a8 4c 89 6d e8 45 89 e0 4c 8d 6f 30 48 89 5d d8 41 83 e0 af 48
      89 fb 49 83 c6 18 4c 89 7d f8 65 4c 8b 3c 25 c0 b8 00 00 <48> 8b 73 18 44 89 c7
      44 89 45 98 ff 53 20 48 85 c0 48 89 c2 74
       RIP  [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
        RSP <ffff880230699688>
       CR2: 0000000000000018
       ---[ end trace 7a96042017ed21e2 ]---
      
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      b208c2f7
    • Chris Mason's avatar
      Merge branch 'for-linus' into for-linus-3.12 · 1329dfc8
      Chris Mason authored
      1329dfc8
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6 · a5c984cc
      Linus Torvalds authored
      Pull CIFS fixes from Steve French:
       "Small set of cifs fixes.  Most important is Jeff's fix that works
        around disconnection problems which can be caused by simultaneous use
        of user space tools (starting a long running smbclient backup then
        doing a cifs kernel mount) or multiple cifs mounts through a NAT, and
        Jim's fix to deal with reexport of cifs share.
      
        I expect to send two more cifs fixes next week (being tested now) -
        fixes to address an SMB2 unmount hang when server dies and a fix for
        cifs symlink handling of Windows "NFS" symlinks"
      
      * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
        [CIFS] update cifs.ko version
        [CIFS] Remove ext2 flags that have been moved to fs.h
        [CIFS] Provide sane values for nlink
        cifs: stop trying to use virtual circuits
        CIFS: FS-Cache: Uncache unread pages in cifs_readpages() before freeing them
      a5c984cc
    • Linus Torvalds's avatar
      Merge tag 'pci-v3.12-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 95167aad
      Linus Torvalds authored
      Pull PCI fix from Bjorn Helgaas:
       "We merged what was intended to be an MMCONFIG cleanup, but in fact,
        for systems without _CBA (which is almost everything), it broke
        extended config space for domain 0 and it broke all config space for
        other domains.
      
        This reverts the change"
      
      * tag 'pci-v3.12-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        Revert "x86/PCI: MMCONFIG: Check earlier for MMCONFIG region at address zero"
      95167aad
    • Bjorn Helgaas's avatar
      Revert "x86/PCI: MMCONFIG: Check earlier for MMCONFIG region at address zero" · 67d470e0
      Bjorn Helgaas authored
      This reverts commit 07f9b61c.
      
      07f9b61c
      
       was intended to be a cleanup that didn't change anything, but in
      fact, for systems without _CBA (which is almost everything), it broke
      extended config space for domain 0 and all config space for other domains.
      
      Reference: http://lkml.kernel.org/r/20131004011806.GE20450@dangermouse.emea.sgi.com
      Reported-by: default avatarHedi Berriche <hedi@sgi.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      67d470e0
    • Linus Torvalds's avatar
      Merge tag 'pm+acpi-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 7dee8dff
      Linus Torvalds authored
      Pull ACPI and power management fixes from Rafael Wysocki:
      
       - The resume part of user space driven hibernation (s2disk) is now
         broken after the change that moved the creation of memory bitmaps to
         after the freezing of tasks, because I forgot that the resume utility
         loaded the image before freezing tasks and needed the bitmaps for
         that.  The fix adds special handling for that case.
      
       - One of recent commits changed the export of acpi_bus_get_device() to
         EXPORT_SYMBOL_GPL(), which was technically correct but broke existing
         binary modules using that function including one in particularly
         widespread use.  Change it back to EXPORT_SYMBOL().
      
       - The intel_pstate driver sometimes fails to disable turbo if its
         no_turbo sysfs attribute is set.  Fix from Srinivas Pandruvada.
      
       - One of recent cpufreq fixes forgot to update a check in cpufreq-cpu0
         which still (incorrectly) treats non-NULL as non-error.  Fix from
         Philipp Zabel.
      
       - The SPEAr cpufreq driver uses a wrong variable type in one place
         preventing it from catching errors returned by one of the functions
         called by it.  Fix from Sachin Kamat.
      
      * tag 'pm+acpi-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: Use EXPORT_SYMBOL() for acpi_bus_get_device()
        intel_pstate: fix no_turbo
        cpufreq: cpufreq-cpu0: NULL is a valid regulator, part 2
        cpufreq: SPEAr: Fix incorrect variable type
        PM / hibernate: Fix user space driven resume regression
      7dee8dff
    • Linus Torvalds's avatar
      Merge tag 'xfs-for-linus-v3.12-rc4' of git://oss.sgi.com/xfs/xfs · 3dbecf0a
      Linus Torvalds authored
      Pull xfs bugfixes from Ben Myers:
       "There are lockdep annotations for project quotas, a fix for dirent
        dtype support on v4 filesystems, a fix for a memory leak in recovery,
        and a fix for the build error that resulted from it.  D'oh"
      
      * tag 'xfs-for-linus-v3.12-rc4' of git://oss.sgi.com/xfs/xfs:
        xfs: Use kmem_free() instead of free()
        xfs: fix memory leak in xlog_recover_add_to_trans
        xfs: dirent dtype presence is dependent on directory magic numbers
        xfs: lockdep needs to know about 3 dquot-deep nesting
      3dbecf0a
    • Linus Torvalds's avatar
      selinux: remove 'flags' parameter from avc_audit() · ab354062
      Linus Torvalds authored
      
      
      Now avc_audit() has no more users with that parameter. Remove it.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ab354062
    • Linus Torvalds's avatar
      selinux: avc_has_perm_flags has no more users · cb4fbe57
      Linus Torvalds authored
      
      
      .. so get rid of it.  The only indirect users were all the
      avc_has_perm() callers which just expanded to have a zero flags
      argument.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb4fbe57