Skip to content
  1. Oct 27, 2010
    • Michael Rubin's avatar
      writeback: add nr_dirtied and nr_written to /proc/vmstat · ea941f0e
      Michael Rubin authored
      
      
      To help developers and applications gain visibility into writeback
      behaviour adding two entries to vm_stat_items and /proc/vmstat.  This will
      allow us to track the "written" and "dirtied" counts.
      
         # grep nr_dirtied /proc/vmstat
         nr_dirtied 3747
         # grep nr_written /proc/vmstat
         nr_written 3618
      
      Signed-off-by: default avatarMichael Rubin <mrubin@google.com>
      Reviewed-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ea941f0e
    • Michael Rubin's avatar
      mm: add account_page_writeback() · f629d1c9
      Michael Rubin authored
      
      
      To help developers and applications gain visibility into writeback
      behaviour this patch adds two counters to /proc/vmstat.
      
        # grep nr_dirtied /proc/vmstat
        nr_dirtied 3747
        # grep nr_written /proc/vmstat
        nr_written 3618
      
      These entries allow user apps to understand writeback behaviour over time
      and learn how it is impacting their performance.  Currently there is no
      way to inspect dirty and writeback speed over time.  It's not possible for
      nr_dirty/nr_writeback.
      
      These entries are necessary to give visibility into writeback behaviour.
      We have /proc/diskstats which lets us understand the io in the block
      layer.  We have blktrace for more in depth understanding.  We have
      e2fsprogs and debugsfs to give insight into the file systems behaviour,
      but we don't offer our users the ability understand what writeback is
      doing.  There is no way to know how active it is over the whole system, if
      it's falling behind or to quantify it's efforts.  With these values
      exported users can easily see how much data applications are sending
      through writeback and also at what rates writeback is processing this
      data.  Comparing the rates of change between the two allow developers to
      see when writeback is not able to keep up with incoming traffic and the
      rate of dirty memory being sent to the IO back end.  This allows folks to
      understand their io workloads and track kernel issues.  Non kernel
      engineers at Google often use these counters to solve puzzling performance
      problems.
      
      Patch #4 adds a pernode vmstat file with nr_dirtied and nr_written
      
      Patch #5 add writeback thresholds to /proc/vmstat
      
      Currently these values are in debugfs. But they should be promoted to
      /proc since they are useful for developers who are writing databases
      and file servers and are not debugging the kernel.
      
      The output is as below:
      
       # grep threshold /proc/vmstat
       nr_pages_dirty_threshold 409111
       nr_pages_dirty_background_threshold 818223
      
      This patch:
      
      This allows code outside of the mm core to safely manipulate page
      writeback state and not worry about the other accounting.  Not using these
      routines means that some code will lose track of the accounting and we get
      bugs.
      
      Modify nilfs2 to use interface.
      
      Signed-off-by: default avatarMichael Rubin <mrubin@google.com>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Jiro SEKIBA <jir@unicus.jp>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f629d1c9
    • Vasiliy Kulikov's avatar
      mm/mempolicy.c: check return code of check_range · 0def08e3
      Vasiliy Kulikov authored
      
      
      Function check_range may return ERR_PTR(...). Check for it.
      
      Signed-off-by: default avatarVasiliy Kulikov <segooon@gmail.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarChristoph Lameter <cl@linux.com>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0def08e3
    • Minchan Kim's avatar
      vmscan: prevent background aging of anon page in no swap system · 74e3f3c3
      Minchan Kim authored
      Ying Han reported that backing aging of anon pages in no swap system
      causes unnecessary TLB flush.
      
      When I sent a patch(69c85481
      
      ), I wanted this patch but Rik pointed out
      and allowed aging of anon pages to give a chance to promote from inactive
      to active LRU.
      
      It has a two problem.
      
      1) non-swap system
      
      Never make sense to age anon pages.
      
      2) swap configured but still doesn't swapon
      
      It doesn't make sense to age anon pages until swap-on time.  But it's
      arguable.  If we have aged anon pages by swapon, VM have moved anon pages
      from active to inactive.  And in the time swapon by admin, the VM can't
      reclaim hot pages so we can protect hot pages swapout.
      
      But let's think about it.  When does swap-on happen?  It depends on admin.
       we can't expect it.  Nonetheless, we have done aging of anon pages to
      protect hot pages swapout.  It means we lost run time overhead when below
      high watermark but gain hot page swap-[in/out] overhead when VM decide
      swapout.  Is it true?  Let's think more detail.  We don't promote anon
      pages in case of non-swap system.  So even though VM does aging of anon
      pages, the pages would be in inactive LRU for a long time.  It means many
      of pages in there would mark access bit again.  So access bit hot/code
      separation would be pointless.
      
      This patch prevents unnecessary anon pages demotion in not-yet-swapon and
      non-configured swap system.  Even, in non-configuared swap system
      inactive_anon_is_low can be compiled out.
      
      It could make side effect that hot anon pages could swap out when admin
      does swap on.  But I think sooner or later it would be steady state.  So
      it's not a big problem.
      
      We could lose someting but gain more thing(TLB flush and unnecessary
      function call to demote anon pages).
      
      Signed-off-by: default avatarYing Han <yinghan@google.com>
      Signed-off-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      74e3f3c3
    • KAMEZAWA Hiroyuki's avatar
      memory hotplug: unify is_removable and offline detection code · 49ac8255
      KAMEZAWA Hiroyuki authored
      
      
      Now, sysfs interface of memory hotplug shows whether the section is
      removable or not.  But it checks only migrateype of pages and doesn't
      check details of cluster of pages.
      
      Next, memory hotplug's set_migratetype_isolate() has the same kind of
      check, too.
      
      This patch adds the function __count_unmovable_pages() and makes above 2
      checks to use the same logic.  Then, is_removable and hotremove code uses
      the same logic.  No changes in the hotremove logic itself.
      
      TODO: need to find a way to check RECLAMABLE. But, considering bit,
            calling shrink_slab() against a range before starting memory hotremove
            sounds better. If so, this patch's logic doesn't need to be changed.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reported-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      49ac8255
    • KAMEZAWA Hiroyuki's avatar
      memory hotplug: fix notifier's return value check · 4b20477f
      KAMEZAWA Hiroyuki authored
      
      
      Even if notifier cannot find any pages, it doesn't mean no pages are
      available...And, if there are no notifiers registered, this condition will
      be always true and memory hotplug will show -EBUSY.
      
      This is a bug but not critical.
      
      In most case, a pageblock which will be offlined is MIGRATE_MOVABLE This
      "notifier" is called only when the pageblock is _not_ MIGRATE_MOVABLE.
      But if not MIGRATE_MOVABLE, it's common case that memory hotplug will
      fail.  So, no one notice this bug.
      
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4b20477f
    • Minchan Kim's avatar
      mm: compaction: fix COMPACTPAGEFAILED counting · cf608ac1
      Minchan Kim authored
      Presently update_nr_listpages() doesn't have a role.  That's because lists
      passed is always empty just after calling migrate_pages.  The
      migrate_pages cleans up page list which have failed to migrate before
      returning by aaa994b3
      
      .
      
       [PATCH] page migration: handle freeing of pages in migrate_pages()
      
       Do not leave pages on the lists passed to migrate_pages().  Seems that we will
       not need any postprocessing of pages.  This will simplify the handling of
       pages by the callers of migrate_pages().
      
      At that time, we thought we don't need any postprocessing of pages.  But
      the situation is changed.  The compaction need to know the number of
      failed to migrate for COMPACTPAGEFAILED stat
      
      This patch makes new rule for caller of migrate_pages to call
      putback_lru_pages.  So caller need to clean up the lists so it has a
      chance to postprocess the pages.  [suggested by Christoph Lameter]
      
      Signed-off-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Reviewed-by: default avatarMel Gorman <mel@csn.ul.ie>
      Reviewed-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cf608ac1
    • Thadeu Lima de Souza Cascardo's avatar
      mm: only build per-node scan_unevictable functions when NUMA is enabled · e4455abb
      Thadeu Lima de Souza Cascardo authored
      
      
      Non-NUMA systems do never create these files anyway, since they are only
      created by driver subsystem when NUMA is configured.
      
      [akpm@linux-foundation.org: cleanup]
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e4455abb
    • zeal's avatar
      include/linux/pageblock-flags.h: fix set_pageblock_flags() macro definiton · f19e77a3
      zeal authored
      
      
      The presently-unused macro was missing one parameter.
      
      Signed-off-by: default avatarzeal <zealcook@gmail.com>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f19e77a3
    • Wu Fengguang's avatar
      writeback: remove nonblocking/encountered_congestion references · 1b430bee
      Wu Fengguang authored
      This removes more dead code that was somehow missed by commit 0d99519e
      
      
      (writeback: remove unused nonblocking and congestion checks).  There are
      no behavior change except for the removal of two entries from one of the
      ext4 tracing interface.
      
      The nonblocking checks in ->writepages are no longer used because the
      flusher now prefer to block on get_request_wait() than to skip inodes on
      IO congestion.  The latter will lead to more seeky IO.
      
      The nonblocking checks in ->writepage are no longer used because it's
      redundant with the WB_SYNC_NONE check.
      
      We no long set ->nonblocking in VM page out and page migration, because
      a) it's effectively redundant with WB_SYNC_NONE in current code
      b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
         that would skip some dirty inodes on congestion and page out others, which
         is unfair in terms of LRU age.
      
      Inspired by Christoph Hellwig. Thanks!
      
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Sage Weil <sage@newdream.net>
      Cc: Steve French <sfrench@samba.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1b430bee
    • David Rientjes's avatar
      oom: fix locking for oom_adj and oom_score_adj · d19d5476
      David Rientjes authored
      
      
      The locking order in oom_adjust_write() and oom_score_adj_write() for
      task->alloc_lock and task->sighand->siglock is reversed, and lockdep
      notices that irqs could encounter an ABBA scenario.
      
      This fixes the locking order so that we always take task_lock(task) prior
      to lock_task_sighand(task).
      
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Reported-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d19d5476
    • David Rientjes's avatar
      oom: rewrite error handling for oom_adj and oom_score_adj tunables · 723548bf
      David Rientjes authored
      
      
      It's better to use proper error handling in oom_adjust_write() and
      oom_score_adj_write() instead of duplicating the locking order on various
      exit paths.
      
      Suggested-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      723548bf
    • David Rientjes's avatar
      oom: kill all threads sharing oom killed task's mm · 1e99bad0
      David Rientjes authored
      It's necessary to kill all threads that share an oom killed task's mm if
      the goal is to lead to future memory freeing.
      
      This patch reintroduces the code removed in 8c5cd6f3
      
       (oom: oom_kill
      doesn't kill vfork parent (or child)) since it is obsoleted.
      
      It's now guaranteed that any task passed to oom_kill_task() does not share
      an mm with any thread that is unkillable.  Thus, we're safe to issue a
      SIGKILL to any thread sharing the same mm.
      
      This is especially necessary to solve an mm->mmap_sem livelock issue
      whereas an oom killed thread must acquire the lock in the exit path while
      another thread is holding it in the page allocator while trying to
      allocate memory itself (and will preempt the oom killer since a task was
      already killed).  Since tasks with pending fatal signals are now granted
      access to memory reserves, the thread holding the lock may quickly
      allocate and release the lock so that the oom killed task may exit.
      
      This mainly is for threads that are cloned with CLONE_VM but not
      CLONE_THREAD, so they are in a different thread group.  Non-NPTL threads
      exist in the wild and this change is necessary to prevent the livelock in
      such cases.  We care more about preventing the livelock than incurring the
      additional tasklist in the oom killer when a task has been killed.
      Systems that are sufficiently large to not want the tasklist scan in the
      oom killer in the first place already have the option of enabling
      /proc/sys/vm/oom_kill_allocating_task, which was designed specifically for
      that purpose.
      
      This code had existed in the oom killer for over eight years dating back
      to the 2.4 kernel.
      
      [akpm@linux-foundation.org: add nice comment]
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1e99bad0
    • David Rientjes's avatar
      oom: avoid killing a task if a thread sharing its mm cannot be killed · e18641e1
      David Rientjes authored
      
      
      The oom killer's goal is to kill a memory-hogging task so that it may
      exit, free its memory, and allow the current context to allocate the
      memory that triggered it in the first place.  Thus, killing a task is
      pointless if other threads sharing its mm cannot be killed because of its
      /proc/pid/oom_adj or /proc/pid/oom_score_adj value.
      
      This patch checks whether any other thread sharing p->mm has an
      oom_score_adj of OOM_SCORE_ADJ_MIN.  If so, the thread cannot be killed
      and oom_badness(p) returns 0, meaning it's unkillable.
      
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e18641e1
    • Ying Han's avatar
      oom: add per-mm oom disable count · 3d5992d2
      Ying Han authored
      
      
      It's pointless to kill a task if another thread sharing its mm cannot be
      killed to allow future memory freeing.  A subsequent patch will prevent
      kills in such cases, but first it's necessary to have a way to flag a task
      that shares memory with an OOM_DISABLE task that doesn't incur an
      additional tasklist scan, which would make select_bad_process() an O(n^2)
      function.
      
      This patch adds an atomic counter to struct mm_struct that follows how
      many threads attached to it have an oom_score_adj of OOM_SCORE_ADJ_MIN.
      They cannot be killed by the kernel, so their memory cannot be freed in
      oom conditions.
      
      This only requires task_lock() on the task that we're operating on, it
      does not require mm->mmap_sem since task_lock() pins the mm and the
      operation is atomic.
      
      [rientjes@google.com: changelog and sys_unshare() code]
      [rientjes@google.com: protect oom_disable_count with task_lock in fork]
      [rientjes@google.com: use old_mm for oom_disable_count in exec]
      Signed-off-by: default avatarYing Han <yinghan@google.com>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3d5992d2
    • Matt Mackall's avatar
      Documentation/filesystems/proc.txt: improve smaps field documentation · 0f4d208f
      Matt Mackall authored
      
      
      Signed-off-by: default avatarMatt Mackall <mpm@selenic.com>
      Cc: Nikanth Karthikesan <knikanth@suse.de>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0f4d208f
    • WANG Cong's avatar
      vmcore: it is not experimental any more · a4f7326d
      WANG Cong authored
      
      
      We use vmcore in our production kernel for a long time, it is pretty
      stable now.  So I don't think we need to mark it as experimental any more.
      
      Signed-off-by: default avatarWANG Cong <xiyou.wangcong@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a4f7326d
    • Richard Weinberger's avatar
      um: fix IRQ flag handling naming · dbec9213
      Richard Weinberger authored
      Commit df9ee292
      
       ("Fix IRQ flag handling naming") changed the IRQ flag
      handling naming scheme and broke UML:
      
      In file included from arch/um/include/asm/fixmap.h:5,
                       from arch/um/include/shared/um_uaccess.h:10,
                       from arch/um/include/asm/uaccess.h:41,
                       from arch/um/include/asm/thread_info.h:13,
                       from include/linux/thread_info.h:56,
                       from include/linux/preempt.h:9,
                       from include/linux/spinlock.h:50,
                       from include/linux/seqlock.h:29,
                       from include/linux/time.h:8,
                       from include/linux/stat.h:60,
                       from include/linux/module.h:10,
                       from init/main.c:13:
      arch/um/include/asm/system.h:11:1: warning: "local_save_flags" redefined
      
      This patch brings the new scheme to UML and makes it work again.
      
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dbec9213
    • Masanori ITOH's avatar
      percpu: fix list_head init bug in __percpu_counter_init() · 8474b591
      Masanori ITOH authored
      
      
      WARNING: at lib/list_debug.c:26 __list_add+0x3f/0x81()
      Hardware name: Express5800/B120a [N8400-085]
      list_add corruption. next->prev should be prev (ffffffff81a7ea00), but was dead000000200200. (next=ffff88080b872d58).
      Modules linked in: aoe ipt_MASQUERADE iptable_nat nf_nat autofs4 sunrpc bridge 8021q garp stp llc ipv6 cpufreq_ondemand acpi_cpufreq freq_table dm_round_robin dm_multipath kvm_intel kvm uinput lpfc scsi_transport_fc igb ioatdma scsi_tgt i2c_i801 i2c_core dca iTCO_wdt iTCO_vendor_support pcspkr shpchp megaraid_sas [last unloaded: aoe]
      Pid: 54, comm: events/3 Tainted: G        W  2.6.34-vanilla1 #1
      Call Trace:
      [<ffffffff8104bd77>] warn_slowpath_common+0x7c/0x94
      [<ffffffff8104bde6>] warn_slowpath_fmt+0x41/0x43
      [<ffffffff8120fd2e>] __list_add+0x3f/0x81
      [<ffffffff81212a12>] __percpu_counter_init+0x59/0x6b
      [<ffffffff810d8499>] bdi_init+0x118/0x17e
      [<ffffffff811f2c50>] blk_alloc_queue_node+0x79/0x143
      [<ffffffff811f2d2b>] blk_alloc_queue+0x11/0x13
      [<ffffffffa02a931d>] aoeblk_gdalloc+0x8e/0x1c9 [aoe]
      [<ffffffffa02aa655>] aoecmd_sleepwork+0x25/0xa8 [aoe]
      [<ffffffff8106186c>] worker_thread+0x1a9/0x237
      [<ffffffffa02aa630>] ? aoecmd_sleepwork+0x0/0xa8 [aoe]
      [<ffffffff81065827>] ? autoremove_wake_function+0x0/0x39
      [<ffffffff810616c3>] ? worker_thread+0x0/0x237
      [<ffffffff810653ad>] kthread+0x7f/0x87
      [<ffffffff8100aa24>] kernel_thread_helper+0x4/0x10
      [<ffffffff8106532e>] ? kthread+0x0/0x87
      [<ffffffff8100aa20>] ? kernel_thread_helper+0x0/0x10
      
      It's because there is no initialization code for a list_head contained in
      the struct backing_dev_info under CONFIG_HOTPLUG_CPU, and the bug comes up
      when block device drivers calling blk_alloc_queue() are used.  In case of
      me, I got them by using aoe.
      
      Signed-off-by: default avatarMasanori Itoh <itoumsn@nttdata.co.jp>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8474b591
    • Andrew Morton's avatar
      kfifo: disable __kfifo_must_check_helper() · 52c51712
      Andrew Morton authored
      
      
      This helper is wrong: it coerces signed values into unsigned ones, so code
      such as
      
      	if (kfifo_alloc(...) < 0) {
      		error
      	}
      
      will fail to detect the error.
      
      So let's disable __kfifo_must_check_helper() for 2.6.36.
      
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Stefani Seibold <stefani@seibold.net>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      52c51712
    • Richard Weinberger's avatar
      hostfs: fix UML crash: remove f_spare from hostfs · 1b627d57
      Richard Weinberger authored
      365b1818
      
       ("add f_flags to struct statfs(64)") resized f_spare within
      struct statfs which caused a UML crash.  There is no need to copy f_spare.
      
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Reported-by: default avatarToralf Förster <toralf.foerster@gmx.de>
      Tested-by: default avatarToralf Förster <toralf.foerster@gmx.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1b627d57
    • Eric Dumazet's avatar
      ipmi: proper spinlock initialization · de5e2ddf
      Eric Dumazet authored
      
      
      Unloading ipmi module can trigger following error.  (if
      CONFIG_DEBUG_SPINLOCK=y)
      
      [ 9633.779590] BUG: spinlock bad magic on CPU#1, rmmod/7170
      [ 9633.779606]  lock: f41f5414, .magic: 00000000, .owner:
      <none>/-1, .owner_cpu: 0
      [ 9633.779626] Pid: 7170, comm: rmmod Not tainted
      2.6.36-rc7-11474-gb71eb1e-dirty #328
      [ 9633.779644] Call Trace:
      [ 9633.779657]  [<c13921cc>] ? printk+0x18/0x1c
      [ 9633.779672]  [<c11a1f33>] spin_bug+0xa3/0xf0
      [ 9633.779685]  [<c11a1ffd>] do_raw_spin_lock+0x7d/0x160
      [ 9633.779702]  [<c1131537>] ? release_sysfs_dirent+0x47/0xb0
      [ 9633.779718]  [<c1131b78>] ? sysfs_addrm_finish+0xa8/0xd0
      [ 9633.779734]  [<c1394bac>] _raw_spin_lock_irqsave+0xc/0x20
      [ 9633.779752]  [<f99d93da>] cleanup_one_si+0x6a/0x200 [ipmi_si]
      [ 9633.779768]  [<c11305b2>] ? sysfs_hash_and_remove+0x72/0x80
      [ 9633.779786]  [<f99dcf26>] ipmi_pnp_remove+0xd/0xf [ipmi_si]
      [ 9633.779802]  [<c11f622b>] pnp_device_remove+0x1b/0x40
      
      Fix this by initializing spinlocks in a smi_info_alloc() helper function,
      right after memory allocation and clearing.
      
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarDavid Miller <davem@davemloft.net>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Acked-by: default avatarCorey Minyard <cminyard@mvista.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      de5e2ddf
    • Michael Hennerich's avatar
      drivers/misc/ad525x_dpot.c: fix typo in spi write16 and write24 transfer counts · 1f9fa521
      Michael Hennerich authored
      
      
      This is a bug fix.  Some SPI connected devices using 16/24 bit accesses,
      previously failed, now work.
      
      This typo slipped in after testing, during some restructuring.
      
      Signed-off-by: default avatarMichael Hennerich <michael.hennerich@analog.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Chris Verges <chrisv@cyberswitching.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1f9fa521
    • Richard Weinberger's avatar
      um: remove PAGE_SIZE alignment in linker script causing kernel segfault. · 6915e04f
      Richard Weinberger authored
      The linker script cleanup that I did in commit 5d150a97
      
       ("um: Clean up
      linker script using standard macros.") (2.6.32) accidentally introduced an
      ALIGN(PAGE_SIZE) when converting to use INIT_TEXT_SECTION; Richard
      Weinberger reported that this causes the kernel to segfault with
      CONFIG_STATIC_LINK=y.
      
      I'm not certain why this extra alignment is a problem, but it seems likely
      it is because previously
      
      __init_begin = _stext = _text = _sinittext
      
      and with the extra ALIGN(PAGE_SIZE), _sinittext becomes different from the
      rest.  So there is likely a bug here where something is assuming that
      _sinittext is the same as one of those other symbols.  But reverting the
      accidental change fixes the regression, so it seems worth committing that
      now.
      
      Signed-off-by: default avatarTim Abbott <tabbott@ksplice.com>
      Reported-by: default avatarRichard Weinberger <richard@nod.at>
      Cc: Jeff Dike <jdike@addtoit.com>
      Tested by: Antoine Martin <antoine@nagafix.co.uk>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6915e04f
    • Robin Holt's avatar
      sgi-xp: incoming XPC channel messages can come in after the channel's... · 09358972
      Robin Holt authored
      
      sgi-xp: incoming XPC channel messages can come in after the channel's partition structures have been torn down
      
      Under some workloads, some channel messages have been observed being
      delayed on the sending side past the point where the receiving side has
      been able to tear down its partition structures.
      
      This condition is already detected in xpc_handle_activate_IRQ_uv(), but
      that information is not given to xpc_handle_activate_mq_msg_uv().  As a
      result, xpc_handle_activate_mq_msg_uv() assumes the structures still exist
      and references them, causing a NULL-pointer deref.
      
      Signed-off-by: default avatarRobin Holt <holt@sgi.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09358972
    • Richard Weinberger's avatar
      um: fix global timer issue when using CONFIG_NO_HZ · 482db6df
      Richard Weinberger authored
      This fixes a issue which was introduced by fe2cc53e
      
       ("uml: track and make
      up lost ticks").
      
      timeval_to_ns() returns long long and not int.  Due to that UML's timer
      did not work properlt and caused timer freezes.
      
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Acked-by: default avatarPekka Enberg <penberg@kernel.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      482db6df
    • Mel Gorman's avatar
      mm, page-allocator: do not check the state of a non-existant buddy during free · b7f50cfa
      Mel Gorman authored
      There is a bug in commit 6dda9d55 ("page allocator: reduce fragmentation
      in buddy allocator by adding buddies that are merging to the tail of the
      free lists") that means a buddy at order MAX_ORDER is checked for merging.
       A page of this order never exists so at times, an effectively random
      piece of memory is being checked.
      
      Alan Curry has reported that this is causing memory corruption in
      userspace data on a PPC32 platform (http://lkml.org/lkml/2010/10/9/32).
      It is not clear why this is happening.  It could be a cache coherency
      problem where pages mapped in both user and kernel space are getting
      different cache lines due to the bad read from kernel space
      (http://lkml.org/lkml/2010/10/13/179
      
      ).  It could also be that there are
      some special registers being io-remapped at the end of the memmap array
      and that a read has special meaning on them.  Compiler bugs have been
      ruled out because the assembly before and after the patch looks relatively
      harmless.
      
      This patch fixes the problem by ensuring we are not reading a possibly
      invalid location of memory.  It's not clear why the read causes corruption
      but one way or the other it is a buggy read.
      
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Corrado Zoccolo <czoccolo@gmail.com>
      Reported-by: default avatarAlan Curry <pacman@kosh.dhis.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b7f50cfa
    • Andrew Morton's avatar
      types.h: move misplaced comment · a75d3776
      Andrew Morton authored
      
      
      This comment landed in the wrong place.
      
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: David Miller <davem@davemloft.net>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Jan Engelhardt <jengelh@medozas.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a75d3776
    • KAMEZAWA Hiroyuki's avatar
      mm: fix return value of scan_lru_pages in memory unplug · f8f72ad5
      KAMEZAWA Hiroyuki authored
      
      
      scan_lru_pages returns pfn. So, it's type should be "unsigned long"
      not "int".
      
      Note: I guess this has been work until now because memory hotplug tester's
            machine has not very big memory....
            physical address < 32bit << PAGE_SHIFT.
      
      Reported-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f8f72ad5
    • Linus Torvalds's avatar
      Merge git://git.infradead.org/battery-2.6 · 45352bbf
      Linus Torvalds authored
      * git://git.infradead.org/battery-2.6:
        power_supply: Makefile cleanup
        bq27x00_battery: Add missing kfree(di->bus) in bq27x00_battery_remove()
        power_supply: Introduce maximum current property
        power_supply: Add types for USB chargers
        ds2782_battery: Fix units
        power_supply: Add driver for TWL4030/TPS65950 BCI charger
        bq20z75: Add support for more power supply properties
        wm831x_power: Add missing kfree(wm831x_power) in wm831x_power_remove()
        jz4740-battery: Add missing kfree(jz_battery) in jz_battery_remove()
        ds2760_battery: Add missing kfree(di) in ds2760_battery_remove()
        olpc_battery: Fix endian neutral breakage for s16 values
        ds2760_battery: Fix W1 and W1_SLAVE_DS2760 dependency
        pcf50633-charger: Add missing sysfs_remove_group()
        power_supply: Add driver for TI BQ20Z75 gas gauge IC
        wm831x_power: Remove duplicate chg mask
        omap: rx51: Add support for USB chargers
        power_supply: Add isp1704 charger detection driver
      45352bbf
    • Linus Torvalds's avatar
      Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core · da62aa69
      Linus Torvalds authored
      * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core: (34 commits)
        i7core_edac: return -ENODEV when devices were already probed
        i7core_edac: properly terminate pci_dev_table
        i7core_edac: Avoid PCI refcount to reach zero on successive load/reload
        i7core_edac: Fix refcount error at PCI devices
        i7core_edac: it is safe to i7core_unregister_mci() when mci=NULL
        i7core_edac: Fix an oops at i7core probe
        i7core_edac: Remove unused member channels in i7core_pvt
        i7core_edac: Remove unused arg csrow from get_dimm_config
        i7core_edac: Reduce args of i7core_register_mci
        i7core_edac: Introduce i7core_unregister_mci
        i7core_edac: Use saved pointers
        i7core_edac: Check probe counter in i7core_remove
        i7core_edac: Call pci_dev_put() when alloc_i7core_dev()  failed
        i7core_edac: Fix error path of i7core_register_mci
        i7core_edac: Fix order of lines in i7core_register_mci
        i7core_edac: Always do get/put for all devices
        i7core_edac: Introduce i7core_pci_ctl_create/release
        i7core_edac: Introduce free_i7core_dev
        i7core_edac: Introduce alloc_i7core_dev
        i7core_edac: Reduce args of i7core_get_onedevice
        ...
      da62aa69
    • Linus Torvalds's avatar
      Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 · f1ebdd60
      Linus Torvalds authored
      * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (22 commits)
        Add _addr_lsb field to ia64 siginfo
        Fix migration.c compilation on s390
        HWPOISON: Remove retry loop for try_to_unmap
        HWPOISON: Turn addr_valid from bitfield into char
        HWPOISON: Disable DEBUG by default
        HWPOISON: Convert pr_debugs to pr_info
        HWPOISON: Improve comments in memory-failure.c
        x86: HWPOISON: Report correct address granuality for huge hwpoison faults
        Encode huge page size for VM_FAULT_HWPOISON errors
        Fix build error with !CONFIG_MIGRATION
        hugepage: move is_hugepage_on_freelist inside ifdef to avoid warning
        Clean up __page_set_anon_rmap
        HWPOISON, hugetlb: fix unpoison for hugepage
        HWPOISON, hugetlb: soft offlining for hugepage
        HWPOSION, hugetlb: recover from free hugepage error when !MF_COUNT_INCREASED
        hugetlb: move refcounting in hugepage allocation inside hugetlb_lock
        HWPOISON, huge...
      f1ebdd60
    • Linus Torvalds's avatar
      Merge branch 'for_linus' of git://github.com/at91linux/linux-2.6-at91 · f99d0553
      Linus Torvalds authored
      * 'for_linus' of git://github.com/at91linux/linux-2.6-at91:
        AT91: rtc: enable built-in RTC in Kconfig for at91sam9g45 family
        at91/atmel-mci: inclusion of sd/mmc driver in at91sam9g45 chip and board
        AT91: pm: make sure that r0 is 0 when dealing with cache operations
        AT91: pm: use plain cpu_do_idle() for "wait for interrupt"
        AT91: reset: extend alternate reset procedure to several chips
        AT91: reset routine cleanup, remove not needed icache flush
        AT91: trivial: align comment of at91sam9g20_reset with one more tab
        AT91: Fix AT91SAM9G20 reset as per the errata in the data sheet
        AT91: add board support for Pcontrol_G20
      f99d0553
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://gitorious.org/linux-omap-dss2/linux · 2c518959
      Linus Torvalds authored
      * 'for-linus' of git://gitorious.org/linux-omap-dss2/linux:
        OMAP: DSS2: don't power off a panel twice
        OMAP: DSS2: OMAPFB: Allow usage of def_vrfb only for omap2,3
        OMAP: DSS2: OMAPFB: make VRFB depends on OMAP2,3
        OMAP: DSS2: OMAPFB: Allow FB_OMAP2 to build without VRFB
        arm/omap: simplify conditional
        OMAP: DSS2: DSI: Remove extra iounmap in error path
        OMAP: DSS2: Use dss_features framework on DSS2 code
        OMAP: DSS2: Introduce dss_features files
        video/omap: remove mux.h include
        ARM: omap/fb: move get_fbmem_region() to .init.text
        ARM: omap/fb: move omapfb_reserve_sram to .init.text
        ARM: omap/fb: move omap_init_fb to .init.text
        OMAP: DSS2: OMAPFB: swap front and back porches for both hsync and vsync
        OMAP: DSS2: make filter coefficient tables human readable
        OMAP: DSS2: Add SPI dependency to Kconfig of ACX565AKM panel
      2c518959
    • Linus Torvalds's avatar
      Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq · 4f687603
      Linus Torvalds authored
      * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
        [CPUFREQ]: x86, cpufreq: Mark longrun_get_policy with __cpuinit.
        [CPUFREQ] add sampling_down_factor tunable to improve ondemand performance
        [CPUFREQ] arch/x86/kernel/cpu/cpufreq: Fix unsigned return type
        [CPUFREQ] drivers/cpufreq: Adjust confusing if indentation
      4f687603
    • Linus Torvalds's avatar
      Merge branch 'for-2.6.37' of git://linux-nfs.org/~bfields/linux · 4390110f
      Linus Torvalds authored
      * 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: (99 commits)
        svcrpc: svc_tcp_sendto XPT_DEAD check is redundant
        svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue
        svcrpc: assume svc_delete_xprt() called only once
        svcrpc: never clear XPT_BUSY on dead xprt
        nfsd4: fix connection allocation in sequence()
        nfsd4: only require krb5 principal for NFSv4.0 callbacks
        nfsd4: move minorversion to client
        nfsd4: delay session removal till free_client
        nfsd4: separate callback change and callback probe
        nfsd4: callback program number is per-session
        nfsd4: track backchannel connections
        nfsd4: confirm only on succesful create_session
        nfsd4: make backchannel sequence number per-session
        nfsd4: use client pointer to backchannel session
        nfsd4: move callback setup into session init code
        nfsd4: don't cache seq_misordered replies
        SUNRPC: Properly initialize sock_xprt.srcaddr in all cases
        SUNRPC: Use conventional switch stat...
      4390110f
    • Linus Torvalds's avatar
      Merge branch 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 · a4dd8dce
      Linus Torvalds authored
      * 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
        net/sunrpc: Use static const char arrays
        nfs4: fix channel attribute sanity-checks
        NFSv4.1: Use more sensible names for 'initialize_mountpoint'
        NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure
        NFSv4.1: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure
        NFS: client needs to maintain list of inodes with active layouts
        NFS: create and destroy inode's layout cache
        NFSv4.1: pnfs: filelayout: introduce minimal file layout driver
        NFSv4.1: pnfs: full mount/umount infrastructure
        NFS: set layout driver
        NFS: ask for layouttypes during v4 fsinfo call
        NFS: change stateid to be a union
        NFSv4.1: pnfsd, pnfs: protocol level pnfs constants
        SUNRPC: define xdr_decode_opaque_fixed
        NFSD: remove duplicate NFS4_STATEID_SIZE
      a4dd8dce
  2. Oct 26, 2010