Skip to content
  1. Mar 24, 2012
  2. Mar 23, 2012
  3. Mar 22, 2012
    • Linus Torvalds's avatar
      Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · 5375871d
      Linus Torvalds authored
      Pull powerpc merge from Benjamin Herrenschmidt:
       "Here's the powerpc batch for this merge window.  It is going to be a
        bit more nasty than usual as in touching things outside of
        arch/powerpc mostly due to the big iSeriesectomy :-) We finally got
        rid of the bugger (legacy iSeries support) which was a PITA to
        maintain and that nobody really used anymore.
      
        Here are some of the highlights:
      
         - Legacy iSeries is gone.  Thanks Stephen ! There's still some bits
           and pieces remaining if you do a grep -ir series arch/powerpc but
           they are harmless and will be removed in the next few weeks
           hopefully.
      
         - The 'fadump' functionality (Firmware Assisted Dump) replaces the
           previous (equivalent) "pHyp assisted dump"...  it's a rewrite of a
           mechanism to get the hypervisor to do crash dumps on pSeries, the
           new implementation hopefully being much more reliable.  Thanks
           Mahesh Salgaonkar.
      
         - The "EEH" code (pSeries PCI error handling & recovery) got a big
           spring cleaning, motivated by the need to be able to implement a
           new backend for it on top of some new different type of firwmare.
      
           The work isn't complete yet, but a good chunk of the cleanups is
           there.  Note that this adds a field to struct device_node which is
           not very nice and which Grant objects to.  I will have a patch soon
           that moves that to a powerpc private data structure (hopefully
           before rc1) and we'll improve things further later on (hopefully
           getting rid of the need for that pointer completely).  Thanks Gavin
           Shan.
      
         - I dug into our exception & interrupt handling code to improve the
           way we do lazy interrupt handling (and make it work properly with
           "edge" triggered interrupt sources), and while at it found & fixed
           a wagon of issues in those areas, including adding support for page
           fault retry & fatal signals on page faults.
      
         - Your usual random batch of small fixes & updates, including a bunch
           of new embedded boards, both Freescale and APM based ones, etc..."
      
      I fixed up some conflicts with the generalized irq-domain changes from
      Grant Likely, hopefully correctly.
      
      * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (141 commits)
        powerpc/ps3: Do not adjust the wrapper load address
        powerpc: Remove the rest of the legacy iSeries include files
        powerpc: Remove the remaining CONFIG_PPC_ISERIES pieces
        init: Remove CONFIG_PPC_ISERIES
        powerpc: Remove FW_FEATURE ISERIES from arch code
        tty/hvc_vio: FW_FEATURE_ISERIES is no longer selectable
        powerpc/spufs: Fix double unlocks
        powerpc/5200: convert mpc5200 to use of_platform_populate()
        powerpc/mpc5200: add options to mpc5200_defconfig
        powerpc/mpc52xx: add a4m072 board support
        powerpc/mpc5200: update mpc5200_defconfig to fit for charon board
        Documentation/powerpc/mpc52xx.txt: Checkpatch cleanup
        powerpc/44x: Add additional device support for APM821xx SoC and Bluestone board
        powerpc/44x: Add support PCI-E for APM821xx SoC and Bluestone board
        MAINTAINERS: Update PowerPC 4xx tree
        powerpc/44x: The bug fixed support for APM821xx SoC and Bluestone board
        powerpc: document the FSL MPIC message register binding
        powerpc: add support for MPIC message register API
        powerpc/fsl: Added aliased MSIIR register address to MSI node in dts
        powerpc/85xx: mpc8548cds - add 36-bit dts
        ...
      5375871d
    • Linus Torvalds's avatar
      Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu · b57cb723
      Linus Torvalds authored
      Pull m68knommu arch updates from Greg Ungerer:
       "Includes a cleanup of the non-MMU linker script (it now almost
        exclusively uses the well defined linker script support macros and
        definitions).  Some more merging of MMU and non-MMU common files
        (specifically the arch process.c, ptrace and time.c).  And a big
        cleanup of the massively duplicated ColdFire device definition code.
      
        Overall we remove about 2000 lines of code, and end up with a single
        set of platform device definitions for the serial ports, ethernet
        ports and QSPI ports common in most ColdFire SoCs.
      
        I expect you will get a merge conflict on arch/m68k/kernel/process.c,
        in cpu_idle().  It should be relatively strait forward to fixup."
      
      And cpu_idle() conflict resolution was indeed trivial (merging the
      nommu/mmu versions of process.c trivially conflicting with the
      conversion to use the schedule_preempt_disabled() helper function)
      
      * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: (57 commits)
        m68knommu: factor more common ColdFire cpu reset code
        m68knommu: make 528x CPU reset register addressing consistent
        m68knommu: make 527x CPU reset register addressing consistent
        m68knommu: make 523x CPU reset register addressing consistent
        m68knommu: factor some common ColdFire cpu reset code
        m68knommu: move old ColdFire timers init from CPU init to timers code
        m68knommu: clean up init code in ColdFire 532x startup
        m68knommu: clean up init code in ColdFire 528x startup
        m68knommu: clean up init code in ColdFire 523x startup
        m68knommu: merge common ColdFire QSPI platform setup code
        m68knommu: make 532x QSPI platform addressing consistent
        m68knommu: make 528x QSPI platform addressing consistent
        m68knommu: make 527x QSPI platform addressing consistent
        m68knommu: make 5249 QSPI platform addressing consistent
        m68knommu: make 523x QSPI platform addressing consistent
        m68knommu: make 520x QSPI platform addressing consistent
        m68knommu: merge common ColdFire FEC platform setup code
        m68knommu: make 532x FEC platform addressing consistent
        m68knommu: make 528x FEC platform addressing consistent
        m68knommu: make 527x FEC platform addressing consistent
        ...
      b57cb723
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw · ad12ab25
      Linus Torvalds authored
      Pull gfs2 changes from Steven Whitehouse.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
        GFS2: Change truncate page allocation to be GFP_NOFS
        GFS2: call gfs2_write_alloc_required for each chunk
        GFS2: Clean up log flush header writing
        GFS2: Remove a __GFP_NOFAIL allocation
        GFS2: Flush pending glock work when evicting an inode
        GFS2: make sure rgrps are up to date in func gfs2_blk2rgrpd
        GFS2: Eliminate sd_rindex_mutex
        GFS2: Unlock rindex mutex on glock error
        GFS2: Make bd_cmp() static
        GFS2: Sort the ordered write list
        GFS2: FITRIM ioctl support
        GFS2: Move two functions from log.c to lops.c
        GFS2: glock statistics gathering
      ad12ab25
    • Naoya Horiguchi's avatar
      memcg: avoid THP split in task migration · 12724850
      Naoya Horiguchi authored
      
      
      Currently we can't do task migration among memory cgroups without THP
      split, which means processes heavily using THP experience large overhead
      in task migration.  This patch introduces the code for moving charge of
      THP and makes THP more valuable.
      
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarHillf Danton <dhillf@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      12724850
    • Naoya Horiguchi's avatar
      thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE · d8c37c48
      Naoya Horiguchi authored
      
      
      These macros will be used in a later patch, where all usages are expected
      to be optimized away without #ifdef CONFIG_TRANSPARENT_HUGEPAGE.  But to
      detect unexpected usages, we convert the existing BUG() to BUILD_BUG().
      
      [akpm@linux-foundation.org: fix build in mm/pgtable-generic.c]
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarHillf Danton <dhillf@gmail.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d8c37c48
    • Naoya Horiguchi's avatar
      memcg: clean up existing move charge code · 8d32ff84
      Naoya Horiguchi authored
      
      
      - Replace lengthy function name is_target_pte_for_mc() with a shorter
        one in order to avoid ugly line breaks.
      
      - explicitly use MC_TARGET_* instead of simply using integers.
      
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Acked-by: default avatarHillf Danton <dhillf@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8d32ff84
    • Jeff Liu's avatar
    • Anton Vorontsov's avatar
      mm/memcontrol.c: remove redundant BUG_ON() in mem_cgroup_usage_unregister_event() · 45f3e385
      Anton Vorontsov authored
      
      
      In the following code:
      
      	if (type == _MEM)
      		thresholds = &memcg->thresholds;
      	else if (type == _MEMSWAP)
      		thresholds = &memcg->memsw_thresholds;
      	else
      		BUG();
      
      	BUG_ON(!thresholds);
      
      The BUG_ON() seems redundant.
      
      Signed-off-by: default avatarAnton Vorontsov <anton.vorontsov@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      45f3e385
    • Andrew Morton's avatar
      mm/memcontrol.c: s/stealed/stolen/ · 13fd1dd9
      Andrew Morton authored
      
      
      A grammatical fix.
      
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      13fd1dd9
    • KAMEZAWA Hiroyuki's avatar
      memcg: fix performance of mem_cgroup_begin_update_page_stat() · 4331f7d3
      KAMEZAWA Hiroyuki authored
      
      
      mem_cgroup_begin_update_page_stat() should be very fast because it's
      called very frequently.  Now, it needs to look up page_cgroup and its
      memcg....this is slow.
      
      This patch adds a global variable to check "any memcg is moving or not".
      With this, the caller doesn't need to visit page_cgroup and memcg.
      
      Here is a test result.  A test program makes page faults onto a file,
      MAP_SHARED and makes each page's page_mapcount(page) > 1, and free the
      range by madvise() and page fault again.  This program causes 26214400
      times of page fault onto a file(size was 1G.) and shows shows the cost of
      mem_cgroup_begin_update_page_stat().
      
      Before this patch for mem_cgroup_begin_update_page_stat()
      
          [kamezawa@bluextal test]$ time ./mmap 1G
      
          real    0m21.765s
          user    0m5.999s
          sys     0m15.434s
      
          27.46%     mmap  mmap               [.] reader
          21.15%     mmap  [kernel.kallsyms]  [k] page_fault
           9.17%     mmap  [kernel.kallsyms]  [k] filemap_fault
           2.96%     mmap  [kernel.kallsyms]  [k] __do_fault
           2.83%     mmap  [kernel.kallsyms]  [k] __mem_cgroup_begin_update_page_stat
      
      After this patch
      
          [root@bluextal test]# time ./mmap 1G
      
          real    0m21.373s
          user    0m6.113s
          sys     0m15.016s
      
      In usual path, calls to __mem_cgroup_begin_update_page_stat() goes away.
      
      Note: we may be able to remove this optimization in future if
            we can get pointer to memcg directly from struct page.
      
      [akpm@linux-foundation.org: don't return a void]
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4331f7d3
    • KAMEZAWA Hiroyuki's avatar
      memcg: remove PCG_FILE_MAPPED · 2ff76f11
      KAMEZAWA Hiroyuki authored
      
      
      With the new lock scheme for updating memcg's page stat, we don't need a
      flag PCG_FILE_MAPPED which was duplicated information of page_mapped().
      
      [hughd@google.com: cosmetic fix]
      [hughd@google.com: add comment to MEM_CGROUP_CHARGE_TYPE_MAPPED case in __mem_cgroup_uncharge_common()]
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2ff76f11
    • KAMEZAWA Hiroyuki's avatar
      memcg: use new logic for page stat accounting · 89c06bd5
      KAMEZAWA Hiroyuki authored
      
      
      Now, page-stat-per-memcg is recorded into per page_cgroup flag by
      duplicating page's status into the flag.  The reason is that memcg has a
      feature to move a page from a group to another group and we have race
      between "move" and "page stat accounting",
      
      Under current logic, assume CPU-A and CPU-B.  CPU-A does "move" and CPU-B
      does "page stat accounting".
      
      When CPU-A goes 1st,
      
                  CPU-A                           CPU-B
                                          update "struct page" info.
          move_lock_mem_cgroup(memcg)
          see pc->flags
          copy page stat to new group
          overwrite pc->mem_cgroup.
          move_unlock_mem_cgroup(memcg)
                                          move_lock_mem_cgroup(mem)
                                          set pc->flags
                                          update page stat accounting
                                          move_unlock_mem_cgroup(mem)
      
      stat accounting is guarded by move_lock_mem_cgroup() and "move" logic
      (CPU-A) doesn't see changes in "struct page" information.
      
      But it's costly to have the same information both in 'struct page' and
      'struct page_cgroup'.  And, there is a potential problem.
      
      For example, assume we have PG_dirty accounting in memcg.
      PG_..is a flag for struct page.
      PCG_ is a flag for struct page_cgroup.
      (This is just an example. The same problem can be found in any
       kind of page stat accounting.)
      
      	  CPU-A                               CPU-B
            TestSet PG_dirty
            (delay)                        TestClear PG_dirty
                                           if (TestClear(PCG_dirty))
                                                memcg->nr_dirty--
            if (TestSet(PCG_dirty))
                memcg->nr_dirty++
      
      Here, memcg->nr_dirty = +1, this is wrong.  This race was reported by Greg
      Thelen <gthelen@google.com>.  Now, only FILE_MAPPED is supported but
      fortunately, it's serialized by page table lock and this is not real bug,
      _now_,
      
      If this potential problem is caused by having duplicated information in
      struct page and struct page_cgroup, we may be able to fix this by using
      original 'struct page' information.  But we'll have a problem in "move
      account"
      
      Assume we use only PG_dirty.
      
               CPU-A                   CPU-B
          TestSet PG_dirty
          (delay)                    move_lock_mem_cgroup()
                                     if (PageDirty(page))
                                            new_memcg->nr_dirty++
                                     pc->mem_cgroup = new_memcg;
                                     move_unlock_mem_cgroup()
          move_lock_mem_cgroup()
          memcg = pc->mem_cgroup
          new_memcg->nr_dirty++
      
      accounting information may be double-counted.  This was original reason to
      have PCG_xxx flags but it seems PCG_xxx has another problem.
      
      I think we need a bigger lock as
      
           move_lock_mem_cgroup(page)
           TestSetPageDirty(page)
           update page stats (without any checks)
           move_unlock_mem_cgroup(page)
      
      This fixes both of problems and we don't have to duplicate page flag into
      page_cgroup.  Please note: move_lock_mem_cgroup() is held only when there
      are possibility of "account move" under the system.  So, in most path,
      status update will go without atomic locks.
      
      This patch introduces mem_cgroup_begin_update_page_stat() and
      mem_cgroup_end_update_page_stat() both should be called at modifying
      'struct page' information if memcg takes care of it.  as
      
           mem_cgroup_begin_update_page_stat()
           modify page information
           mem_cgroup_update_page_stat()
           => never check any 'struct page' info, just update counters.
           mem_cgroup_end_update_page_stat().
      
      This patch is slow because we need to call begin_update_page_stat()/
      end_update_page_stat() regardless of accounted will be changed or not.  A
      following patch adds an easy optimization and reduces the cost.
      
      [akpm@linux-foundation.org: s/lock/locked/]
      [hughd@google.com: fix deadlock by avoiding stat lock when anon]
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Greg Thelen <gthelen@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      89c06bd5
    • KAMEZAWA Hiroyuki's avatar
      memcg: remove PCG_MOVE_LOCK flag from page_cgroup · 312734c0
      KAMEZAWA Hiroyuki authored
      
      
      PCG_MOVE_LOCK is used for bit spinlock to avoid race between overwriting
      pc->mem_cgroup and page statistics accounting per memcg.  This lock helps
      to avoid the race but the race is very rare because moving tasks between
      cgroup is not a usual job.  So, it seems using 1bit per page is too
      costly.
      
      This patch changes this lock as per-memcg spinlock and removes
      PCG_MOVE_LOCK.
      
      If smaller lock is required, we'll be able to add some hashes but I'd like
      to start from this.
      
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarGreg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      312734c0
    • KAMEZAWA Hiroyuki's avatar
      memcg: simplify move_account() check · 619d094b
      KAMEZAWA Hiroyuki authored
      
      
      In memcg, for avoiding take-lock-irq-off at accessing page_cgroup, a
      logic, flag + rcu_read_lock(), is used.  This works as following
      
           CPU-A                     CPU-B
                                   rcu_read_lock()
          set flag
                                   if(flag is set)
                                         take heavy lock
                                   do job.
          synchronize_rcu()        rcu_read_unlock()
          take heavy lock.
      
      In recent discussion, it's argued that using per-cpu value for this flag
      just complicates the code because 'set flag' is very rare.
      
      This patch changes 'flag' implementation from percpu to atomic_t.  This
      will be much simpler.
      
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarGreg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Ying Han <yinghan@google.com>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      619d094b
    • KAMEZAWA Hiroyuki's avatar
      memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat) · 9e335790
      KAMEZAWA Hiroyuki authored
      
      
      As described in the log, I guess EXPORT was for preparing dirty
      accounting.  But _now_, we don't need to export this.  Remove this for
      now.
      
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: default avatarGreg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Ying Han <yinghan@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9e335790
    • Konstantin Khlebnikov's avatar
      memcg: kill dead prev_priority stubs · a710920c
      Konstantin Khlebnikov authored
      
      
      This code was removed in 25edde03 ("vmscan: kill prev_priority
      completely")
      
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@openvz.org>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a710920c
    • KAMEZAWA Hiroyuki's avatar
      memcg: remove PCG_CACHE page_cgroup flag · b2402857
      KAMEZAWA Hiroyuki authored
      
      
      We record 'the page is cache' with the PCG_CACHE bit in page_cgroup.
      Here, "CACHE" means anonymous user pages (and SwapCache).  This doesn't
      include shmem.
      
      Considering callers, at charge/uncharge, the caller should know what the
      page is and we don't need to record it by using one bit per page.
      
      This patch removes PCG_CACHE bit and make callers of
      mem_cgroup_charge_statistics() to specify what the page is.
      
      About page migration: Mapping of the used page is not touched during migra
      tion (see page_remove_rmap) so we can rely on it and push the correct
      charge type down to __mem_cgroup_uncharge_common from end_migration for
      unused page.  The force flag was misleading was abused for skipping the
      needless page_mapped() / PageCgroupMigration() check, as we know the
      unused page is no longer mapped and cleared the migration flag just a few
      lines up.  But doing the checks is no biggie and it's not worth adding
      another flag just to skip them.
      
      [akpm@linux-foundation.org: checkpatch fixes]
      [hughd@google.com: fix PageAnon uncharging]
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ying Han <yinghan@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b2402857
    • Hugh Dickins's avatar
      memcg: let css_get_next() rely upon rcu_read_lock() · ca464d69
      Hugh Dickins authored
      
      
      Remove lock and unlock around css_get_next()'s call to idr_get_next().
      memcg iterators (only users of css_get_next) already did rcu_read_lock(),
      and its comment demands that; but add a WARN_ON_ONCE to make sure of it.
      
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ca464d69
    • Hugh Dickins's avatar
      cgroup: revert ss_id_lock to spinlock · 42aee6c4
      Hugh Dickins authored
      
      
      Commit c1e2ee2d ("memcg: replace ss->id_lock with a rwlock") has now
      been seen to cause the unfair behavior we should have expected from
      converting a spinlock to an rwlock: softlockup in cgroup_mkdir(), whose
      get_new_cssid() is waiting for the wlock, while there are 19 tasks using
      the rlock in css_get_next() to get on with their memcg workload (in an
      artificial test, admittedly).  Yet lib/idr.c was made suitable for RCU
      way back: revert that commit, restoring ss->id_lock to a spinlock.
      
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      42aee6c4
    • Hugh Dickins's avatar
      idr: make idr_get_next() good for rcu_read_lock() · 9f7de827
      Hugh Dickins authored
      
      
      Make one small adjustment to idr_get_next(): take the height from the top
      layer (stable under RCU) instead of from the root (unprotected by RCU), as
      idr_find() does: so that it can be used with RCU locking.  Copied comment
      on RCU locking from idr_find().
      
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f7de827