Commit b6da0076 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'akpm' (patchbomb from Andrew)

Merge first patchbomb from Andrew Morton:
 - a few minor cifs fixes
 - dma-debug upadtes
 - ocfs2
 - slab
 - about half of MM
 - procfs
 - kernel/exit.c
 - panic.c tweaks
 - printk upates
 - lib/ updates
 - checkpatch updates
 - fs/binfmt updates
 - the drivers/rtc tree
 - nilfs
 - kmod fixes
 - more kernel/exit.c
 - various other misc tweaks and fixes

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
  exit: pidns: fix/update the comments in zap_pid_ns_processes()
  exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
  exit: exit_notify: re-use "dead" list to autoreap current
  exit: reparent: call forget_original_parent() under tasklist_lock
  exit: reparent: avoid find_new_reaper() if no children
  exit: reparent: introduce find_alive_thread()
  exit: reparent: introduce find_child_reaper()
  exit: reparent: document the ->has_child_subreaper checks
  exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper()
  exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting
  exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting
  exit: proc: don't try to flush /proc/tgid/task/tgid
  exit: release_task: fix the comment about group leader accounting
  exit: wait: drop tasklist_lock before psig->c* accounting
  exit: wait: don't use zombie->real_parent
  exit: wait: cleanup the ptrace_reparented() checks
  usermodehelper: kill the kmod_thread_locker logic
  usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper()
  fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp
  nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races
  ...
parents cbfe0de3 a53b8315
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
@@ -29,7 +29,7 @@ Brief summary of control files

 hugetlb.<hugepagesize>.limit_in_bytes     # set/show limit of "hugepagesize" hugetlb usage
 hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb  usage recorded
 hugetlb.<hugepagesize>.usage_in_bytes     # show current res_counter usage for "hugepagesize" hugetlb
 hugetlb.<hugepagesize>.usage_in_bytes     # show current usage for "hugepagesize" hugetlb
 hugetlb.<hugepagesize>.failcnt		   # show the number of allocation failure due to HugeTLB limit

For a system supporting two hugepage size (16M and 16G) the control
+15 −11
Original line number Diff line number Diff line
Memory Resource Controller

NOTE: This document is hopelessly outdated and it asks for a complete
      rewrite. It still contains a useful information so we are keeping it
      here but make sure to check the current code if you need a deeper
      understanding.

NOTE: The Memory Resource Controller has generically been referred to as the
      memory controller in this document. Do not confuse memory controller
      used here with the memory controller that is used in hardware.
@@ -52,9 +57,9 @@ Brief summary of control files.
 tasks				 # attach a task(thread) and show list of threads
 cgroup.procs			 # show list of processes
 cgroup.event_control		 # an interface for event_fd()
 memory.usage_in_bytes		 # show current res_counter usage for memory
 memory.usage_in_bytes		 # show current usage for memory
				 (See 5.5 for details)
 memory.memsw.usage_in_bytes	 # show current res_counter usage for memory+Swap
 memory.memsw.usage_in_bytes	 # show current usage for memory+Swap
				 (See 5.5 for details)
 memory.limit_in_bytes		 # set/show limit of memory usage
 memory.memsw.limit_in_bytes	 # set/show limit of memory+Swap usage
@@ -116,16 +121,16 @@ The memory controller is the first controller developed.

2.1. Design

The core of the design is a counter called the res_counter. The res_counter
tracks the current memory usage and limit of the group of processes associated
with the controller. Each cgroup has a memory controller specific data
structure (mem_cgroup) associated with it.
The core of the design is a counter called the page_counter. The
page_counter tracks the current memory usage and limit of the group of
processes associated with the controller. Each cgroup has a memory controller
specific data structure (mem_cgroup) associated with it.

2.2. Accounting

		+--------------------+
		|  mem_cgroup        |
		|  (res_counter)     |
		|  (page_counter)    |
		+--------------------+
		 /            ^      \
		/             |       \
@@ -352,9 +357,8 @@ set:
0. Configuration

a. Enable CONFIG_CGROUPS
b. Enable CONFIG_RESOURCE_COUNTERS
c. Enable CONFIG_MEMCG
d. Enable CONFIG_MEMCG_SWAP (to use swap extension)
b. Enable CONFIG_MEMCG
c. Enable CONFIG_MEMCG_SWAP (to use swap extension)
d. Enable CONFIG_MEMCG_KMEM (to use kmem extension)

1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
+0 −197
Original line number Diff line number Diff line

		The Resource Counter

The resource counter, declared at include/linux/res_counter.h,
is supposed to facilitate the resource management by controllers
by providing common stuff for accounting.

This "stuff" includes the res_counter structure and routines
to work with it.



1. Crucial parts of the res_counter structure

 a. unsigned long long usage

 	The usage value shows the amount of a resource that is consumed
	by a group at a given time. The units of measurement should be
	determined by the controller that uses this counter. E.g. it can
	be bytes, items or any other unit the controller operates on.

 b. unsigned long long max_usage

 	The maximal value of the usage over time.

 	This value is useful when gathering statistical information about
	the particular group, as it shows the actual resource requirements
	for a particular group, not just some usage snapshot.

 c. unsigned long long limit

 	The maximal allowed amount of resource to consume by the group. In
	case the group requests for more resources, so that the usage value
	would exceed the limit, the resource allocation is rejected (see
	the next section).

 d. unsigned long long failcnt

 	The failcnt stands for "failures counter". This is the number of
	resource allocation attempts that failed.

 c. spinlock_t lock

 	Protects changes of the above values.



2. Basic accounting routines

 a. void res_counter_init(struct res_counter *rc,
				struct res_counter *rc_parent)

 	Initializes the resource counter. As usual, should be the first
	routine called for a new counter.

	The struct res_counter *parent can be used to define a hierarchical
	child -> parent relationship directly in the res_counter structure,
	NULL can be used to define no relationship.

 c. int res_counter_charge(struct res_counter *rc, unsigned long val,
				struct res_counter **limit_fail_at)

	When a resource is about to be allocated it has to be accounted
	with the appropriate resource counter (controller should determine
	which one to use on its own). This operation is called "charging".

	This is not very important which operation - resource allocation
	or charging - is performed first, but
	  * if the allocation is performed first, this may create a
	    temporary resource over-usage by the time resource counter is
	    charged;
	  * if the charging is performed first, then it should be uncharged
	    on error path (if the one is called).

	If the charging fails and a hierarchical dependency exists, the
	limit_fail_at parameter is set to the particular res_counter element
	where the charging failed.

 d. u64 res_counter_uncharge(struct res_counter *rc, unsigned long val)

	When a resource is released (freed) it should be de-accounted
	from the resource counter it was accounted to.  This is called
	"uncharging". The return value of this function indicate the amount
	of charges still present in the counter.

	The _locked routines imply that the res_counter->lock is taken.

 e. u64 res_counter_uncharge_until
		(struct res_counter *rc, struct res_counter *top,
		 unsigned long val)

	Almost same as res_counter_uncharge() but propagation of uncharge
	stops when rc == top. This is useful when kill a res_counter in
	child cgroup.

 2.1 Other accounting routines

    There are more routines that may help you with common needs, like
    checking whether the limit is reached or resetting the max_usage
    value. They are all declared in include/linux/res_counter.h.



3. Analyzing the resource counter registrations

 a. If the failcnt value constantly grows, this means that the counter's
    limit is too tight. Either the group is misbehaving and consumes too
    many resources, or the configuration is not suitable for the group
    and the limit should be increased.

 b. The max_usage value can be used to quickly tune the group. One may
    set the limits to maximal values and either load the container with
    a common pattern or leave one for a while. After this the max_usage
    value shows the amount of memory the container would require during
    its common activity.

    Setting the limit a bit above this value gives a pretty good
    configuration that works in most of the cases.

 c. If the max_usage is much less than the limit, but the failcnt value
    is growing, then the group tries to allocate a big chunk of resource
    at once.

 d. If the max_usage is much less than the limit, but the failcnt value
    is 0, then this group is given too high limit, that it does not
    require. It is better to lower the limit a bit leaving more resource
    for other groups.



4. Communication with the control groups subsystem (cgroups)

All the resource controllers that are using cgroups and resource counters
should provide files (in the cgroup filesystem) to work with the resource
counter fields. They are recommended to adhere to the following rules:

 a. File names

 	Field name	File name
	---------------------------------------------------
	usage		usage_in_<unit_of_measurement>
	max_usage	max_usage_in_<unit_of_measurement>
	limit		limit_in_<unit_of_measurement>
	failcnt		failcnt
	lock		no file :)

 b. Reading from file should show the corresponding field value in the
    appropriate format.

 c. Writing to file

 	Field		Expected behavior
	----------------------------------
	usage		prohibited
	max_usage	reset to usage
	limit		set the limit
	failcnt		reset to zero



5. Usage example

 a. Declare a task group (take a look at cgroups subsystem for this) and
    fold a res_counter into it

	struct my_group {
		struct res_counter res;

		<other fields>
	}

 b. Put hooks in resource allocation/release paths

 	int alloc_something(...)
	{
		if (res_counter_charge(res_counter_ptr, amount) < 0)
			return -ENOMEM;

		<allocate the resource and return to the caller>
	}

	void release_something(...)
	{
		res_counter_uncharge(res_counter_ptr, amount);

		<release the resource>
	}

    In order to keep the usage value self-consistent, both the
    "res_counter_ptr" and the "amount" in release_something() should be
    the same as they were in the alloc_something() when the releasing
    resource was allocated.

 c. Provide the way to read res_counter values and set them (the cgroups
    still can help with it).

 c. Compile and run :)
+8 −1
Original line number Diff line number Diff line
@@ -5,11 +5,17 @@ Required properties:
	- "ti,da830-rtc"  - for RTC IP used similar to that on DA8xx SoC family.
	- "ti,am3352-rtc" - for RTC IP used similar to that on AM335x SoC family.
			    This RTC IP has special WAKE-EN Register to enable
			    Wakeup generation for event Alarm.
			    Wakeup generation for event Alarm. It can also be
			    used to control an external PMIC via the
			    pmic_power_en pin.
- reg: Address range of rtc register set
- interrupts: rtc timer, alarm interrupts in order
- interrupt-parent: phandle for the interrupt controller

Optional properties:
- system-power-controller: whether the rtc is controlling the system power
  through pmic_power_en

Example:

rtc@1c23000 {
@@ -18,4 +24,5 @@ rtc@1c23000 {
	interrupts = <19
		      19>;
	interrupt-parent = <&intc>;
	system-power-controller;
};
+1 −0
Original line number Diff line number Diff line
@@ -115,6 +115,7 @@ nxp NXP Semiconductors
onnn	ON Semiconductor Corp.
opencores	OpenCores.org
panasonic	Panasonic Corporation
pericom	Pericom Technology Inc.
phytec	PHYTEC Messtechnik GmbH
picochip	Picochip Ltd
plathome	Plat'Home Co., Ltd.
Loading