Commit 603699bb authored by Mauro Carvalho Chehab's avatar Mauro Carvalho Chehab Committed by Jonathan Corbet
Browse files

static-keys.txt: standardize document format



Each text file under Documentation follows a different
format. Some doesn't even have titles!

Change its representation to follow the adopted standard,
using ReST markups for it to be parseable by Sphinx:
- Mark titles;
- Add a warning mark;
- Mark literals and literal blocks;
- Adjust identation.

Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
parent c6d289d0
Loading
Loading
Loading
Loading
+108 −99
Original line number Original line Diff line number Diff line
===========
Static Keys
Static Keys
			-----------
===========

.. warning::


   DEPRECATED API:
   DEPRECATED API:


   The use of 'struct static_key' directly, is now DEPRECATED. In addition
   The use of 'struct static_key' directly, is now DEPRECATED. In addition
static_key_{true,false}() is also DEPRECATED. IE DO NOT use the following:
   static_key_{true,false}() is also DEPRECATED. IE DO NOT use the following::


	struct static_key false = STATIC_KEY_INIT_FALSE;
	struct static_key false = STATIC_KEY_INIT_FALSE;
	struct static_key true = STATIC_KEY_INIT_TRUE;
	struct static_key true = STATIC_KEY_INIT_TRUE;
	static_key_true()
	static_key_true()
	static_key_false()
	static_key_false()


The updated API replacements are:
   The updated API replacements are::


	DEFINE_STATIC_KEY_TRUE(key);
	DEFINE_STATIC_KEY_TRUE(key);
	DEFINE_STATIC_KEY_FALSE(key);
	DEFINE_STATIC_KEY_FALSE(key);
@@ -20,11 +23,12 @@ DEFINE_STATIC_KEY_ARRAY_FALSE(keys, count);
	static_branch_likely()
	static_branch_likely()
	static_branch_unlikely()
	static_branch_unlikely()


0) Abstract
Abstract
========


Static keys allows the inclusion of seldom used features in
Static keys allows the inclusion of seldom used features in
performance-sensitive fast-path kernel code, via a GCC feature and a code
performance-sensitive fast-path kernel code, via a GCC feature and a code
patching technique. A quick example:
patching technique. A quick example::


	DEFINE_STATIC_KEY_FALSE(key);
	DEFINE_STATIC_KEY_FALSE(key);


@@ -45,7 +49,8 @@ The static_branch_unlikely() branch will be generated into the code with as litt
impact to the likely code path as possible.
impact to the likely code path as possible.




1) Motivation
Motivation
==========




Currently, tracepoints are implemented using a conditional branch. The
Currently, tracepoints are implemented using a conditional branch. The
@@ -60,7 +65,8 @@ possible. Although tracepoints are the original motivation for this work, other
kernel code paths should be able to make use of the static keys facility.
kernel code paths should be able to make use of the static keys facility.




2) Solution
Solution
========




gcc (v4.5) adds a new 'asm goto' statement that allows branching to a label:
gcc (v4.5) adds a new 'asm goto' statement that allows branching to a label:
@@ -71,7 +77,7 @@ Using the 'asm goto', we can create branches that are either taken or not taken
by default, without the need to check memory. Then, at run-time, we can patch
by default, without the need to check memory. Then, at run-time, we can patch
the branch site to change the branch direction.
the branch site to change the branch direction.


For example, if we have a simple branch that is disabled by default:
For example, if we have a simple branch that is disabled by default::


	if (static_branch_unlikely(&key))
	if (static_branch_unlikely(&key))
		printk("I am the true branch\n");
		printk("I am the true branch\n");
@@ -87,14 +93,15 @@ optimization.
This lowlevel patching mechanism is called 'jump label patching', and it gives
This lowlevel patching mechanism is called 'jump label patching', and it gives
the basis for the static keys facility.
the basis for the static keys facility.


3) Static key label API, usage and examples:
Static key label API, usage and examples
========================================




In order to make use of this optimization you must first define a key:
In order to make use of this optimization you must first define a key::


	DEFINE_STATIC_KEY_TRUE(key);
	DEFINE_STATIC_KEY_TRUE(key);


or:
or::


	DEFINE_STATIC_KEY_FALSE(key);
	DEFINE_STATIC_KEY_FALSE(key);


@@ -102,14 +109,14 @@ or:
The key must be global, that is, it can't be allocated on the stack or dynamically
The key must be global, that is, it can't be allocated on the stack or dynamically
allocated at run-time.
allocated at run-time.


The key is then used in code as:
The key is then used in code as::


        if (static_branch_unlikely(&key))
        if (static_branch_unlikely(&key))
                do unlikely code
                do unlikely code
        else
        else
                do likely code
                do likely code


Or:
Or::


        if (static_branch_likely(&key))
        if (static_branch_likely(&key))
                do likely code
                do likely code
@@ -120,15 +127,15 @@ Keys defined via DEFINE_STATIC_KEY_TRUE(), or DEFINE_STATIC_KEY_FALSE, may
be used in either static_branch_likely() or static_branch_unlikely()
be used in either static_branch_likely() or static_branch_unlikely()
statements.
statements.


Branch(es) can be set true via:
Branch(es) can be set true via::


	static_branch_enable(&key);
	static_branch_enable(&key);


or false via:
or false via::


	static_branch_disable(&key);
	static_branch_disable(&key);


The branch(es) can then be switched via reference counts:
The branch(es) can then be switched via reference counts::


	static_branch_inc(&key);
	static_branch_inc(&key);
	...
	...
@@ -142,11 +149,11 @@ static_branch_inc(), will change the branch back to true. Likewise, if the
key is initialized false, a 'static_branch_inc()', will change the branch to
key is initialized false, a 'static_branch_inc()', will change the branch to
true. And then a 'static_branch_dec()', will again make the branch false.
true. And then a 'static_branch_dec()', will again make the branch false.


Where an array of keys is required, it can be defined as:
Where an array of keys is required, it can be defined as::


	DEFINE_STATIC_KEY_ARRAY_TRUE(keys, count);
	DEFINE_STATIC_KEY_ARRAY_TRUE(keys, count);


or:
or::


	DEFINE_STATIC_KEY_ARRAY_FALSE(keys, count);
	DEFINE_STATIC_KEY_ARRAY_FALSE(keys, count);


@@ -159,31 +166,33 @@ simply fall back to a traditional, load, test, and jump sequence. Also, the
struct jump_entry table must be at least 4-byte aligned because the
struct jump_entry table must be at least 4-byte aligned because the
static_key->entry field makes use of the two least significant bits.
static_key->entry field makes use of the two least significant bits.


* select HAVE_ARCH_JUMP_LABEL, see: arch/x86/Kconfig
* ``select HAVE_ARCH_JUMP_LABEL``,
    see: arch/x86/Kconfig


* #define JUMP_LABEL_NOP_SIZE, see: arch/x86/include/asm/jump_label.h
* ``#define JUMP_LABEL_NOP_SIZE``,
    see: arch/x86/include/asm/jump_label.h


* __always_inline bool arch_static_branch(struct static_key *key, bool branch), see:
* ``__always_inline bool arch_static_branch(struct static_key *key, bool branch)``,
					arch/x86/include/asm/jump_label.h
    see: arch/x86/include/asm/jump_label.h


* __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch),
* ``__always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)``,
    see: arch/x86/include/asm/jump_label.h
    see: arch/x86/include/asm/jump_label.h


* void arch_jump_label_transform(struct jump_entry *entry, enum jump_label_type type),
* ``void arch_jump_label_transform(struct jump_entry *entry, enum jump_label_type type)``,
    see: arch/x86/kernel/jump_label.c
    see: arch/x86/kernel/jump_label.c


* __init_or_module void arch_jump_label_transform_static(struct jump_entry *entry, enum jump_label_type type),
* ``__init_or_module void arch_jump_label_transform_static(struct jump_entry *entry, enum jump_label_type type)``,
    see: arch/x86/kernel/jump_label.c
    see: arch/x86/kernel/jump_label.c



* ``struct jump_entry``,
* struct jump_entry, see: arch/x86/include/asm/jump_label.h
    see: arch/x86/include/asm/jump_label.h




5) Static keys / jump label analysis, results (x86_64):
5) Static keys / jump label analysis, results (x86_64):




As an example, let's add the following branch to 'getppid()', such that the
As an example, let's add the following branch to 'getppid()', such that the
system call now looks like:
system call now looks like::


  SYSCALL_DEFINE0(getppid)
  SYSCALL_DEFINE0(getppid)
  {
  {
@@ -199,7 +208,7 @@ SYSCALL_DEFINE0(getppid)
        return pid;
        return pid;
  }
  }


The resulting instructions with jump labels generated by GCC is:
The resulting instructions with jump labels generated by GCC is::


  ffffffff81044290 <sys_getppid>:
  ffffffff81044290 <sys_getppid>:
  ffffffff81044290:       55                      push   %rbp
  ffffffff81044290:       55                      push   %rbp
@@ -219,7 +228,7 @@ ffffffff810442c7: 31 c0 xor %eax,%eax
  ffffffff810442c9:       e8 71 13 6d 00          callq  ffffffff8171563f <printk>
  ffffffff810442c9:       e8 71 13 6d 00          callq  ffffffff8171563f <printk>
  ffffffff810442ce:       eb c9                   jmp    ffffffff81044299 <sys_getppid+0x9>
  ffffffff810442ce:       eb c9                   jmp    ffffffff81044299 <sys_getppid+0x9>


Without the jump label optimization it looks like:
Without the jump label optimization it looks like::


  ffffffff810441f0 <sys_getppid>:
  ffffffff810441f0 <sys_getppid>:
  ffffffff810441f0:       8b 05 8a 52 d8 00       mov    0xd8528a(%rip),%eax        # ffffffff81dc9480 <key>
  ffffffff810441f0:       8b 05 8a 52 d8 00       mov    0xd8528a(%rip),%eax        # ffffffff81dc9480 <key>
@@ -246,7 +255,7 @@ ffffffff8104423c: 00 00 00 00
Thus, the disable jump label case adds a 'mov', 'test' and 'jne' instruction
Thus, the disable jump label case adds a 'mov', 'test' and 'jne' instruction
vs. the jump label case just has a 'no-op' or 'jmp 0'. (The jmp 0, is patched
vs. the jump label case just has a 'no-op' or 'jmp 0'. (The jmp 0, is patched
to a 5 byte atomic no-op instruction at boot-time.) Thus, the disabled jump
to a 5 byte atomic no-op instruction at boot-time.) Thus, the disabled jump
label case adds:
label case adds::


  6 (mov) + 2 (test) + 2 (jne) = 10 - 5 (5 byte jump 0) = 5 addition bytes.
  6 (mov) + 2 (test) + 2 (jne) = 10 - 5 (5 byte jump 0) = 5 addition bytes.


@@ -262,7 +271,7 @@ Since there are a number of static key API uses in the scheduler paths,
'pipe-test' (also known as 'perf bench sched pipe') can be used to show the
'pipe-test' (also known as 'perf bench sched pipe') can be used to show the
performance improvement. Testing done on 3.3.0-rc2:
performance improvement. Testing done on 3.3.0-rc2:


jump label disabled:
jump label disabled::


 Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs):
 Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs):


@@ -279,7 +288,7 @@ jump label disabled:


       1.601607384 seconds time elapsed                                          ( +-  0.07% )
       1.601607384 seconds time elapsed                                          ( +-  0.07% )


jump label enabled:
jump label enabled::


 Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs):
 Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs):