Skip to content
  1. May 17, 2015
    • Ingo Molnar's avatar
      x86: Pack loops tightly as well · 52648e83
      Ingo Molnar authored
      Packing loops tightly (-falign-loops=1) is beneficial to code size:
      
           text        data    bss     dec              filename
       12566391        1617840 1089536 15273767         vmlinux.align.16-byte
       12224951        1617840 1089536 14932327         vmlinux.align.1-byte
       11976567        1617840 1089536 14683943         vmlinux.align.1-byte.funcs-1-byte
       11903735        1617840 1089536 14611111         vmlinux.align.1-byte.funcs-1-byte.loops-1-byte
      
      Which reduces the size of the kernel by another 0.6%, so the
      the total combined size reduction of the alignment-packing
      patches is ~5.5%.
      
      The x86 decoder bandwidth and caching arguments laid out in:
      
        be6cb027
      
       ("x86: Align jump targets to 1-byte boundaries")
      
      apply to loop alignment as well.
      
      Furtermore, modern CPU uarchs have a loop cache/buffer that
      is a L0 cache before even any uop cache, covering a few
      dozen most recently executed instructions.
      
      This loop cache generally does not have the 16-byte alignment
      restrictions of the uop cache.
      
      Now loop alignment can still be beneficial if:
      
       - a loop is cache-hot and its surroundings are not.
      
       - if the loop is so cache hot that the instruction
         flow becomes x86 decoder bandwidth limited
      
      But loop alignment is harmful if:
      
       - a loop is cache-cold
      
       - a loop's surroundings are cache-hot as well
      
       - two cache-hot loops are close to each other
      
       - if the loop fits into the loop cache
      
       - if the code flow is not decoder bandwidth limited
      
      and I'd argue that the latter five scenarios are much
      more common in the kernel, as our hottest loops are
      typically:
      
       - pointer chasing: this should fit into the loop cache
         in most cases and is typically data cache and address
         generation limited
      
       - generic memory ops (memset, memcpy, etc.): these generally
         fit into the loop cache as well, and are likewise data
         cache limited.
      
      So this patch packs loop addresses tightly as well.
      
      Acked-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Link: http://lkml.kernel.org/r/20150410123017.GB19918@gmail.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      52648e83
  2. May 15, 2015
    • Ingo Molnar's avatar
      x86: Align jump targets to 1-byte boundaries · be6cb027
      Ingo Molnar authored
      
      
      The following NOP in a hot function caught my attention:
      
        >   5a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
      
      That's a dead NOP that bloats the function a bit, added for the
      default 16-byte alignment that GCC applies for jump targets.
      
      I realize that x86 CPU manufacturers recommend 16-byte jump
      target alignments (it's in the Intel optimization manual),
      to help their relatively narrow decoder prefetch alignment
      and uop cache constraints, but the cost of that is very
      significant:
      
              text           data       bss         dec      filename
          12566391        1617840   1089536    15273767      vmlinux.align.16-byte
          12224951        1617840   1089536    14932327      vmlinux.align.1-byte
      
      By using 1-byte jump target alignment (i.e. no alignment at all)
      we get an almost 3% reduction in kernel size (!) - and a
      probably similar reduction in I$ footprint.
      
      Now, the usual justification for jump target alignment is the
      following:
      
       - modern decoders tend to have 16-byte (effective) decoder
         prefetch windows. (AMD documents it higher but measurements
         suggest the effective prefetch window on curretn uarchs is
         still around 16 bytes)
      
       - on Intel there's also the uop-cache with cachelines that have
         16-byte granularity and limited associativity.
      
       - older x86 uarchs had a penalty for decoder fetches that crossed
         16-byte boundaries. These limits are mostly gone from recent
         uarchs.
      
      So if a forward jump target is aligned to cacheline boundary then
      prefetches will start from a new prefetch-cacheline and there's
      higher chance for decoding in fewer steps and packing tightly.
      
      But I think that argument is flawed for typical optimized kernel
      code flows: forward jumps often go to 'cold' (uncommon) pieces
      of code, and  aligning cold code to cache lines does not bring a
      lot of advantages  (they are uncommon), while it causes
      collateral damage:
      
       - their alignment 'spreads out' the cache footprint, it shifts
         followup hot code further out
      
       - plus it slows down even 'cold' code that immediately follows 'hot'
         code (like in the above case), which could have benefited from the
         partial cacheline that comes off the end of hot code.
      
      But even in the cache-hot case the 16 byte alignment brings
      disadvantages:
      
       - it spreads out the cache footprint, possibly making the code
         fall out of the L1 I$.
      
       - On Intel CPUs, recent microarchitectures have plenty of
         uop cache (typically doubling every 3 years) - while the
         size of the L1 cache grows much less aggressively. So
         workloads are rarely uop cache limited.
      
      The only situation where alignment might matter are tight
      loops that could fit into a single 16 byte chunk - but those
      are pretty rare in the kernel: if they exist they tend
      to be pointer chasing or generic memory ops, which both tend
      to be cache miss (or cache allocation) intensive and are not
      decoder bandwidth limited.
      
      So the balance of arguments strongly favors packing kernel
      instructions tightly versus maximizing for decoder bandwidth:
      this patch changes the jump target alignment from 16 bytes
      to 1 byte (tightly packed, unaligned).
      
      Acked-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Link: http://lkml.kernel.org/r/20150410120846.GA17101@gmail.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      be6cb027
  3. May 14, 2015
  4. May 11, 2015
    • Borislav Petkov's avatar
      x86/alternatives: Switch AMD F15h and later to the P6 NOPs · f21262b8
      Borislav Petkov authored
      
      
      Software optimization guides for both F15h and F16h cite those
      NOPs as the optimal ones. A microbenchmark confirms that
      actually even older families are better with the single-insn
      NOPs so switch to them for the alternatives.
      
      Cycles count below includes the loop overhead of the measurement
      but that overhead is the same with all runs.
      
      	F10h, revE:
      	-----------
      	Running NOP tests, 1000 NOPs x 1000000 repetitions
      
      	K8:
      			      90     288.212282 cycles
      			   66 90     288.220840 cycles
      			66 66 90     288.219447 cycles
      		     66 66 66 90     288.223204 cycles
      		  66 66 90 66 90     571.393424 cycles
      	       66 66 90 66 66 90     571.374919 cycles
      	    66 66 66 90 66 66 90     572.249281 cycles
      	 66 66 66 90 66 66 66 90     571.388651 cycles
      
      	P6:
      			      90     288.214193 cycles
      			   66 90     288.225550 cycles
      			0f 1f 00     288.224441 cycles
      		     0f 1f 40 00     288.225030 cycles
      		  0f 1f 44 00 00     288.233558 cycles
      	       66 0f 1f 44 00 00     324.792342 cycles
      	    0f 1f 80 00 00 00 00     325.657462 cycles
      	 0f 1f 84 00 00 00 00 00     430.246643 cycles
      
      	F14h:
      	----
      	Running NOP tests, 1000 NOPs x 1000000 repetitions
      
      	K8:
      			      90     510.404890 cycles
      			   66 90     510.432117 cycles
      			66 66 90     510.561858 cycles
      		     66 66 66 90     510.541865 cycles
      		  66 66 90 66 90    1014.192782 cycles
      	       66 66 90 66 66 90    1014.226546 cycles
      	    66 66 66 90 66 66 90    1014.334299 cycles
      	 66 66 66 90 66 66 66 90    1014.381205 cycles
      
      	P6:
      			      90     510.436710 cycles
      			   66 90     510.448229 cycles
      			0f 1f 00     510.545100 cycles
      		     0f 1f 40 00     510.502792 cycles
      		  0f 1f 44 00 00     510.589517 cycles
      	       66 0f 1f 44 00 00     510.611462 cycles
      	    0f 1f 80 00 00 00 00     511.166794 cycles
      	 0f 1f 84 00 00 00 00 00     511.651641 cycles
      
      	F15h:
      	-----
      	Running NOP tests, 1000 NOPs x 1000000 repetitions
      
      	K8:
      			      90     243.128396 cycles
      			   66 90     243.129883 cycles
      			66 66 90     243.131631 cycles
      		     66 66 66 90     242.499324 cycles
      		  66 66 90 66 90     481.829083 cycles
      	       66 66 90 66 66 90     481.884413 cycles
      	    66 66 66 90 66 66 90     481.851446 cycles
      	 66 66 66 90 66 66 66 90     481.409220 cycles
      
      	P6:
      			      90     243.127026 cycles
      			   66 90     243.130711 cycles
      			0f 1f 00     243.122747 cycles
      		     0f 1f 40 00     242.497617 cycles
      		  0f 1f 44 00 00     245.354461 cycles
      	       66 0f 1f 44 00 00     361.930417 cycles
      	    0f 1f 80 00 00 00 00     362.844944 cycles
      	 0f 1f 84 00 00 00 00 00     480.514948 cycles
      
      	F16h:
      	-----
      	Running NOP tests, 1000 NOPs x 1000000 repetitions
      
      	K8:
      			      90     507.793298 cycles
      			   66 90     507.789636 cycles
      			66 66 90     507.826490 cycles
      		     66 66 66 90     507.859075 cycles
      		  66 66 90 66 90    1008.663129 cycles
      	       66 66 90 66 66 90    1008.696259 cycles
      	    66 66 66 90 66 66 90    1008.692517 cycles
      	 66 66 66 90 66 66 66 90    1008.755399 cycles
      
      	P6:
      			      90     507.795232 cycles
      			   66 90     507.794761 cycles
      			0f 1f 00     507.834901 cycles
      		     0f 1f 40 00     507.822629 cycles
      		  0f 1f 44 00 00     507.838493 cycles
      	       66 0f 1f 44 00 00     507.908597 cycles
      	    0f 1f 80 00 00 00 00     507.946417 cycles
      	 0f 1f 84 00 00 00 00 00     507.954960 cycles
      
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1431332153-18566-2-git-send-email-bp@alien8.de
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f21262b8
  5. May 10, 2015
  6. May 08, 2015
    • Denys Vlasenko's avatar
      x86/entry: Define 'cpu_current_top_of_stack' for 64-bit code · 3a23208e
      Denys Vlasenko authored
      
      
      32-bit code has PER_CPU_VAR(cpu_current_top_of_stack).
      64-bit code uses somewhat more obscure: PER_CPU_VAR(cpu_tss + TSS_sp0).
      
      Define the 'cpu_current_top_of_stack' macro on CONFIG_X86_64
      as well so that the PER_CPU_VAR(cpu_current_top_of_stack)
      expression can be used in both 32-bit and 64-bit code.
      
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1429889495-27850-3-git-send-email-dvlasenk@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3a23208e
    • Denys Vlasenko's avatar
      x86/entry: Remove unused 'kernel_stack' per-cpu variable · fed7c3f0
      Denys Vlasenko authored
      
      
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1429889495-27850-2-git-send-email-dvlasenk@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fed7c3f0
    • Denys Vlasenko's avatar
      x86/entry: Stop using PER_CPU_VAR(kernel_stack) · 63332a84
      Denys Vlasenko authored
      
      
      PER_CPU_VAR(kernel_stack) is redundant:
      
        - On the 64-bit build, we can use PER_CPU_VAR(cpu_tss + TSS_sp0).
        - On the 32-bit build, we can use PER_CPU_VAR(cpu_current_top_of_stack).
      
      PER_CPU_VAR(kernel_stack) will be deleted by a separate change.
      
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1429889495-27850-1-git-send-email-dvlasenk@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      63332a84
    • Denys Vlasenko's avatar
      x86: Force inlining of atomic ops · 2a4e90b1
      Denys Vlasenko authored
      
      
      With both gcc 4.7.2 and 4.9.2, sometimes gcc mysteriously
      doesn't inline very small functions we expect to be inlined:
      
      $ nm --size-sort vmlinux | grep -iF ' t ' | uniq -c | grep -v '^
      *1 ' | sort -rn     473 000000000000000b t spin_unlock_irqrestore
          449 000000000000005f t rcu_read_unlock
          355 0000000000000009 t atomic_inc                <== THIS
          353 000000000000006e t rcu_read_lock
          350 0000000000000075 t rcu_read_lock_sched_held
          291 000000000000000b t spin_unlock
          266 0000000000000019 t arch_local_irq_restore
          215 000000000000000b t spin_lock
          180 0000000000000011 t kzalloc
          165 0000000000000012 t list_add_tail
          161 0000000000000019 t arch_local_save_flags
          153 0000000000000016 t test_and_set_bit
          134 000000000000000b t spin_unlock_irq
          134 0000000000000009 t atomic_dec                <== THIS
          130 000000000000000b t spin_unlock_bh
          122 0000000000000010 t brelse
          120 0000000000000016 t test_and_clear_bit
          120 000000000000000b t spin_lock_irq
          119 000000000000001e t get_dma_ops
          117 0000000000000053 t cpumask_next
          116 0000000000000036 t kref_get
          114 000000000000001a t schedule_work
          106 000000000000000b t spin_lock_bh
          103 0000000000000019 t arch_local_irq_disable
      ...
      
      Note sizes of marked functions. They are merely 9 bytes long!
      Selecting function with 'atomic' in their names:
      
          355 0000000000000009 t atomic_inc
          134 0000000000000009 t atomic_dec
           98 0000000000000014 t atomic_dec_and_test
           31 000000000000000e t atomic_add_return
           27 000000000000000a t atomic64_inc
           26 000000000000002f t kmap_atomic
           24 0000000000000009 t atomic_add
           12 0000000000000009 t atomic_sub
           10 0000000000000021 t __atomic_add_unless
           10 000000000000000a t atomic64_add
            5 000000000000001f t __atomic_add_unless.constprop.7
            5 000000000000000a t atomic64_dec
            4 000000000000001f t __atomic_add_unless.constprop.18
            4 000000000000001f t __atomic_add_unless.constprop.12
            4 000000000000001f t __atomic_add_unless.constprop.10
            3 000000000000001f t __atomic_add_unless.constprop.13
            3 0000000000000011 t atomic64_add_return
            2 000000000000001f t __atomic_add_unless.constprop.9
            2 000000000000001f t __atomic_add_unless.constprop.8
            2 000000000000001f t __atomic_add_unless.constprop.6
            2 000000000000001f t __atomic_add_unless.constprop.5
            2 000000000000001f t __atomic_add_unless.constprop.3
            2 000000000000001f t __atomic_add_unless.constprop.22
            2 000000000000001f t __atomic_add_unless.constprop.14
            2 000000000000001f t __atomic_add_unless.constprop.11
            2 000000000000001e t atomic_dec_if_positive
            2 0000000000000014 t atomic_inc_and_test
            2 0000000000000011 t atomic_add_return.constprop.4
            2 0000000000000011 t atomic_add_return.constprop.17
            2 0000000000000011 t atomic_add_return.constprop.16
            2 000000000000000d t atomic_inc.constprop.4
            2 000000000000000c t atomic_cmpxchg
      
      This patch fixes this for x86 atomic ops via
      s/inline/__always_inline/. This decreases allyesconfig kernel by
      about 25k:
      
          text     data      bss       dec     hex filename
      82399481 22255416 20627456 125282353 777a831 vmlinux.before
      82375570 22255544 20627456 125258570 7774b4a vmlinux
      
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1431080762-17797-1-git-send-email-dvlasenk@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2a4e90b1
    • Denys Vlasenko's avatar
      x86/asm/entry/64: Clean up usage of TEST insns · 03335e95
      Denys Vlasenko authored
      
      
      By the nature of TEST operation, it is often possible
      to test a narrower part of the operand:
      
          "testl $3, mem"  -> "testb $3, mem"
      
      This results in shorter insns, because TEST insn has no
      sign-entending byte-immediate forms unlike other ALU ops.
      
         text	   data	    bss	    dec	    hex	filename
        11674	      0	      0	  11674	   2d9a	entry_64.o.before
        11658	      0	      0	  11658	   2d8a	entry_64.o
      
      Changes in object code:
      
      -	f7 84 24 88 00 00 00 03 00 00 00 	testl  $0x3,0x88(%rsp)
      +	f6 84 24 88 00 00 00 03	         	testb  $0x3,0x88(%rsp)
      -	f7 44 24 68 03 00 00 00          	testl  $0x3,0x68(%rsp)
      +	f6 44 24 68 03                  	testb  $0x3,0x68(%rsp)
      -	f7 84 24 90 00 00 00 03 00 00 00	testl  $0x3,0x90(%rsp)
      +	f6 84 24 90 00 00 00 03         	testb  $0x3,0x90(%rsp)
      
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1430140912-7960-2-git-send-email-dvlasenk@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      03335e95
    • Denys Vlasenko's avatar
      x86/asm/entry/64: Tidy up JZ insns after TESTs · dde74f2e
      Denys Vlasenko authored
      
      
      After TESTs, use logically correct JZ/JNZ mnemonics instead of
      JE/JNE. This doesn't change code.
      
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1430140912-7960-1-git-send-email-dvlasenk@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      dde74f2e
  7. May 06, 2015
  8. May 05, 2015
  9. May 01, 2015
    • Jiang Liu's avatar
      x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI bus · 2c62e849
      Jiang Liu authored
      An IO port or MMIO resource assigned to a PCI host bridge may be
      consumed by the host bridge itself or available to its child
      bus/devices. The ACPI specification defines a bit (Producer/Consumer)
      to tell whether the resource is consumed by the host bridge itself,
      but firmware hasn't used that bit consistently, so we can't rely on it.
      
      Before commit 593669c2 ("x86/PCI/ACPI: Use common ACPI resource
      interfaces to simplify implementation"), arch/x86/pci/acpi.c ignored
      all IO port resources defined by acpi_resource_io and
      acpi_resource_fixed_io to filter out IO ports consumed by the host
      bridge itself.
      
      Commit 593669c2 ("x86/PCI/ACPI: Use common ACPI resource interfaces
      to simplify implementation") started accepting all IO port and MMIO
      resources, which caused a regression that IO port resources consumed
      by the host bridge itself became available to its child devices.
      
      Then commit 63f1789e ("x86/PCI/ACPI: Ignore resources consumed by
      host bridge itself") ignored resources consumed by the host bridge
      itself by checking the IORESOURCE_WINDOW flag, which accidently removed
      MMIO resources defined by acpi_resource_memory24, acpi_resource_memory32
      and acpi_resource_fixed_memory32.
      
      On x86 and IA64 platforms, all IO port and MMIO resources are assumed
      to be available to child bus/devices except one special case:
          IO port [0xCF8-0xCFF] is consumed by the host bridge itself
          to access PCI configuration space.
      
      So explicitly filter out PCI CFG IO ports[0xCF8-0xCFF]. This solution
      will also ease the way to consolidate ACPI PCI host bridge common code
      from x86, ia64 and ARM64.
      
      Related ACPI table are archived at:
      https://bugzilla.kernel.org/show_bug.cgi?id=94221
      
      Related discussions at:
      http://patchwork.ozlabs.org/patch/461633/
      https://lkml.org/lkml/2015/3/29/304
      
      Fixes: 63f1789e
      
       (Ignore resources consumed by host bridge itself)
      Reported-by: default avatarBernhard Thaler <bernhard.thaler@wvnet.at>
      Signed-off-by: default avatarJiang Liu <jiang.liu@linux.intel.com>
      Cc: 4.0+ <stable@vger.kernel.org> # 4.0+
      Reviewed-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2c62e849
  10. Apr 30, 2015
    • Boris Ostrovsky's avatar
      xen: Suspend ticks on all CPUs during suspend · 2b953a5e
      Boris Ostrovsky authored
      Commit 77e32c89
      
       ("clockevents: Manage device's state separately for
      the core") decouples clockevent device's modes from states. With this
      change when a Xen guest tries to resume, it won't be calling its
      set_mode op which needs to be done on each VCPU in order to make the
      hypervisor aware that we are in oneshot mode.
      
      This happens because clockevents_tick_resume() (which is an intermediate
      step of resuming ticks on a processor) doesn't call clockevents_set_state()
      anymore and because during suspend clockevent devices on all VCPUs (except
      for the one doing the suspend) are left in ONESHOT state. As result, during
      resume the clockevents state machine will assume that device is already
      where it should be and doesn't need to be updated.
      
      To avoid this problem we should suspend ticks on all VCPUs during
      suspend.
      
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      2b953a5e
  11. Apr 27, 2015
    • Paolo Bonzini's avatar
      x86: pvclock: Really remove the sched notifier for cross-cpu migrations · 73459e2a
      Paolo Bonzini authored
      This reverts commits 0a4e6be9
      and 80f7fdb1
      
      .
      
      The task migration notifier was originally introduced in order to support
      the pvclock vsyscall with non-synchronized TSC, but KVM only supports it
      with synchronized TSC.  Hence, on KVM the race condition is only needed
      due to a bad implementation on the host side, and even then it's so rare
      that it's mostly theoretical.
      
      As far as KVM is concerned it's possible to fix the host, avoiding the
      additional complexity in the vDSO and the (re)introduction of the task
      migration notifier.
      
      Xen, on the other hand, hasn't yet implemented vsyscall support at
      all, so we do not care about its plans for non-synchronized TSC.
      
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Suggested-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      73459e2a
    • Radim Krčmář's avatar
      kvm: x86: fix kvmclock update protocol · 5dca0d91
      Radim Krčmář authored
      
      
      The kvmclock spec says that the host will increment a version field to
      an odd number, then update stuff, then increment it to an even number.
      The host is buggy and doesn't do this, and the result is observable
      when one vcpu reads another vcpu's kvmclock data.
      
      There's no good way for a guest kernel to keep its vdso from reading
      a different vcpu's kvmclock data, but we don't need to care about
      changing VCPUs as long as we read a consistent data from kvmclock.
      (VCPU can change outside of this loop too, so it doesn't matter if we
      return a value not fit for this VCPU.)
      
      Based on a patch by Radim Krčmář.
      
      Reviewed-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Acked-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5dca0d91
    • Andy Lutomirski's avatar
      x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue · 61f01dd9
      Andy Lutomirski authored
      AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with
      SS == 0 results in an invalid usermode state in which SS is apparently
      equal to __USER_DS but causes #SS if used.
      
      Work around the issue by setting SS to __KERNEL_DS __switch_to, thus
      ensuring that SYSRET never happens with SS set to NULL.
      
      This was exposed by a recent vDSO cleanup.
      
      Fixes: e7d6eefa
      
       x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      61f01dd9
  12. Apr 24, 2015
    • Linus Torvalds's avatar
      x86: fix special __probe_kernel_write() tail zeroing case · d869844b
      Linus Torvalds authored
      Commit cae2a173
      
       ("x86: clean up/fix 'copy_in_user()' tail zeroing")
      fixed the failure case tail zeroing of one special case of the x86-64
      generic user-copy routine, namely when used for the user-to-user case
      ("copy_in_user()").
      
      But in the process it broke an even more unusual case: using the user
      copy routine for kernel-to-kernel copying.
      
      Now, normally kernel-kernel copies are obviously done using memcpy(),
      but we have a couple of special cases when we use the user-copy
      functions.  One is when we pass a kernel buffer to a regular user-buffer
      routine, using set_fs(KERNEL_DS).  That's a "normal" case, and continued
      to work fine, because it never takes any faults (with the possible
      exception of a silent and successful vmalloc fault).
      
      But Jan Beulich pointed out another, very unusual, special case: when we
      use the user-copy routines not because it's a path that expects a user
      pointer, but for a couple of ftrace/kgdb cases that want to do a kernel
      copy, but do so using "unsafe" buffers, and use the user-copy routine to
      gracefully handle faults.  IOW, for probe_kernel_write().
      
      And that broke for the case of a faulting kernel destination, because we
      saw the kernel destination and wanted to try to clear the tail of the
      buffer.  Which doesn't work, since that's what faults.
      
      This only triggers for things like kgdb and ftrace users (eg trying
      setting a breakpoint on read-only memory), but it's definitely a bug.
      The fix is to not compare against the kernel address start (TASK_SIZE),
      but instead use the same limits "access_ok()" uses.
      
      Reported-and-tested-by: default avatarJan Beulich <jbeulich@suse.com>
      Cc: stable@vger.kernel.org # 4.0
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d869844b
    • Ard Biesheuvel's avatar
      crypto: x86/sha512_ssse3 - fixup for asm function prototype change · 00425bb1
      Ard Biesheuvel authored
      Patch e68410eb ("crypto: x86/sha512_ssse3 - move SHA-384/512
      SSSE3 implementation to base layer") changed the prototypes of the
      core asm SHA-512 implementations so that they are compatible with
      the prototype used by the base layer.
      
      However, in one instance, the register that was used for passing the
      input buffer was reused as a scratch register later on in the code,
      and since the input buffer param changed places with the digest param
      -which needs to be written back before the function returns- this
      resulted in the scratch register to be dereferenced in a memory write
      operation, causing a GPF.
      
      Fix this by changing the scratch register to use the same register as
      the input buffer param again.
      
      Fixes: e68410eb
      
       ("crypto: x86/sha512_ssse3 - move SHA-384/512 SSSE3 implementation to base layer")
      Reported-By: default avatarBobby Powers <bobbypowers@gmail.com>
      Tested-By: default avatarBobby Powers <bobbypowers@gmail.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      00425bb1
  13. Apr 22, 2015
    • Denys Vlasenko's avatar
      x86/asm/entry/32: Update -ENOSYS handling to match the 64-bit logic · 3f5159a9
      Denys Vlasenko authored
      
      
      Recently Andy changed the 64-bit syscall logic so that
      pt_regs->ax is initially set to -ENOSYS, and on syscall exit,
      it is updated with the actual return value. This simplified
      the logic there.
      
      This patch does the same for 32-bit syscall entry points.
      
      The check for %rax being too big is moved to be just before
      the call instruction which dispatches execution through the
      syscall table.
      
      There is no way to accidentally skip this check now by jumping
      to a label after it. This allows us to remove redundant checks
      after ptrace et al.
      
      If %rax is too big, we just skip over the (call, write %rax to
      pt_regs->ax) instruction pair. pt_regs->ax remains set to -ENOSYS,
      and it gets returned to userspace.
      
      Similar to 64-bit code, this eliminates the "ia32_badsys" code path.
      
      Run-tested.
      
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1429632194-13445-2-git-send-email-dvlasenk@redhat.com
      
      
      [ Changelog massage. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3f5159a9
    • Denys Vlasenko's avatar
      x86/asm/entry/64: Merge 32-bit execve stubs with x32 ones, as they are identical · ac7f5dfb
      Denys Vlasenko authored
      
      
      Run-tested.
      
      Suggested-by: default avatarBrian Gerst <brgerst@gmail.com>
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1429632194-13445-1-git-send-email-dvlasenk@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ac7f5dfb
    • Denys Vlasenko's avatar
      x86/asm/entry/64: Implement better check for canonical addresses · 17be0aec
      Denys Vlasenko authored
      
      
      This change makes the check exact (no more false positives
      on "negative" addresses).
      
      Andy explains:
      
       "Canonical addresses either start with 17 zeros or 17 ones.
      
        In the old code, we checked that the top (64-47) = 17 bits were all
        zero.  We did this by shifting right by 47 bits and making sure that
        nothing was left.
      
        In the new code, we're shifting left by (64 - 48) = 16 bits and then
        signed shifting right by the same amount, this propagating the 17th
        highest bit to all positions to its left.  If we get the same value we
        started with, then we're good to go."
      
      While it isn't really important to be fully correct here -
      almost all addresses we'll ever see will be userspace ones,
      but OTOH it looks to be cheap enough: the new code uses
      two more ALU ops but preserves %rcx, allowing to not
      reload it from pt_regs->cx again.
      
      On disassembly level, the changes are:
      
        cmp %rcx,0x80(%rsp) -> mov 0x80(%rsp),%r11; cmp %rcx,%r11
        shr $0x2f,%rcx      -> shl $0x10,%rcx; sar $0x10,%rcx; cmp %rcx,%r11
        mov 0x58(%rsp),%rcx -> (eliminated)
      
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1429633649-20169-1-git-send-email-dvlasenk@redhat.com
      
      
      [ Changelog massage. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      17be0aec
    • Sonny Rao's avatar
      perf/x86/intel/uncore: Move PCI IDs for IMC to uncore driver · 0140e614
      Sonny Rao authored
      
      
      This keeps all the related PCI IDs together in the driver where
      they are used.
      
      Signed-off-by: default avatarSonny Rao <sonnyrao@chromium.org>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1429644791-25724-1-git-send-email-sonnyrao@chromium.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0140e614
    • Sonny Rao's avatar
      perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile... · 80bcffb3
      Sonny Rao authored
      
      perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile Processor) IMC uncore PMUs
      
      This uncore is the same as the Haswell desktop part but uses a
      different PCI ID.
      
      Signed-off-by: default avatarSonny Rao <sonnyrao@chromium.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1429569247-16697-1-git-send-email-sonnyrao@chromium.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      80bcffb3
    • Jiri Olsa's avatar
      perf/x86/intel: Add cpu_(prepare|starting|dying) for core_pmu · 3b6e0421
      Jiri Olsa authored
      
      
      The core_pmu does not define cpu_* callbacks, which handles
      allocation of 'struct cpu_hw_events::shared_regs' data,
      initialization of debug store and PMU_FL_EXCL_CNTRS counters.
      
      While this probably won't happen on bare metal, virtual CPU can
      define x86_pmu.extra_regs together with PMU version 1 and thus
      be using core_pmu -> using shared_regs data without it being
      allocated. That could could leave to following panic:
      
      	BUG: unable to handle kernel NULL pointer dereference at (null)
      	IP: [<ffffffff8152cd4f>] _spin_lock_irqsave+0x1f/0x40
      
      	SNIP
      
      	 [<ffffffff81024bd9>] __intel_shared_reg_get_constraints+0x69/0x1e0
      	 [<ffffffff81024deb>] intel_get_event_constraints+0x9b/0x180
      	 [<ffffffff8101e815>] x86_schedule_events+0x75/0x1d0
      	 [<ffffffff810586dc>] ? check_preempt_curr+0x7c/0x90
      	 [<ffffffff810649fe>] ? try_to_wake_up+0x24e/0x3e0
      	 [<ffffffff81064ba2>] ? default_wake_function+0x12/0x20
      	 [<ffffffff8109eb16>] ? autoremove_wake_function+0x16/0x40
      	 [<ffffffff810577e9>] ? __wake_up_common+0x59/0x90
      	 [<ffffffff811a9517>] ? __d_lookup+0xa7/0x150
      	 [<ffffffff8119db5f>] ? do_lookup+0x9f/0x230
      	 [<ffffffff811a993a>] ? dput+0x9a/0x150
      	 [<ffffffff8119c8f5>] ? path_to_nameidata+0x25/0x60
      	 [<ffffffff8119e90a>] ? __link_path_walk+0x7da/0x1000
      	 [<ffffffff8101d8f9>] ? x86_pmu_add+0xb9/0x170
      	 [<ffffffff8101d7a7>] x86_pmu_commit_txn+0x67/0xc0
      	 [<ffffffff811b07b0>] ? mntput_no_expire+0x30/0x110
      	 [<ffffffff8119c731>] ? path_put+0x31/0x40
      	 [<ffffffff8107c297>] ? current_fs_time+0x27/0x30
      	 [<ffffffff8117d170>] ? mem_cgroup_get_reclaim_stat_from_page+0x20/0x70
      	 [<ffffffff8111b7aa>] group_sched_in+0x13a/0x170
      	 [<ffffffff81014a29>] ? sched_clock+0x9/0x10
      	 [<ffffffff8111bac8>] ctx_sched_in+0x2e8/0x330
      	 [<ffffffff8111bb7b>] perf_event_sched_in+0x6b/0xb0
      	 [<ffffffff8111bc36>] perf_event_context_sched_in+0x76/0xc0
      	 [<ffffffff8111eb3b>] perf_event_comm+0x1bb/0x2e0
      	 [<ffffffff81195ee9>] set_task_comm+0x69/0x80
      	 [<ffffffff81195fe1>] setup_new_exec+0xe1/0x2e0
      	 [<ffffffff811ea68e>] load_elf_binary+0x3ce/0x1ab0
      
      Adding cpu_(prepare|starting|dying) for core_pmu to have
      shared_regs data allocated for core_pmu. AFAICS there's no harm
      to initialize debug store and PMU_FL_EXCL_CNTRS either for
      core_pmu.
      
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20150421152623.GC13169@krava.redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3b6e0421
    • Hagen Paul Pfeifer's avatar
      x86/asm: Always inline atomics · 3462bd2a
      Hagen Paul Pfeifer authored
      
      
      During some code analysis I realized that atomic_add(), atomic_sub()
      and friends are not necessarily inlined AND that each function
      is defined multiple times:
      
      	atomic_inc:          544 duplicates
      	atomic_dec:          215 duplicates
      	atomic_dec_and_test: 107 duplicates
      	atomic64_inc:         38 duplicates
      	[...]
      
      Each definition is exact equally, e.g.:
      
      	ffffffff813171b8 <atomic_add>:
      	55         push   %rbp
      	48 89 e5   mov    %rsp,%rbp
      	f0 01 3e   lock add %edi,(%rsi)
      	5d         pop    %rbp
      	c3         retq
      
      In turn each definition has one or more callsites (sure):
      
      	ffffffff81317c78: e8 3b f5 ff ff  callq  ffffffff813171b8 <atomic_add> [...]
      	ffffffff8131a062: e8 51 d1 ff ff  callq  ffffffff813171b8 <atomic_add> [...]
      	ffffffff8131a190: e8 23 d0 ff ff  callq  ffffffff813171b8 <atomic_add> [...]
      
      The other way around would be to remove the static linkage - but
      I prefer an enforced inlining here.
      
      	Before:
      	  text     data	  bss      dec       hex     filename
      	  81467393 19874720 20168704 121510817 73e1ba1 vmlinux.orig
      
      	After:
      	  text     data     bss      dec       hex     filename
      	  81461323 19874720 20168704 121504747 73e03eb vmlinux.inlined
      
      Yes, the inlining here makes the kernel even smaller! ;)
      
      Linus further observed:
      
      	"I have this memory of having seen that before - the size
      	 heuristics for gcc getting confused by inlining.
      	 [...]
      
      	 It might be a good idea to mark things that are basically just
      	 wrappers around a single (or a couple of) asm instruction to be
      	 always_inline."
      
      Signed-off-by: default avatarHagen Paul Pfeifer <hagen@jauu.net>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1429565231-4609-1-git-send-email-hagen@jauu.net
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3462bd2a
    • Andy Lutomirski's avatar
      x86, paravirt, xen: Remove the 64-bit ->irq_enable_sysexit() pvop · aac82d31
      Andy Lutomirski authored
      
      
      We don't use irq_enable_sysexit on 64-bit kernels any more.
      Remove all the paravirt and Xen machinery to support it on
      64-bit kernels.
      
      Tested-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/8a03355698fe5b94194e9e7360f19f91c1b2cf1f.1428100853.git.luto@kernel.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      aac82d31
    • Ben Serebrin's avatar
      KVM: VMX: Preserve host CR4.MCE value while in guest mode. · 085e68ee
      Ben Serebrin authored
      
      
      The host's decision to enable machine check exceptions should remain
      in force during non-root mode.  KVM was writing 0 to cr4 on VCPU reset
      and passed a slightly-modified 0 to the vmcs.guest_cr4 value.
      
      Tested: Built.
      On earlier version, tested by injecting machine check
      while a guest is spinning.
      
      Before the change, if guest CR4.MCE==0, then the machine check is
      escalated to Catastrophic Error (CATERR) and the machine dies.
      If guest CR4.MCE==1, then the machine check causes VMEXIT and is
      handled normally by host Linux. After the change, injecting a machine
      check causes normal Linux machine check handling.
      
      Signed-off-by: default avatarBen Serebrin <serebrin@google.com>
      Reviewed-by: default avatarVenkatesh Srinivas <venkateshs@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      085e68ee
  14. Apr 20, 2015