Skip to content
  1. Oct 13, 2015
  2. Oct 04, 2015
  3. Oct 03, 2015
    • Matt Bennett's avatar
      MIPS: Octeon: Fix kernel panic on startup from memory corruption · 66803dd9
      Matt Bennett authored
      
      
      During development it was found that a number of builds would panic
      during the kernel init process, more specifically in 'delayed_fput()'.
      The panic showed the kernel trying to access a memory address of
      '0xb7fdc00' while traversing the 'delayed_fput_list' structure.
      Comparing this memory address to the value of the pointer used on
      builds that did not panic confirmed that the pointer on crashing
      builds must have been corrupted at some stage earlier in the init
      process.
      
      By traversing the list earlier and earlier in the code it was found
      that 'plat_mem_setup()' was responsible for corrupting the list.
      Specifically the line:
      
          memory = cvmx_bootmem_phy_alloc(mem_alloc_size,
      			__pa_symbol(&__init_end), -1,
      			0x100000,
      			CVMX_BOOTMEM_FLAG_NO_LOCKING);
      
      Which would eventually call:
      
          cvmx_bootmem_phy_set_size(new_ent_addr,
      		cvmx_bootmem_phy_get_size
      		(ent_addr) -
      		(desired_min_addr -
      			ent_addr));
      
      Where 'new_ent_addr'=0x4800000 (the address of 'delayed_fput_list')
      and the second argument (size)=0xb7fdc00 (the address causing the
      kernel panic). The job of this part of 'plat_mem_setup()' is to
      allocate chunks of memory for the kernel to use. At the start of
      each chunk of memory the size of the chunk is written, hence the
      value 0xb7fdc00 is written onto memory at 0x4800000, therefore the
      kernel panics when it goes back to access 'delayed_fput_list' later
      on in the initialisation process.
      
      On builds that were not crashing it was found that the compiler had
      placed 'delayed_fput_list' at 0x4800008, meaning it wasn't corrupted
      (but something else in memory was overwritten).
      
      As can be seen in the first function call above the code begins to
      allocate chunks of memory beginning from the symbol '__init_end'.
      The MIPS linker script (vmlinux.lds.S) however defines the .bss
      section to begin after '__init_end'. Therefore memory within the
      .bss section is allocated to the kernel to use (System.map shows
      'delayed_fput_list' and other kernel structures to be in .bss).
      
      To stop the kernel panic (and the .bss section being corrupted)
      memory should begin being allocated from the symbol '_end'.
      
      Signed-off-by: default avatarMatt Bennett <matt.bennett@alliedtelesis.co.nz>
      Acked-by: default avatarDavid Daney <david.daney@cavium.com>
      Cc: linux-mips@linux-mips.org
      Cc: aleksey.makarov@auriga.com
      Patchwork: https://patchwork.linux-mips.org/patch/11251/
      
      
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      66803dd9
    • Paul Burton's avatar
      MIPS: Fix R2300 FP context switch handling · 085c2f25
      Paul Burton authored
      Commit 1a3d5957 ("MIPS: Tidy up FPU context switching") removed FP
      context saving from the asm-written resume function in favour of reusing
      existing code to perform the same task. However it only removed the FP
      context saving code from the r4k_switch.S implementation of resume.
      Remove it from the r2300_switch.S implementation too in order to prevent
      attempting to save the FP context twice, which would likely lead to an
      exception from the second save because the FPU had already been disabled
      by the first save.
      
      This patch has only been build tested, using rbtx49xx_defconfig.
      
      Fixes: 1a3d5957
      
       ("MIPS: Tidy up FPU context switching")
      Signed-off-by: default avatarPaul Burton <paul.burton@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: Maciej W. Rozycki <macro@linux-mips.org>
      Cc: linux-kernel@vger.kernel.org
      Cc: Manuel Lauss <manuel.lauss@gmail.com>
      Patchwork: https://patchwork.linux-mips.org/patch/11167/
      
      
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      085c2f25
    • Paul Burton's avatar
      MIPS: Fix octeon FP context switch handling · 0fa24340
      Paul Burton authored
      Commit 1a3d5957 ("MIPS: Tidy up FPU context switching") removed FP
      context saving from the asm-written resume function in favour of reusing
      existing code to perform the same task. However it only removed the FP
      context saving code from the r4k_switch.S implementation of resume.
      Octeon uses its own implementation in octeon_switch.S, so remove FP
      context saving there too in order to prevent attempting to save context
      twice. That formerly led to an exception from the second save as follows
      because the FPU had already been disabled by the first save:
      
          do_cpu invoked from kernel context![#1]:
          CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.3.0-rc2-dirty #2
          task: 800000041f84a008 ti: 800000041f864000 task.ti: 800000041f864000
          $ 0   : 0000000000000000 0000000010008ce1 0000000000100000 ffffffffbfffffff
          $ 4   : 800000041f84a008 800000041f84ac08 800000041f84c000 0000000000000004
          $ 8   : 0000000000000001 0000000000000000 0000000000000000 0000000000000001
          $12   : 0000000010008ce3 0000000000119c60 0000000000000036 800000041f864000
          $16   : 800000041f84ac08 800000000792ce80 800000041f84a008 ffffffff81758b00
          $20   : 0000000000000000 ffffffff8175ae50 0000000000000000 ffffffff8176c740
          $24   : 0000000000000006 ffffffff81170300
          $28   : 800000041f864000 800000041f867d90 0000000000000000 ffffffff815f3fa0
          Hi    : 0000000000fa8257
          Lo    : ffffffffe15cfc00
          epc   : ffffffff8112821c resume+0x9c/0x200
          ra    : ffffffff815f3fa0 __schedule+0x3f0/0x7d8
          Status: 10008ce2        KX SX UX KERNEL EXL
          Cause : 1080002c (ExcCode 0b)
          PrId  : 000d0601 (Cavium Octeon+)
          Modules linked in:
          Process kthreadd (pid: 2, threadinfo=800000041f864000, task=800000041f84a008, tls=0000000000000000)
          Stack : ffffffff81604218 ffffffff815f7e08 800000041f84a008 ffffffff811681b0
                    800000041f84a008 ffffffff817e9878 0000000000000000 ffffffff81770000
                    ffffffff81768340 ffffffff81161398 0000000000000001 0000000000000000
                    0000000000000000 ffffffff815f4424 0000000000000000 ffffffff81161d68
                    ffffffff81161be8 0000000000000000 0000000000000000 0000000000000000
                    0000000000000000 0000000000000000 0000000000000000 ffffffff8111e16c
                    0000000000000000 0000000000000000 0000000000000000 0000000000000000
                    0000000000000000 0000000000000000 0000000000000000 0000000000000000
                    0000000000000000 0000000000000000 0000000000000000 0000000000000000
                    0000000000000000 0000000000000000 0000000000000000 0000000000000000
                    ...
          Call Trace:
          [<ffffffff8112821c>] resume+0x9c/0x200
          [<ffffffff815f3fa0>] __schedule+0x3f0/0x7d8
          [<ffffffff815f4424>] schedule+0x34/0x98
          [<ffffffff81161d68>] kthreadd+0x180/0x198
          [<ffffffff8111e16c>] ret_from_kernel_thread+0x14/0x1c
      
      Tested using cavium_octeon_defconfig on an EdgeRouter Lite.
      
      Fixes: 1a3d5957
      
       ("MIPS: Tidy up FPU context switching")
      Reported-by: default avatarAaro Koskinen <aaro.koskinen@nokia.com>
      Signed-off-by: default avatarPaul Burton <paul.burton@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: Aleksey Makarov <aleksey.makarov@auriga.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: Chandrakala Chavva <cchavva@caviumnetworks.com>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Leonid Rosenboim <lrosenboim@caviumnetworks.com>
      Patchwork: https://patchwork.linux-mips.org/patch/11166/
      
      
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      0fa24340
  4. Oct 02, 2015
    • Li Bin's avatar
      arm64: ftrace: fix function_graph tracer panic · ee556d00
      Li Bin authored
      
      
      When function graph tracer is enabled, the following operation
      will trigger panic:
      
      mount -t debugfs nodev /sys/kernel
      echo next_tgid > /sys/kernel/tracing/set_ftrace_filter
      echo function_graph > /sys/kernel/tracing/current_tracer
      ls /proc/
      
      ------------[ cut here ]------------
      [  198.501417] Unable to handle kernel paging request at virtual address cb88537fdc8ba316
      [  198.506126] pgd = ffffffc008f79000
      [  198.509363] [cb88537fdc8ba316] *pgd=00000000488c6003, *pud=00000000488c6003, *pmd=0000000000000000
      [  198.517726] Internal error: Oops: 94000005 [#1] SMP
      [  198.518798] Modules linked in:
      [  198.520582] CPU: 1 PID: 1388 Comm: ls Tainted: G
      [  198.521800] Hardware name: linux,dummy-virt (DT)
      [  198.522852] task: ffffffc0fa9e8000 ti: ffffffc0f9ab0000 task.ti: ffffffc0f9ab0000
      [  198.524306] PC is at next_tgid+0x30/0x100
      [  198.525205] LR is at return_to_handler+0x0/0x20
      [  198.526090] pc : [<ffffffc0002a1070>] lr : [<ffffffc0000907c0>] pstate: 60000145
      [  198.527392] sp : ffffffc0f9ab3d40
      [  198.528084] x29: ffffffc0f9ab3d40 x28: ffffffc0f9ab0000
      [  198.529406] x27: ffffffc000d6a000 x26: ffffffc000b786e8
      [  198.530659] x25: ffffffc0002a1900 x24: ffffffc0faf16c00
      [  198.531942] x23: ffffffc0f9ab3ea0 x22: 0000000000000002
      [  198.533202] x21: ffffffc000d85050 x20: 0000000000000002
      [  198.534446] x19: 0000000000000002 x18: 0000000000000000
      [  198.535719] x17: 000000000049fa08 x16: ffffffc000242efc
      [  198.537030] x15: 0000007fa472b54c x14: ffffffffff000000
      [  198.538347] x13: ffffffc0fada84a0 x12: 0000000000000001
      [  198.539634] x11: ffffffc0f9ab3d70 x10: ffffffc0f9ab3d70
      [  198.540915] x9 : ffffffc0000907c0 x8 : ffffffc0f9ab3d40
      [  198.542215] x7 : 0000002e330f08f0 x6 : 0000000000000015
      [  198.543508] x5 : 0000000000000f08 x4 : ffffffc0f9835ec0
      [  198.544792] x3 : cb88537fdc8ba316 x2 : cb88537fdc8ba306
      [  198.546108] x1 : 0000000000000002 x0 : ffffffc000d85050
      [  198.547432]
      [  198.547920] Process ls (pid: 1388, stack limit = 0xffffffc0f9ab0020)
      [  198.549170] Stack: (0xffffffc0f9ab3d40 to 0xffffffc0f9ab4000)
      [  198.582568] Call trace:
      [  198.583313] [<ffffffc0002a1070>] next_tgid+0x30/0x100
      [  198.584359] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
      [  198.585503] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
      [  198.586574] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
      [  198.587660] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
      [  198.588896] Code: aa0003f5 2a0103f4 b4000102 91004043 (885f7c60)
      [  198.591092] ---[ end trace 6a346f8f20949ac8 ]---
      
      This is because when using function graph tracer, if the traced
      function return value is in multi regs ([x0-x7]), return_to_handler
      may corrupt them. So in return_to_handler, the parameter regs should
      be protected properly.
      
      Cc: <stable@vger.kernel.org> # 3.18+
      Signed-off-by: default avatarLi Bin <huawei.libin@huawei.com>
      Acked-by: default avatarAKASHI Takahiro <takahiro.akashi@linaro.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      ee556d00
    • Ralf Baechle's avatar
      MIPS: BPF: Fix load delay slots. · 0c5d1878
      Ralf Baechle authored
      
      
      The entire bpf_jit_asm.S is written in noreorder mode because "we know
      better" according to a comment.  This also prevented the assembler from
      throwing in the required NOPs for MIPS I processors which have no
      load-use interlock, thus the load's consumer might end up using the
      old value of the register from prior to the load.
      
      Fixed by putting the assembler in reorder mode for just the affected
      load instructions.  This is not enough for gas to actually try to be
      clever by looking at the next instruction and inserting a nop only
      when needed but as the comment said "we know better", so getting gas
      to unconditionally emit a NOP is just right in this case and prevents
      adding further ifdefery.
      
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      0c5d1878
    • Ben Hutchings's avatar
      x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds · f4b4aae1
      Ben Hutchings authored
      
      
      On x32, gcc predefines __x86_64__ but long is only 32-bit.  Use
      __ILP32__ to distinguish x32.
      
      Fixes this compiler error in perf:
      
      	tools/include/asm-generic/bitops/__ffs.h: In function '__ffs':
      	tools/include/asm-generic/bitops/__ffs.h:19:8: error: right shift count >= width of type [-Werror=shift-count-overflow]
      	  word >>= 32;
      	       ^
      
      This isn't sufficient to build perf for x32, though.
      
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1443660043.2730.15.camel@decadent.org.uk
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f4b4aae1
    • Stephen Smalley's avatar
      x86/mm: Set NX on gap between __ex_table and rodata · ab76f7b4
      Stephen Smalley authored
      
      
      Unused space between the end of __ex_table and the start of
      rodata can be left W+x in the kernel page tables.  Extend the
      setting of the NX bit to cover this gap by starting from
      text_end rather than rodata_start.
      
        Before:
        ---[ High Kernel Mapping ]---
        0xffffffff80000000-0xffffffff81000000          16M                               pmd
        0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
        0xffffffff81600000-0xffffffff81754000        1360K     ro                 GLB x  pte
        0xffffffff81754000-0xffffffff81800000         688K     RW                 GLB x  pte
        0xffffffff81800000-0xffffffff81a00000           2M     ro         PSE     GLB NX pmd
        0xffffffff81a00000-0xffffffff81b3b000        1260K     ro                 GLB NX pte
        0xffffffff81b3b000-0xffffffff82000000        4884K     RW                 GLB NX pte
        0xffffffff82000000-0xffffffff82200000           2M     RW         PSE     GLB NX pmd
        0xffffffff82200000-0xffffffffa0000000         478M                               pmd
      
        After:
        ---[ High Kernel Mapping ]---
        0xffffffff80000000-0xffffffff81000000          16M                               pmd
        0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
        0xffffffff81600000-0xffffffff81754000        1360K     ro                 GLB x  pte
        0xffffffff81754000-0xffffffff81800000         688K     RW                 GLB NX pte
        0xffffffff81800000-0xffffffff81a00000           2M     ro         PSE     GLB NX pmd
        0xffffffff81a00000-0xffffffff81b3b000        1260K     ro                 GLB NX pte
        0xffffffff81b3b000-0xffffffff82000000        4884K     RW                 GLB NX pte
        0xffffffff82000000-0xffffffff82200000           2M     RW         PSE     GLB NX pmd
        0xffffffff82200000-0xffffffffa0000000         478M                               pmd
      
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: <stable@vger.kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-sds@tycho.nsa.gov
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ab76f7b4
    • Lee, Chun-Yi's avatar
      x86/kexec: Fix kexec crash in syscall kexec_file_load() · e3c41e37
      Lee, Chun-Yi authored
      
      
      The original bug is a page fault crash that sometimes happens
      on big machines when preparing ELF headers:
      
          BUG: unable to handle kernel paging request at ffffc90613fc9000
          IP: [<ffffffff8103d645>] prepare_elf64_ram_headers_callback+0x165/0x260
      
      The bug is caused by us under-counting the number of memory ranges
      and subsequently not allocating enough ELF header space for them.
      The bug is typically masked on smaller systems, because the ELF header
      allocation is rounded up to the next page.
      
      This patch modifies the code in fill_up_crash_elf_data() by using
      walk_system_ram_res() instead of walk_system_ram_range() to correctly
      count the max number of crash memory ranges. That's because the
      walk_system_ram_range() filters out small memory regions that
      reside in the same page, but walk_system_ram_res() does not.
      
      Here's how I found the bug:
      
      After tracing prepare_elf64_headers() and prepare_elf64_ram_headers_callback(),
      the code uses walk_system_ram_res() to fill-in crash memory regions information
      to the program header, so it counts those small memory regions that
      reside in a page area.
      
      But, when the kernel was using walk_system_ram_range() in
      fill_up_crash_elf_data() to count the number of crash memory regions,
      it filters out small regions.
      
      I printed those small memory regions, for example:
      
        kexec: Get nr_ram ranges. vaddr=0xffff880077592258 paddr=0x77592258, sz=0xdc0
      
      Based on the code in walk_system_ram_range(), this memory region
      will be filtered out:
      
        pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
        end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
        end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE
      
      So, the max_nr_ranges that's counted by the kernel doesn't include
      small memory regions - causing us to under-allocate the required space.
      That causes the page fault crash that happens in a later code path
      when preparing ELF headers.
      
      This bug is not easy to reproduce on small machines that have few
      CPUs, because the allocated page aligned ELF buffer has more free
      space to cover those small memory regions' PT_LOAD headers.
      
      Signed-off-by: default avatarLee, Chun-Yi <jlee@suse.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: kexec@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1443531537-29436-1-git-send-email-jlee@suse.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e3c41e37
    • Andrey Ryabinin's avatar
      arch/x86/include/asm/efi.h: fix build failure · a523841e
      Andrey Ryabinin authored
      With KMEMCHECK=y, KASAN=n:
      
        arch/x86/platform/efi/efi.c:673:3: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration]
        arch/x86/platform/efi/efi_64.c:139:2: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration]
        arch/x86/include/asm/desc.h:121:2: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration]
      
      Don't #undef memcpy if KASAN=n.
      
      Fixes: 769a8089
      
       ("x86, efi, kasan: #undef memset/memcpy/memmove per arch")
      Signed-off-by: default avatarAndrey Ryabinin <ryabinin.a.a@gmail.com>
      Reported-by: default avatarIngo Molnar <mingo@kernel.org>
      Reported-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a523841e
    • Steve Capper's avatar
      arm64: Fix THP protection change logic · 1a541b4e
      Steve Capper authored
      6910fa16 ("arm64: enable PTE type bit in the mask for pte_modify") fixes
      a problem whereby a large block of PROT_NONE mapped memory is
      incorrectly mapped as block descriptors when mprotect is called.
      
      Unfortunately, a subtle bug was introduced by this fix to the THP logic.
      
      If one mmaps a large block of memory, then faults it such that it is
      collapsed into THPs; resulting calls to mprotect on this area of memory
      will lead to incorrect table descriptors being written instead of block
      descriptors. This is because pmd_modify calls pte_modify which is now
      allowed to modify the type of the page table entry.
      
      This patch reverts commit 6910fa16, and
      fixes the problem it was trying to address by adjusting PAGE_NONE to
      represent a table entry. Thus no change in pte type is required when
      moving from PROT_NONE to a different protection.
      
      Fixes: 6910fa16
      
       ("arm64: enable PTE type bit in the mask for pte_modify")
      Cc: <stable@vger.kernel.org> # 4.0+
      Cc: Feng Kan <fkan@apm.com>
      Reported-by: default avatarGanapatrao Kulkarni <Ganapatrao.Kulkarni@caviumnetworks.com>
      Tested-by: default avatarGanapatrao Kulkarni <gkulkarni@caviumnetworks.com>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarSteve Capper <steve.capper@linaro.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      1a541b4e
  5. Oct 01, 2015
    • Ralf Baechle's avatar
      MIPS: BPF: Do all exports of symbols with FEXPORT(). · 1e16a8f1
      Ralf Baechle authored
      
      
      FEXPORT also marks the symbol as code using .type symbol, @function.
      Without objdump -d will output only a hexdump for code following the
      affected symbols.
      
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      1e16a8f1
    • Dirk Müller's avatar
      Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS · d2922422
      Dirk Müller authored
      
      
      The cpu feature flags are not ever going to change, so warning
      everytime can cause a lot of kernel log spam
      (in our case more than 10GB/hour).
      
      The warning seems to only occur when nested virtualization is
      enabled, so it's probably triggered by a KVM bug.  This is a
      sensible and safe change anyway, and the KVM bug fix might not
      be suitable for stable releases anyway.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDirk Mueller <dmueller@suse.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d2922422
    • Paolo Bonzini's avatar
      Revert "KVM: SVM: use NPT page attributes" · fc07e76a
      Paolo Bonzini authored
      This reverts commit 3c2e7f7d
      
      .
      Initializing the mapping from MTRR to PAT values was reported to
      fail nondeterministically, and it also caused extremely slow boot
      (due to caching getting disabled---bug 103321) with assigned devices.
      
      Reported-by: default avatarMarkus Trippelsdorf <markus@trippelsdorf.de>
      Reported-by: default avatarSebastian Schuette <dracon@ewetel.net>
      Cc: stable@vger.kernel.org # 4.2+
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fc07e76a
    • Paolo Bonzini's avatar
      Revert "KVM: svm: handle KVM_X86_QUIRK_CD_NW_CLEARED in svm_get_mt_mask" · bcf166a9
      Paolo Bonzini authored
      This reverts commit 54928303
      
      .
      It builds on the commit that is being reverted next.
      
      Cc: stable@vger.kernel.org # 4.2+
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bcf166a9
    • Paolo Bonzini's avatar
      Revert "KVM: SVM: Sync g_pat with guest-written PAT value" · 625422f6
      Paolo Bonzini authored
      This reverts commit e098223b
      
      ,
      which has a dependency on other commits being reverted.
      
      Cc: stable@vger.kernel.org # 4.2+
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      625422f6
    • Paolo Bonzini's avatar
      Revert "KVM: x86: apply guest MTRR virtualization on host reserved pages" · 606decd6
      Paolo Bonzini authored
      This reverts commit fd717f11
      
      .
      It was reported to cause Machine Check Exceptions (bug 104091).
      
      Reported-by: default avatar <harn-solo@gmx.de>
      Cc: stable@vger.kernel.org # 4.2+
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      606decd6
    • Ard Biesheuvel's avatar
      arm64/efi: Fix boot crash by not padding between EFI_MEMORY_RUNTIME regions · 0ce3cc00
      Ard Biesheuvel authored
      
      
      The new Properties Table feature introduced in UEFIv2.5 may
      split memory regions that cover PE/COFF memory images into
      separate code and data regions. Since these regions only differ
      in the type (runtime code vs runtime data) and the permission
      bits, but not in the memory type attributes (UC/WC/WT/WB), the
      spec does not require them to be aligned to 64 KB.
      
      Since the relative offset of PE/COFF .text and .data segments
      cannot be changed on the fly, this means that we can no longer
      pad out those regions to be mappable using 64 KB pages.
      Unfortunately, there is no annotation in the UEFI memory map
      that identifies data regions that were split off from a code
      region, so we must apply this logic to all adjacent runtime
      regions whose attributes only differ in the permission bits.
      
      So instead of rounding each memory region to 64 KB alignment at
      both ends, only round down regions that are not directly
      preceded by another runtime region with the same type
      attributes. Since the UEFI spec does not mandate that the memory
      map be sorted, this means we also need to sort it first.
      
      Note that this change will result in all EFI_MEMORY_RUNTIME
      regions whose start addresses are not aligned to the OS page
      size to be mapped with executable permissions (i.e., on kernels
      compiled with 64 KB pages). However, since these mappings are
      only active during the time that UEFI Runtime Services are being
      invoked, the window for abuse is rather small.
      
      Tested-by: default avatarMark Salter <msalter@redhat.com>
      Tested-by: Mark Rutland <mark.rutland@arm.com> [UEFI 2.4 only]
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      Reviewed-by: default avatarMark Salter <msalter@redhat.com>
      Reviewed-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: <stable@vger.kernel.org> # v4.0+
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Leif Lindholm <leif.lindholm@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1443218539-7610-3-git-send-email-matt@codeblueprint.co.uk
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0ce3cc00
    • Matt Fleming's avatar
      x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, instead of top-down · a5caa209
      Matt Fleming authored
      Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced
      that signals that the firmware PE/COFF loader supports splitting
      code and data sections of PE/COFF images into separate EFI
      memory map entries. This allows the kernel to map those regions
      with strict memory protections, e.g. EFI_MEMORY_RO for code,
      EFI_MEMORY_XP for data, etc.
      
      Unfortunately, an unwritten requirement of this new feature is
      that the regions need to be mapped with the same offsets
      relative to each other as observed in the EFI memory map. If
      this is not done crashes like this may occur,
      
        BUG: unable to handle kernel paging request at fffffffefe6086dd
        IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd
        Call Trace:
         [<ffffffff8104c90e>] efi_call+0x7e/0x100
         [<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90
         [<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70
         [<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392
         [<ffffffff81f37e1b>] start_kernel+0x38a/0x417
         [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
         [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
      
      Here 0xfffffffefe6086dd refers to an address the firmware
      expects to be mapped but which the OS never claimed was mapped.
      The issue is that included in these regions are relative
      addresses to other regions which were emitted by the firmware
      toolchain before the "splitting" of sections occurred at
      runtime.
      
      Needless to say, we don't satisfy this unwritten requirement on
      x86_64 and instead map the EFI memory map entries in reverse
      order. The above crash is almost certainly triggerable with any
      kernel newer than v3.13 because that's when we rewrote the EFI
      runtime region mapping code, in commit d2f7cbe7
      
       ("x86/efi:
      Runtime services virtual mapping"). For kernel versions before
      v3.13 things may work by pure luck depending on the
      fragmentation of the kernel virtual address space at the time we
      map the EFI regions.
      
      Instead of mapping the EFI memory map entries in reverse order,
      where entry N has a higher virtual address than entry N+1, map
      them in the same order as they appear in the EFI memory map to
      preserve this relative offset between regions.
      
      This patch has been kept as small as possible with the intention
      that it should be applied aggressively to stable and
      distribution kernels. It is very much a bugfix rather than
      support for a new feature, since when EFI_PROPERTIES_TABLE is
      enabled we must map things as outlined above to even boot - we
      have no way of asking the firmware not to split the code/data
      regions.
      
      In fact, this patch doesn't even make use of the more strict
      memory protections available in UEFI v2.5. That will come later.
      
      Suggested-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reported-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chun-Yi <jlee@suse.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: James Bottomley <JBottomley@Odin.com>
      Cc: Lee, Chun-Yi <jlee@suse.com>
      Cc: Leif Lindholm <leif.lindholm@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1443218539-7610-2-git-send-email-matt@codeblueprint.co.uk
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a5caa209