Skip to content
  1. Apr 12, 2018
    • Dave Hansen's avatar
      x86/pti: Leave kernel text global for !PCID · 8c06c774
      Dave Hansen authored
      Global pages are bad for hardening because they potentially let an
      exploit read the kernel image via a Meltdown-style attack which
      makes it easier to find gadgets.
      
      But, global pages are good for performance because they reduce TLB
      misses when making user/kernel transitions, especially when PCIDs
      are not available, such as on older hardware, or where a hypervisor
      has disabled them for some reason.
      
      This patch implements a basic, sane policy: If you have PCIDs, you
      only map a minimal amount of kernel text global.  If you do not have
      PCIDs, you map all kernel text global.
      
      This policy effectively makes PCIDs something that not only adds
      performance but a little bit of hardening as well.
      
      I ran a simple "lseek" microbenchmark[1] to test the benefit on
      a modern Atom microserver.  Most of the benefit comes from applying
      the series before this patch ("entry only"), but there is still a
      signifiant benefit from this patch.
      
        No Global Lines (baseline  ): 6077741 lseeks/sec
        88 Global Lines (entry only): 7528609 lseeks/sec (+23.9%)
        94 Global Lines (this patch): 8433111 lseeks/sec (+38.8%)
      
      [1.] https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c
      
      
      
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180406205518.E3D989EB@viggo.jf.intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8c06c774
    • Dave Hansen's avatar
      x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image · 39114b7a
      Dave Hansen authored
      Summary:
      
      In current kernels, with PTI enabled, no pages are marked Global. This
      potentially increases TLB misses.  But, the mechanism by which the Global
      bit is set and cleared is rather haphazard.  This patch makes the process
      more explicit.  In the end, it leaves us with Global entries in the page
      tables for the areas truly shared by userspace and kernel and increases
      TLB hit rates.
      
      The place this patch really shines in on systems without PCIDs.  In this
      case, we are using an lseek microbenchmark[1] to see how a reasonably
      non-trivial syscall behaves.  Higher is better:
      
        No Global pages (baseline): 6077741 lseeks/sec
        88 Global Pages (this set): 7528609 lseeks/sec (+23.9%)
      
      On a modern Skylake desktop with PCIDs, the benefits are tangible, but not
      huge for a kernel compile (lower is better):
      
        No Global pages (baseline): 186.951 seconds time elapsed  ( +-  0.35% )
        28 Global pages (this set): 185.756 seconds time elapsed  ( +-  0.09% )
                                     -1.195 seconds (-0.64%)
      
      I also re-checked everything using the lseek1 test[1]:
      
        No Global pages (baseline): 15783951 lseeks/sec
        28 Global pages (this set): 16054688 lseeks/sec
      			     +270737 lseeks/sec (+1.71%)
      
      The effect is more visible, but still modest.
      
      Details:
      
      The kernel page tables are inherited from head_64.S which rudely marks
      them as _PAGE_GLOBAL.  For PTI, we have been relying on the grace of
      $DEITY and some insane behavior in pageattr.c to clear _PAGE_GLOBAL.
      This patch tries to do better.
      
      First, stop filtering out "unsupported" bits from being cleared in the
      pageattr code.  It's fine to filter out *setting* these bits but it
      is insane to keep us from clearing them.
      
      Then, *explicitly* go clear _PAGE_GLOBAL from the kernel identity map.
      Do not rely on pageattr to do it magically.
      
      After this patch, we can see that "GLB" shows up in each copy of the
      page tables, that we have the same number of global entries in each
      and that they are the *same* entries.
      
        /sys/kernel/debug/page_tables/current_kernel:11
        /sys/kernel/debug/page_tables/current_user:11
        /sys/kernel/debug/page_tables/kernel:11
      
        9caae8ad6a1fb53aca2407ec037f612d  current_kernel.GLB
        9caae8ad6a1fb53aca2407ec037f612d  current_user.GLB
        9caae8ad6a1fb53aca2407ec037f612d  kernel.GLB
      
      A quick visual audit also shows that all the entries make sense.
      0xfffffe0000000000 is the cpu_entry_area and 0xffffffff81c00000
      is the entry/exit text:
      
        0xfffffe0000000000-0xfffffe0000002000           8K     ro                 GLB NX pte
        0xfffffe0000002000-0xfffffe0000003000           4K     RW                 GLB NX pte
        0xfffffe0000003000-0xfffffe0000006000          12K     ro                 GLB NX pte
        0xfffffe0000006000-0xfffffe0000007000           4K     ro                 GLB x  pte
        0xfffffe0000007000-0xfffffe000000d000          24K     RW                 GLB NX pte
        0xfffffe000002d000-0xfffffe000002e000           4K     ro                 GLB NX pte
        0xfffffe000002e000-0xfffffe000002f000           4K     RW                 GLB NX pte
        0xfffffe000002f000-0xfffffe0000032000          12K     ro                 GLB NX pte
        0xfffffe0000032000-0xfffffe0000033000           4K     ro                 GLB x  pte
        0xfffffe0000033000-0xfffffe0000039000          24K     RW                 GLB NX pte
        0xffffffff81c00000-0xffffffff81e00000           2M     ro         PSE     GLB x  pmd
      
      [1.] https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c
      
      
      
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180406205517.C80FBE05@viggo.jf.intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      39114b7a
    • Dave Hansen's avatar
      x86/pti: Enable global pages for shared areas · 0f561fce
      Dave Hansen authored
      
      
      The entry/exit text and cpu_entry_area are mapped into userspace and
      the kernel.  But, they are not _PAGE_GLOBAL.  This creates unnecessary
      TLB misses.
      
      Add the _PAGE_GLOBAL flag for these areas.
      
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180406205515.2977EE7D@viggo.jf.intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0f561fce
    • Dave Hansen's avatar
      x86/mm: Do not forbid _PAGE_RW before init for __ro_after_init · 639d6aaf
      Dave Hansen authored
      
      
      __ro_after_init data gets stuck in the .rodata section.  That's normally
      fine because the kernel itself manages the R/W properties.
      
      But, if we run __change_page_attr() on an area which is __ro_after_init,
      the .rodata checks will trigger and force the area to be immediately
      read-only, even if it is early-ish in boot.  This caused problems when
      trying to clear the _PAGE_GLOBAL bit for these area in the PTI code:
      it cleared _PAGE_GLOBAL like I asked, but also took it up on itself
      to clear _PAGE_RW.  The kernel then oopses the next time it wrote to
      a __ro_after_init data structure.
      
      To fix this, add the kernel_set_to_readonly check, just like we have
      for kernel text, just a few lines below in this function.
      
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180406205514.8D898241@viggo.jf.intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      639d6aaf
    • Dave Hansen's avatar
      x86/mm: Comment _PAGE_GLOBAL mystery · 430d4005
      Dave Hansen authored
      
      
      I was mystified as to where the _PAGE_GLOBAL in the kernel page tables
      for kernel text came from.  I audited all the places I could find, but
      I missed one: head_64.S.
      
      The page tables that we create in here live for a long time, and they
      also have _PAGE_GLOBAL set, despite whether the processor supports it
      or not.  It's harmless, and we got *lucky* that the pageattr code
      accidentally clears it when we wipe it out of __supported_pte_mask and
      then later try to mark kernel text read-only.
      
      Comment some of these properties to make it easier to find and
      understand in the future.
      
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180406205513.079BB265@viggo.jf.intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      430d4005
    • Dave Hansen's avatar
      x86/mm: Remove extra filtering in pageattr code · 1a54420a
      Dave Hansen authored
      
      
      The pageattr code has a mode where it can set or clear PTE bits in
      existing PTEs, so the page protections of the *new* PTEs come from
      one of two places:
      
        1. The set/clear masks: cpa->mask_clr / cpa->mask_set
        2. The existing PTE
      
      We filter ->mask_set/clr for supported PTE bits at entry to
      __change_page_attr() so we never need to filter them again.
      
      The only other place permissions can come from is an existing PTE
      and those already presumably have good bits.  We do not need to filter
      them again.
      
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180406205511.BC072352@viggo.jf.intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1a54420a
    • Dave Hansen's avatar
      x86/mm: Do not auto-massage page protections · fb43d6cb
      Dave Hansen authored
      
      
      A PTE is constructed from a physical address and a pgprotval_t.
      __PAGE_KERNEL, for instance, is a pgprot_t and must be converted
      into a pgprotval_t before it can be used to create a PTE.  This is
      done implicitly within functions like pfn_pte() by massage_pgprot().
      
      However, this makes it very challenging to set bits (and keep them
      set) if your bit is being filtered out by massage_pgprot().
      
      This moves the bit filtering out of pfn_pte() and friends.  For
      users of PAGE_KERNEL*, filtering will be done automatically inside
      those macros but for users of __PAGE_KERNEL*, they need to do their
      own filtering now.
      
      Note that we also just move pfn_pte/pmd/pud() over to check_pgprot()
      instead of massage_pgprot().  This way, we still *look* for
      unsupported bits and properly warn about them if we find them.  This
      might happen if an unfiltered __PAGE_KERNEL* value was passed in,
      for instance.
      
      - printk format warning fix from: Arnd Bergmann <arnd@arndb.de>
      - boot crash fix from:            Tom Lendacky <thomas.lendacky@amd.com>
      - crash bisected by:              Mike Galbraith <efault@gmx.de>
      
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reported-and-fixed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixed-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Bisected-by: default avatarMike Galbraith <efault@gmx.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180406205509.77E1D7F6@viggo.jf.intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fb43d6cb
  2. Apr 10, 2018
    • Dave Hansen's avatar
      x86/espfix: Document use of _PAGE_GLOBAL · 6baf4bec
      Dave Hansen authored
      
      
      The "normal" kernel page table creation mechanisms using
      PAGE_KERNEL_* page protections will never set _PAGE_GLOBAL with PTI.
      The few places in the kernel that always want _PAGE_GLOBAL must
      avoid using PAGE_KERNEL_*.
      
      Document that we want it here and its use is not accidental.
      
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180406205507.BCF4D4F0@viggo.jf.intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6baf4bec
    • Dave Hansen's avatar
      x86/mm: Introduce "default" kernel PTE mask · 8a57f484
      Dave Hansen authored
      
      
      The __PAGE_KERNEL_* page permissions are "raw".  They contain bits
      that may or may not be supported on the current processor.  They need
      to be filtered by a mask (currently __supported_pte_mask) to turn them
      into a value that we can actually set in a PTE.
      
      These __PAGE_KERNEL_* values all contain _PAGE_GLOBAL.  But, with PTI,
      we want to be able to support _PAGE_GLOBAL (have the bit set in
      __supported_pte_mask) but not have it appear in any of these masks by
      default.
      
      This patch creates a new mask, __default_kernel_pte_mask, and applies
      it when creating all of the PAGE_KERNEL_* masks.  This makes
      PAGE_KERNEL_* safe to use anywhere (they only contain supported bits).
      It also ensures that PAGE_KERNEL_* contains _PAGE_GLOBAL on PTI=n
      kernels but clears _PAGE_GLOBAL when PTI=y.
      
      We also make __default_kernel_pte_mask a non-GPL exported symbol
      because there are plenty of driver-available interfaces that take
      PAGE_KERNEL_* permissions.
      
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180406205506.030DB6B6@viggo.jf.intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8a57f484
    • Dave Hansen's avatar
      x86/mm: Undo double _PAGE_PSE clearing · 606c7193
      Dave Hansen authored
      
      
      When clearing _PAGE_PRESENT on a huge page, we need to be careful
      to also clear _PAGE_PSE, otherwise it might still get confused
      for a valid large page table entry.
      
      We do that near the spot where we *set* _PAGE_PSE.  That's fine,
      but it's unnecessary.  pgprot_large_2_4k() already did it.
      
      BTW, I also noticed that pgprot_large_2_4k() and
      pgprot_4k_2_large() are not symmetric.  pgprot_large_2_4k() clears
      _PAGE_PSE (because it is aliased to _PAGE_PAT) but
      pgprot_4k_2_large() does not put _PAGE_PSE back.  Bummer.
      
      Also, add some comments and change "promote" to "move".  "Promote"
      seems an odd word to move when we are logically moving a bit to a
      lower bit position.  Also add an extra line return to make it clear
      to which line the comment applies.
      
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180406205504.9B0F44A9@viggo.jf.intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      606c7193
    • Dave Hansen's avatar
      x86/mm: Factor out pageattr _PAGE_GLOBAL setting · d1440b23
      Dave Hansen authored
      
      
      The pageattr code has a pattern repeated where it sets _PAGE_GLOBAL
      for present PTEs but clears it for non-present PTEs.  The intention
      is to keep _PAGE_GLOBAL from getting confused with _PAGE_PROTNONE
      since _PAGE_GLOBAL is for present PTEs and _PAGE_PROTNONE is for
      non-present
      
      But, this pattern makes no sense.  Effectively, it says, if you use
      the pageattr code, always set _PAGE_GLOBAL when _PAGE_PRESENT.
      canon_pgprot() will clear it if unsupported (because it masks the
      value with __supported_pte_mask) but we *always* set it. Even if
      canon_pgprot() did not filter _PAGE_GLOBAL, it would be OK.
      _PAGE_GLOBAL is ignored when CR4.PGE=0 by the hardware.
      
      This unconditional setting of _PAGE_GLOBAL is a problem when we have
      PTI and non-PTI and we want some areas to have _PAGE_GLOBAL and some
      not.
      
      This updated version of the code says:
      1. Clear _PAGE_GLOBAL when !_PAGE_PRESENT
      2. Never set _PAGE_GLOBAL implicitly
      3. Allow _PAGE_GLOBAL to be in cpa.set_mask
      4. Allow _PAGE_GLOBAL to be inherited from previous PTE
      
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180406205502.86E199DA@viggo.jf.intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d1440b23
    • Ingo Molnar's avatar
    • Andy Lutomirski's avatar
      x86/entry/64: Drop idtentry's manual stack switch for user entries · 071ccc96
      Andy Lutomirski authored
      For non-paranoid entries, idtentry knows how to switch from the
      kernel stack to the user stack, as does error_entry.  This results
      in pointless duplication and code bloat.  Make idtentry stop
      thinking about stacks for non-paranoid entries.
      
      This reduces text size by 5377 bytes.
      
      This goes back to the following commit:
      
        7f2590a1
      
       ("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
      
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/90aab80c1f906e70742eaa4512e3c9b5e62d59d4.1522794757.git.luto@kernel.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      071ccc96
  3. Apr 05, 2018
    • Dmitry V. Levin's avatar
      x86/uapi: Fix asm/bootparam.h userspace compilation errors · 9820e1c3
      Dmitry V. Levin authored
      
      
      Consistently use types provided by <linux/types.h> to fix the following
      asm/bootparam.h userspace compilation errors:
      
      	/usr/include/asm/bootparam.h:140:2: error: unknown type name 'u16'
      	  u16 version;
      	/usr/include/asm/bootparam.h:141:2: error: unknown type name 'u16'
      	  u16 compatible_version;
      	/usr/include/asm/bootparam.h:142:2: error: unknown type name 'u16'
      	  u16 pm_timer_address;
      	/usr/include/asm/bootparam.h:143:2: error: unknown type name 'u16'
      	  u16 num_cpus;
      	/usr/include/asm/bootparam.h:144:2: error: unknown type name 'u64'
      	  u64 pci_mmconfig_base;
      	/usr/include/asm/bootparam.h:145:2: error: unknown type name 'u32'
      	  u32 tsc_khz;
      	/usr/include/asm/bootparam.h:146:2: error: unknown type name 'u32'
      	  u32 apic_khz;
      	/usr/include/asm/bootparam.h:147:2: error: unknown type name 'u8'
      	  u8 standard_ioapic;
      	/usr/include/asm/bootparam.h:148:2: error: unknown type name 'u8'
      	  u8 cpu_ids[255];
      
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Acked-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Cc: <stable@vger.kernel.org> # v4.16
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 4a362601 ("x86/jailhouse: Add infrastructure for running in non-root cell")
      Link: http://lkml.kernel.org/r/20180405043210.GA13254@altlinux.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9820e1c3
  4. Apr 03, 2018
    • Linus Torvalds's avatar
      Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · bc16d405
      Linus Torvalds authored
      Pull EFI updates from Ingo Molnar:
       "The main EFI changes in this cycle were:
      
         - Fix the apple-properties code (Andy Shevchenko)
      
         - Add WARN() on arm64 if UEFI Runtime Services corrupt the reserved
           x18 register (Ard Biesheuvel)
      
         - Use efi_switch_mm() on x86 instead of manipulating %cr3 directly
           (Sai Praneeth)
      
         - Fix early memremap leak in ESRT code (Ard Biesheuvel)
      
         - Switch to L"xxx" notation for wide string literals (Ard Biesheuvel)
      
         - ... plus misc other cleanups and bugfixes"
      
      * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3
        x86/efi: Replace efi_pgd with efi_mm.pgd
        efi: Use string literals for efi_char16_t variable initializers
        efi/esrt: Fix handling of early ESRT table mapping
        efi: Use efi_mm in x86 as well as ARM
        efi: Make const array 'apple' static
        efi/apple-properties: Use memremap() instead of ioremap()
        efi: Reorder pr_notice() with add_device_randomness() call
        x86/efi: Replace GFP_ATOMIC with GFP_KERNEL in efi_query_variable_store()
        efi/arm64: Check whether x18 is preserved by runtime services calls
        efi/arm*: Stop printing addresses of virtual mappings
        efi/apple-properties: Remove redundant attribute initialization from unmarshal_key_value_pairs()
        efi/arm*: Only register page tables when they exist
      bc16d405
    • Linus Torvalds's avatar
      Merge branch 'x86-dma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2fcd2b30
      Linus Torvalds authored
      Pull x86 dma mapping updates from Ingo Molnar:
       "This tree, by Christoph Hellwig, switches over the x86 architecture to
        the generic dma-direct and swiotlb code, and also unifies more of the
        dma-direct code between architectures. The now unused x86-only
        primitives are removed"
      
      * 'x86-dma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        dma-mapping: Don't clear GFP_ZERO in dma_alloc_attrs
        swiotlb: Make swiotlb_{alloc,free}_buffer depend on CONFIG_DMA_DIRECT_OPS
        dma/swiotlb: Remove swiotlb_{alloc,free}_coherent()
        dma/direct: Handle force decryption for DMA coherent buffers in common code
        dma/direct: Handle the memory encryption bit in common code
        dma/swiotlb: Remove swiotlb_set_mem_attributes()
        set_memory.h: Provide set_memory_{en,de}crypted() stubs
        x86/dma: Remove dma_alloc_coherent_gfp_flags()
        iommu/intel-iommu: Enable CONFIG_DMA_DIRECT_OPS=y and clean up intel_{alloc,free}_coherent()
        iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}()
        x86/dma/amd_gart: Use dma_direct_{alloc,free}()
        x86/dma/amd_gart: Look at dev->coherent_dma_mask instead of GFP_DMA
        x86/dma: Use generic swiotlb_ops
        x86/dma: Use DMA-direct (CONFIG_DMA_DIRECT_OPS=y)
        x86/dma: Remove dma_alloc_coherent_mask()
      2fcd2b30
    • Linus Torvalds's avatar
      Merge branch 'sched-wait-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ce6eba3d
      Linus Torvalds authored
      Pull wait_var_event updates from Ingo Molnar:
       "This introduces the new wait_var_event() API, which is a more flexible
        waiting primitive than wait_on_atomic_t().
      
        All wait_on_atomic_t() users are migrated over to the new API and
        wait_on_atomic_t() is removed. The migration fixes one bug and should
        result in no functional changes for the other usecases"
      
      * 'sched-wait-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/wait: Improve __var_waitqueue() code generation
        sched/wait: Remove the wait_on_atomic_t() API
        sched/wait, arch/mips: Fix and convert wait_on_atomic_t() usage to the new wait_var_event() API
        sched/wait, fs/ocfs2: Convert wait_on_atomic_t() usage to the new wait_var_event() API
        sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API
        sched/wait, fs/fscache: Convert wait_on_atomic_t() usage to the new wait_var_event() API
        sched/wait, fs/btrfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API
        sched/wait, fs/afs: Convert wait_on_atomic_t() usage to the new wait_var_event() API
        sched/wait, drivers/media: Convert wait_on_atomic_t() usage to the new wait_var_event() API
        sched/wait, drivers/drm: Convert wait_on_atomic_t() usage to the new wait_var_event() API
        sched/wait: Introduce wait_var_event()
      ce6eba3d
    • Linus Torvalds's avatar
      Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a5532439
      Linus Torvalds authored
      Pull x86 timer updates from Ingo Molnar:
       "Two changes: add the new convert_art_ns_to_tsc() API for upcoming
        Intel Goldmont+ drivers, and remove the obsolete rdtscll() API"
      
      * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/tsc: Get rid of rdtscll()
        x86/tsc: Convert ART in nanoseconds to TSC
      a5532439
    • Linus Torvalds's avatar
      Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · cea061e4
      Linus Torvalds authored
      Pull x86 platform updates from Ingo Molnar:
       "The main changes in this cycle were:
      
         - Add "Jailhouse" hypervisor support (Jan Kiszka)
      
         - Update DeviceTree support (Ivan Gorinov)
      
         - Improve DMI date handling (Andy Shevchenko)"
      
      * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/PCI: Fix a potential regression when using dmi_get_bios_year()
        firmware/dmi_scan: Uninline dmi_get_bios_year() helper
        x86/devicetree: Use CPU description from Device Tree
        of/Documentation: Specify local APIC ID in "reg"
        MAINTAINERS: Add entry for Jailhouse
        x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI
        x86: Consolidate PCI_MMCONFIG configs
        x86: Align x86_64 PCI_MMCONFIG with 32-bit variant
        x86/jailhouse: Enable PCI mmconfig access in inmates
        PCI: Scan all functions when running over Jailhouse
        jailhouse: Provide detection for non-x86 systems
        x86/devicetree: Fix device IRQ settings in DT
        x86/devicetree: Initialize device tree before using it
        pci: Simplify code by using the new dmi_get_bios_year() helper
        ACPI/sleep: Simplify code by using the new dmi_get_bios_year() helper
        x86/pci: Simplify code by using the new dmi_get_bios_year() helper
        dmi: Introduce the dmi_get_bios_year() helper function
        x86/platform/quark: Re-use DEFINE_SHOW_ATTRIBUTE() macro
        x86/platform/atom: Re-use DEFINE_SHOW_ATTRIBUTE() macro
      cea061e4
    • Linus Torvalds's avatar
      Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d22fff81
      Linus Torvalds authored
      Pull x86 mm updates from Ingo Molnar:
      
       - Extend the memmap= boot parameter syntax to allow the redeclaration
         and dropping of existing ranges, and to support all e820 range types
         (Jan H. Schönherr)
      
       - Improve the W+X boot time security checks to remove false positive
         warnings on Xen (Jan Beulich)
      
       - Support booting as Xen PVH guest (Juergen Gross)
      
       - Improved 5-level paging (LA57) support, in particular it's possible
         now to have a single kernel image for both 4-level and 5-level
         hardware (Kirill A. Shutemov)
      
       - AMD hardware RAM encryption support (SME/SEV) fixes (Tom Lendacky)
      
       - Preparatory commits for hardware-encrypted RAM support on Intel CPUs.
         (Kirill A. Shutemov)
      
       - Improved Intel-MID support (Andy Shevchenko)
      
       - Show EFI page tables in page_tables debug files (Andy Lutomirski)
      
       - ... plus misc fixes and smaller cleanups
      
      * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
        x86/cpu/tme: Fix spelling: "configuation" -> "configuration"
        x86/boot: Fix SEV boot failure from change to __PHYSICAL_MASK_SHIFT
        x86/mm: Update comment in detect_tme() regarding x86_phys_bits
        x86/mm/32: Remove unused node_memmap_size_bytes() & CONFIG_NEED_NODE_MEMMAP_SIZE logic
        x86/mm: Remove pointless checks in vmalloc_fault
        x86/platform/intel-mid: Add special handling for ACPI HW reduced platforms
        ACPI, x86/boot: Introduce the ->reduced_hw_early_init() ACPI callback
        ACPI, x86/boot: Split out acpi_generic_reduce_hw_init() and export
        x86/pconfig: Provide defines and helper to run MKTME_KEY_PROG leaf
        x86/pconfig: Detect PCONFIG targets
        x86/tme: Detect if TME and MKTME is activated by BIOS
        x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
        x86/boot/compressed/64: Use page table in trampoline memory
        x86/boot/compressed/64: Use stack from trampoline memory
        x86/boot/compressed/64: Make sure we have a 32-bit code segment
        x86/mm: Do not use paravirtualized calls in native_set_p4d()
        kdump, vmcoreinfo: Export pgtable_l5_enabled value
        x86/boot/compressed/64: Prepare new top-level page table for trampoline
        x86/boot/compressed/64: Set up trampoline memory
        x86/boot/compressed/64: Save and restore trampoline memory
        ...
      d22fff81
    • Linus Torvalds's avatar
      Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 986b37c0
      Linus Torvalds authored
      Pull x86 cleanups and msr updates from Ingo Molnar:
       "The main change is a performance/latency improvement to /dev/msr
        access. The rest are misc cleanups"
      
      * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/msr: Make rdmsrl_safe_on_cpu() scheduling safe as well
        x86/cpuid: Allow cpuid_read() to schedule
        x86/msr: Allow rdmsr_safe_on_cpu() to schedule
        x86/rtc: Stop using deprecated functions
        x86/dumpstack: Unify show_regs()
        x86/fault: Do not print IP in show_fault_oops()
        x86/MSR: Move native_* variants to msr.h
      986b37c0
    • Linus Torvalds's avatar
      Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e68b4bad
      Linus Torvalds authored
      Pull x86 build updates from Ingo Molnar:
       "The biggest change is the forcing of asm-goto support on x86, which
        effectively increases the GCC minimum supported version to gcc-4.5 (on
        x86)"
      
      * 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/build: Don't pass in -D__KERNEL__ multiple times
        x86: Remove FAST_FEATURE_TESTS
        x86: Force asm-goto
        x86/build: Drop superfluous ALIGN from the linker script
      e68b4bad
    • Linus Torvalds's avatar
      Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5e46caf6
      Linus Torvalds authored
      Pull x86 asm fixlets from Ingo Molnar:
       "A clobber list fix and cleanups"
      
      * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/asm: Trim clear_page.S includes
        x86/asm: Clobber flags in clear_page()
      5e46caf6
    • Linus Torvalds's avatar
      Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2451d1e5
      Linus Torvalds authored
      Pull x86 apic updates from Ingo Molnar:
       "The main x86 APIC/IOAPIC changes in this cycle were:
      
         - Robustify kexec support to more carefully restore IRQ hardware
           state before calling into kexec/kdump kernels. (Baoquan He)
      
         - Clean up the local APIC code a bit (Dou Liyang)
      
         - Remove unused callbacks (David Rientjes)"
      
      * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/apic: Finish removing unused callbacks
        x86/apic: Drop logical_smp_processor_id() inline
        x86/apic: Modernize the pending interrupt code
        x86/apic: Move pending interrupt check code into it's own function
        x86/apic: Set up through-local-APIC mode on the boot CPU if 'noapic' specified
        x86/apic: Rename variables and functions related to x86_io_apic_ops
        x86/apic: Remove the (now) unused disable_IO_APIC() function
        x86/apic: Fix restoring boot IRQ mode in reboot and kexec/kdump
        x86/apic: Split disable_IO_APIC() into two functions to fix CONFIG_KEXEC_JUMP=y
        x86/apic: Split out restore_boot_irq_mode() from disable_IO_APIC()
        x86/apic: Make setup_local_APIC() static
        x86/apic: Simplify init_bsp_APIC() usage
        x86/x2apic: Mark set_x2apic_phys_mode() as __init
      2451d1e5
    • Linus Torvalds's avatar
      Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 67dbfc14
      Linus Torvalds authored
      Pull SMP hotplug updates from Ingo Molnar:
       "Simplify the CPU hot-plug state machine"
      
      * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        cpu/hotplug: Fix unused function warning
        cpu/hotplug: Merge cpuhp_bp_states and cpuhp_ap_states
      67dbfc14
    • Linus Torvalds's avatar
      Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 46e0d28b
      Linus Torvalds authored
      Pull scheduler updates from Ingo Molnar:
       "The main scheduler changes in this cycle were:
      
         - NUMA balancing improvements (Mel Gorman)
      
         - Further load tracking improvements (Patrick Bellasi)
      
         - Various NOHZ balancing cleanups and optimizations (Peter Zijlstra)
      
         - Improve blocked load handling, in particular we can now reduce and
           eventually stop periodic load updates on 'very idle' CPUs. (Vincent
           Guittot)
      
         - On isolated CPUs offload the final 1Hz scheduler tick as well, plus
           related cleanups and reorganization. (Frederic Weisbecker)
      
         - Core scheduler code cleanups (Ingo Molnar)"
      
      * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
        sched/core: Update preempt_notifier_key to modern API
        sched/cpufreq: Rate limits for SCHED_DEADLINE
        sched/fair: Update util_est only on util_avg updates
        sched/cpufreq/schedutil: Use util_est for OPP selection
        sched/fair: Use util_est in LB and WU paths
        sched/fair: Add util_est on top of PELT
        sched/core: Remove TASK_ALL
        sched/completions: Use bool in try_wait_for_completion()
        sched/fair: Update blocked load when newly idle
        sched/fair: Move idle_balance()
        sched/nohz: Merge CONFIG_NO_HZ_COMMON blocks
        sched/fair: Move rebalance_domains()
        sched/nohz: Optimize nohz_idle_balance()
        sched/fair: Reduce the periodic update duration
        sched/nohz: Stop NOHZ stats when decayed
        sched/cpufreq: Provide migration hint
        sched/nohz: Clean up nohz enter/exit
        sched/fair: Update blocked load from NEWIDLE
        sched/fair: Add NOHZ stats balancing
        sched/fair: Restructure nohz_balance_kick()
        ...
      46e0d28b
    • Linus Torvalds's avatar
      Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 86bbbeba
      Linus Torvalds authored
      Pull x86 RAS updates from Ingo Molnar:
       "The main changes in this cycle were:
      
         - AMD MCE support/decoding improvements (Yazen Ghannam)
      
         - general MCE header cleanups and reorganization (Borislav Petkov)"
      
      * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
        x86/MCE: Cleanup and complete struct mce fields definitions
        x86/mce/AMD: Carve out SMCA get_block_address() code
        x86/mce/AMD: Get address from already initialized block
        x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type
        x86/mce/AMD: Pass the bank number to smca_get_bank_type()
        x86/mce/AMD: Collect error info even if valid bits are not set
        x86/mce: Issue the 'mcelog --ascii' message only on !AMD
        x86/mce: Convert 'struct mca_config' bools to a bitfield
        x86/mce: Put private structures and definitions into the internal header
      86bbbeba
    • Linus Torvalds's avatar
      Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 486adcea
      Linus Torvalds authored
      Pull perf updates from Ingo Molnar:
       "The main kernel side changes were:
      
         - Modernize the kprobe and uprobe creation/destruction tooling ABIs:
      
           The existing text based APIs (kprobe_events and uprobe_events in
           tracefs), are naive, limited ABIs in that they require user-space
           to clean up after themselves, which is both difficult and fragile
           if the tool is buggy or exits unexpectedly. In other words they are
           not really suited for modern, robust tooling.
      
           So introduce a modern, file descriptor based ABI that does not have
           these limitations: introduce the 'perf_kprobe' and 'perf_uprobe'
           PMUs and extend the perf_event_open() syscall to create events with
           a kprobe/uprobe attached to them. These [k,u]probe are associated
           with this file descriptor, so they are not available in tracefs.
      
           (Song Liu)
      
         - Intel Cannon Lake CPU support (Harry Pan)
      
         - Intel PT cleanups (Alexander Shishkin)
      
         - Improve the performance of pinned/flexible event groups by using RB
           trees (Alexey Budankov)
      
         - Add PERF_EVENT_IOC_MODIFY_ATTRIBUTES which allows the modification
           of hardware breakpoints, which new ABI variant massively speeds up
           existing tooling that uses hardware breakpoints to instrument (and
           debug) memory usage.
      
           (Milind Chabbi, Jiri Olsa)
      
         - Various Intel PEBS handling fixes and improvements, and other Intel
           PMU improvements (Kan Liang)
      
         - Various perf core improvements and optimizations (Peter Zijlstra)
      
         - ... misc cleanups, fixes and updates.
      
        There's over 200 tooling commits, here's an (imperfect) list of
        highlights:
      
         - 'perf annotate' improvements:
      
            * Recognize and handle jumps to other functions as calls, which
              improves the navigation along jumps and back. (Arnaldo Carvalho
              de Melo)
      
            * Add the 'P' hotkey in TUI annotation to dump annotation output
              into a file, to ease e-mail reporting of annotation details.
              (Arnaldo Carvalho de Melo)
      
            * Add an IPC/cycles column to the TUI (Jin Yao)
      
            * Improve s390 assembly annotation (Thomas Richter)
      
            * Refactor the output formatting logic to better separate it into
              interactive and non-interactive features and add the --stdio2
              output variant to demonstrate this. (Arnaldo Carvalho de Melo)
      
         - 'perf script' improvements:
      
            * Add Python 3 support (Jaroslav Škarvada)
      
            * Add --show-round-event (Jiri Olsa)
      
         - 'perf c2c' improvements:
      
            * Add NUMA analysis support (Jiri Olsa)
      
         - 'perf trace' improvements:
      
            * Improve PowerPC support (Ravi Bangoria)
      
         - 'perf inject' improvements:
      
            * Integrate ARM CoreSight traces (Robert Walker)
      
         - 'perf stat' improvements:
      
            * Add the --interval-count option (yuzhoujian)
      
            * Add the --timeout option (yuzhoujian)
      
         - 'perf sched' improvements (Changbin Du)
      
         - Vendor events improvements :
      
            * Add IBM s390 vendor events (Thomas Richter)
      
            * Add and improve arm64 vendor events (John Garry, Ganapatrao
              Kulkarni)
      
            * Update POWER9 vendor events (Sukadev Bhattiprolu)
      
         - Intel PT tooling improvements (Adrian Hunter)
      
         - PMU handling improvements (Agustin Vega-Frias)
      
         - Record machine topology in perf.data (Jiri Olsa)
      
         - Various overwrite related cleanups (Kan Liang)
      
         - Add arm64 dwarf post unwind support (Kim Phillips, Jean Pihet)
      
         - ... and lots of other changes, cleanups and fixes, see the shortlog
           and Git history for details"
      
      * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (262 commits)
        perf/x86/intel: Enable C-state residency events for Cannon Lake
        perf/x86/intel: Add Cannon Lake support for RAPL profiling
        perf/x86/pt, coresight: Clean up address filter structure
        perf vendor events s390: Add JSON files for IBM z14
        perf vendor events s390: Add JSON files for IBM z13
        perf vendor events s390: Add JSON files for IBM zEC12 zBC12
        perf vendor events s390: Add JSON files for IBM z196
        perf vendor events s390: Add JSON files for IBM z10EC z10BC
        perf mmap: Be consistent when checking for an unmaped ring buffer
        perf mmap: Fix accessing unmapped mmap in perf_mmap__read_done()
        perf build: Fix check-headers.sh opts assignment
        perf/x86: Update rdpmc_always_available static key to the modern API
        perf annotate: Use absolute addresses to calculate jump target offsets
        perf annotate: Defer searching for comma in raw line till it is needed
        perf annotate: Support jumping from one function to another
        perf annotate: Add "_local" to jump/offset validation routines
        perf python: Reference Py_None before returning it
        perf annotate: Mark jumps to outher functions with the call arrow
        perf annotate: Pass function descriptor to its instruction parsing routines
        perf annotate: No need to calculate notes->start twice
        ...
      486adcea
    • Linus Torvalds's avatar
      Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 701f3b31
      Linus Torvalds authored
      Pull locking updates from Ingo Molnar:
       "The main changes in the locking subsystem in this cycle were:
      
         - Add the Linux Kernel Memory Consistency Model (LKMM) subsystem,
           which is an an array of tools in tools/memory-model/ that formally
           describe the Linux memory coherency model (a.k.a.
           Documentation/memory-barriers.txt), and also produce 'litmus tests'
           in form of kernel code which can be directly executed and tested.
      
           Here's a high level background article about an earlier version of
           this work on LWN.net:
      
              https://lwn.net/Articles/718628/
      
           The design principles:
      
            "There is reason to believe that Documentation/memory-barriers.txt
             could use some help, and a major purpose of this patch is to
             provide that help in the form of a design-time tool that can
             produce all valid executions of a small fragment of concurrent
             Linux-kernel code, which is called a "litmus test". This tool's
             functionality is roughly similar to a full state-space search.
             Please note that this is a design-time tool, not useful for
             regression testing. However, we hope that the underlying
             Linux-kernel memory model will be incorporated into other tools
             capable of analyzing large bodies of code for regression-testing
             purposes."
      
           [...]
      
            "A second tool is klitmus7, which converts litmus tests to
             loadable kernel modules for direct testing. As with herd7, the
             klitmus7 code is freely available from
      
               http://diy.inria.fr/sources/index.html
      
             (and via "git" at https://github.com/herd/herdtools7)"
      
           [...]
      
           Credits go to:
      
            "This patch was the result of a most excellent collaboration
             founded by Jade Alglave and also including Alan Stern, Andrea
             Parri, and Luc Maranget."
      
           ... and to the gents listed in the MAINTAINERS entry:
      
              LINUX KERNEL MEMORY CONSISTENCY MODEL (LKMM)
              M:      Alan Stern <stern@rowland.harvard.edu>
              M:      Andrea Parri <parri.andrea@gmail.com>
              M:      Will Deacon <will.deacon@arm.com>
              M:      Peter Zijlstra <peterz@infradead.org>
              M:      Boqun Feng <boqun.feng@gmail.com>
              M:      Nicholas Piggin <npiggin@gmail.com>
              M:      David Howells <dhowells@redhat.com>
              M:      Jade Alglave <j.alglave@ucl.ac.uk>
              M:      Luc Maranget <luc.maranget@inria.fr>
              M:      "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      
           The LKMM project already found several bugs in Linux locking
           primitives and improved the understanding and the documentation of
           the Linux memory model all around.
      
         - Add KASAN instrumentation to atomic APIs (Dmitry Vyukov)
      
         - Add RWSEM API debugging and reorganize the lock debugging Kconfig
           (Waiman Long)
      
         - ... misc cleanups and other smaller changes"
      
      * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
        locking/Kconfig: Restructure the lock debugging menu
        locking/Kconfig: Add LOCK_DEBUGGING_SUPPORT to make it more readable
        locking/rwsem: Add DEBUG_RWSEMS to look for lock/unlock mismatches
        lockdep: Make the lock debug output more useful
        locking/rtmutex: Handle non enqueued waiters gracefully in remove_waiter()
        locking/atomic, asm-generic, x86: Add comments for atomic instrumentation
        locking/atomic, asm-generic: Add KASAN instrumentation to atomic operations
        locking/atomic/x86: Switch atomic.h to use atomic-instrumented.h
        locking/atomic, asm-generic: Add asm-generic/atomic-instrumented.h
        locking/xchg/alpha: Remove superfluous memory barriers from the _local() variants
        tools/memory-model: Finish the removal of rb-dep, smp_read_barrier_depends(), and lockless_dereference()
        tools/memory-model: Add documentation of new litmus test
        tools/memory-model: Remove mention of docker/gentoo image
        locking/memory-barriers: De-emphasize smp_read_barrier_depends() some more
        locking/lockdep: Show unadorned pointers
        mutex: Drop linkage.h from mutex.h
        tools/memory-model: Remove rb-dep, smp_read_barrier_depends, and lockless_dereference
        tools/memory-model: Convert underscores to hyphens
        tools/memory-model: Add a S lock-based external-view litmus test
        tools/memory-model: Add required herd7 version to README file
        ...
      701f3b31
    • Linus Torvalds's avatar
      Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8747a291
      Linus Torvalds authored
      Pull RCU updates from Ingo Molnar:
       "The main RCU subsystem changes in this cycle were:
      
        - Miscellaneous fixes, perhaps most notably removing obsolete code
          whose only purpose in life was to gather information for the
          now-removed RCU debugfs facility. Other notable changes include
          removing NO_HZ_FULL_ALL in favor of the nohz_full kernel boot
          parameter, minor optimizations for expedited grace periods, some
          added tracing, creating an RCU-specific workqueue using Tejun's new
          WQ_MEM_RECLAIM flag, and several cleanups to code and comments.
      
        - SRCU cleanups and optimizations.
      
        - Torture-test updates, perhaps most notably the adding of ARMv8
          support, but also including numerous cleanups and usability fixes"
      
      * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
        rcu: Create RCU-specific workqueues with rescuers
        torture: Provide more sensible nreader/nwriter defaults for rcuperf
        torture: Grace periods do not piggyback off of themselves
        torture: Adjust rcuperf trace processing to allow for workqueues
        torture: Default jitter off when running rcuperf
        torture: Specify qemu memory size with --memory argument
        rcutorture: Add basic ARM64 support to run scripts
        rcutorture: Update kvm.sh header comment
        rcutorture: Record which grace-period primitives are tested
        rcutorture: Re-enable testing of dynamic expediting
        rcutorture: Avoid fake-writer use of undefined primitives
        rcutorture: Abstract function and module names
        rcutorture: Replace multi-instance kzalloc() with kcalloc()
        rcu: Remove SRCU throttling
        srcu: Remove dead code in srcu_gp_end()
        srcu: Reduce scans of srcu_data in counter wrap check
        srcu: Prevent sdp->srcu_gp_seq_needed_exp counter wrap
        srcu: Abstract function name
        rcu: Make expedited RCU CPU selection avoid unnecessary stores
        rcu: Trace expedited GP delays due to transitioning CPUs
        ...
      8747a291
    • Linus Torvalds's avatar
      Merge branch 'core-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · cc67ccec
      Linus Torvalds authored
      Pull header file cleanup from Ingo Molnar:
       "Reduce <linux/interrupt.h> dependencies: a single change that drops
        two #includes from this frequently used kernel header"
      
      * 'core-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        headers: Drop two #included headers from <linux/interrupt.h>
      cc67ccec
    • Linus Torvalds's avatar
      Merge branch 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 61d1757f
      Linus Torvalds authored
      Pull debugobjects updates from Ingo Molnar:
       "Misc improvements:
      
         - add better instrumentation/debugging
      
         - optimize the freeing logic improve performance"
      
      * 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        debugobjects: Avoid another unused variable warning
        debugobjects: Fix debug_objects_freed accounting
        debugobjects: Use global free list in __debug_check_no_obj_freed()
        debugobjects: Use global free list in free_object()
        debugobjects: Add global free list and the counter
        debugobjects: Export max loops counter
      61d1757f
    • Linus Torvalds's avatar
      Merge branch 'core-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 54dce3c3
      Linus Torvalds authored
      Pull misc core updates from Ingo Molnar:
       "Two changes:
      
        - add membarriers to Documentation/features/
      
        - fix a minor nit in panic printk formatting"
      
      * 'core-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        panic: Add closing panic marker parenthesis
        Documentation/features, membarriers: Document membarrier-sync-core architecture support
        Documentation/features: Allow comments in arch features files
      54dce3c3
  5. Apr 02, 2018
    • Linus Torvalds's avatar
      Merge tag 'drm-for-v4.17' of git://people.freedesktop.org/~airlied/linux · 320b164a
      Linus Torvalds authored
      Pull drm updates from Dave Airlie:
       "Cannonlake and Vega12 support are probably the two major things. This
        pull lacks nouveau, Ben had some unforseen leave and a few other
        blockers so we'll see how things look or maybe leave it for this merge
        window.
      
        core:
         - Device links to handle sound/gpu pm dependency
         - Color encoding/range properties
         - Plane clipping into plane check helper
         - Backlight helpers
         - DP TP4 + HBR3 helper support
      
        amdgpu:
         - Vega12 support
         - Enable DC by default on all supported GPUs
         - Powerplay restructuring and cleanup
         - DC bandwidth calc updates
         - DC backlight on pre-DCE11
         - TTM backing store dropping support
         - SR-IOV fixes
         - Adding "wattman" like functionality
         - DC crc support
         - Improved DC dual-link handling
      
        amdkfd:
         - GPUVM support for dGPU
         - KFD events for dGPU
         - Enable PCIe atomics for dGPUs
         - HSA process eviction support
         - Live-lock fixes for process eviction
         - VM page table allocation fix for large-bar systems
      
        panel:
         - Raydium RM68200
         - AUO G104SN02 V2
         - KEO TX31D200VM0BAA
         - ARM Versatile panels
      
        i915:
         - Cannonlake support enabled
         - AUX-F port support added
         - Icelake base enabling until internal milestone of forcewake support
         - Query uAPI interface (used for GPU topology information currently)
         - Compressed framebuffer support for sprites
         - kmem cache shrinking when GPU is idle
         - Avoid boosting GPU when waited item is being processed already
         - Avoid retraining LSPCON link unnecessarily
         - Decrease request signaling latency
         - Deprecation of I915_SET_COLORKEY_NONE
         - Kerneldoc and compiler warning cleanup for upcoming CI enforcements
         - Full range ycbcr toggling
         - HDCP support
      
        i915/gvt:
         - Big refactor for shadow ppgtt
         - KBL context save/restore via LRI cmd (Weinan)
         - Properly unmap dma for guest page (Changbin)
      
        vmwgfx:
         - Lots of various improvements
      
        etnaviv:
         - Use the drm gpu scheduler
         - prep work for GC7000L support
      
        vc4:
         - fix alpha blending
         - Expose perf counters to userspace
      
        pl111:
         - Bandwidth checking/limiting
         - Versatile panel support
      
        sun4i:
         - A83T HDMI support
         - A80 support
         - YUV plane support
         - H3/H5 HDMI support
      
        omapdrm:
         - HPD support for DVI connector
         - remove lots of static variables
      
        msm:
         - DSI updates from 10nm / SDM845
         - fix for race condition with a3xx/a4xx fence completion irq
         - some refactoring/prep work for eventual a6xx support (ie. when we
           have a userspace)
         - a5xx debugfs enhancements
         - some mdp5 fixes/cleanups to prepare for eventually merging
           writeback
         - support (ie. when we have a userspace)
      
        tegra:
         - mmap() fixes for fbdev devices
         - Overlay plane for hw cursor fix
         - dma-buf cache maintenance support
      
        mali-dp:
         - YUV->RGB conversion support
      
        rockchip:
         - rk3399/chromebook fixes and improvements
      
        rcar-du:
         - LVDS support move to drm bridge
         - DT bindings for R8A77995
         - Driver/DT support for R8A77970
      
        tilcdc:
         - DRM panel support"
      
      * tag 'drm-for-v4.17' of git://people.freedesktop.org/~airlied/linux: (1646 commits)
        drm/i915: Fix hibernation with ACPI S0 target state
        drm/i915/execlists: Use a locked clear_bit() for synchronisation with interrupt
        drm/i915: Specify which engines to reset following semaphore/event lockups
        drm/i915/dp: Write to SET_POWER dpcd to enable MST hub.
        drm/amdkfd: Use ordered workqueue to restore processes
        drm/amdgpu: Fix acquiring VM on large-BAR systems
        drm/amd/pp: clean header file hwmgr.h
        drm/amd/pp: use mlck_table.count for array loop index limit
        drm: Fix uabi regression by allowing garbage mode->type from userspace
        drm/amdgpu: Add an ATPX quirk for hybrid laptop
        drm/amdgpu: fix spelling mistake: "asssert" -> "assert"
        drm/amd/pp: Add new asic support in pp_psm.c
        drm/amd/pp: Clean up powerplay code on Vega12
        drm/amd/pp: Add smu irq handlers for legacy asics
        drm/amd/pp: Fix set wrong temperature range on smu7
        drm/amdgpu: Don't change preferred domian when fallback GTT v5
        drm/vmwgfx: Bump version patchlevel and date
        drm/vmwgfx: use monotonic event timestamps
        drm/vmwgfx: Unpin the screen object backup buffer when not used
        drm/vmwgfx: Stricter count of legacy surface device resources
        ...
      320b164a
    • Linus Torvalds's avatar
      Linux 4.16 · 0adb3285
      Linus Torvalds authored
      0adb3285
  6. Apr 01, 2018
  7. Mar 31, 2018