Skip to content
  1. Aug 30, 2014
    • Vivek Goyal's avatar
      kexec: create a new config option CONFIG_KEXEC_FILE for new syscall · 74ca317c
      Vivek Goyal authored
      
      
      Currently new system call kexec_file_load() and all the associated code
      compiles if CONFIG_KEXEC=y.  But new syscall also compiles purgatory
      code which currently uses gcc option -mcmodel=large.  This option seems
      to be available only gcc 4.4 onwards.
      
      Hiding new functionality behind a new config option will not break
      existing users of old gcc.  Those who wish to enable new functionality
      will require new gcc.  Having said that, I am trying to figure out how
      can I move away from using -mcmodel=large but that can take a while.
      
      I think there are other advantages of introducing this new config
      option.  As this option will be enabled only on x86_64, other arches
      don't have to compile generic kexec code which will never be used.  This
      new code selects CRYPTO=y and CRYPTO_SHA256=y.  And all other arches had
      to do this for CONFIG_KEXEC.  Now with introduction of new config
      option, we can remove crypto dependency from other arches.
      
      Now CONFIG_KEXEC_FILE is available only on x86_64.  So whereever I had
      CONFIG_X86_64 defined, I got rid of that.
      
      For CONFIG_KEXEC_FILE, instead of doing select CRYPTO=y, I changed it to
      "depends on CRYPTO=y".  This should be safer as "select" is not
      recursive.
      
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Tested-by: default avatarShaun Ruffell <sruffell@digium.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      74ca317c
    • Hugh Dickins's avatar
      x86,mm: fix pte_special versus pte_numa · b38af472
      Hugh Dickins authored
      
      
      Sasha Levin has shown oopses on ffffea0003480048 and ffffea0003480008 at
      mm/memory.c:1132, running Trinity on different 3.16-rc-next kernels:
      where zap_pte_range() checks page->mapping to see if PageAnon(page).
      
      Those addresses fit struct pages for pfns d2001 and d2000, and in each
      dump a register or a stack slot showed d2001730 or d2000730: pte flags
      0x730 are PCD ACCESSED PROTNONE SPECIAL IOMAP; and Sasha's e820 map has
      a hole between cfffffff and 100000000, which would need special access.
      
      Commit c46a7c81 ("x86: define _PAGE_NUMA by reusing software bits on
      the PMD and PTE levels") has broken vm_normal_page(): a PROTNONE SPECIAL
      pte no longer passes the pte_special() test, so zap_pte_range() goes on
      to try to access a non-existent struct page.
      
      Fix this by refining pte_special() (SPECIAL with PRESENT or PROTNONE) to
      complement pte_numa() (SPECIAL with neither PRESENT nor PROTNONE).  A
      hint that this was a problem was that c46a7c81 added pte_numa() test
      to vm_normal_page(), and moved its is_zero_pfn() test from slow to fast
      path: This was papering over a pte_special() snag when the zero page was
      encountered during zap.  This patch reverts vm_normal_page() to how it
      was before, relying on pte_special().
      
      It still appears that this patch may be incomplete: aren't there other
      places which need to be handling PROTNONE along with PRESENT?  For
      example, pte_mknuma() clears _PAGE_PRESENT and sets _PAGE_NUMA, but on a
      PROT_NONE area, that would make it pte_special().  This is side-stepped
      by the fact that NUMA hinting faults skipped PROT_NONE VMAs and there
      are no grounds where a NUMA hinting fault on a PROT_NONE VMA would be
      interesting.
      
      Fixes: c46a7c81 ("x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels")
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: <stable@vger.kernel.org>	[3.16]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b38af472
  2. Aug 19, 2014
  3. Aug 16, 2014
  4. Aug 13, 2014
  5. Aug 11, 2014
    • David Vrabel's avatar
      x86/xen: use vmap() to map grant table pages in PVH guests · 7d951f3c
      David Vrabel authored
      
      
      Commit b7dd0e35 (x86/xen: safely map and unmap grant frames when
      in atomic context) causes PVH guests to crash in
      arch_gnttab_map_shared() when they attempted to map the pages for the
      grant table.
      
      This use of a PV-specific function during the PVH grant table setup is
      non-obvious and not needed.  The standard vmap() function does the
      right thing.
      
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reported-by: default avatarMukesh Rathor <mukesh.rathor@oracle.com>
      Tested-by: default avatarMukesh Rathor <mukesh.rathor@oracle.com>
      Cc: stable@vger.kernel.org
      7d951f3c
    • David Vrabel's avatar
      x86/xen: resume timer irqs early · 8d5999df
      David Vrabel authored
      
      
      If the timer irqs are resumed during device resume it is possible in
      certain circumstances for the resume to hang early on, before device
      interrupts are resumed.  For an Ubuntu 14.04 PVHVM guest this would
      occur in ~0.5% of resume attempts.
      
      It is not entirely clear what is occuring the point of the hang but I
      think a task necessary for the resume calls schedule_timeout(),
      waiting for a timer interrupt (which never arrives).  This failure may
      require specific tasks to be running on the other VCPUs to trigger
      (processes are not frozen during a suspend/resume if PREEMPT is
      disabled).
      
      Add IRQF_EARLY_RESUME to the timer interrupts so they are resumed in
      syscore_resume().
      
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: stable@vger.kernel.org
      8d5999df
  6. Aug 10, 2014
  7. Aug 09, 2014
    • Vivek Goyal's avatar
      kexec: verify the signature of signed PE bzImage · 8e7d8381
      Vivek Goyal authored
      
      
      This is the final piece of the puzzle of verifying kernel image signature
      during kexec_file_load() syscall.
      
      This patch calls into PE file routines to verify signature of bzImage.  If
      signature are valid, kexec_file_load() succeeds otherwise it fails.
      
      Two new config options have been introduced.  First one is
      CONFIG_KEXEC_VERIFY_SIG.  This option enforces that kernel has to be
      validly signed otherwise kernel load will fail.  If this option is not
      set, no signature verification will be done.  Only exception will be when
      secureboot is enabled.  In that case signature verification should be
      automatically enforced when secureboot is enabled.  But that will happen
      when secureboot patches are merged.
      
      Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG.  This option
      enables signature verification support on bzImage.  If this option is not
      set and previous one is set, kernel image loading will fail because kernel
      does not have support to verify signature of bzImage.
      
      I tested these patches with both "pesign" and "sbsign" signed bzImages.
      
      I used signing_key.priv key and signing_key.x509 cert for signing as
      generated during kernel build process (if module signing is enabled).
      
      Used following method to sign bzImage.
      
      pesign
      ======
      - Convert DER format cert to PEM format cert
      openssl x509 -in signing_key.x509 -inform DER -out signing_key.x509.PEM -outform
      PEM
      
      - Generate a .p12 file from existing cert and private key file
      openssl pkcs12 -export -out kernel-key.p12 -inkey signing_key.priv -in
      signing_key.x509.PEM
      
      - Import .p12 file into pesign db
      pk12util -i /tmp/kernel-key.p12 -d /etc/pki/pesign
      
      - Sign bzImage
      pesign -i /boot/vmlinuz-3.16.0-rc3+ -o /boot/vmlinuz-3.16.0-rc3+.signed.pesign
      -c "Glacier signing key - Magrathea" -s
      
      sbsign
      ======
      sbsign --key signing_key.priv --cert signing_key.x509.PEM --output
      /boot/vmlinuz-3.16.0-rc3+.signed.sbsign /boot/vmlinuz-3.16.0-rc3+
      
      Patch details:
      
      Well all the hard work is done in previous patches.  Now bzImage loader
      has just call into that code and verify whether bzImage signature are
      valid or not.
      
      Also create two config options.  First one is CONFIG_KEXEC_VERIFY_SIG.
      This option enforces that kernel has to be validly signed otherwise kernel
      load will fail.  If this option is not set, no signature verification will
      be done.  Only exception will be when secureboot is enabled.  In that case
      signature verification should be automatically enforced when secureboot is
      enabled.  But that will happen when secureboot patches are merged.
      
      Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG.  This option
      enables signature verification support on bzImage.  If this option is not
      set and previous one is set, kernel image loading will fail because kernel
      does not have support to verify signature of bzImage.
      
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Matt Fleming <matt@console-pimps.org>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8e7d8381
    • Vivek Goyal's avatar
      kexec: support kexec/kdump on EFI systems · 6a2c20e7
      Vivek Goyal authored
      
      
      This patch does two things.  It passes EFI run time mappings to second
      kernel in bootparams efi_info.  Second kernel parse this info and create
      new mappings in second kernel.  That means mappings in first and second
      kernel will be same.  This paves the way to enable EFI in kexec kernel.
      
      This patch also prepares and passes EFI setup data through bootparams.
      This contains bunch of information about various tables and their
      addresses.
      
      These information gathering and passing has been written along the lines
      of what current kexec-tools is doing to make kexec work with UEFI.
      
      [akpm@linux-foundation.org: s/get_efi/efi_get/g, per Matt]
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Matt Fleming <matt@console-pimps.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6a2c20e7
    • Vivek Goyal's avatar
      kexec: support for kexec on panic using new system call · dd5f7260
      Vivek Goyal authored
      
      
      This patch adds support for loading a kexec on panic (kdump) kernel usning
      new system call.
      
      It prepares ELF headers for memory areas to be dumped and for saved cpu
      registers.  Also prepares the memory map for second kernel and limits its
      boot to reserved areas only.
      
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dd5f7260
    • Vivek Goyal's avatar
      kexec-bzImage64: support for loading bzImage using 64bit entry · 27f48d3e
      Vivek Goyal authored
      
      
      This is loader specific code which can load bzImage and set it up for
      64bit entry.  This does not take care of 32bit entry or real mode entry.
      
      32bit mode entry can be implemented if somebody needs it.
      
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      27f48d3e
    • Vivek Goyal's avatar
      kexec: load and relocate purgatory at kernel load time · 12db5562
      Vivek Goyal authored
      
      
      Load purgatory code in RAM and relocate it based on the location.
      Relocation code has been inspired by module relocation code and purgatory
      relocation code in kexec-tools.
      
      Also compute the checksums of loaded kexec segments and store them in
      purgatory.
      
      Arch independent code provides this functionality so that arch dependent
      bootloaders can make use of it.
      
      Helper functions are provided to get/set symbol values in purgatory which
      are used by bootloaders later to set things like stack and entry point of
      second kernel etc.
      
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      12db5562
    • Vivek Goyal's avatar
      purgatory: core purgatory functionality · 8fc5b4d4
      Vivek Goyal authored
      
      
      Create a stand alone relocatable object purgatory which runs between two
      kernels.  This name, concept and some code has been taken from
      kexec-tools.  Idea is that this code runs after a crash and it runs in
      minimal environment.  So keep it separate from rest of the kernel and in
      long term we will have to practically do no maintenance of this code.
      
      This code also has the logic to do verify sha256 hashes of various
      segments which have been loaded into memory.  So first we verify that the
      kernel we are jumping to is fine and has not been corrupted and make
      progress only if checsums are verified.
      
      This code also takes care of copying some memory contents to backup region.
      
      [sfr@canb.auug.org.au: run host built programs from objtree]
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8fc5b4d4
    • Vivek Goyal's avatar
      purgatory/sha256: provide implementation of sha256 in purgaotory context · daeba064
      Vivek Goyal authored
      
      
      Next two patches provide code for purgatory.  This is a code which does
      not link against the kernel and runs stand alone.  This code runs between
      two kernels.  One of the primary purpose of this code is to verify the
      digest of newly loaded kernel and making sure it matches the digest
      computed at kernel load time.
      
      We use sha256 for calculating digest of kexec segmetns.  Purgatory can't
      use stanard crypto API as that API is not available in purgatory context.
      
      Hence, I have copied code from crypto/sha256_generic.c and compiled it
      with purgaotry code so that it could be used.  I could not #include
      sha256_generic.c file here as some of the function signature requiered
      little tweaking.  Original functions work with crypto API but these ones
      don't
      
      So instead of doing #include on sha256_generic.c I just copied relevant
      portions of code into arch/x86/purgatory/sha256.c.  Now we shouldn't have
      to touch this code at all.  Do let me know if there are better ways to
      handle it.
      
      This patch does not enable compiling of this code.  That happens in next
      patch.  I wanted to highlight this change in a separate patch for easy
      review.
      
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      daeba064
    • Vivek Goyal's avatar
      kexec: implementation of new syscall kexec_file_load · cb105258
      Vivek Goyal authored
      
      
      Previous patch provided the interface definition and this patch prvides
      implementation of new syscall.
      
      Previously segment list was prepared in user space.  Now user space just
      passes kernel fd, initrd fd and command line and kernel will create a
      segment list internally.
      
      This patch contains generic part of the code.  Actual segment preparation
      and loading is done by arch and image specific loader.  Which comes in
      next patch.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb105258
    • Vivek Goyal's avatar
      kexec: new syscall kexec_file_load() declaration · f0895685
      Vivek Goyal authored
      
      
      This is the new syscall kexec_file_load() declaration/interface.  I have
      reserved the syscall number only for x86_64 so far.  Other architectures
      (including i386) can reserve syscall number when they enable the support
      for this new syscall.
      
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f0895685
    • Vivek Goyal's avatar
      kernel: build bin2c based on config option CONFIG_BUILD_BIN2C · de5b56ba
      Vivek Goyal authored
      
      
      currently bin2c builds only if CONFIG_IKCONFIG=y. But bin2c will now be
      used by kexec too.  So make it compilation dependent on CONFIG_BUILD_BIN2C
      and this config option can be selected by CONFIG_KEXEC and CONFIG_IKCONFIG.
      
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      de5b56ba
    • David Herrmann's avatar
      shm: add memfd_create() syscall · 9183df25
      David Herrmann authored
      
      
      memfd_create() is similar to mmap(MAP_ANON), but returns a file-descriptor
      that you can pass to mmap().  It can support sealing and avoids any
      connection to user-visible mount-points.  Thus, it's not subject to quotas
      on mounted file-systems, but can be used like malloc()'ed memory, but with
      a file-descriptor to it.
      
      memfd_create() returns the raw shmem file, so calls like ftruncate() can
      be used to modify the underlying inode.  Also calls like fstat() will
      return proper information and mark the file as regular file.  If you want
      sealing, you can specify MFD_ALLOW_SEALING.  Otherwise, sealing is not
      supported (like on all other regular files).
      
      Compared to O_TMPFILE, it does not require a tmpfs mount-point and is not
      subject to a filesystem size limit.  It is still properly accounted to
      memcg limits, though, and to the same overcommit or no-overcommit
      accounting as all user memory.
      
      Signed-off-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Ryan Lortie <desrt@desrt.ca>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Daniel Mack <zonque@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9183df25
    • Daniel Walter's avatar
      arch/x86: replace strict_strto calls · 164109e3
      Daniel Walter authored
      
      
      Replace obsolete strict_strto calls with appropriate kstrto calls
      
      Signed-off-by: default avatarDaniel Walter <dwalter@google.com>
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      164109e3
    • Andy Lutomirski's avatar
      arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate area · a6c19dfe
      Andy Lutomirski authored
      
      
      The core mm code will provide a default gate area based on
      FIXADDR_USER_START and FIXADDR_USER_END if
      !defined(__HAVE_ARCH_GATE_AREA) && defined(AT_SYSINFO_EHDR).
      
      This default is only useful for ia64.  arm64, ppc, s390, sh, tile, 64-bit
      UML, and x86_32 have their own code just to disable it.  arm, 32-bit UML,
      and x86_64 have gate areas, but they have their own implementations.
      
      This gets rid of the default and moves the code into ia64.
      
      This should save some code on architectures without a gate area: it's now
      possible to inline the gate_area functions in the default case.
      
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Acked-by: default avatarNathan Lynch <nathan_lynch@mentor.com>
      Acked-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [in principle]
      Acked-by: Richard Weinberger <richard@nod.at> [for um]
      Acked-by: Will Deacon <will.deacon@arm.com> [for arm64]
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Nathan Lynch <Nathan_Lynch@mentor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a6c19dfe
    • Laura Abbott's avatar
      lib/scatterlist: make ARCH_HAS_SG_CHAIN an actual Kconfig · 308c09f1
      Laura Abbott authored
      
      
      Rather than have architectures #define ARCH_HAS_SG_CHAIN in an
      architecture specific scatterlist.h, make it a proper Kconfig option and
      use that instead.  At same time, remove the header files are are now
      mostly useless and just include asm-generic/scatterlist.h.
      
      [sfr@canb.auug.org.au: powerpc files now need asm/dma.h]
      Signed-off-by: default avatarLaura Abbott <lauraa@codeaurora.org>
      Acked-by: Thomas Gleixner <tglx@linutronix.de>			[x86]
      Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	[powerpc]
      Acked-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      308c09f1
  8. Aug 08, 2014
  9. Aug 07, 2014
  10. Aug 06, 2014
    • Richard Weinberger's avatar
      um: Use get_signal() signal_setup_done() · 307627ee
      Richard Weinberger authored
      
      
      Use the more generic functions get_signal() signal_setup_done()
      for signal delivery.
      
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      307627ee
    • Thomas Gleixner's avatar
      x86: MCE: Add raw_lock conversion again · ed5c41d3
      Thomas Gleixner authored
      
      
      Commit ea431643 ("x86/mce: Fix CMCI preemption bugs") breaks RT by
      the completely unrelated conversion of the cmci_discover_lock to a
      regular (non raw) spinlock.  This lock was annotated in commit
      59d958d2 ("locking, x86: mce: Annotate cmci_discover_lock as raw")
      with a proper explanation why.
      
      The argument for converting the lock back to a regular spinlock was:
      
       - it does percpu ops without disabling preemption. Preemption is not
         disabled due to the mistaken use of a raw spinlock.
      
      Which is complete nonsense.  The raw_spinlock is disabling preemption in
      the same way as a regular spinlock.  In mainline spinlock maps to
      raw_spinlock, in RT spinlock becomes a "sleeping" lock.
      
      raw_spinlock has on RT exactly the same semantics as in mainline.  And
      because this lock is taken in non preemptible context it must be raw on
      RT.
      
      Undo the locking brainfart.
      
      Reported-by: default avatarClark Williams <williams@redhat.com>
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ed5c41d3
    • Matt Fleming's avatar
      x86/efi: Enforce CONFIG_RELOCATABLE for EFI boot stub · 7b2a583a
      Matt Fleming authored
      
      
      Without CONFIG_RELOCATABLE the early boot code will decompress the
      kernel to LOAD_PHYSICAL_ADDR. While this may have been fine in the BIOS
      days, that isn't going to fly with UEFI since parts of the firmware
      code/data may be located at LOAD_PHYSICAL_ADDR.
      
      Straying outside of the bounds of the regions we've explicitly requested
      from the firmware will cause all sorts of trouble. Bruno reports that
      his machine resets while trying to decompress the kernel image.
      
      We already go to great pains to ensure the kernel is loaded into a
      suitably aligned buffer, it's just that the address isn't necessarily
      LOAD_PHYSICAL_ADDR, because we can't guarantee that address isn't in-use
      by the firmware.
      
      Explicitly enforce CONFIG_RELOCATABLE for the EFI boot stub, so that we
      can load the kernel at any address with the correct alignment.
      
      Reported-by: default avatarBruno Prémont <bonbons@linux-vserver.org>
      Tested-by: default avatarBruno Prémont <bonbons@linux-vserver.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      7b2a583a
    • Theodore Ts'o's avatar
      random: introduce getrandom(2) system call · c6e9d6f3
      Theodore Ts'o authored
      
      
      The getrandom(2) system call was requested by the LibreSSL Portable
      developers.  It is analoguous to the getentropy(2) system call in
      OpenBSD.
      
      The rationale of this system call is to provide resiliance against
      file descriptor exhaustion attacks, where the attacker consumes all
      available file descriptors, forcing the use of the fallback code where
      /dev/[u]random is not available.  Since the fallback code is often not
      well-tested, it is better to eliminate this potential failure mode
      entirely.
      
      The other feature provided by this new system call is the ability to
      request randomness from the /dev/urandom entropy pool, but to block
      until at least 128 bits of entropy has been accumulated in the
      /dev/urandom entropy pool.  Historically, the emphasis in the
      /dev/urandom development has been to ensure that urandom pool is
      initialized as quickly as possible after system boot, and preferably
      before the init scripts start execution.
      
      This is because changing /dev/urandom reads to block represents an
      interface change that could potentially break userspace which is not
      acceptable.  In practice, on most x86 desktop and server systems, in
      general the entropy pool can be initialized before it is needed (and
      in modern kernels, we will printk a warning message if not).  However,
      on an embedded system, this may not be the case.  And so with this new
      interface, we can provide the functionality of blocking until the
      urandom pool has been initialized.  Any userspace program which uses
      this new functionality must take care to assure that if it is used
      during the boot process, that it will not cause the init scripts or
      other portions of the system startup to hang indefinitely.
      
      SYNOPSIS
      	#include <linux/random.h>
      
      	int getrandom(void *buf, size_t buflen, unsigned int flags);
      
      DESCRIPTION
      	The system call getrandom() fills the buffer pointed to by buf
      	with up to buflen random bytes which can be used to seed user
      	space random number generators (i.e., DRBG's) or for other
      	cryptographic uses.  It should not be used for Monte Carlo
      	simulations or other programs/algorithms which are doing
      	probabilistic sampling.
      
      	If the GRND_RANDOM flags bit is set, then draw from the
      	/dev/random pool instead of the /dev/urandom pool.  The
      	/dev/random pool is limited based on the entropy that can be
      	obtained from environmental noise, so if there is insufficient
      	entropy, the requested number of bytes may not be returned.
      	If there is no entropy available at all, getrandom(2) will
      	either block, or return an error with errno set to EAGAIN if
      	the GRND_NONBLOCK bit is set in flags.
      
      	If the GRND_RANDOM bit is not set, then the /dev/urandom pool
      	will be used.  Unlike using read(2) to fetch data from
      	/dev/urandom, if the urandom pool has not been sufficiently
      	initialized, getrandom(2) will block (or return -1 with the
      	errno set to EAGAIN if the GRND_NONBLOCK bit is set in flags).
      
      	The getentropy(2) system call in OpenBSD can be emulated using
      	the following function:
      
                  int getentropy(void *buf, size_t buflen)
                  {
                          int     ret;
      
                          if (buflen > 256)
                                  goto failure;
                          ret = getrandom(buf, buflen, 0);
                          if (ret < 0)
                                  return ret;
                          if (ret == buflen)
                                  return 0;
                  failure:
                          errno = EIO;
                          return -1;
                  }
      
      RETURN VALUE
             On success, the number of bytes that was filled in the buf is
             returned.  This may not be all the bytes requested by the
             caller via buflen if insufficient entropy was present in the
             /dev/random pool, or if the system call was interrupted by a
             signal.
      
             On error, -1 is returned, and errno is set appropriately.
      
      ERRORS
      	EINVAL		An invalid flag was passed to getrandom(2)
      
      	EFAULT		buf is outside the accessible address space.
      
      	EAGAIN		The requested entropy was not available, and
      			getentropy(2) would have blocked if the
      			GRND_NONBLOCK flag was not set.
      
      	EINTR		While blocked waiting for entropy, the call was
      			interrupted by a signal handler; see the description
      			of how interrupted read(2) calls on "slow" devices
      			are handled with and without the SA_RESTART flag
      			in the signal(7) man page.
      
      NOTES
      	For small requests (buflen <= 256) getrandom(2) will not
      	return EINTR when reading from the urandom pool once the
      	entropy pool has been initialized, and it will return all of
      	the bytes that have been requested.  This is the recommended
      	way to use getrandom(2), and is designed for compatibility
      	with OpenBSD's getentropy() system call.
      
      	However, if you are using GRND_RANDOM, then getrandom(2) may
      	block until the entropy accounting determines that sufficient
      	environmental noise has been gathered such that getrandom(2)
      	will be operating as a NRBG instead of a DRBG for those people
      	who are working in the NIST SP 800-90 regime.  Since it may
      	block for a long time, these guarantees do *not* apply.  The
      	user may want to interrupt a hanging process using a signal,
      	so blocking until all of the requested bytes are returned
      	would be unfriendly.
      
      	For this reason, the user of getrandom(2) MUST always check
      	the return value, in case it returns some error, or if fewer
      	bytes than requested was returned.  In the case of
      	!GRND_RANDOM and small request, the latter should never
      	happen, but the careful userspace code (and all crypto code
      	should be careful) should check for this anyway!
      
      	Finally, unless you are doing long-term key generation (and
      	perhaps not even then), you probably shouldn't be using
      	GRND_RANDOM.  The cryptographic algorithms used for
      	/dev/urandom are quite conservative, and so should be
      	sufficient for all purposes.  The disadvantage of GRND_RANDOM
      	is that it can block, and the increased complexity required to
      	deal with partially fulfilled getrandom(2) requests.
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarZach Brown <zab@zabbo.net>
      c6e9d6f3
  11. Aug 05, 2014
  12. Aug 03, 2014
    • Dan Carpenter's avatar
      x86/pmc_atom: Silence shift wrapping warnings in pmc_sleep_tmr_show() · 4c51cb00
      Dan Carpenter authored
      
      
      I don't know if we really need 64 bits here but these variables are
      declared as u64 and it can't hurt to cast this so we prevent any shift
      wrapping.
      
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarAubrey Li <aubrey.li@linux.intel.com>
      Link: http://lkml.kernel.org/r/20140801082715.GE28869@mwanda
      
      
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      4c51cb00
    • Alexei Starovoitov's avatar
      net: filter: split 'struct sk_filter' into socket and bpf parts · 7ae457c1
      Alexei Starovoitov authored
      
      
      clean up names related to socket filtering and bpf in the following way:
      - everything that deals with sockets keeps 'sk_*' prefix
      - everything that is pure BPF is changed to 'bpf_*' prefix
      
      split 'struct sk_filter' into
      struct sk_filter {
      	atomic_t        refcnt;
      	struct rcu_head rcu;
      	struct bpf_prog *prog;
      };
      and
      struct bpf_prog {
              u32                     jited:1,
                                      len:31;
              struct sock_fprog_kern  *orig_prog;
              unsigned int            (*bpf_func)(const struct sk_buff *skb,
                                                  const struct bpf_insn *filter);
              union {
                      struct sock_filter      insns[0];
                      struct bpf_insn         insnsi[0];
                      struct work_struct      work;
              };
      };
      so that 'struct bpf_prog' can be used independent of sockets and cleans up
      'unattached' bpf use cases
      
      split SK_RUN_FILTER macro into:
          SK_RUN_FILTER to be used with 'struct sk_filter *' and
          BPF_PROG_RUN to be used with 'struct bpf_prog *'
      
      __sk_filter_release(struct sk_filter *) gains
      __bpf_prog_release(struct bpf_prog *) helper function
      
      also perform related renames for the functions that work
      with 'struct bpf_prog *', since they're on the same lines:
      
      sk_filter_size -> bpf_prog_size
      sk_filter_select_runtime -> bpf_prog_select_runtime
      sk_filter_free -> bpf_prog_free
      sk_unattached_filter_create -> bpf_prog_create
      sk_unattached_filter_destroy -> bpf_prog_destroy
      sk_store_orig_filter -> bpf_prog_store_orig_filter
      sk_release_orig_filter -> bpf_release_orig_filter
      __sk_migrate_filter -> bpf_migrate_filter
      __sk_prepare_filter -> bpf_prepare_filter
      
      API for attaching classic BPF to a socket stays the same:
      sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *)
      and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program
      which is used by sockets, tun, af_packet
      
      API for 'unattached' BPF programs becomes:
      bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *)
      and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program
      which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ae457c1