Skip to content
  1. Jan 30, 2013
    • Yinghai Lu's avatar
      x86: Don't panic if can not alloc buffer for swiotlb · ac2cbab2
      Yinghai Lu authored
      
      
      Normal boot path on system with iommu support:
      swiotlb buffer will be allocated early at first and then try to initialize
      iommu, if iommu for intel or AMD could setup properly, swiotlb buffer
      will be freed.
      
      The early allocating is with bootmem, and could panic when we try to use
      kdump with buffer above 4G only, or with memmap to limit mem under 4G.
      for example: memmap=4095M$1M to remove memory under 4G.
      
      According to Eric, add _nopanic version and no_iotlb_memory to fail
      map single later if swiotlb is still needed.
      
      -v2: don't pass nopanic, and use -ENOMEM return value according to Eric.
           panic early instead of using swiotlb_full to panic...according to Eric/Konrad.
      -v3: make swiotlb_init to be notpanic, but will affect:
           arm64, ia64, powerpc, tile, unicore32, x86.
      -v4: cleanup swiotlb_init by removing swiotlb_init_with_default_size.
      
      Suggested-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-36-git-send-email-yinghai@kernel.org
      Reviewed-and-tested-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
      Cc: linux-mips@linux-mips.org
      Cc: xen-devel@lists.xensource.com
      Cc: virtualization@lists.linux-foundation.org
      Cc: Shuah Khan <shuahkhan@gmail.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      ac2cbab2
    • Yinghai Lu's avatar
      mm: Add alloc_bootmem_low_pages_nopanic() · 38fa4175
      Yinghai Lu authored
      
      
      We don't need to panic in some case, like for swiotlb preallocating.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-35-git-send-email-yinghai@kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      38fa4175
    • Yinghai Lu's avatar
      x86, 64bit, mm: hibernate use generic mapping_init · 8b78c21d
      Yinghai Lu authored
      
      
      We should set mappings only for usable memory ranges under max_pfn
      Otherwise causes same problem that is fixed by
      
      	x86, mm: Only direct map addresses that are marked as E820_RAM
      
      Make it only map range in pfn_mapped array.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-34-git-send-email-yinghai@kernel.org
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: linux-pm@vger.kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      8b78c21d
    • Yinghai Lu's avatar
      x86, 64bit, mm: Mark data/bss/brk to nx · 72212675
      Yinghai Lu authored
      
      
      HPA said, we should not have RW and +x set at the time.
      
      for kernel layout:
      [    0.000000] Kernel Layout:
      [    0.000000]   .text: [0x01000000-0x021434f8]
      [    0.000000] .rodata: [0x02200000-0x02a13fff]
      [    0.000000]   .data: [0x02c00000-0x02dc763f]
      [    0.000000]   .init: [0x02dc9000-0x0312cfff]
      [    0.000000]    .bss: [0x0313b000-0x03dd6fff]
      [    0.000000]    .brk: [0x03dd7000-0x03dfffff]
      
      before the patch, we have
      ---[ High Kernel Mapping ]---
      0xffffffff80000000-0xffffffff81000000          16M                           pmd
      0xffffffff81000000-0xffffffff82200000          18M     ro         PSE GLB x  pmd
      0xffffffff82200000-0xffffffff82c00000          10M     ro         PSE GLB NX pmd
      0xffffffff82c00000-0xffffffff82dc9000        1828K     RW             GLB x  pte
      0xffffffff82dc9000-0xffffffff82e00000         220K     RW             GLB NX pte
      0xffffffff82e00000-0xffffffff83000000           2M     RW         PSE GLB NX pmd
      0xffffffff83000000-0xffffffff8313a000        1256K     RW             GLB NX pte
      0xffffffff8313a000-0xffffffff83200000         792K     RW             GLB x  pte
      0xffffffff83200000-0xffffffff83e00000          12M     RW         PSE GLB x  pmd
      0xffffffff83e00000-0xffffffffa0000000         450M                           pmd
      
      after patch,, we get
      ---[ High Kernel Mapping ]---
      0xffffffff80000000-0xffffffff81000000          16M                           pmd
      0xffffffff81000000-0xffffffff82200000          18M     ro         PSE GLB x  pmd
      0xffffffff82200000-0xffffffff82c00000          10M     ro         PSE GLB NX pmd
      0xffffffff82c00000-0xffffffff82e00000           2M     RW             GLB NX pte
      0xffffffff82e00000-0xffffffff83000000           2M     RW         PSE GLB NX pmd
      0xffffffff83000000-0xffffffff83200000           2M     RW             GLB NX pte
      0xffffffff83200000-0xffffffff83e00000          12M     RW         PSE GLB NX pmd
      0xffffffff83e00000-0xffffffffa0000000         450M                           pmd
      
      so data, bss, brk get NX ...
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-33-git-send-email-yinghai@kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      72212675
    • Yinghai Lu's avatar
      x86: Merge early kernel reserve for 32bit and 64bit · 6c902b65
      Yinghai Lu authored
      
      
      They are the same, and we could move them out from head32/64.c to setup.c.
      
      We are using memblock, and it could handle overlapping properly, so
      we don't need to reserve some at first to hold the location, and just
      need to make sure we reserve them before we are using memblock to find
      free mem to use.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-32-git-send-email-yinghai@kernel.org
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      6c902b65
    • Yinghai Lu's avatar
      x86: Add Crash kernel low reservation · 0212f915
      Yinghai Lu authored
      
      
      During kdump kernel's booting stage, it need to find low ram for
      swiotlb buffer when system does not support intel iommu/dmar remapping.
      
      kexed-tools is appending memmap=exactmap and range from /proc/iomem
      with "Crash kernel", and that range is above 4G for 64bit after boot
      protocol 2.12.
      
      We need to add another range in /proc/iomem like "Crash kernel low",
      so kexec-tools could find that info and append to kdump kernel
      command line.
      
      Try to reserve some under 4G if the normal "Crash kernel" is above 4G.
      
      User could specify the size with crashkernel_low=XX[KMG].
      
      -v2: fix warning that is found by Fengguang's test robot.
      -v3: move out get_mem_size change to another patch, to solve compiling
           warning that is found by Borislav Petkov <bp@alien8.de>
      -v4: user must specify crashkernel_low if system does not support
           intel or amd iommu.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-31-git-send-email-yinghai@kernel.org
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Rob Landley <rob@landley.net>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      0212f915
    • Yinghai Lu's avatar
      x86, kdump: Remove crashkernel range find limit for 64bit · 7d41a8a4
      Yinghai Lu authored
      
      
      Now kexeced kernel/ramdisk could be above 4g, so remove 896 limit for
      64bit.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-30-git-send-email-yinghai@kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      7d41a8a4
    • Yinghai Lu's avatar
      memblock: Add memblock_mem_size() · 595ad9af
      Yinghai Lu authored
      
      
      Use it to get mem size under the limit_pfn.
      to replace local version in x86 reserved_initrd.
      
      -v2: remove not needed cast that is pointed out by HPA.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-29-git-send-email-yinghai@kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      595ad9af
    • Yinghai Lu's avatar
      x86, boot: Not need to check setup_header version for setup_data · d1af6d04
      Yinghai Lu authored
      
      
      That is for bootloaders.
      
      setup_data is in setup_header, and bootloader is copying that from bzImage.
      So for old bootloader should keep that as 0 already.
      
      old kexec-tools till now for elf image set setup_data to 0, so it is ok.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-28-git-send-email-yinghai@kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      d1af6d04
    • Yinghai Lu's avatar
      x86, boot: Update comments about entries for 64bit image · 8ee2f2df
      Yinghai Lu authored
      
      
      Now 64bit entry is fixed on 0x200, can not be changed anymore.
      
      Update the comments to reflect that.
      
      Also put info about it in boot.txt
      
      -v2: fix some grammar error
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-27-git-send-email-yinghai@kernel.org
      Cc: Rob Landley <rob@landley.net>
      Cc: Matt Fleming <matt.fleming@intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      8ee2f2df
    • Yinghai Lu's avatar
      x86, boot: Support loading bzImage, boot_params and ramdisk above 4G · ee92d815
      Yinghai Lu authored
      
      
      xloadflags bit 1 indicates that we can load the kernel and all data
      structures above 4G; it is set if kernel is relocatable and 64bit.
      
      bootloader will check if xloadflags bit 1 is set to decide if
      it could load ramdisk and kernel high above 4G.
      
      bootloader will fill value to ext_ramdisk_image/size for high 32bits
      when it load ramdisk above 4G.
      kernel use get_ramdisk_image/size to use ext_ramdisk_image/size to get
      right positon for ramdisk.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Cc: Rob Landley <rob@landley.net>
      Cc: Matt Fleming <matt.fleming@intel.com>
      Cc: Gokul Caushik <caushik1@gmail.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Joe Millenbach <jmillenbach@gmail.com>
      Link: http://lkml.kernel.org/r/1359058816-7615-26-git-send-email-yinghai@kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      ee92d815
    • Yinghai Lu's avatar
      x86, kexec, 64bit: Only set ident mapping for ram. · 0e691cf8
      Yinghai Lu authored
      
      
      We should set mappings only for usable memory ranges under max_pfn
      Otherwise causes same problem that is fixed by
      
      	x86, mm: Only direct map addresses that are marked as E820_RAM
      
      This patch exposes pfn_mapped array, and only sets ident mapping for ranges
      in that array.
      
      This patch relies on new kernel_ident_mapping_init that could handle existing
      pgd/pud between different calls.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-25-git-send-email-yinghai@kernel.org
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      0e691cf8
    • Yinghai Lu's avatar
      x86, kexec: Replace ident_mapping_init and init_level4_page · 9ebdc79f
      Yinghai Lu authored
      
      
      Now ident_mapping_init is checking if pgd/pud is present for every 2M,
      so several 2Ms are in same PUD, it will keep checking if pud is there
      with same pud.
      
      init_level4_page just does not check existing pgd/pud.
      
      We could use generic mapping_init with different settings in info to
      replace those two local grown version functions.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-24-git-send-email-yinghai@kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      9ebdc79f
    • Yinghai Lu's avatar
      x86, kexec: Set ident mapping for kernel that is above max_pfn · 084d1283
      Yinghai Lu authored
      
      
      When first kernel is booted with memmap= or mem=  to limit max_pfn.
      kexec can load second kernel above that max_pfn.
      
      We need to set ident mapping for whole image in this case instead of just
      for first 2M.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-23-git-send-email-yinghai@kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      084d1283
    • Yinghai Lu's avatar
      x86, kexec: Remove 1024G limitation for kexec buffer on 64bit · 577af55d
      Yinghai Lu authored
      
      
      Now 64bit kernel supports more than 1T ram and kexec tools
      could find buffer above 1T, remove that obsolete limitation.
      and use MAXMEM instead.
      
      Tested on system with more than 1024G ram.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-22-git-send-email-yinghai@kernel.org
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      577af55d
    • Yinghai Lu's avatar
      x86, boot: Move lldt/ltr out of 64bit code section · d3c433bf
      Yinghai Lu authored
      commit 08da5a2c
      
      
      
          x86_64: Early segment setup for VT
      
      sets up LDT and TR into a valid state in order to speed up boot
      decompression under VT.
      
      Those code are put in code64, and it is using GDT that is only
      loaded from code32 path.
      
      That breaks booting with 64bit bootloader that does not go through
      code32 path and jump to startup_64 directly, and it has different
      GDT.
      
      Move those lines into code32 after their GDT is loaded.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-21-git-send-email-yinghai@kernel.org
      Cc: Zachary Amsden <zamsden@gmail.com>
      Cc: Matt Fleming <matt.fleming@intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      d3c433bf
    • Yinghai Lu's avatar
      x86, boot: Move verify_cpu.S and no_longmode down · 187a8a73
      Yinghai Lu authored
      
      
      We need to move some code to 32bit section in following patch:
      
         x86, boot: Move lldt/ltr out of 64bit code section
      
      but that will push startup_64 down from 0x200.
      
      According to hpa, we can not change startup_64 position and that
      is an ABI.
      
      We could move function verify_cpu and no_longmode down, because
      verify_cpu is used via function call and no_longmode will not
      return, then we don't need to add extra code for jumping back.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-20-git-send-email-yinghai@kernel.org
      Cc: Matt Fleming <matt.fleming@intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      187a8a73
    • Yinghai Lu's avatar
      x86, boot: Pass cmd_line_ptr with unsigned long instead · 3db07e70
      Yinghai Lu authored
      
      
      boot/compressed/misc.c is used for bzImage in 64bit and 32bit, and
      cmd_line_ptr could point to buffer that is above 4g, cmd_line_ptr
      should be 64bit otherwise high 32bit will be capped out.
      
      So need to change data type to unsigned long, that will be 64bit get
      correct address of command line buffer.
      
      And it is still ok with 32bit bzImage, because unsigned long on 32bit kernel
      is still 32bit.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-19-git-send-email-yinghai@kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      3db07e70
    • Yinghai Lu's avatar
      x86, boot: Move checking of cmd_line_ptr out of common path · 16a4baa6
      Yinghai Lu authored
      
      
      cmdline.c::__cmdline_find_option... are shared between 16-bit setup code
      and 32/64 bit decompressor code.
      
      for 32/64 only path via kexec, we should not check if ptr is less 1M.
      as those cmdline could be put above 1M, or even 4G.
      
      Move out accessible checking out of __cmdline_find_option()
      So decompressor in misc.c can parse cmdline correctly.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-18-git-send-email-yinghai@kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      16a4baa6
    • Yinghai Lu's avatar
      x86, boot: Add get_cmd_line_ptr() · f1da834c
      Yinghai Lu authored
      
      
      Add an accessor function for the command line address.
      Later we will add support for holding a 64-bit address via ext_cmd_line_ptr.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-17-git-send-email-yinghai@kernel.org
      Cc: Gokul Caushik <caushik1@gmail.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Joe Millenbach <jmillenbach@gmail.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      f1da834c
    • Yinghai Lu's avatar
      x86: Add get_ramdisk_image/size() · a8a51a88
      Yinghai Lu authored
      
      
      There are several places to find ramdisk information early for reserving
      and relocating.
      
      Use accessor functions to make code more readable and consistent.
      
      Later will add ext_ramdisk_image/size in those functions to support
      loading ramdisk above 4g.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-16-git-send-email-yinghai@kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      a8a51a88
    • Yinghai Lu's avatar
      x86: Merge early_reserve_initrd for 32bit and 64bit · 1b8c78be
      Yinghai Lu authored
      
      
      They are the same, could move them out from head32/64.c to setup.c.
      
      We are using memblock, and it could handle overlapping properly, so
      we don't need to reserve some at first to hold the location, and just
      need to make sure we reserve them before we are using memblock to find
      free mem to use.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-15-git-send-email-yinghai@kernel.org
      Reviewed-by: default avatarPekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      1b8c78be
    • Yinghai Lu's avatar
      x86, 64bit: Don't set max_pfn_mapped wrong value early on native path · 10054230
      Yinghai Lu authored
      
      
      We are not having max_pfn_mapped set correctly until init_memory_mapping.
      So don't print its initial value for 64bit
      
      Also need to use KERNEL_IMAGE_SIZE directly for highmap cleanup.
      
      -v2: update comments about max_pfn_mapped according to Stefano Stabellini.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-14-git-send-email-yinghai@kernel.org
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      10054230
    • Yinghai Lu's avatar
      x86, 64bit: #PF handler set page to cover only 2M per #PF · 6b9c75ac
      Yinghai Lu authored
      
      
      We only map a single 2 MiB page per #PF, even though we should be able
      to do this a full gigabyte at a time with no additional memory cost.
      This is a workaround for a broken AMD reference BIOS (and its
      derivatives in shipping system) which maps a large chunk of memory as
      WB in the MTRR system but will #MC if the processor wanders off and
      tries to prefetch that memory, which can happen any time the memory is
      mapped in the TLB.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-13-git-send-email-yinghai@kernel.org
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      [ hpa: rewrote the patch description ]
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      6b9c75ac
    • H. Peter Anvin's avatar
      x86, 64bit: Use a #PF handler to materialize early mappings on demand · 8170e6be
      H. Peter Anvin authored
      
      
      Linear mode (CR0.PG = 0) is mutually exclusive with 64-bit mode; all
      64-bit code has to use page tables.  This makes it awkward before we
      have first set up properly all-covering page tables to access objects
      that are outside the static kernel range.
      
      So far we have dealt with that simply by mapping a fixed amount of
      low memory, but that fails in at least two upcoming use cases:
      
      1. We will support load and run kernel, struct boot_params, ramdisk,
         command line, etc. above the 4 GiB mark.
      2. need to access ramdisk early to get microcode to update that as
         early possible.
      
      We could use early_iomap to access them too, but it will make code to
      messy and hard to be unified with 32 bit.
      
      Hence, set up a #PF table and use a fixed number of buffers to set up
      page tables on demand.  If the buffers fill up then we simply flush
      them and start over.  These buffers are all in __initdata, so it does
      not increase RAM usage at runtime.
      
      Thus, with the help of the #PF handler, we can set the final kernel
      mapping from blank, and switch to init_level4_pgt later.
      
      During the switchover in head_64.S, before #PF handler is available,
      we use three pages to handle kernel crossing 1G, 512G boundaries with
      sharing page by playing games with page aliasing: the same page is
      mapped twice in the higher-level tables with appropriate wraparound.
      The kernel region itself will be properly mapped; other mappings may
      be spurious.
      
      early_make_pgtable is using kernel high mapping address to access pages
      to set page table.
      
      -v4: Add phys_base offset to make kexec happy, and add
      	init_mapping_kernel()   - Yinghai
      -v5: fix compiling with xen, and add back ident level3 and level2 for xen
           also move back init_level4_pgt from BSS to DATA again.
           because we have to clear it anyway.  - Yinghai
      -v6: switch to init_level4_pgt in init_mem_mapping. - Yinghai
      -v7: remove not needed clear_page for init_level4_page
           it is with fill 512,8,0 already in head_64.S  - Yinghai
      -v8: we need to keep that handler alive until init_mem_mapping and don't
           let early_trap_init to trash that early #PF handler.
           So split early_trap_pf_init out and move it down. - Yinghai
      -v9: switchover only cover kernel space instead of 1G so could avoid
           touch possible mem holes. - Yinghai
      -v11: change far jmp back to far return to initial_code, that is needed
           to fix failure that is reported by Konrad on AMD systems.  - Yinghai
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-12-git-send-email-yinghai@kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      8170e6be
    • Yinghai Lu's avatar
      x86, realmode: Separate real_mode reserve and setup · 4f7b9226
      Yinghai Lu authored
      
      
      After we switch to use #PF handler help to set page table, init_level4_pgt
      will only have entries set after init_mem_mapping().
      We need to move copying init_level4_pgt to trampoline_pgd after that.
      
      So split reserve and setup, and move the setup after init_mem_mapping()
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-11-git-send-email-yinghai@kernel.org
      Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
      Acked-by: default avatarJarkko Sakkinen <jarkko.sakkinen@intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      4f7b9226
    • Yinghai Lu's avatar
      x86, 64bit, realmode: Use init_level4_pgt to set trampoline_pgd directly · 9735e91e
      Yinghai Lu authored
      
      
      with #PF handler way to set early page table, level3_ident will go away with
      64bit native path.
      
      So just use entries in init_level4_pgt to set them in trampoline_pgd.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-10-git-send-email-yinghai@kernel.org
      Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
      Acked-by: default avatarJarkko Sakkinen <jarkko.sakkinen@intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      9735e91e
    • Yinghai Lu's avatar
      x86, 64bit: Copy struct boot_params early · fa2bbce9
      Yinghai Lu authored
      
      
      We want to support struct boot_params (formerly known as the
      zero-page, or real-mode data) above the 4 GiB mark.  We will have #PF
      handler to set page table for not accessible ram early, but want to
      limit it before x86_64_start_reservations to limit the code change to
      native path only.
      
      Also we will need the ramdisk info in struct boot_params to access the microcode
      blob in ramdisk in x86_64_start_kernel, so copy struct boot_params early makes
      it accessing ramdisk info simple.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-9-git-send-email-yinghai@kernel.org
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      fa2bbce9
    • Yinghai Lu's avatar
      x86, 64bit, mm: Add generic kernel/ident mapping helper · aece2785
      Yinghai Lu authored
      
      
      It is simple version for kernel_physical_mapping_init.
      it will work to build one page table that will be used later.
      
      Use mapping_info to control
              1. alloc_pg_page method
              2. if PMD is EXEC,
              3. if pgd is with kernel low mapping or ident mapping.
      
      Will use to replace some local versions in kexec, hibernation and etc.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-8-git-send-email-yinghai@kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      aece2785
    • Yinghai Lu's avatar
      x86, realmode: Set real_mode permissions early · 231b3642
      Yinghai Lu authored
      
      
      Trampoline code is executed by APs with kernel low mapping on 64bit.
      We need to set trampoline code to EXEC early before we boot APs.
      
      Found the problem after switching to #PF handler set page table,
      and we do not set initial kernel low mapping with EXEC anymore in
      arch/x86/kernel/head_64.S.
      
      Change to use early_initcall instead that will make sure trampoline
      will have EXEC set.
      
      -v2: Merge two comments according to Borislav Petkov <bp@alien8.de>
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-7-git-send-email-yinghai@kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      231b3642
    • Yinghai Lu's avatar
      x86, 64bit, mm: Make pgd next calculation consistent with pud/pmd · c2bdee59
      Yinghai Lu authored
      
      
      Just like the way we calculate next for pud and pmd, aka round down and
      add size.
      
      Also, do not do boundary-checking with 'next', and just pass 'end' down
      to phys_pud_init() instead. Because the loop in phys_pud_init() stops at
      PTRS_PER_PUD and thus can handle a possibly bigger 'end' properly.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-6-git-send-email-yinghai@kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      c2bdee59
    • Yinghai Lu's avatar
      x86: Factor out e820_add_kernel_range() · b422a309
      Yinghai Lu authored
      
      
      Separate out the reservation of the kernel static memory areas into a
      separate function.
      
      Also add support for case when memmap=xxM$yyM is used without exactmap.
      Need to remove reserved range at first before we add E820_RAM
      range, otherwise added E820_RAM range will be ignored.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-5-git-send-email-yinghai@kernel.org
      Cc: Jacob Shin <jacob.shin@amd.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      b422a309
    • Yinghai Lu's avatar
      x86, mm: Fix page table early allocation offset checking · c9b3234a
      Yinghai Lu authored
      
      
      During debugging loading kernel above 4G, found that one page is not used
      in pre-allocated BRK area for early page allocation.
      pgt_buf_top is address that can not be used, so should check if that new
      end is above that top, otherwise last page will not be used.
      
      Fix that checking and also add print out for allocation from pre-allocated
      BRK area to catch possible bugs later.
      
      But after we get back that page for pgt, it tiggers one bug in pgt allocation
      with xen: We need to avoid to use page as pgt to map range that is
      overlapping with that pgt page.
      
      Add checking about overlapping, when it happens, use memblock allocation
      instead.  That fixes crash on Xen PV guest with 2G that Stefan found.
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-2-git-send-email-yinghai@kernel.org
      Acked-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Tested-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      c9b3234a
    • H. Peter Anvin's avatar
      Merge remote-tracking branch 'origin/x86/boot' into x86/mm2 · de65d816
      H. Peter Anvin authored
      
      
      Coming patches to x86/mm2 require the changes and advanced baseline in
      x86/boot.
      
      Resolved Conflicts:
      	arch/x86/kernel/setup.c
      	mm/nobootmem.c
      
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      de65d816
  2. Jan 29, 2013
    • H. Peter Anvin's avatar
      x86, boot: Sanitize boot_params if not zeroed on creation · 5dcd14ec
      H. Peter Anvin authored
      
      
      Use the new sentinel field to detect bootloaders which fail to follow
      protocol and don't initialize fields in struct boot_params that they
      do not explicitly initialize to zero.
      
      Based on an original patch and research by Yinghai Lu.
      Changed by hpa to be invoked both in the decompression path and in the
      kernel proper; the latter for the case where a bootloader takes over
      decompression.
      
      Originally-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-26-git-send-email-yinghai@kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      5dcd14ec
  3. Jan 28, 2013
  4. Jan 26, 2013
    • Linus Torvalds's avatar
      Linux 3.8-rc5 · 949db153
      Linus Torvalds authored
      949db153
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · d7df025e
      Linus Torvalds authored
      Pull btrfs fixes from Chris Mason:
       "It turns out that we had two crc bugs when running fsx-linux in a
        loop.  Many thanks to Josef, Miao Xie, and Dave Sterba for nailing it
        all down.  Miao also has a new OOM fix in this v2 pull as well.
      
        Ilya fixed a regression Liu Bo found in the balance ioctls for pausing
        and resuming a running balance across drives.
      
        Josef's orphan truncate patch fixes an obscure corruption we'd see
        during xfstests.
      
        Arne's patches address problems with subvolume quotas.  If the user
        destroys quota groups incorrectly the FS will refuse to mount.
      
        The rest are smaller fixes and plugs for memory leaks."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (30 commits)
        Btrfs: fix repeated delalloc work allocation
        Btrfs: fix wrong max device number for single profile
        Btrfs: fix missed transaction->aborted check
        Btrfs: Add ACCESS_ONCE() to transaction->abort accesses
        Btrfs: put csums on the right ordered extent
        Btrfs: use right range to find checksum for compressed extents
        Btrfs: fix panic when recovering tree log
        Btrfs: do not allow logged extents to be merged or removed
        Btrfs: fix a regression in balance usage filter
        Btrfs: prevent qgroup destroy when there are still relations
        Btrfs: ignore orphan qgroup relations
        Btrfs: reorder locks and sanity checks in btrfs_ioctl_defrag
        Btrfs: fix unlock order in btrfs_ioctl_rm_dev
        Btrfs: fix unlock order in btrfs_ioctl_resize
        Btrfs: fix "mutually exclusive op is running" error code
        Btrfs: bring back balance pause/resume logic
        btrfs: update timestamps on truncate()
        btrfs: fix btrfs_cont_expand() freeing IS_ERR em
        Btrfs: fix a bug when llseek for delalloc bytes behind prealloc extents
        Btrfs: fix off-by-one in lseek
        ...
      d7df025e
  5. Jan 25, 2013