- Dec 20, 2013
-
-
Kees Cook authored
Instead of duplicating the CC_STACKPROTECTOR Kconfig and Makefile logic in each architecture, switch to using HAVE_CC_STACKPROTECTOR and keep everything in one place. This retains the x86-specific bug verification scripts. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: James Hogan <james.hogan@imgtec.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Shawn Guo <shawn.guo@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@linux-mips.org Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/1387481759-14535-2-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
- Dec 13, 2013
-
-
Gleb Natapov authored
A guest can cause a BUG_ON() leading to a host kernel crash. When the guest writes to the ICR to request an IPI, while in x2apic mode the following things happen, the destination is read from ICR2, which is a register that the guest can control. kvm_irq_delivery_to_apic_fast uses the high 16 bits of ICR2 as the cluster id. A BUG_ON is triggered, which is a protection against accessing map->logical_map with an out-of-bounds access and manages to avoid that anything really unsafe occurs. The logic in the code is correct from real HW point of view. The problem is that KVM supports only one cluster with ID 0 in clustered mode, but the code that has the bug does not take this into account. Reported-by: Lars Bull <larsbull@google.com> Cc: stable@vger.kernel.org Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Andy Honig authored
In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the potential to corrupt kernel memory if userspace provides an address that is at the end of a page. This patches concerts those functions to use kvm_write_guest_cached and kvm_read_guest_cached. It also checks the vapic_address specified by userspace during ioctl processing and returns an error to userspace if the address is not a valid GPA. This is generally not guest triggerable, because the required write is done by firmware that runs before the guest. Also, it only affects AMD processors and oldish Intel that do not have the FlexPriority feature (unless you disable FlexPriority, of course; then newer processors are also affected). Fixes: b93463aa ('KVM: Accelerated apic support') Reported-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Andy Honig authored
Under guest controllable circumstances apic_get_tmcct will execute a divide by zero and cause a crash. If the guest cpuid support tsc deadline timers and performs the following sequence of requests the host will crash. - Set the mode to periodic - Set the TMICT to 0 - Set the mode bits to 11 (neither periodic, nor one shot, nor tsc deadline) - Set the TMICT to non-zero. Then the lapic_timer.period will be 0, but the TMICT will not be. If the guest then reads from the TMCCT then the host will perform a divide by 0. This patch ensures that if the lapic_timer.period is 0, then the division does not occur. Reported-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- Dec 11, 2013
-
-
Peter Zijlstra authored
Introduce mul_u64_u32_shr() as proposed by Andy a while back; it allows using 64x64->128 muls on 64bit archs and recent GCC which defines __SIZEOF_INT128__ and __int128. (This new method will be used by the scheduler.) Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: fweisbec@gmail.com Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-hxjoeuzmrcaumR0uZwjpe2pv@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Peter Zijlstra authored
While hunting a preemption issue with Alexander, Ben noticed that the currently generic PREEMPT_NEED_RESCHED stuff is horribly broken for load-store architectures. We currently rely on the IPI to fold TIF_NEED_RESCHED into PREEMPT_NEED_RESCHED, but when this IPI lands while we already have a load for the preempt-count but before the store, the store will erase the PREEMPT_NEED_RESCHED change. The current preempt-count only works on load-store archs because interrupts are assumed to be completely balanced wrt their preempt_count fiddling; the previous preempt_count load will match the preempt_count state after the interrupt and therefore nothing gets lost. This patch removes the PREEMPT_NEED_RESCHED usage from generic code and pushes it into x86 arch code; the generic code goes back to relying on TIF_NEED_RESCHED. Boot tested on x86_64 and compile tested on ppc64. Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reported-and-Tested-by: Alexander Graf <agraf@suse.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20131128132641.GP10022@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Matthew Garrett authored
UEFI time services are often broken once we're in virtual mode. We were already refusing to use them on 64-bit systems, but it turns out that they're also broken on some 32-bit firmware, including the Dell Venue. Disable them for now, we can revisit once we have the 1:1 mappings code incorporated. Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com> Link: http://lkml.kernel.org/r/1385754283-2464-1-git-send-email-matthew.garrett@nebula.com Cc: <stable@vger.kernel.org> Cc: Matt Fleming <matt.fleming@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
-
- Dec 10, 2013
-
-
cpw authored
The SGI UV tlb shootdown code panics the system with a NULL pointer deference if 'nobau' is specified on the boot commandline. uv_flush_tlb_other() gets called for every flush, whether the BAU is disabled or not. It should not be keeping the s_enters statistic while the BAU is disabled. The panic occurs because during initialization init_per_cpu_tunables() does not set the bcp->statp pointer if 'nobau' was specified. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: <stable@vger.kernel.org> # 3.12.x Link: http://lkml.kernel.org/r/E1VnzBi-0005yF-MU@eag09.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
H. Peter Anvin authored
In checkin 5551a34e x86-64, build: Always pass in -mno-sse we unconditionally added -mno-sse to the main build, to keep newer compilers from generating SSE instructions from autovectorization. However, this did not extend to the special environments (arch/x86/boot, arch/x86/boot/compressed, and arch/x86/realmode/rm). Add -mno-sse to the compiler command line for these environments, and add -mno-mmx to all the environments as well, as we don't want a compiler to generate MMX code either. This patch also removes a $(cc-option) call for -m32, since we have long since stopped supporting compilers too old for the -m32 option, and in fact hardcode it in other places in the Makefiles. Reported-by: Kevin B. Smith <kevin.b.smith@intel.com> Cc: Sunil K. Pandey <sunil.k.pandey@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: H. J. Lu <hjl.tools@gmail.com> Link: http://lkml.kernel.org/n/tip-j21wzqv790q834n7yc6g80j1@git.kernel.org Cc: <stable@vger.kernel.org> # build fix only
-
- Dec 05, 2013
-
-
Maria Dimakopoulou authored
The EVENT_CONSTRAINT_END() macro defines the end marker as a constraint with a weight of zero. This was all fine until we blacklisted the corrupting memory events on Intel IvyBridge. These events are blacklisted by using a counter bitmask of zero. Thus, they also get a constraint weight of zero. The iteration macro: for_each_constraint tests the weight==0. Therefore, it was stopping at the first blacklisted event, i.e., 0xd0. The corrupting events were therefore considered as unconstrained and were scheduled on any of the generic counters. This patch fixes the end marker to have a weight of -1. With this, the blacklisted events get an empty constraint and cannot be scheduled which is what we want for now. Signed-off-by: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: ak@linux.intel.com Cc: jolsa@redhat.com Cc: zheng.z.yan@intel.com Link: http://lkml.kernel.org/r/20131204232437.GA10689@starlight Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Fenghua Yu authored
Since erratum AVR31 in "Intel Atom Processor C2000 Product Family Specification Update" is now published, I added a justification comment for disabling IO APIC before Local APIC, as changed in commit: 522e6646 x86/apic: Disable I/O APIC before shutdown of the local APIC Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1386202069-51515-1-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
-
H. Peter Anvin authored
In checkin: 0c44c2d0 x86: Use asm goto to implement better modify_and_test() functions the various functions which do modify and test were unified and optimized using "asm goto". However, this change missed the detail that the bitops require an "Ir" constraint rather than an "er" constraint ("I" = integer constant from 0-31, "e" = signed 32-bit integer constant). This would cause code to miscompile if these functions were used on constant bit positions 32-255 and the build to fail if used on constant bit positions above 255. Add the constraints as a parameter to the GEN_BINARY_RMWcc() macro to avoid this problem. Reported-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/529E8719.4070202@zytor.com
-
- Dec 04, 2013
-
-
H. Peter Anvin authored
Always pass in the -mno-sse argument, regardless if -preferred-stack-boundary is supported. We never want to generate SSE instructions in the kernel unless we *really* know what we're doing. According to H. J. Lu, any version of gcc new enough that we support it at all should handle the -mno-sse option, so just add it unconditionally. Reported-by: Kevin B. Smith <kevin.b.smith@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: H. J. Lu <hjl.tools@gmail.com> Link: http://lkml.kernel.org/n/tip-j21wzqv790q834n7yc6g80j1@git.kernel.org Cc: <stable@vger.kernel.org> # build fix only
-
- Nov 29, 2013
-
-
Matt Fleming authored
Dave reported seeing the following incorrect output on his Thinkpad T420 when using earlyprintk=efi, [ 0.000000] efi: EFI v2.00 by Lenovo ACPI=0xdabfe000 ACPI 2.0=0xdabfe014 SMBIOS=0xdaa9e000 The output should be on one line, not split over two. The cause is an off-by-one error when checking that the efi_y coordinate hasn't been incremented out of bounds. Reported-by: Dave Young <dyoung@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
-
- Nov 22, 2013
-
-
Kirill A. Shutemov authored
There are two code paths how page with pmd page table can be freed: pmd_free() and pmd_free_tlb(). I've missed the second one and didn't add page table destructor call there. It leads to leak of page->ptl for pmd page tables, if dynamically allocated page->ptl is in use. The patch adds the missed destructor and modifies documentation accordingly. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Andrey Vagin <avagin@openvz.org> Tested-by: Andrey Vagin <avagin@openvz.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- Nov 21, 2013
-
-
Al Viro authored
Note that pmds[i] is simply uninitialized at that point... Granted, it's very hard to hit (you need split page locks *and* kmalloc(sizeof(spinlock_t), GFP_KERNEL) failing), but the code is obviously bogus. Introduced by commit 09ef4939 ("x86: add missed pgtable_pmd_page_ctor/dtor calls for preallocated pmds") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- Nov 20, 2013
-
-
Sasha Levin authored
We should not be using jump labels before they were initialized. Push back the callback to until after jump label initialization. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
-
- Nov 19, 2013
-
-
Peter Zijlstra authored
Vince's perf-trinity fuzzer found yet another 'interesting' problem. When we sample the irq_work_exit tracepoint with period==1 (or PERF_SAMPLE_PERIOD) and we add an fasync SIGNAL handler we create an infinite event generation loop: ,-> <IPI> | irq_work_exit() -> | trace_irq_work_exit() -> | ... | __perf_event_overflow() -> (due to fasync) | irq_work_queue() -> (irq_work_list must be empty) '--------- arch_irq_work_raise() Similar things can happen due to regular poll() wakeups if we exceed the ring-buffer wakeup watermark, or have an event_limit. To avoid this, dis-allow sampling this particular tracepoint. In order to achieve this, create a special perf_perm function pointer for each event and call this (when set) on trying to create a tracepoint perf event. [ roasted: use expr... to allow for ',' in your expression ] Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Dave Jones <davej@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20131114152304.GC5364@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
- Nov 17, 2013
-
-
Ramkumar Ramachandra authored
Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
-
Ramkumar Ramachandra authored
arch/um/defconfig only lists one default configuration, and that applies only to the i386 architecture. Replace it with two minimal configuration files generated using `make savedefconfig`: i386_defconfig and x86_64_defconfig The build scripts now require two updates: 1. um's Kconfig (arch/x86/um/Kconfig) should specify an ARCH_DEFCONFIG section explicitly pointing to these scripts if the required variables are set. Take care to remove the DEFCONFIG_LIST section defined in the included file arch/um/Kconfig.common. 2. um's Makefile (arch/um/Makefile) should set KBUILD_DEFCONFIG properly for the top-level Makefile to pick up. Copy the logic in arch/x86/Makefile to properly pick the defconfig file depending on the actual architecture; except we're working with $SUBARCH here, instead of $ARCH. Now, you can do: $ ARCH=um make defconfig $ ARCH=um make and successfully build User-Mode Linux on an x86_64 box in default configuration. Cc: Richard Weinberger <richard@nod.at> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
-
Richard Weinberger authored
Currently on UML stack traces are not very reliable and both x86 and x86_64 have their on implementations. This patch unifies both and adds support to outline unreliable functions calls. Signed-off-by: Richard Weinberger <richard@nod.at>
-
- Nov 15, 2013
-
-
David Rientjes authored
Commit 9ebddac7 "ACPI, x86: Fix extended error log driver to depend on CONFIG_X86_LOCAL_APIC" fixed a build error when CONFIG_X86_LOCAL_APIC was not selected and !CONFIG_SMP. However, since CONFIG_ACPI_EXTLOG is tristate, there is a second build error: ERROR: "boot_cpu_physical_apicid" [drivers/acpi/acpi_extlog.ko] undefined! The symbol needs to be exported for it to be available. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Chen Gong <gong.chen@linux.intel.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1311141504080.30112@chino.kir.corp.google.com [ Changed it to a _GPL() export. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Christoph Hellwig authored
We've switched over every architecture that supports SMP to it, so remove the new useless config variable. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
If split page table lock is in use, we embed the lock into struct page of table's page. We have to disable split lock, if spinlock_t is too big be to be embedded, like when DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC enabled. This patch add support for dynamic allocation of split page table lock if we can't embed it to struct page. page->ptl is unsigned long now and we use it as spinlock_t if sizeof(spinlock_t) <= sizeof(long), otherwise it's pointer to spinlock_t. The spinlock_t allocated in pgtable_page_ctor() for PTE table and in pgtable_pmd_page_ctor() for PMD table. All other helpers converted to support dynamically allocated page->ptl. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
In split page table lock case, we embed spinlock_t into struct page. For obvious reason, we don't want to increase size of struct page if spinlock_t is too big, like with DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC or on -rt kernel. So we disable split page table lock, if spinlock_t is too big. This patchset allows to allocate the lock dynamically if spinlock_t is big. In this page->ptl is used to store pointer to spinlock instead of spinlock itself. It costs additional cache line for indirect access, but fix page fault scalability for multi-threaded applications. LOCK_STAT depends on DEBUG_SPINLOCK, so on current kernel enabling LOCK_STAT to analyse scalability issues breaks scalability. ;) The patchset mostly fixes this. Results for ./thp_memscale -c 80 -b 512M on 4-socket machine: baseline, no CONFIG_LOCK_STAT: 9.115460703 seconds time elapsed baseline, CONFIG_LOCK_STAT=y: 53.890567123 seconds time elapsed patched, no CONFIG_LOCK_STAT: 8.852250368 seconds time elapsed patched, CONFIG_LOCK_STAT=y: 11.069770759 seconds time elapsed Patch count is scary, but most of them trivial. Overview: Patches 1-4 Few bug fixes. No dependencies to other patches. Probably should applied as soon as possible. Patch 5 Changes signature of pgtable_page_ctor(). We will use it for dynamic lock allocation, so it can fail. Patches 6-8 Add missing constructor/destructor calls on few archs. It's fixes NR_PAGETABLE accounting and prepare to use split ptl. Patches 9-33 Add pgtable_page_ctor() fail handling to all archs. Patches 34 Finally adds support of dynamically-allocated page->pte. Also contains documentation for split page table lock. This patch (of 34): I've missed that we preallocate few pmds on pgd_alloc() if X86_PAE enabled. Let's add missed constructor/destructor calls. I haven't noticed it during testing since prep_new_page() clears page->mapping and therefore page->ptl. It's effectively equal to spin_lock_init(&page->ptl). Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Chris Zankel <chris@zankel.net> Cc: Christoph Lameter <cl@linux.com> Cc: David Howells <dhowells@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Mikael Starvik <starvik@axis.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rob Herring <rob.herring@calxeda.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
Enable PMD split page table lock for X86_64 and PAE. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Alex Thorlton <athorlton@sgi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Robin Holt <robinmholt@gmail.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
We're going to introduce split page table lock for PMD level. Let's rename existing split ptlock for PTE level to avoid confusion. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Alex Thorlton <athorlton@sgi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Robin Holt <robinmholt@gmail.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Rafael J. Wysocki authored
Modify struct acpi_dev_node to contain a pointer to struct acpi_device associated with the given device object (that is, its ACPI companion device) instead of an ACPI handle corresponding to it. Introduce two new macros for manipulating that pointer in a CONFIG_ACPI-safe way, ACPI_COMPANION() and ACPI_COMPANION_SET(), and rework the ACPI_HANDLE() macro to take the above changes into account. Drop the ACPI_HANDLE_SET() macro entirely and rework its users to use ACPI_COMPANION_SET() instead. For some of them who used to pass the result of acpi_get_child() directly to ACPI_HANDLE_SET() introduce a helper routine acpi_preset_companion() doing an equivalent thing. The main motivation for doing this is that there are things represented by struct acpi_device objects that don't have valid ACPI handles (so called fixed ACPI hardware features, such as power and sleep buttons) and we would like to create platform device objects for them and "glue" them to their ACPI companions in the usual way (which currently is impossible due to the lack of valid ACPI handles). However, there are more reasons why it may be useful. First, struct acpi_device pointers allow of much better type checking than void pointers which are ACPI handles, so it should be more difficult to write buggy code using modified struct acpi_dev_node and the new macros. Second, the change should help to reduce (over time) the number of places in which the result of ACPI_HANDLE() is passed to acpi_bus_get_device() in order to obtain a pointer to the struct acpi_device associated with the given "physical" device, because now that pointer is returned by ACPI_COMPANION() directly. Finally, the change should make it easier to write generic code that will build both for CONFIG_ACPI set and unset without adding explicit compiler directives to it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> # on Haswell Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Aaron Lu <aaron.lu@intel.com> # for ATA and SDIO part
-
- Nov 14, 2013
-
-
Jesse Barnes authored
We've always been able to use either method on VLV, but it appears more recent BIOSes only support the gen6 method, so switch over to that. References: https://bugs.freedesktop.org/show_bug.cgi?id=71370 Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-
Thomas Gleixner authored
No point in having this bit defined by architecture. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130917183629.090698799@linutronix.de
-
Anthoine Bourgeois authored
If a nested guest does a NM fault but its CR0 doesn't contain the TS flag (because it was already cleared by the guest with L1 aid) then we have to activate FPU ourselves in L0 and then continue to L2. If TS flag is set then we fallback on the previous behavior, forward the fault to L1 if it asked for. Signed-off-by: Anthoine Bourgeois <bourgeois@bertin.fr> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- Nov 13, 2013
-
-
Vineet Gupta authored
Only a couple of arches (sh/x86) use fpu_counter in task_struct so it can be moved out into ARCH specific thread_struct, reducing the size of task_struct for other arches. Compile tested i386_defconfig + gcc 4.7.3 Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Paul Mundt <paul.mundt@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Zhi Yong Wu authored
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Tang Chen authored
The hot-Pluggable field in SRAT specifies which memory is hotpluggable. As we mentioned before, if hotpluggable memory is used by the kernel, it cannot be hot-removed. So memory hotplug users may want to set all hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it. Memory hotplug users may also set a node as movable node, which has ZONE_MOVABLE only, so that the whole node can be hot-removed. But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the kernel cannot use memory in movable nodes. This will cause NUMA performance down. And other users may be unhappy. So we need a way to allow users to enable and disable this functionality. In this patch, we introduce movable_node boot option to allow users to choose to not to consume hotpluggable memory at early boot time and later we can set it as ZONE_MOVABLE. To achieve this, the movable_node boot option will control the memblock allocation direction. That said, after memblock is ready, before SRAT is parsed, we should allocate memory near the kernel image as we explained in the previous patches. So if movable_node boot option is set, the kernel does the following: 1. After memblock is ready, make memblock allocate memory bottom up. 2. After SRAT is parsed, make memblock behave as default, allocate memory top down. Users can specify "movable_node" in kernel commandline to enable this functionality. For those who don't use memory hotplug or who don't want to lose their NUMA performance, just don't specify anything. The kernel will work as before. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Suggested-by: Ingo Molnar <mingo@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Toshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Tang Chen authored
Memory reserved for crashkernel could be large. So we should not allocate this memory bottom up from the end of kernel image. When SRAT is parsed, we will be able to know which memory is hotpluggable, and we can avoid allocating this memory for the kernel. So reorder reserve_crashkernel() after SRAT is parsed. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Toshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Tang Chen authored
The Linux kernel cannot migrate pages used by the kernel. As a result, kernel pages cannot be hot-removed. So we cannot allocate hotpluggable memory for the kernel. In a memory hotplug system, any numa node the kernel resides in should be unhotpluggable. And for a modern server, each node could have at least 16GB memory. So memory around the kernel image is highly likely unhotpluggable. ACPI SRAT (System Resource Affinity Table) contains the memory hotplug info. But before SRAT is parsed, memblock has already started to allocate memory for the kernel. So we need to prevent memblock from doing this. So direct memory mapping page tables setup is the case. init_mem_mapping() is called before SRAT is parsed. To prevent page tables being allocated within hotpluggable memory, we will use bottom-up direction to allocate page tables from the end of kernel image to the higher memory. Note: As for allocating page tables in lower memory, TJ said: : This is an optional behavior which is triggered by a very specific kernel : boot param, which I suspect is gonna need to stick around to support : memory hotplug in the current setup unless we add another layer of address : translation to support memory hotplug. As for page tables may occupy too much lower memory if using 4K mapping (CONFIG_DEBUG_PAGEALLOC and CONFIG_KMEMCHECK both disable using >4k pages), TJ said: : But as I said in the same paragraph, parsing SRAT earlier doesn't solve : the problem in itself either. Ignoring the option if 4k mapping is : required and memory consumption would be prohibitive should work, no? : Something like that would be necessary if we're gonna worry about cases : like this no matter how we implement it, but, frankly, I'm not sure this : is something worth worrying about. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Toshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Tang Chen authored
Create a new function memory_map_top_down to factor out of the top-down direct memory mapping pagetable setup. This is also a preparation for the following patch, which will introduce the bottom-up memory mapping. That said, we will put the two ways of pagetable setup into separate functions, and choose to use which way in init_mem_mapping, which makes the code more clear. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Toshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jianguo Wu authored
Use more appropriate NUMA_NO_NODE instead of -1 in all archs' module_alloc() Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Len Brown authored
Support the next generation Intel Atom processor mirco-architecture, formerly called Silvermont. The server version, formerly called "Avoton", is named the "Intel(R) Atom(TM) Processor C2000 Product Family". The client version, formerly called "Bay Trail", is named the "Intel Atom Processor Z3000 Series", as well as various "Intel Pentium Processor" and "Intel Celeron Processor" brands, depending on form-factor. Silvermont has a set of MSRs not far off from NHM, but the RAPL register set is a sub-set of those previously supported. Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-