Skip to content
  1. Dec 21, 2012
    • David Woodhouse's avatar
      x86: Default to ARCH=x86 to avoid overriding CONFIG_64BIT · ffee0de4
      David Woodhouse authored
      
      
      It is easy to waste a bunch of time when one takes a 32-bit .config
      from a test machine and try to build it on a faster 64-bit system, and
      its existing setting of CONFIG_64BIT=n gets *changed* to match the
      build host.  Similarly, if one has an existing build tree it is easy
      to trash an entire build tree that way.
      
      This is because the default setting for $ARCH when discovered from
      'uname' is one of the legacy pre-x86-merge values (i386 or x86_64),
      which effectively force the setting of CONFIG_64BIT to match. We should
      default to ARCH=x86 instead, finally completing the merge that we
      started so long ago.
      
      This patch preserves the behaviour of the legacy ARCH settings for commands
      such as:
      
         make ARCH=x86_64 randconfig
         make ARCH=i386 randconfig
      
      ... since making the value of CONFIG_64BIT actually random in that situation
      is not desirable.
      
      In time, perhaps we can retire this legacy use of the old ARCH= values.
      We already have a way to override values for *any* config option, using
      $KCONFIG_ALLCONFIG, so it could be argued that we don't necessarily need
      to keep ARCH={i386,x86_64} around as a special case just for overriding
      CONFIG_64BIT.
      
      We'd probably at least want to add a way to override config options from
      the command line ('make CONFIG_FOO=y oldconfig') before we talk about doing
      that though.
      
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Link: http://lkml.kernel.org/r/1356040315.3198.51.camel@shinybook.infradead.org
      
      
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      ffee0de4
  2. Dec 01, 2012
    • Vincent Palatin's avatar
      x86, fpu: Avoid FPU lazy restore after suspend · 644c1541
      Vincent Palatin authored
      
      
      When a cpu enters S3 state, the FPU state is lost.
      After resuming for S3, if we try to lazy restore the FPU for a process running
      on the same CPU, this will result in a corrupted FPU context.
      
      Ensure that "fpu_owner_task" is properly invalided when (re-)initializing a CPU,
      so nobody will try to lazy restore a state which doesn't exist in the hardware.
      
      Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off,
      by doing thousands of suspend/resume cycles with 4 processes doing FPU
      operations running. Without the patch, a process is killed after a
      few hundreds cycles by a SIGFPE.
      
      Cc: Duncan Laurie <dlaurie@chromium.org>
      Cc: Olof Johansson <olofj@chromium.org>
      Cc: <stable@kernel.org> v3.4+ # for 3.4 need to replace this_cpu_write by percpu_write
      Signed-off-by: default avatarVincent Palatin <vpalatin@chromium.org>
      Link: http://lkml.kernel.org/r/1354306532-1014-1-git-send-email-vpalatin@chromium.org
      
      
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      644c1541
  3. Nov 28, 2012
    • H. Peter Anvin's avatar
      x86-32: Unbreak booting on some 486 clones · 6662c34f
      H. Peter Anvin authored
      
      
      There appear to have been some 486 clones, including the "enhanced"
      version of Am486, which have CPUID but not CR4.  These 486 clones had
      only the FPU flag, if any, unlike the Intel 486s with CPUID, which
      also had VME and therefore needed CR4.
      
      Therefore, look at the basic CPUID flags and require at least one bit
      other than bit 0 before we modify CR4.
      
      Thanks to Christian Ludloff of sandpile.org for confirming this as a
      problem.
      
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      6662c34f
  4. Nov 27, 2012
  5. Nov 21, 2012
  6. Nov 17, 2012
    • Takashi Iwai's avatar
      KVM: x86: Fix invalid secondary exec controls in vmx_cpuid_update() · 29282fde
      Takashi Iwai authored
      The commit [ad756a16
      
      : KVM: VMX: Implement PCID/INVPCID for guests with
      EPT] introduced the unconditional access to SECONDARY_VM_EXEC_CONTROL,
      and this triggers kernel warnings like below on old CPUs:
      
          vmwrite error: reg 401e value a0568000 (err 12)
          Pid: 13649, comm: qemu-kvm Not tainted 3.7.0-rc4-test2+ #154
          Call Trace:
           [<ffffffffa0558d86>] vmwrite_error+0x27/0x29 [kvm_intel]
           [<ffffffffa054e8cb>] vmcs_writel+0x1b/0x20 [kvm_intel]
           [<ffffffffa054f114>] vmx_cpuid_update+0x74/0x170 [kvm_intel]
           [<ffffffffa03629b6>] kvm_vcpu_ioctl_set_cpuid2+0x76/0x90 [kvm]
           [<ffffffffa0341c67>] kvm_arch_vcpu_ioctl+0xc37/0xed0 [kvm]
           [<ffffffff81143f7c>] ? __vunmap+0x9c/0x110
           [<ffffffffa0551489>] ? vmx_vcpu_load+0x39/0x1a0 [kvm_intel]
           [<ffffffffa0340ee2>] ? kvm_arch_vcpu_load+0x52/0x1a0 [kvm]
           [<ffffffffa032dcd4>] ? vcpu_load+0x74/0xd0 [kvm]
           [<ffffffffa032deb0>] kvm_vcpu_ioctl+0x110/0x5e0 [kvm]
           [<ffffffffa032e93d>] ? kvm_dev_ioctl+0x4d/0x4a0 [kvm]
           [<ffffffff8117dc6f>] do_vfs_ioctl+0x8f/0x530
           [<ffffffff81139d76>] ? remove_vma+0x56/0x60
           [<ffffffff8113b708>] ? do_munmap+0x328/0x400
           [<ffffffff81187c8c>] ? fget_light+0x4c/0x100
           [<ffffffff8117e1a1>] sys_ioctl+0x91/0xb0
           [<ffffffff815a942d>] system_call_fastpath+0x1a/0x1f
      
      This patch adds a check for the availability of secondary exec
      control to avoid these warnings.
      
      Cc: <stable@vger.kernel.org> [v3.6+]
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      29282fde
  7. Nov 15, 2012
  8. Nov 13, 2012
    • Petr Matousek's avatar
      KVM: x86: invalid opcode oops on SET_SREGS with OSXSAVE bit set (CVE-2012-4461) · 6d1068b3
      Petr Matousek authored
      
      
      On hosts without the XSAVE support unprivileged local user can trigger
      oops similar to the one below by setting X86_CR4_OSXSAVE bit in guest
      cr4 register using KVM_SET_SREGS ioctl and later issuing KVM_RUN
      ioctl.
      
      invalid opcode: 0000 [#2] SMP
      Modules linked in: tun ip6table_filter ip6_tables ebtable_nat ebtables
      ...
      Pid: 24935, comm: zoog_kvm_monito Tainted: G      D      3.2.0-3-686-pae
      EIP: 0060:[<f8b9550c>] EFLAGS: 00210246 CPU: 0
      EIP is at kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm]
      EAX: 00000001 EBX: 000f387e ECX: 00000000 EDX: 00000000
      ESI: 00000000 EDI: 00000000 EBP: ef5a0060 ESP: d7c63e70
       DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      Process zoog_kvm_monito (pid: 24935, ti=d7c62000 task=ed84a0c0
      task.ti=d7c62000)
      Stack:
       00000001 f70a1200 f8b940a9 ef5a0060 00000000 00200202 f8769009 00000000
       ef5a0060 000f387e eda5c020 8722f9c8 00015bae 00000000 ed84a0c0 ed84a0c0
       c12bf02d 0000ae80 ef7f8740 fffffffb f359b740 ef5a0060 f8b85dc1 0000ae80
      Call Trace:
       [<f8b940a9>] ? kvm_arch_vcpu_ioctl_set_sregs+0x2fe/0x308 [kvm]
      ...
       [<c12bfb44>] ? syscall_call+0x7/0xb
      Code: 89 e8 e8 14 ee ff ff ba 00 00 04 00 89 e8 e8 98 48 ff ff 85 c0 74
      1e 83 7d 48 00 75 18 8b 85 08 07 00 00 31 c9 8b 95 0c 07 00 00 <0f> 01
      d1 c7 45 48 01 00 00 00 c7 45 1c 01 00 00 00 0f ae f0 89
      EIP: [<f8b9550c>] kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm] SS:ESP
      0068:d7c63e70
      
      QEMU first retrieves the supported features via KVM_GET_SUPPORTED_CPUID
      and then sets them later. So guest's X86_FEATURE_XSAVE should be masked
      out on hosts without X86_FEATURE_XSAVE, making kvm_set_cr4 with
      X86_CR4_OSXSAVE fail. Userspaces that allow specifying guest cpuid with
      X86_FEATURE_XSAVE even on hosts that do not support it, might be
      susceptible to this attack from inside the guest as well.
      
      Allow setting X86_CR4_OSXSAVE bit only if host has XSAVE support.
      
      Signed-off-by: default avatarPetr Matousek <pmatouse@redhat.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      6d1068b3
  9. Nov 08, 2012
  10. Nov 04, 2012
    • Jan Beulich's avatar
      xen/hypercall: fix hypercall fallback code for very old hypervisors · cf47a83f
      Jan Beulich authored
      
      
      While copying the argument structures in HYPERVISOR_event_channel_op()
      and HYPERVISOR_physdev_op() into the local variable is sufficiently
      safe even if the actual structure is smaller than the container one,
      copying back eventual output values the same way isn't: This may
      collide with on-stack variables (particularly "rc") which may change
      between the first and second memcpy() (i.e. the second memcpy() could
      discard that change).
      
      Move the fallback code into out-of-line functions, and handle all of
      the operations known by this old a hypervisor individually: Some don't
      require copying back anything at all, and for the rest use the
      individual argument structures' sizes rather than the container's.
      
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      [v2: Reduce #define/#undef usage in HYPERVISOR_physdev_op_compat().]
      [v3: Fix compile errors when modules use said hypercalls]
      [v4: Add xen_ prefix to the HYPERCALL_..]
      [v5: Alter the name and only EXPORT_SYMBOL_GPL one of them]
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      cf47a83f
  11. Nov 01, 2012
    • Xiao Guangrong's avatar
      KVM: x86: fix vcpu->mmio_fragments overflow · 87da7e66
      Xiao Guangrong authored
      After commit b3356bf0
      
       (KVM: emulator: optimize "rep ins" handling),
      the pieces of io data can be collected and write them to the guest memory
      or MMIO together
      
      Unfortunately, kvm splits the mmio access into 8 bytes and store them to
      vcpu->mmio_fragments. If the guest uses "rep ins" to move large data, it
      will cause vcpu->mmio_fragments overflow
      
      The bug can be exposed by isapc (-M isapc):
      
      [23154.818733] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
      [ ......]
      [23154.858083] Call Trace:
      [23154.859874]  [<ffffffffa04f0e17>] kvm_get_cr8+0x1d/0x28 [kvm]
      [23154.861677]  [<ffffffffa04fa6d4>] kvm_arch_vcpu_ioctl_run+0xcda/0xe45 [kvm]
      [23154.863604]  [<ffffffffa04f5a1a>] ? kvm_arch_vcpu_load+0x17b/0x180 [kvm]
      
      Actually, we can use one mmio_fragment to store a large mmio access then
      split it when we pass the mmio-exit-info to userspace. After that, we only
      need two entries to store mmio info for the cross-mmio pages access
      
      Signed-off-by: default avatarXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      87da7e66
    • Andre Przywara's avatar
      x86, amd: Disable way access filter on Piledriver CPUs · 2bbf0a14
      Andre Przywara authored
      The Way Access Filter in recent AMD CPUs may hurt the performance of
      some workloads, caused by aliasing issues in the L1 cache.
      This patch disables it on the affected CPUs.
      
      The issue is similar to that one of last year:
      http://lkml.indiana.edu/hypermail/linux/kernel/1107.3/00041.html
      This new patch does not replace the old one, we just need another
      quirk for newer CPUs.
      
      The performance penalty without the patch depends on the
      circumstances, but is a bit less than the last year's 3%.
      
      The workloads affected would be those that access code from the same
      physical page under different virtual addresses, so different
      processes using the same libraries with ASLR or multiple instances of
      PIE-binaries. The code needs to be accessed simultaneously from both
      cores of the same compute unit.
      
      More details can be found here:
      http://developer.amd.com/Assets/SharedL1InstructionCacheonAMD15hCPU.pdf
      
      
      
      CPUs affected are anything with the core known as Piledriver.
      That includes the new parts of the AMD A-Series (aka Trinity) and the
      just released new CPUs of the FX-Series (aka Vishera).
      The model numbering is a bit odd here: FX CPUs have model 2,
      A-Series has model 10h, with possible extensions to 1Fh. Hence the
      range of model ids.
      
      Signed-off-by: default avatarAndre Przywara <osp@andrep.de>
      Link: http://lkml.kernel.org/r/1351700450-9277-1-git-send-email-osp@andrep.de
      
      
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      2bbf0a14
    • Konrad Rzeszutek Wilk's avatar
      xen/mmu: Use Xen specific TLB flush instead of the generic one. · 95a7d768
      Konrad Rzeszutek Wilk authored
      
      
      As Mukesh explained it, the MMUEXT_TLB_FLUSH_ALL allows the
      hypervisor to do a TLB flush on all active vCPUs. If instead
      we were using the generic one (which ends up being xen_flush_tlb)
      we end up making the MMUEXT_TLB_FLUSH_LOCAL hypercall. But
      before we make that hypercall the kernel will IPI all of the
      vCPUs (even those that were asleep from the hypervisor
      perspective). The end result is that we needlessly wake them
      up and do a TLB flush when we can just let the hypervisor
      do it correctly.
      
      This patch gives around 50% speed improvement when migrating
      idle guest's from one host to another.
      
      Oracle-bug: 14630170
      
      CC: stable@vger.kernel.org
      Tested-by: default avatarJingjie Jiang <jingjie.jiang@oracle.com>
      Suggested-by: default avatarMukesh Rathor <mukesh.rathor@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      95a7d768
  12. Oct 31, 2012
    • Tang Chen's avatar
      x86/mce: Do not change worker's running cpu in cmci_rediscover(). · 85b97637
      Tang Chen authored
      
      
      cmci_rediscover() used set_cpus_allowed_ptr() to change the current process's
      running cpu, and migrate itself to the dest cpu. But worker processes are not
      allowed to be migrated. If current is a worker, the worker will be migrated to
      another cpu, but the corresponding  worker_pool is still on the original cpu.
      
      In this case, the following BUG_ON in try_to_wake_up_local() will be triggered:
      BUG_ON(rq != this_rq());
      
      This will cause the kernel panic. The call trace is like the following:
      
      [ 6155.451107] ------------[ cut here ]------------
      [ 6155.452019] kernel BUG at kernel/sched/core.c:1654!
      ......
      [ 6155.452019] RIP: 0010:[<ffffffff810add15>]  [<ffffffff810add15>] try_to_wake_up_local+0x115/0x130
      ......
      [ 6155.452019] Call Trace:
      [ 6155.452019]  [<ffffffff8166fc14>] __schedule+0x764/0x880
      [ 6155.452019]  [<ffffffff81670059>] schedule+0x29/0x70
      [ 6155.452019]  [<ffffffff8166de65>] schedule_timeout+0x235/0x2d0
      [ 6155.452019]  [<ffffffff810db57d>] ? mark_held_locks+0x8d/0x140
      [ 6155.452019]  [<ffffffff810dd463>] ? __lock_release+0x133/0x1a0
      [ 6155.452019]  [<ffffffff81671c50>] ? _raw_spin_unlock_irq+0x30/0x50
      [ 6155.452019]  [<ffffffff810db8f5>] ? trace_hardirqs_on_caller+0x105/0x190
      [ 6155.452019]  [<ffffffff8166fefb>] wait_for_common+0x12b/0x180
      [ 6155.452019]  [<ffffffff810b0b30>] ? try_to_wake_up+0x2f0/0x2f0
      [ 6155.452019]  [<ffffffff8167002d>] wait_for_completion+0x1d/0x20
      [ 6155.452019]  [<ffffffff8110008a>] stop_one_cpu+0x8a/0xc0
      [ 6155.452019]  [<ffffffff810abd40>] ? __migrate_task+0x1a0/0x1a0
      [ 6155.452019]  [<ffffffff810a6ab8>] ? complete+0x28/0x60
      [ 6155.452019]  [<ffffffff810b0fd8>] set_cpus_allowed_ptr+0x128/0x130
      [ 6155.452019]  [<ffffffff81036785>] cmci_rediscover+0xf5/0x140
      [ 6155.452019]  [<ffffffff816643c0>] mce_cpu_callback+0x18d/0x19d
      [ 6155.452019]  [<ffffffff81676187>] notifier_call_chain+0x67/0x150
      [ 6155.452019]  [<ffffffff810a03de>] __raw_notifier_call_chain+0xe/0x10
      [ 6155.452019]  [<ffffffff81070470>] __cpu_notify+0x20/0x40
      [ 6155.452019]  [<ffffffff810704a5>] cpu_notify_nofail+0x15/0x30
      [ 6155.452019]  [<ffffffff81655182>] _cpu_down+0x262/0x2e0
      [ 6155.452019]  [<ffffffff81655236>] cpu_down+0x36/0x50
      [ 6155.452019]  [<ffffffff813d3eaa>] acpi_processor_remove+0x50/0x11e
      [ 6155.452019]  [<ffffffff813a6978>] acpi_device_remove+0x90/0xb2
      [ 6155.452019]  [<ffffffff8143cbec>] __device_release_driver+0x7c/0xf0
      [ 6155.452019]  [<ffffffff8143cd6f>] device_release_driver+0x2f/0x50
      [ 6155.452019]  [<ffffffff813a7870>] acpi_bus_remove+0x32/0x6d
      [ 6155.452019]  [<ffffffff813a7932>] acpi_bus_trim+0x87/0xee
      [ 6155.452019]  [<ffffffff813a7a21>] acpi_bus_hot_remove_device+0x88/0x16b
      [ 6155.452019]  [<ffffffff813a33ee>] acpi_os_execute_deferred+0x27/0x34
      [ 6155.452019]  [<ffffffff81090589>] process_one_work+0x219/0x680
      [ 6155.452019]  [<ffffffff81090528>] ? process_one_work+0x1b8/0x680
      [ 6155.452019]  [<ffffffff813a33c7>] ? acpi_os_wait_events_complete+0x23/0x23
      [ 6155.452019]  [<ffffffff810923be>] worker_thread+0x12e/0x320
      [ 6155.452019]  [<ffffffff81092290>] ? manage_workers+0x110/0x110
      [ 6155.452019]  [<ffffffff81098396>] kthread+0xc6/0xd0
      [ 6155.452019]  [<ffffffff8167c4c4>] kernel_thread_helper+0x4/0x10
      [ 6155.452019]  [<ffffffff81671f30>] ? retint_restore_args+0x13/0x13
      [ 6155.452019]  [<ffffffff810982d0>] ? __init_kthread_worker+0x70/0x70
      [ 6155.452019]  [<ffffffff8167c4c0>] ? gs_change+0x13/0x13
      
      This patch removes the set_cpus_allowed_ptr() call, and put the cmci rediscover
      jobs onto all the other cpus using system_wq. This could bring some delay for
      the jobs.
      
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      85b97637
  13. Oct 30, 2012
  14. Oct 28, 2012
  15. Oct 26, 2012
  16. Oct 25, 2012
  17. Oct 24, 2012