Skip to content
  1. Dec 27, 2011
    • Xiao Guangrong's avatar
      KVM: x86: cleanup port-in/port-out emulated · 6f6fbe98
      Xiao Guangrong authored
      
      
      Remove the same code between emulator_pio_in_emulated and
      emulator_pio_out_emulated
      
      Signed-off-by: default avatarXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      6f6fbe98
    • Xiao Guangrong's avatar
      KVM: x86: retry non-page-table writing instructions · 1cb3f3ae
      Xiao Guangrong authored
      
      
      If the emulation is caused by #PF and it is non-page_table writing instruction,
      it means the VM-EXIT is caused by shadow page protected, we can zap the shadow
      page and retry this instruction directly
      
      The idea is from Avi
      
      Signed-off-by: default avatarXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      1cb3f3ae
    • Xiao Guangrong's avatar
      KVM: x86: tag the instructions which are used to write page table · d5ae7ce8
      Xiao Guangrong authored
      
      
      The idea is from Avi:
      | tag instructions that are typically used to modify the page tables, and
      | drop shadow if any other instruction is used.
      | The list would include, I'd guess, and, or, bts, btc, mov, xchg, cmpxchg,
      | and cmpxchg8b.
      
      This patch is used to tag the instructions and in the later path, shadow page
      is dropped if it is written by other instructions
      
      Signed-off-by: default avatarXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      d5ae7ce8
    • Xiao Guangrong's avatar
      KVM: MMU: avoid pte_list_desc running out in kvm_mmu_pte_write · f759e2b4
      Xiao Guangrong authored
      
      
      kvm_mmu_pte_write is unsafe since we need to alloc pte_list_desc in the
      function when spte is prefetched, unfortunately, we can not know how many
      spte need to be prefetched on this path, that means we can use out of the
      free  pte_list_desc object in the cache, and BUG_ON() is triggered, also some
      path does not fill the cache, such as INS instruction emulated that does not
      trigger page fault
      
      Signed-off-by: default avatarXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      f759e2b4
    • Nadav Har'El's avatar
      KVM: nVMX: Fix warning-causing idt-vectoring-info behavior · 51cfe38e
      Nadav Har'El authored
      
      
      When L0 wishes to inject an interrupt while L2 is running, it emulates an exit
      to L1 with EXIT_REASON_EXTERNAL_INTERRUPT. This was explained in the original
      nVMX patch 23, titled "Correct handling of interrupt injection".
      
      Unfortunately, it is possible (though rare) that at this point there is valid
      idt_vectoring_info in vmcs02. For example, L1 injected some interrupt to L2,
      and when L2 tried to run this interrupt's handler, it got a page fault - so
      it returns the original interrupt vector in idt_vectoring_info. The problem
      is that if this is the case, we cannot exit to L1 with EXTERNAL_INTERRUPT
      like we wished to, because the VMX spec guarantees that idt_vectoring_info
      and exit_reason_external_interrupt can never happen together. This is not
      just specified in the spec - a KVM L1 actually prints a kernel warning
      "unexpected, valid vectoring info" if we violate this guarantee, and some
      users noticed these warnings in L1's logs.
      
      In order to better emulate a processor, which would never return the external
      interrupt and the idt-vectoring-info together, we need to separate the two
      injection steps: First, complete L1's injection into L2 (i.e., enter L2,
      injecting to it the idt-vectoring-info); Second, after entry into L2 succeeds
      and it exits back to L0, exit to L1 with the EXIT_REASON_EXTERNAL_INTERRUPT.
      Most of this is already in the code - the only change we need is to remain
      in L2 (and not exit to L1) in this case.
      
      Note that the previous patch ensures (by using KVM_REQ_IMMEDIATE_EXIT) that
      although we do enter L2 first, it will exit immediately after processing its
      injection, allowing us to promptly inject to L1.
      
      Note how we test vmcs12->idt_vectoring_info_field; This isn't really the
      vmcs12 value (we haven't exited to L1 yet, so vmcs12 hasn't been updated),
      but rather the place we save, at the end of vmx_vcpu_run, the vmcs02 value
      of this field. This was explained in patch 25 ("Correct handling of idt
      vectoring info") of the original nVMX patch series.
      
      Thanks to Dave Allan and to Federico Simoncelli for reporting this bug,
      to Abel Gordon for helping me figure out the solution, and to Avi Kivity
      for helping to improve it.
      
      Signed-off-by: default avatarNadav Har'El <nyh@il.ibm.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      51cfe38e
    • Nadav Har'El's avatar
      KVM: nVMX: Add KVM_REQ_IMMEDIATE_EXIT · d6185f20
      Nadav Har'El authored
      
      
      This patch adds a new vcpu->requests bit, KVM_REQ_IMMEDIATE_EXIT.
      This bit requests that when next entering the guest, we should run it only
      for as little as possible, and exit again.
      
      We use this new option in nested VMX: When L1 launches L2, but L0 wishes L1
      to continue running so it can inject an event to it, we unfortunately cannot
      just pretend to have run L2 for a little while - We must really launch L2,
      otherwise certain one-off vmcs12 parameters (namely, L1 injection into L2)
      will be lost. So the existing code runs L2 in this case.
      But L2 could potentially run for a long time until it exits, and the
      injection into L1 will be delayed. The new KVM_REQ_IMMEDIATE_EXIT allows us
      to request that L2 will be entered, as necessary, but will exit as soon as
      possible after entry.
      
      Our implementation of this request uses smp_send_reschedule() to send a
      self-IPI, with interrupts disabled. The interrupts remain disabled until the
      guest is entered, and then, after the entry is complete (often including
      processing an injection and jumping to the relevant handler), the physical
      interrupt is noticed and causes an exit.
      
      On recent Intel processors, we could have achieved the same goal by using
      MTF instead of a self-IPI. Another technique worth considering in the future
      is to use VM_EXIT_ACK_INTR_ON_EXIT and a highest-priority vector IPI - to
      slightly improve performance by avoiding the useless interrupt handler
      which ends up being called when smp_send_reschedule() is used.
      
      Signed-off-by: default avatarNadav Har'El <nyh@il.ibm.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      d6185f20
    • Keith Packard's avatar
      drm/i915: Disable RC6 on Sandybridge by default · 371de6e4
      Keith Packard authored
      
      
      RC6 fails again.
      
      > I found my system freeze mostly during starting up X and KDE. Sometimes it
      > works for some minutes, sometimes it freezes immediatly. When the freeze
      > happens, everything is dead (even the reset button does not work, I need to
      > power cycle).
      
      > I disabled RC6, and my system runs wonderfully.
      
      > The system is a Z68 Pro board with Sandybridge i5-2500K processor, 8
      > GB of RAM and UEFI firmware.
      
      Reported-by: default avatarKai Krakow <hurikhan77@gmail.com>
      Signed-off-by: default avatarKeith Packard <keithp@keithp.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      371de6e4
    • Keith Packard's avatar
      drm/i915: Disable semaphores by default on SNB · ebbd857e
      Keith Packard authored
      
      
      Semaphores still cause problems on some machines:
      
      > From Udo Steinberg:
      >
      > With Linux-3.2-rc6 I'm frequently seeing GPU hangs when large amounts of
      > text scroll in an xterm, such as when extracting a tar archive. Such as this
      > one (note the timestamps):
      >
      >  I can reproduce it fairly easily with something
      >  as simple as:
      >
      >	  while true; do dmesg; done
      
      This patch turns them off on SNB while leaving them on for IVB.
      
      Reported-by: default avatarUdo Steinberg <udo@hypervisor.org>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Eugeni Dodonov <eugeni@dodonov.net>
      Signed-off-by: default avatarKeith Packard <keithp@keithp.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ebbd857e
    • Linus Torvalds's avatar
      Merge branch 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 7f54492f
      Linus Torvalds authored
      * 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: PPC: e500: include linux/export.h
        KVM: PPC: fix kvmppc_start_thread() for CONFIG_SMP=N
        KVM: PPC: protect use of kvmppc_h_pr
        KVM: PPC: move compute_tlbie_rb to book3s_64 common header
        KVM: Don't automatically expose the TSC deadline timer in cpuid
        KVM: Device assignment permission checks
        KVM: Remove ability to assign a device without iommu support
        KVM: x86: Prevent starting PIT timers in the absence of irqchip support
      7f54492f
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 · 6fd8fb7f
      Linus Torvalds authored
      post 3.2-rc7 pull request
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
        MAINTAINERS: firewire git URL update
      6fd8fb7f
    • Linus Torvalds's avatar
      vfs: fix handling of lock allocation failure in lease-break case · 6d4b9e38
      Linus Torvalds authored
      Bruce Fields notes that commit 778fc546
      
       ("locks: fix tracking of
      inprogress lease breaks") introduced a possible error pointer
      dereference on failure to allocate memory.  locks_conflict() will
      dereference the passed-in new lease lock structure that may be an error pointer.
      
      This means an open (without O_NONBLOCK set) on a file with a lease
      applied (generally only done when Samba or nfsd (with v4) is running)
      could crash if a kmalloc() fails.
      
      So instead of playing games with IS_ERROR() all over the place, just
      check the allocation failure early.  That makes the code more
      straightforward, and avoids this possible bad pointer dereference.
      
      Based-on-patch-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6d4b9e38
  2. Dec 26, 2011
  3. Dec 25, 2011
  4. Dec 24, 2011
  5. Dec 23, 2011
    • Florian Westphal's avatar
      netfilter: xt_connbytes: handle negation correctly · 0354b48f
      Florian Westphal authored
      
      
      "! --connbytes 23:42" should match if the packet/byte count is not in range.
      
      As there is no explict "invert match" toggle in the match structure,
      userspace swaps the from and to arguments
      (i.e., as if "--connbytes 42:23" were given).
      
      However, "what <= 23 && what >= 42" will always be false.
      
      Change things so we use "||" in case "from" is larger than "to".
      
      This change may look like it breaks backwards compatibility when "to" is 0.
      However, older iptables binaries will refuse "connbytes 42:0",
      and current releases treat it to mean "! --connbytes 0:42",
      so we should be fine.
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0354b48f
    • Al Viro's avatar
      Btrfs: call d_instantiate after all ops are setup · 08c422c2
      Al Viro authored
      
      
      This closes races where btrfs is calling d_instantiate too soon during
      inode creation.  All of the callers of btrfs_add_nondir are updated to
      instantiate after the inode is fully setup in memory.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      08c422c2
    • Chris Mason's avatar
      Btrfs: fix worker lock misuse in find_worker · 8d532b2a
      Chris Mason authored
      
      
      Dan Carpenter noticed that we were doing a double unlock on the worker
      lock, and sometimes picking a worker thread without the lock held.
      
      This fixes both errors.
      
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      8d532b2a
    • Eric Dumazet's avatar
      net: relax rcvbuf limits · 0fd7bac6
      Eric Dumazet authored
      skb->truesize might be big even for a small packet.
      
      Its even bigger after commit 87fb4b7b
      
       (net: more accurate skb
      truesize) and big MTU.
      
      We should allow queueing at least one packet per receiver, even with a
      low RCVBUF setting.
      
      Reported-by: default avatarMichal Simek <monstr@monstr.eu>
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fd7bac6
    • Xi Wang's avatar
      rps: fix insufficient bounds checking in store_rps_dev_flow_table_cnt() · a0a129f8
      Xi Wang authored
      
      
      Setting a large rps_flow_cnt like (1 << 30) on 32-bit platform will
      cause a kernel oops due to insufficient bounds checking.
      
      	if (count > 1<<30) {
      		/* Enforce a limit to prevent overflow */
      		return -EINVAL;
      	}
      	count = roundup_pow_of_two(count);
      	table = vmalloc(RPS_DEV_FLOW_TABLE_SIZE(count));
      
      Note that the macro RPS_DEV_FLOW_TABLE_SIZE(count) is defined as:
      
      	... + (count * sizeof(struct rps_dev_flow))
      
      where sizeof(struct rps_dev_flow) is 8.  (1 << 30) * 8 will overflow
      32 bits.
      
      This patch replaces the magic number (1 << 30) with a symbolic bound.
      
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarXi Wang <xi.wang@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0a129f8
    • Eric Dumazet's avatar
      net: introduce DST_NOPEER dst flag · e688a604
      Eric Dumazet authored
      Chris Boot reported crashes occurring in ipv6_select_ident().
      
      [  461.457562] RIP: 0010:[<ffffffff812dde61>]  [<ffffffff812dde61>]
      ipv6_select_ident+0x31/0xa7
      
      [  461.578229] Call Trace:
      [  461.580742] <IRQ>
      [  461.582870]  [<ffffffff812efa7f>] ? udp6_ufo_fragment+0x124/0x1a2
      [  461.589054]  [<ffffffff812dbfe0>] ? ipv6_gso_segment+0xc0/0x155
      [  461.595140]  [<ffffffff812700c6>] ? skb_gso_segment+0x208/0x28b
      [  461.601198]  [<ffffffffa03f236b>] ? ipv6_confirm+0x146/0x15e
      [nf_conntrack_ipv6]
      [  461.608786]  [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77
      [  461.614227]  [<ffffffff81271d64>] ? dev_hard_start_xmit+0x357/0x543
      [  461.620659]  [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111
      [  461.626440]  [<ffffffffa0379745>] ? br_parse_ip_options+0x19a/0x19a
      [bridge]
      [  461.633581]  [<ffffffff812722ff>] ? dev_queue_xmit+0x3af/0x459
      [  461.639577]  [<ffffffffa03747d2>] ? br_dev_queue_push_xmit+0x72/0x76
      [bridge]
      [  461.646887]  [<ffffffffa03791e3>] ? br_nf_post_routing+0x17d/0x18f
      [bridge]
      [  461.653997]  [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77
      [  461.659473]  [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge]
      [  461.665485]  [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111
      [  461.671234]  [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge]
      [  461.677299]  [<ffffffffa0379215>] ?
      nf_bridge_update_protocol+0x20/0x20 [bridge]
      [  461.684891]  [<ffffffffa03bb0e5>] ? nf_ct_zone+0xa/0x17 [nf_conntrack]
      [  461.691520]  [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge]
      [  461.697572]  [<ffffffffa0374812>] ? NF_HOOK.constprop.8+0x3c/0x56
      [bridge]
      [  461.704616]  [<ffffffffa0379031>] ?
      nf_bridge_push_encap_header+0x1c/0x26 [bridge]
      [  461.712329]  [<ffffffffa037929f>] ? br_nf_forward_finish+0x8a/0x95
      [bridge]
      [  461.719490]  [<ffffffffa037900a>] ?
      nf_bridge_pull_encap_header+0x1c/0x27 [bridge]
      [  461.727223]  [<ffffffffa0379974>] ? br_nf_forward_ip+0x1c0/0x1d4 [bridge]
      [  461.734292]  [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77
      [  461.739758]  [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge]
      [  461.746203]  [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111
      [  461.751950]  [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge]
      [  461.758378]  [<ffffffffa037533a>] ? NF_HOOK.constprop.4+0x56/0x56
      [bridge]
      
      This is caused by bridge netfilter special dst_entry (fake_rtable), a
      special shared entry, where attaching an inetpeer makes no sense.
      
      Problem is present since commit 87c48fa3
      
       (ipv6: make fragment
      identifications less predictable)
      
      Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and
      __ip_select_ident() fallback to the 'no peer attached' handling.
      
      Reported-by: default avatarChris Boot <bootc@bootc.net>
      Tested-by: default avatarChris Boot <bootc@bootc.net>
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e688a604
    • Thomas Graf's avatar
      mqprio: Avoid panic if no options are provided · 7838f2ce
      Thomas Graf authored
      
      
      Userspace may not provide TCA_OPTIONS, in fact tc currently does
      so not do so if no arguments are specified on the command line.
      Return EINVAL instead of panicing.
      
      Signed-off-by: default avatarThomas Graf <tgraf@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7838f2ce
    • Eric Dumazet's avatar
      bridge: provide a mtu() method for fake_dst_ops · a13861a2
      Eric Dumazet authored
      Commit 618f9bc7
      
       (net: Move mtu handling down to the protocol
      depended handlers) forgot the bridge netfilter case, adding a NULL
      dereference in ip_fragment().
      
      Reported-by: default avatarChris Boot <bootc@bootc.net>
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a13861a2
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://neil.brown.name/md · ad1fca20
      Linus Torvalds authored
      * 'for-linus' of git://neil.brown.name/md:
        md/bitmap: It is OK to clear bits during recovery.
        md: don't give up looking for spares on first failure-to-add
        md/raid5: ensure correct assessment of drives during degraded reshape.
        md/linear: fix hot-add of devices to linear arrays.
      ad1fca20
    • NeilBrown's avatar
      md/bitmap: It is OK to clear bits during recovery. · 961902c0
      NeilBrown authored
      commit d0a4bb49
      
       introduced a
      regression which is annoying but fairly harmless.
      
      When writing to an array that is undergoing recovery (a spare
      in being integrated into the array), writing to the array will
      set bits in the bitmap, but they will not be cleared when the
      write completes.
      
      For bits covering areas that have not been recovered yet this is not a
      problem as the recovery will clear the bits.  However bits set in
      already-recovered region will stay set and never be cleared.
      This doesn't risk data integrity.  The only negatives are:
       - next time there is a crash, more resyncing than necessary will
         be done.
       - the bitmap doesn't look clean, which is confusing.
      
      While an array is recovering we don't want to update the
      'events_cleared' setting in the bitmap but we do still want to clear
      bits that have very recently been set - providing they were written to
      the recovering device.
      
      So split those two needs - which previously both depended on 'success'
      and always clear the bit of the write went to all devices.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      961902c0