Skip to content
  1. Mar 12, 2014
    • Neil Horman's avatar
      vmxnet3: fix netpoll race condition · d25f06ea
      Neil Horman authored
      
      
      vmxnet3's netpoll driver is incorrectly coded.  It directly calls
      vmxnet3_do_poll, which is the driver internal napi poll routine.  As the netpoll
      controller method doesn't block real napi polls in any way, there is a potential
      for race conditions in which the netpoll controller method and the napi poll
      method run concurrently.  The result is data corruption causing panics such as this
      one recently observed:
      PID: 1371   TASK: ffff88023762caa0  CPU: 1   COMMAND: "rs:main Q:Reg"
       #0 [ffff88023abd5780] machine_kexec at ffffffff81038f3b
       #1 [ffff88023abd57e0] crash_kexec at ffffffff810c5d92
       #2 [ffff88023abd58b0] oops_end at ffffffff8152b570
       #3 [ffff88023abd58e0] die at ffffffff81010e0b
       #4 [ffff88023abd5910] do_trap at ffffffff8152add4
       #5 [ffff88023abd5970] do_invalid_op at ffffffff8100cf95
       #6 [ffff88023abd5a10] invalid_op at ffffffff8100bf9b
          [exception RIP: vmxnet3_rq_rx_complete+1968]
          RIP: ffffffffa00f1e80  RSP: ffff88023abd5ac8  RFLAGS: 00010086
          RAX: 0000000000000000  RBX: ffff88023b5dcee0  RCX: 00000000000000c0
          RDX: 0000000000000000  RSI: 00000000000005f2  RDI: ffff88023b5dcee0
          RBP: ffff88023abd5b48   R8: 0000000000000000   R9: ffff88023a3b6048
          R10: 0000000000000000  R11: 0000000000000002  R12: ffff8802398d4cd8
          R13: ffff88023af35140  R14: ffff88023b60c890  R15: 0000000000000000
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #7 [ffff88023abd5b50] vmxnet3_do_poll at ffffffffa00f204a [vmxnet3]
       #8 [ffff88023abd5b80] vmxnet3_netpoll at ffffffffa00f209c [vmxnet3]
       #9 [ffff88023abd5ba0] netpoll_poll_dev at ffffffff81472bb7
      
      The fix is to do as other drivers do, and have the poll controller call the top
      half interrupt handler, which schedules a napi poll properly to recieve frames
      
      Tested by myself, successfully.
      
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      CC: Shreyas Bhatewara <sbhatewara@vmware.com>
      CC: "VMware, Inc." <pv-drivers@vmware.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: stable@vger.kernel.org
      Reviewed-by: default avatarShreyas N Bhatewara <sbhatewara@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d25f06ea
  2. Mar 11, 2014
  3. Mar 10, 2014
    • Michael Chan's avatar
      bnx2: Fix shutdown sequence · a8d9bc2e
      Michael Chan authored
      The pci shutdown handler added in:
      
          bnx2: Add pci shutdown handler
          commit 25bfb1dd
      
      
      
      created a shutdown down sequence without chip reset if the device was
      never brought up.  This can cause the firmware to shutdown the PHY
      prematurely and cause MMIO read cycles to be unresponsive.  On some
      systems, it may generate NMI in the bnx2's pci shutdown handler.
      
      The fix is to tell the firmware not to shutdown the PHY if there was
      no prior chip reset.
      
      Signed-off-by: default avatarMichael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8d9bc2e
  4. Mar 08, 2014
  5. Mar 07, 2014
  6. Mar 06, 2014
    • Haiyang Zhang's avatar
      hyperv: Move state setting for link query · 1b07da51
      Haiyang Zhang authored
      
      
      It moves the state setting for query into rndis_filter_receive_response().
      All callbacks including query-complete and status-callback are synchronized
      by channel->inbound_lock. This prevents pentential race between them.
      
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b07da51
    • Soren Brinkmann's avatar
      net: macb: DMA-unmap full rx-buffer · 48330e08
      Soren Brinkmann authored
      
      
      When allocating RX buffers a fixed size is used, while freeing is based
      on actually received bytes, resulting in the following kernel warning
      when CONFIG_DMA_API_DEBUG is enabled:
       WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:1051 check_unmap+0x258/0x894()
       macb e000b000.ethernet: DMA-API: device driver frees DMA memory with different size [device address=0x000000002d170040] [map size=1536 bytes] [unmap size=60 bytes]
       Modules linked in:
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.0-rc3-xilinx-00220-g49f84081ce4f #65
       [<c001516c>] (unwind_backtrace) from [<c0011df8>] (show_stack+0x10/0x14)
       [<c0011df8>] (show_stack) from [<c03c775c>] (dump_stack+0x7c/0xc8)
       [<c03c775c>] (dump_stack) from [<c00245cc>] (warn_slowpath_common+0x60/0x84)
       [<c00245cc>] (warn_slowpath_common) from [<c0024670>] (warn_slowpath_fmt+0x2c/0x3c)
       [<c0024670>] (warn_slowpath_fmt) from [<c0227d44>] (check_unmap+0x258/0x894)
       [<c0227d44>] (check_unmap) from [<c0228588>] (debug_dma_unmap_page+0x64/0x70)
       [<c0228588>] (debug_dma_unmap_page) from [<c02ab78c>] (gem_rx+0x118/0x170)
       [<c02ab78c>] (gem_rx) from [<c02ac4d4>] (macb_poll+0x24/0x94)
       [<c02ac4d4>] (macb_poll) from [<c031222c>] (net_rx_action+0x6c/0x188)
       [<c031222c>] (net_rx_action) from [<c0028a28>] (__do_softirq+0x108/0x280)
       [<c0028a28>] (__do_softirq) from [<c0028e8c>] (irq_exit+0x84/0xf8)
       [<c0028e8c>] (irq_exit) from [<c000f360>] (handle_IRQ+0x68/0x8c)
       [<c000f360>] (handle_IRQ) from [<c0008528>] (gic_handle_irq+0x3c/0x60)
       [<c0008528>] (gic_handle_irq) from [<c0012904>] (__irq_svc+0x44/0x78)
       Exception stack(0xc056df20 to 0xc056df68)
       df20: 00000001 c0577430 00000000 c0577430 04ce8e0d 00000002 edfce238 00000000
       df40: 04e20f78 00000002 c05981f4 00000000 00000008 c056df68 c0064008 c02d7658
       df60: 20000013 ffffffff
       [<c0012904>] (__irq_svc) from [<c02d7658>] (cpuidle_enter_state+0x54/0xf8)
       [<c02d7658>] (cpuidle_enter_state) from [<c02d77dc>] (cpuidle_idle_call+0xe0/0x138)
       [<c02d77dc>] (cpuidle_idle_call) from [<c000f660>] (arch_cpu_idle+0x8/0x3c)
       [<c000f660>] (arch_cpu_idle) from [<c006bec4>] (cpu_startup_entry+0xbc/0x124)
       [<c006bec4>] (cpu_startup_entry) from [<c053daec>] (start_kernel+0x350/0x3b0)
       ---[ end trace d5fdc38641bd3a11 ]---
       Mapped at:
        [<c0227184>] debug_dma_map_page+0x48/0x11c
        [<c02ab32c>] gem_rx_refill+0x154/0x1f8
        [<c02ac7b4>] macb_open+0x270/0x3e0
        [<c03152e0>] __dev_open+0x7c/0xfc
        [<c031554c>] __dev_change_flags+0x8c/0x140
      
      Fixing this by passing the same size which is passed during mapping the
      memory to the unmap function as well.
      
      Signed-off-by: default avatarSoren Brinkmann <soren.brinkmann@xilinx.com>
      Reviewed-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48330e08
    • Soren Brinkmann's avatar
      net: macb: Check DMA mappings for error · 92030908
      Soren Brinkmann authored
      
      
      With CONFIG_DMA_API_DEBUG enabled the following warning is printed:
       WARNING: CPU: 0 PID: 619 at lib/dma-debug.c:1101 check_unmap+0x758/0x894()
       macb e000b000.ethernet: DMA-API: device driver failed to check map error[device address=0x000000002d171c02] [size=322 bytes] [mapped as single]
       Modules linked in:
       CPU: 0 PID: 619 Comm: udhcpc Not tainted 3.14.0-rc3-xilinx-00219-gd158fc7f36a2 #63
       [<c001516c>] (unwind_backtrace) from [<c0011df8>] (show_stack+0x10/0x14)
       [<c0011df8>] (show_stack) from [<c03c7714>] (dump_stack+0x7c/0xc8)
       [<c03c7714>] (dump_stack) from [<c00245cc>] (warn_slowpath_common+0x60/0x84)
       [<c00245cc>] (warn_slowpath_common) from [<c0024670>] (warn_slowpath_fmt+0x2c/0x3c)
       [<c0024670>] (warn_slowpath_fmt) from [<c0228244>] (check_unmap+0x758/0x894)
       [<c0228244>] (check_unmap) from [<c0228588>] (debug_dma_unmap_page+0x64/0x70)
       [<c0228588>] (debug_dma_unmap_page) from [<c02aba64>] (macb_interrupt+0x1f8/0x2dc)
       [<c02aba64>] (macb_interrupt) from [<c006c6e4>] (handle_irq_event_percpu+0x2c/0x178)
       [<c006c6e4>] (handle_irq_event_percpu) from [<c006c86c>] (handle_irq_event+0x3c/0x5c)
       [<c006c86c>] (handle_irq_event) from [<c006f548>] (handle_fasteoi_irq+0xb8/0x100)
       [<c006f548>] (handle_fasteoi_irq) from [<c006c148>] (generic_handle_irq+0x20/0x30)
       [<c006c148>] (generic_handle_irq) from [<c000f35c>] (handle_IRQ+0x64/0x8c)
       [<c000f35c>] (handle_IRQ) from [<c0008528>] (gic_handle_irq+0x3c/0x60)
       [<c0008528>] (gic_handle_irq) from [<c0012904>] (__irq_svc+0x44/0x78)
       Exception stack(0xed197f60 to 0xed197fa8)
       7f60: 00000134 60000013 bd94362e bd94362e be96b37c 00000014 fffffd72 00000122
       7f80: c000ebe4 ed196000 00000000 00000011 c032c0d8 ed197fa8 c0064008 c000ea20
       7fa0: 60000013 ffffffff
       [<c0012904>] (__irq_svc) from [<c000ea20>] (ret_fast_syscall+0x0/0x48)
       ---[ end trace 478f921d0d542d1e ]---
       Mapped at:
        [<c0227184>] debug_dma_map_page+0x48/0x11c
        [<c02aaca0>] macb_start_xmit+0x184/0x2a8
        [<c03143c0>] dev_hard_start_xmit+0x334/0x470
        [<c032c09c>] sch_direct_xmit+0x78/0x2f8
        [<c0314814>] __dev_queue_xmit+0x318/0x708
      
      due to missing checks of the dma mapping. Add the appropriate checks to fix
      this.
      
      Signed-off-by: default avatarSoren Brinkmann <soren.brinkmann@xilinx.com>
      Reviewed-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92030908
    • Daniel Borkmann's avatar
      net: sctp: fix skb leakage in COOKIE ECHO path of chunk->auth_chunk · c485658b
      Daniel Borkmann authored
      While working on ec0223ec ("net: sctp: fix sctp_sf_do_5_1D_ce to
      verify if we/peer is AUTH capable"), we noticed that there's a skb
      memory leakage in the error path.
      
      Running the same reproducer as in ec0223ec and by unconditionally
      jumping to the error label (to simulate an error condition) in
      sctp_sf_do_5_1D_ce() receive path lets kmemleak detector bark about
      the unfreed chunk->auth_chunk skb clone:
      
      Unreferenced object 0xffff8800b8f3a000 (size 256):
        comm "softirq", pid 0, jiffies 4294769856 (age 110.757s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          89 ab 75 5e d4 01 58 13 00 00 00 00 00 00 00 00  ..u^..X.........
        backtrace:
          [<ffffffff816660be>] kmemleak_alloc+0x4e/0xb0
          [<ffffffff8119f328>] kmem_cache_alloc+0xc8/0x210
          [<ffffffff81566929>] skb_clone+0x49/0xb0
          [<ffffffffa0467459>] sctp_endpoint_bh_rcv+0x1d9/0x230 [sctp]
          [<ffffffffa046fdbc>] sctp_inq_push+0x4c/0x70 [sctp]
          [<ffffffffa047e8de>] sctp_rcv+0x82e/0x9a0 [sctp]
          [<ffffffff815abd38>] ip_local_deliver_finish+0xa8/0x210
          [<ffffffff815a64af>] nf_reinject+0xbf/0x180
          [<ffffffffa04b4762>] nfqnl_recv_verdict+0x1d2/0x2b0 [nfnetlink_queue]
          [<ffffffffa04aa40b>] nfnetlink_rcv_msg+0x14b/0x250 [nfnetlink]
          [<ffffffff815a3269>] netlink_rcv_skb+0xa9/0xc0
          [<ffffffffa04aa7cf>] nfnetlink_rcv+0x23f/0x408 [nfnetlink]
          [<ffffffff815a2bd8>] netlink_unicast+0x168/0x250
          [<ffffffff815a2fa1>] netlink_sendmsg+0x2e1/0x3f0
          [<ffffffff8155cc6b>] sock_sendmsg+0x8b/0xc0
          [<ffffffff8155d449>] ___sys_sendmsg+0x369/0x380
      
      What happens is that commit bbd0d598 clones the skb containing
      the AUTH chunk in sctp_endpoint_bh_rcv() when having the edge case
      that an endpoint requires COOKIE-ECHO chunks to be authenticated:
      
        ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
        <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
        ------------------ AUTH; COOKIE-ECHO ---------------->
        <-------------------- COOKIE-ACK ---------------------
      
      When we enter sctp_sf_do_5_1D_ce() and before we actually get to
      the point where we process (and subsequently free) a non-NULL
      chunk->auth_chunk, we could hit the "goto nomem_init" path from
      an error condition and thus leave the cloned skb around w/o
      freeing it.
      
      The fix is to centrally free such clones in sctp_chunk_destroy()
      handler that is invoked from sctp_chunk_free() after all refs have
      dropped; and also move both kfree_skb(chunk->auth_chunk) there,
      so that chunk->auth_chunk is either NULL (since sctp_chunkify()
      allocs new chunks through kmem_cache_zalloc()) or non-NULL with
      a valid skb pointer. chunk->skb and chunk->auth_chunk are the
      only skbs in the sctp_chunk structure that need to be handeled.
      
      While at it, we should use consume_skb() for both. It is the same
      as dev_kfree_skb() but more appropriately named as we are not
      a device but a protocol. Also, this effectively replaces the
      kfree_skb() from both invocations into consume_skb(). Functions
      are the same only that kfree_skb() assumes that the frame was
      being dropped after a failure (e.g. for tools like drop monitor),
      usage of consume_skb() seems more appropriate in function
      sctp_chunk_destroy() though.
      
      Fixes: bbd0d598
      
       ("[SCTP]: Implement the receive and verification of AUTH chunk")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Vlad Yasevich <yasevich@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c485658b
    • hayeswang's avatar
      r8152: disable the ECM mode · 10c32717
      hayeswang authored
      
      
      There are known issues for switching the drivers between ECM mode and
      vendor mode. The interrup transfer may become abnormal. The hardware
      may have the opportunity to die if you change the configuration without
      unloading the current driver first, because all the control transfers
      of the current driver would fail after the command of switching the
      configuration.
      
      Although to use the ecm driver and vendor driver independently is fine,
      it may have problems to change the driver from one to the other by
      switching the configuration. Additionally, now the vendor mode driver
      is more powerful than the ECM driver. Thus, disable the ECM mode driver,
      and let r8152 to set the configuration to vendor mode and reset the
      device automatically.
      
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10c32717
    • Gavin Shan's avatar
      net/mlx4: Support shutdown() interface · 367d56f7
      Gavin Shan authored
      
      
      In kexec scenario, we failed to load the mlx4 driver in the
      second kernel because the ownership bit was hold by the first
      kernel without release correctly.
      
      The patch adds shutdown() interface so that the ownership can
      be released correctly in the first kernel. It also helps avoiding
      EEH error happened during boot stage of the second kernel because
      of undesired traffic, which can't be handled by hardware during
      that stage on Power platform.
      
      Signed-off-by: default avatarGavin Shan <shangw@linux.vnet.ibm.com>
      Tested-by: default avatarWei Yang <weiyang@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      367d56f7
    • Linus Lüssing's avatar
      bridge: multicast: add sanity check for query source addresses · 6565b9ee
      Linus Lüssing authored
      
      
      MLD queries are supposed to have an IPv6 link-local source address
      according to RFC2710, section 4 and RFC3810, section 5.1.14. This patch
      adds a sanity check to ignore such broken MLD queries.
      
      Without this check, such malformed MLD queries can result in a
      denial of service: The queries are ignored by any MLD listener
      therefore they will not respond with an MLD report. However,
      without this patch these malformed MLD queries would enable the
      snooping part in the bridge code, potentially shutting down the
      according ports towards these hosts for multicast traffic as the
      bridge did not learn about these listeners.
      
      Reported-by: default avatarJan Stancek <jstancek@redhat.com>
      Signed-off-by: default avatarLinus Lüssing <linus.luessing@web.de>
      Reviewed-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6565b9ee
    • Nikolay Aleksandrov's avatar
      net: fix for a race condition in the inet frag code · 24b9bf43
      Nikolay Aleksandrov authored
      I stumbled upon this very serious bug while hunting for another one,
      it's a very subtle race condition between inet_frag_evictor,
      inet_frag_intern and the IPv4/6 frag_queue and expire functions
      (basically the users of inet_frag_kill/inet_frag_put).
      
      What happens is that after a fragment has been added to the hash chain
      but before it's been added to the lru_list (inet_frag_lru_add) in
      inet_frag_intern, it may get deleted (either by an expired timer if
      the system load is high or the timer sufficiently low, or by the
      fraq_queue function for different reasons) before it's added to the
      lru_list, then after it gets added it's a matter of time for the
      evictor to get to a piece of memory which has been freed leading to a
      number of different bugs depending on what's left there.
      
      I've been able to trigger this on both IPv4 and IPv6 (which is normal
      as the frag code is the same), but it's been much more difficult to
      trigger on IPv4 due to the protocol differences about how fragments
      are treated.
      
      The setup I used to reproduce this is: 2 machines with 4 x 10G bonded
      in a RR bond, so the same flow can be seen on multiple cards at the
      same time. Then I used multiple instances of ping/ping6 to generate
      fragmented packets and flood the machines with them while running
      other processes to load the attacked machine.
      
      *It is very important to have the _same flow_ coming in on multiple CPUs
      concurrently. Usually the attacked machine would die in less than 30
      minutes, if configured properly to have many evictor calls and timeouts
      it could happen in 10 minutes or so.
      
      An important point to make is that any caller (frag_queue or timer) of
      inet_frag_kill will remove both the timer refcount and the
      original/guarding refcount thus removing everything that's keeping the
      frag from being freed at the next inet_frag_put.  All of this could
      happen before the frag was ever added to the LRU list, then it gets
      added and the evictor uses a freed fragment.
      
      An example for IPv6 would be if a fragment is being added and is at
      the stage of being inserted in the hash after the hash lock is
      released, but before inet_frag_lru_add executes (or is able to obtain
      the lru lock) another overlapping fragment for the same flow arrives
      at a different CPU which finds it in the hash, but since it's
      overlapping it drops it invoking inet_frag_kill and thus removing all
      guarding refcounts, and afterwards freeing it by invoking
      inet_frag_put which removes the last refcount added previously by
      inet_frag_find, then inet_frag_lru_add gets executed by
      inet_frag_intern and we have a freed fragment in the lru_list.
      
      The fix is simple, just move the lru_add under the hash chain locked
      region so when a removing function is called it'll have to wait for
      the fragment to be added to the lru_list, and then it'll remove it (it
      works because the hash chain removal is done before the lru_list one
      and there's no window between the two list adds when the frag can get
      dropped). With this fix applied I couldn't kill the same machine in 24
      hours with the same setup.
      
      Fixes: 3ef0eb0d
      
       ("net: frag, move LRU list maintenance outside of
      rwlock")
      
      CC: Florian Westphal <fw@strlen.de>
      CC: Jesper Dangaard Brouer <brouer@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      24b9bf43
  7. Mar 05, 2014