Skip to content
  1. May 26, 2017
    • Daniel Borkmann's avatar
      bpf: fix incorrect pruning decision when alignment must be tracked · 1ad2f583
      Daniel Borkmann authored
      Currently, when we enforce alignment tracking on direct packet access,
      the verifier lets the following program pass despite doing a packet
      write with unaligned access:
      
        0: (61) r2 = *(u32 *)(r1 +76)
        1: (61) r3 = *(u32 *)(r1 +80)
        2: (61) r7 = *(u32 *)(r1 +8)
        3: (bf) r0 = r2
        4: (07) r0 += 14
        5: (25) if r7 > 0x1 goto pc+4
         R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
         R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
        6: (2d) if r0 > r3 goto pc+1
         R0=pkt(id=0,off=14,r=14) R1=ctx R2=pkt(id=0,off=0,r=14)
         R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
        7: (63) *(u32 *)(r0 -4) = r0
        8: (b7) r0 = 0
        9: (95) exit
      
        from 6 to 8:
         R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
         R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
        8: (b7) r0 = 0
        9: (95) exit
      
        from 5 to 10:
         R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
         R3=pkt_end R7=inv,min_value=2 R10=fp
        10: (07) r0 += 1
        11: (05) goto pc-6
        6: safe                           <----- here, wrongly found safe
        processed 15 insns
      
      However, if we enforce a pruning mismatch by adding state into r8
      which is then being mismatched in states_equal(), we find that for
      the otherwise same program, the verifier detects a misaligned packet
      access when actually walking that path:
      
        0: (61) r2 = *(u32 *)(r1 +76)
        1: (61) r3 = *(u32 *)(r1 +80)
        2: (61) r7 = *(u32 *)(r1 +8)
        3: (b7) r8 = 1
        4: (bf) r0 = r2
        5: (07) r0 += 14
        6: (25) if r7 > 0x1 goto pc+4
         R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
         R3=pkt_end R7=inv,min_value=0,max_value=1
         R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
        7: (2d) if r0 > r3 goto pc+1
         R0=pkt(id=0,off=14,r=14) R1=ctx R2=pkt(id=0,off=0,r=14)
         R3=pkt_end R7=inv,min_value=0,max_value=1
         R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
        8: (63) *(u32 *)(r0 -4) = r0
        9: (b7) r0 = 0
        10: (95) exit
      
        from 7 to 9:
         R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
         R3=pkt_end R7=inv,min_value=0,max_value=1
         R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
        9: (b7) r0 = 0
        10: (95) exit
      
        from 6 to 11:
         R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
         R3=pkt_end R7=inv,min_value=2
         R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
        11: (07) r0 += 1
        12: (b7) r8 = 0
        13: (05) goto pc-7                <----- mismatch due to r8
        7: (2d) if r0 > r3 goto pc+1
         R0=pkt(id=0,off=15,r=15) R1=ctx R2=pkt(id=0,off=0,r=15)
         R3=pkt_end R7=inv,min_value=2
         R8=imm0,min_value=0,max_value=0,min_align=2147483648 R10=fp
        8: (63) *(u32 *)(r0 -4) = r0
        misaligned packet access off 2+15+-4 size 4
      
      The reason why we fail to see it in states_equal() is that the
      third test in compare_ptrs_to_packet() ...
      
        if (old->off <= cur->off &&
            old->off >= old->range && cur->off >= cur->range)
                return true;
      
      ... will let the above pass. The situation we run into is that
      old->off <= cur->off (14 <= 15), meaning that prior walked paths
      went with smaller offset, which was later used in the packet
      access after successful packet range check and found to be safe
      already.
      
      For example: Given is R0=pkt(id=0,off=0,r=0). Adding offset 14
      as in above program to it, results in R0=pkt(id=0,off=14,r=0)
      before the packet range test. Now, testing this against R3=pkt_end
      with 'if r0 > r3 goto out' will transform R0 into R0=pkt(id=0,off=14,r=14)
      for the case when we're within bounds. A write into the packet
      at offset *(u32 *)(r0 -4), that is, 2 + 14 -4, is valid and
      aligned (2 is for NET_IP_ALIGN). After processing this with
      all fall-through paths, we later on check paths from branches.
      When the above skb->mark test is true, then we jump near the
      end of the program, perform r0 += 1, and jump back to the
      'if r0 > r3 goto out' test we've visited earlier already. This
      time, R0 is of type R0=pkt(id=0,off=15,r=0), and we'll prune
      that part because this time we'll have a larger safe packet
      range, and we already found that with off=14 all further insn
      were already safe, so it's safe as well with a larger off.
      However, the problem is that the subsequent write into the packet
      with 2 + 15 -4 is then unaligned, and not caught by the alignment
      tracking. Note that min_align, aux_off, and aux_off_align were
      all 0 in this example.
      
      Since we cannot tell at this time what kind of packet access was
      performed in the prior walk and what minimal requirements it has
      (we might do so in the future, but that requires more complexity),
      fix it to disable this pruning case for strict alignment for now,
      and let the verifier do check such paths instead. With that applied,
      the test cases pass and reject the program due to misalignment.
      
      Fixes: d1174416
      
       ("bpf: Track alignment of register values in the verifier.")
      Reference: http://patchwork.ozlabs.org/patch/761909/
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ad2f583
    • Ihar Hrachyshka's avatar
      arp: fixed -Wuninitialized compiler warning · 5990baaa
      Ihar Hrachyshka authored
      Commit 7d472a59 ("arp: always override
      existing neigh entries with gratuitous ARP") introduced a compiler
      warning:
      
      net/ipv4/arp.c:880:35: warning: 'addr_type' may be used uninitialized in
      this function [-Wmaybe-uninitialized]
      
      While the code logic seems to be correct and doesn't allow the variable
      to be used uninitialized, and the warning is not consistently
      reproducible, it's still worth fixing it for other people not to waste
      time looking at the warning in case it pops up in the build environment.
      Yes, compiler is probably at fault, but we will need to accommodate.
      
      Fixes: 7d472a59
      
       ("arp: always override existing neigh entries with gratuitous ARP")
      Signed-off-by: default avatarIhar Hrachyshka <ihrachys@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5990baaa
    • Wei Wang's avatar
      tcp: avoid fastopen API to be used on AF_UNSPEC · ba615f67
      Wei Wang authored
      Fastopen API should be used to perform fastopen operations on the TCP
      socket. It does not make sense to use fastopen API to perform disconnect
      by calling it with AF_UNSPEC. The fastopen data path is also prone to
      race conditions and bugs when using with AF_UNSPEC.
      
      One issue reported and analyzed by Vegard Nossum is as follows:
      +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      Thread A:                            Thread B:
      ------------------------------------------------------------------------
      sendto()
       - tcp_sendmsg()
           - sk_stream_memory_free() = 0
               - goto wait_for_sndbuf
      	     - sk_stream_wait_memory()
      	        - sk_wait_event() // sleep
                |                          sendto(flags=MSG_FASTOPEN, dest_addr=AF_UNSPEC)
      	  |                           - tcp_sendmsg()
      	  |                              - tcp_sendmsg_fastopen()
      	  |                                 - __inet_stream_connect()
      	  |                                    - tcp_disconnect() //because of AF_UNSPEC
      	  |                                       - tcp_transmit_skb()// send RST
      	  |                                    - return 0; // no reconnect!
      	  |                           - sk_stream_wait_connect()
      	  |                                 - sock_error()
      	  |                                    - xchg(&sk->sk_err, 0)
      	  |                                    - return -ECONNRESET
      	- ... // wake up, see sk->sk_err == 0
          - skb_entail() on TCP_CLOSE socket
      
      If the connection is reopened then we will send a brand new SYN packet
      after thread A has already queued a buffer. At this point I think the
      socket internal state (sequence numbers etc.) becomes messed up.
      
      When the new connection is closed, the FIN-ACK is rejected because the
      sequence number is outside the window. The other side tries to
      retransmit,
      but __tcp_retransmit_skb() calls tcp_trim_head() on an empty skb which
      corrupts the skb data length and hits a BUG() in copy_and_csum_bits().
      +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      
      Hence, this patch adds a check for AF_UNSPEC in the fastopen data path
      and return EOPNOTSUPP to user if such case happens.
      
      Fixes: cf60af03
      
       ("tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba615f67
    • Roman Kapl's avatar
      net: move somaxconn init from sysctl code · 7c3f1875
      Roman Kapl authored
      The default value for somaxconn is set in sysctl_core_net_init(), but this
      function is not called when kernel is configured without CONFIG_SYSCTL.
      
      This results in the kernel not being able to accept TCP connections,
      because the backlog has zero size. Usually, the user ends up with:
      "TCP: request_sock_TCP: Possible SYN flooding on port 7. Dropping request.  Check SNMP counters."
      If SYN cookies are not enabled the connection is rejected.
      
      Before ef547f2a
      
       (tcp: remove max_qlen_log), the effects were less
      severe, because the backlog was always at least eight slots long.
      
      Signed-off-by: default avatarRoman Kapl <roman.kapl@sysgo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c3f1875
    • Gustavo A. R. Silva's avatar
      net: fix potential null pointer dereference · 65d786c2
      Gustavo A. R. Silva authored
      
      
      Add null check to avoid a potential null pointer dereference.
      
      Addresses-Coverity-ID: 1408831
      Signed-off-by: default avatarGustavo A. R. Silva <garsilva@embeddedor.com>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65d786c2
    • Eric Garver's avatar
      geneve: fix fill_info when using collect_metadata · 11387fe4
      Eric Garver authored
      Since 9b4437a5 ("geneve: Unify LWT and netdev handling.") fill_info
      does not return UDP_ZERO_CSUM6_RX when using COLLECT_METADATA. This is
      because it uses ip_tunnel_info_af() with the device level info, which is
      not valid for COLLECT_METADATA.
      
      Fix by checking for the presence of the actual sockets.
      
      Fixes: 9b4437a5
      
       ("geneve: Unify LWT and netdev handling.")
      Signed-off-by: default avatarEric Garver <e@erig.me>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11387fe4
  2. May 25, 2017
    • David S. Miller's avatar
      Merge branch 'q-in-q-checksums' · e6a88e4b
      David S. Miller authored
      
      
      Daniel Borkmann says:
      
      ====================
      BPF pruning follow-up
      
      Follow-up to fix incorrect pruning when alignment tracking is
      in use and to properly clear regs after call to not leave stale
      data behind. For details, please see individual patches.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6a88e4b
    • Vlad Yasevich's avatar
      virtio-net: enable TSO/checksum offloads for Q-in-Q vlans · 2836b4f2
      Vlad Yasevich authored
      
      
      Since virtio does not provide it's own ndo_features_check handler,
      TSO, and now checksum offload, are disabled for stacked vlans.
      Re-enable the support and let the host take care of it.  This
      restores/improves Guest-to-Guest performance over Q-in-Q vlans.
      
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2836b4f2
    • Vlad Yasevich's avatar
      be2net: Fix offload features for Q-in-Q packets · cc6e9de6
      Vlad Yasevich authored
      
      
      At least some of the be2net cards do not seem to be capabled
      of performing checksum offload computions on Q-in-Q packets.
      In these case, the recevied checksum on the remote is invalid
      and TCP syn packets are dropped.
      
      This patch adds a call to check disbled acceleration features
      on Q-in-Q tagged traffic.
      
      CC: Sathya Perla <sathya.perla@broadcom.com>
      CC: Ajit Khaparde <ajit.khaparde@broadcom.com>
      CC: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
      CC: Somnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc6e9de6
    • Vlad Yasevich's avatar
      vlan: Fix tcp checksum offloads in Q-in-Q vlans · 35d2f80b
      Vlad Yasevich authored
      It appears that TCP checksum offloading has been broken for
      Q-in-Q vlans.  The behavior was execerbated by the
      series
          commit afb0bc97
      
       ("Merge branch 'stacked_vlan_tso'")
      that that enabled accleleration features on stacked vlans.
      
      However, event without that series, it is possible to trigger
      this issue.  It just requires a lot more specialized configuration.
      
      The root cause is the interaction between how
      netdev_intersect_features() works, the features actually set on
      the vlan devices and HW having the ability to run checksum with
      longer headers.
      
      The issue starts when netdev_interesect_features() replaces
      NETIF_F_HW_CSUM with a combination of NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM,
      if the HW advertises IP|IPV6 specific checksums.  This happens
      for tagged and multi-tagged packets.   However, HW that enables
      IP|IPV6 checksum offloading doesn't gurantee that packets with
      arbitrarily long headers can be checksummed.
      
      This patch disables IP|IPV6 checksums on the packet for multi-tagged
      packets.
      
      CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      CC: Michal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35d2f80b
    • Andrew Lunn's avatar
      net: phy: marvell: Limit errata to 88m1101 · f2899788
      Andrew Lunn authored
      The 88m1101 has an errata when configuring autoneg. However, it was
      being applied to many other Marvell PHYs as well. Limit its scope to
      just the 88m1101.
      
      Fixes: 76884679
      
       ("phylib: Add support for Marvell 88e1111S and 88e1145")
      Reported-by: default avatarDaniel Walker <danielwa@cisco.com>
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Acked-by: default avatarHarini Katakam <harinik@xilinx.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2899788
    • Randy Dunlap's avatar
      net/phy: fix mdio-octeon dependency and build · cd47512e
      Randy Dunlap authored
      
      
      Fix build errors by making this driver depend on OF_MDIO, like
      several other similar drivers do.
      
      drivers/built-in.o: In function `octeon_mdiobus_remove':
      mdio-octeon.c:(.text+0x196ee0): undefined reference to `mdiobus_unregister'
      mdio-octeon.c:(.text+0x196ee8): undefined reference to `mdiobus_free'
      drivers/built-in.o: In function `octeon_mdiobus_probe':
      mdio-octeon.c:(.text+0x196f1d): undefined reference to `devm_mdiobus_alloc_size'
      mdio-octeon.c:(.text+0x196ffe): undefined reference to `of_mdiobus_register'
      mdio-octeon.c:(.text+0x197010): undefined reference to `mdiobus_free'
      
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc:	Andrew Lunn <andrew@lunn.ch>
      Cc:	Florian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd47512e
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2017-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 3f6b123b
      David S. Miller authored
      
      
      Saeed Mahameed says:
      
      ====================
      mlx5-fixes-2017-05-23
      
      Some TC offloads fixes from Or Gerlitz.
      From Erez, mlx5 IPoIB RX fix to improve GRO.
      From Mohamad, Command interface fix to improve mitigation against FW
      commands timeouts.
      From Tariq, Driver load Tolerance against affinity settings failures.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f6b123b
    • David S. Miller's avatar
      Merge tag 'mac80211-for-davem-2017-05-23' of... · 029c5817
      David S. Miller authored
      
      Merge tag 'mac80211-for-davem-2017-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      Just two fixes this time:
       * fix the scheduled scan "BUG: scheduling while atomic"
       * check mesh address extension flags more strictly
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      029c5817
    • Alexander Potapenko's avatar
      net: rtnetlink: bail out from rtnl_fdb_dump() on parse error · 0ff50e83
      Alexander Potapenko authored
      
      
      rtnl_fdb_dump() failed to check the result of nlmsg_parse(), which led
      to contents of |ifm| being uninitialized because nlh->nlmsglen was too
      small to accommodate |ifm|. The uninitialized data may affect some
      branches and result in unwanted effects, although kernel data doesn't
      seem to leak to the userspace directly.
      
      The bug has been detected with KMSAN and syzkaller.
      
      For the record, here is the KMSAN report:
      
      ==================================================================
      BUG: KMSAN: use of unitialized memory in rtnl_fdb_dump+0x5dc/0x1000
      CPU: 0 PID: 1039 Comm: probe Not tainted 4.11.0-rc5+ #2727
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16
       dump_stack+0x143/0x1b0 lib/dump_stack.c:52
       kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:1007
       __kmsan_warning_32+0x66/0xb0 mm/kmsan/kmsan_instr.c:491
       rtnl_fdb_dump+0x5dc/0x1000 net/core/rtnetlink.c:3230
       netlink_dump+0x84f/0x1190 net/netlink/af_netlink.c:2168
       __netlink_dump_start+0xc97/0xe50 net/netlink/af_netlink.c:2258
       netlink_dump_start ./include/linux/netlink.h:165
       rtnetlink_rcv_msg+0xae9/0xb40 net/core/rtnetlink.c:4094
       netlink_rcv_skb+0x339/0x5a0 net/netlink/af_netlink.c:2339
       rtnetlink_rcv+0x83/0xa0 net/core/rtnetlink.c:4110
       netlink_unicast_kernel net/netlink/af_netlink.c:1272
       netlink_unicast+0x13b7/0x1480 net/netlink/af_netlink.c:1298
       netlink_sendmsg+0x10b8/0x10f0 net/netlink/af_netlink.c:1844
       sock_sendmsg_nosec net/socket.c:633
       sock_sendmsg net/socket.c:643
       ___sys_sendmsg+0xd4b/0x10f0 net/socket.c:1997
       __sys_sendmsg net/socket.c:2031
       SYSC_sendmsg+0x2c6/0x3f0 net/socket.c:2042
       SyS_sendmsg+0x87/0xb0 net/socket.c:2038
       do_syscall_64+0x102/0x150 arch/x86/entry/common.c:285
       entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
      RIP: 0033:0x401300
      RSP: 002b:00007ffc3b0e6d58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000401300
      RDX: 0000000000000000 RSI: 00007ffc3b0e6d80 RDI: 0000000000000003
      RBP: 00007ffc3b0e6e00 R08: 000000000000000b R09: 0000000000000004
      R10: 000000000000000d R11: 0000000000000246 R12: 0000000000000000
      R13: 00000000004065a0 R14: 0000000000406630 R15: 0000000000000000
      origin: 000000008fe00056
       save_stack_trace+0x59/0x60 arch/x86/kernel/stacktrace.c:59
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:352
       kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:247
       kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:260
       slab_alloc_node mm/slub.c:2743
       __kmalloc_node_track_caller+0x1f4/0x390 mm/slub.c:4349
       __kmalloc_reserve net/core/skbuff.c:138
       __alloc_skb+0x2cd/0x740 net/core/skbuff.c:231
       alloc_skb ./include/linux/skbuff.h:933
       netlink_alloc_large_skb net/netlink/af_netlink.c:1144
       netlink_sendmsg+0x934/0x10f0 net/netlink/af_netlink.c:1819
       sock_sendmsg_nosec net/socket.c:633
       sock_sendmsg net/socket.c:643
       ___sys_sendmsg+0xd4b/0x10f0 net/socket.c:1997
       __sys_sendmsg net/socket.c:2031
       SYSC_sendmsg+0x2c6/0x3f0 net/socket.c:2042
       SyS_sendmsg+0x87/0xb0 net/socket.c:2038
       do_syscall_64+0x102/0x150 arch/x86/entry/common.c:285
       return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246
      ==================================================================
      
      and the reproducer:
      
      ==================================================================
        #include <sys/socket.h>
        #include <net/if_arp.h>
        #include <linux/netlink.h>
        #include <stdint.h>
      
        int main()
        {
          int sock = socket(PF_NETLINK, SOCK_DGRAM | SOCK_NONBLOCK, 0);
          struct msghdr msg;
          memset(&msg, 0, sizeof(msg));
          char nlmsg_buf[32];
          memset(nlmsg_buf, 0, sizeof(nlmsg_buf));
          struct nlmsghdr *nlmsg = nlmsg_buf;
          nlmsg->nlmsg_len = 0x11;
          nlmsg->nlmsg_type = 0x1e; // RTM_NEWROUTE = RTM_BASE + 0x0e
          // type = 0x0e = 1110b
          // kind = 2
          nlmsg->nlmsg_flags = 0x101; // NLM_F_ROOT | NLM_F_REQUEST
          nlmsg->nlmsg_seq = 0;
          nlmsg->nlmsg_pid = 0;
          nlmsg_buf[16] = (char)7;
          struct iovec iov;
          iov.iov_base = nlmsg_buf;
          iov.iov_len = 17;
          msg.msg_iov = &iov;
          msg.msg_iovlen = 1;
          sendmsg(sock, &msg, 0);
          return 0;
        }
      ==================================================================
      
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ff50e83
    • Quentin Schulz's avatar
      net: fec: add post PHY reset delay DT property · 159a0760
      Quentin Schulz authored
      
      
      Some PHY require to wait for a bit after the reset GPIO has been
      toggled. This adds support for the DT property `phy-reset-post-delay`
      which gives the delay in milliseconds to wait after reset.
      
      If the DT property is not given, no delay is observed. Post reset delay
      greater than 1000ms are invalid.
      
      Signed-off-by: default avatarQuentin Schulz <quentin.schulz@free-electrons.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Acked-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      159a0760
    • David S. Miller's avatar
      Merge branch 'sctp-dupcookie-fixes' · 11d3c949
      David S. Miller authored
      
      
      Xin Long says:
      
      ====================
      sctp: a bunch of fixes for processing dupcookie
      
      After introducing transport hashtable and per stream info into sctp,
      some regressions were caused when processing dupcookie, this patchset
      is to fix them.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11d3c949
    • Xin Long's avatar
      sctp: set new_asoc temp when processing dupcookie · 7e062977
      Xin Long authored
      
      
      After sctp changed to use transport hashtable, a transport would be
      added into global hashtable when adding the peer to an asoc, then
      the asoc can be got by searching the transport in the hashtbale.
      
      The problem is when processing dupcookie in sctp_sf_do_5_2_4_dupcook,
      a new asoc would be created. A peer with the same addr and port as
      the one in the old asoc might be added into the new asoc, but fail
      to be added into the hashtable, as they also belong to the same sk.
      
      It causes that sctp's dupcookie processing can not really work.
      
      Since the new asoc will be freed after copying it's information to
      the old asoc, it's more like a temp asoc. So this patch is to fix
      it by setting it as a temp asoc to avoid adding it's any transport
      into the hashtable and also avoid allocing assoc_id.
      
      An extra thing it has to do is to also alloc stream info for any
      temp asoc, as sctp dupcookie process needs it to update old asoc.
      But I don't think it would hurt something, as a temp asoc would
      always be freed after finishing processing cookie echo packet.
      
      Reported-by: default avatarJianwen Ji <jiji@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e062977
    • Xin Long's avatar
      sctp: fix stream update when processing dupcookie · 3ab21379
      Xin Long authored
      Since commit 3dbcc105 ("sctp: alloc stream info when initializing
      asoc"), stream and stream.out info are always alloced when creating
      an asoc.
      
      So it's not correct to check !asoc->stream before updating stream
      info when processing dupcookie, but would be better to check asoc
      state instead.
      
      Fixes: 3dbcc105
      
       ("sctp: alloc stream info when initializing asoc")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ab21379
  3. May 23, 2017
    • Jesper Dangaard Brouer's avatar
      mlx5: fix bug reading rss_hash_type from CQE · 12e8b570
      Jesper Dangaard Brouer authored
      
      
      Masks for extracting part of the Completion Queue Entry (CQE)
      field rss_hash_type was swapped, namely CQE_RSS_HTYPE_IP and
      CQE_RSS_HTYPE_L4.
      
      The bug resulted in setting skb->l4_hash, even-though the
      rss_hash_type indicated that hash was NOT computed over the
      L4 (UDP or TCP) part of the packet.
      
      Added comments from the datasheet, to make it more clear what
      these masks are selecting.
      
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      12e8b570
    • Oliver Neukum's avatar
      cdc-ether: divorce initialisation with a filter reset and a generic method · 7f65b1f5
      Oliver Neukum authored
      
      
      Some devices need their multicast filter reset but others are crashed by that.
      So the methods need to be separated.
      
      Signed-off-by: default avatarOliver Neukum <oneukum@suse.com>
      Reported-by: default avatar"Ridgway, Keith" <kridgway@harris.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f65b1f5
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 2f9bfd33
      David S. Miller authored
      
      
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2017-05-23
      
      1) Fix wrong header offset for esp4 udpencap packets.
      
      2) Fix a stack access out of bounds when creating a bundle
         with sub policies. From Sabrina Dubroca.
      
      3) Fix slab-out-of-bounds in pfkey due to an incorrect
         sadb_x_sec_len calculation.
      
      4) We checked the wrong feature flags when taking down
         an interface with IPsec offload enabled.
         Fix from Ilan Tayari.
      
      5) Copy the anti replay sequence numbers when doing a state
         migration, otherwise we get out of sync with the sequence
         numbers. Fix from Antony Antony.
      
      Please pull or let me know if there are problems.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f9bfd33
    • Tariq Toukan's avatar
      net/mlx5: Tolerate irq_set_affinity_hint() failures · b665d98e
      Tariq Toukan authored
      Add tolerance to failures of irq_set_affinity_hint().
      Its role is to give hints that optimizes performance,
      and should not block the driver load.
      
      In non-SMP systems, functionality is not available as
      there is a single core, and all these calls definitely
      fail.  Hence, do not call the function and avoid the
      warning prints.
      
      Fixes: db058a18
      
       ("net/mlx5_core: Set irq affinity hints")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Cc: kernel-team@fb.com
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      b665d98e
    • Mohamad Haj Yahia's avatar
      net/mlx5: Avoid using pending command interface slots · 73dd3a48
      Mohamad Haj Yahia authored
      Currently when firmware command gets stuck or it takes long time to
      complete, the driver command will get timeout and the command slot is
      freed and can be used for new commands, and if the firmware receive new
      command on the old busy slot its behavior is unexpected and this could
      be harmful.
      To fix this when the driver command gets timeout we return failure,
      but we don't free the command slot and we wait for the firmware to
      explicitly respond to that command.
      Once all the entries are busy we will stop processing new firmware
      commands.
      
      Fixes: 9cba4ebc
      
       ('net/mlx5: Fix potential deadlock in command mode change')
      Signed-off-by: default avatarMohamad Haj Yahia <mohamad@mellanox.com>
      Cc: kernel-team@fb.com
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      73dd3a48
    • Erez Shitrit's avatar
      net/mlx5e: IPoIB, handle RX packet correctly · b57fe691
      Erez Shitrit authored
      IPoIB packet contains the pseudo header area, we need to pull it prior
      to reset_mac_header in order to let the GRO work well.
      
      In more details:
      GRO checks the mac address of the new coming packet, it does that by
      comparing the hard_header_len size of the current packet to the previous
      one in that session, the comparison is over hard_header_len size.
      Now, the driver prepares that area in the skb by allocating area from the
      reserved part and resetting the correct mac header to it.
      
      Fixes: 9d6bd752
      
       ("net/mlx5e: IPoIB, RX handler")
      Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      b57fe691
    • Or Gerlitz's avatar
      net/mlx5e: Fix warnings around parsing of TC pedit actions · e3ca4e05
      Or Gerlitz authored
      The sparse tool emits these correct complaints:
      
      drivers/net/ethernet/mellanox/mlx5/core//en_tc.c:1005:25: warning: cast to restricted __be32
      drivers/net/ethernet/mellanox/mlx5/core//en_tc.c:1007:25: warning: cast to restricted __be16
      
      The value is provided from user-space in network order, but there's
      no way for them to realize that, avoid the warnings by casting to the
      appropriate type.
      
      Fixes: d79b6df6
      
       ('net/mlx5e: Add parsing of TC pedit actions to HW format')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reported-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: default avatarPaul Blakey <paulb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      e3ca4e05
    • Or Gerlitz's avatar
      net/mlx5e: Properly enforce disallowing of partial field re-write offload · d824bf3f
      Or Gerlitz authored
      Currently we don't support partial header re-writes through TC pedit
      action offloading. However, the code that enforces that wasn't err-ing
      on cases where the first and last bits of the mask are set but there is
      some zero bit between them, such as in the below example, fix that!
      
      tc filter add dev enp1s0 protocol ip parent ffff: prio 10 flower
      	ip_proto udp dst_port 2001 skip_sw
      	action pedit munge ip src set 1.0.0.1 retain 0xff0000ff
      
      Fixes: d79b6df6
      
       ('net/mlx5e: Add parsing of TC pedit actions to HW format')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarPaul Blakey <paulb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      d824bf3f
    • Or Gerlitz's avatar
      net/mlx5e: Allow TC csum offload if applied together with pedit action · 26c02749
      Or Gerlitz authored
      
      
      When offloading header re-writes, the HW re-calculates the relevant L3/L4
      checksums. Hence, when upper layers (as done by OVS) ask for TC checksum action
      offload together with pedit offload, don't err. This command now works:
      
      tc filter add dev ens1f0 protocol ip parent ffff: prio 20 flower skip_sw
      	ip_proto tcp dst_port 9001
      	action pedit ex munge tcp dport set 0x1234 pipe
              action csum tcp
      
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reported-by: default avatarPaul Blakey <paulb@mellanox.com>
      Reviewed-by: default avatarPaul Blakey <paulb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      26c02749
    • Or Gerlitz's avatar
      net/sched: act_csum: Add accessors for offloading drivers · 3aa42664
      Or Gerlitz authored
      
      
      Add the accessors for realizing if this is a csum action,
      and for which fields checksum is needed.
      
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarPaul Blakey <paulb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      3aa42664
    • Or Gerlitz's avatar
      net/mlx5e: Use the correct delete call on offloaded TC encap entry detach · cdc5a7f3
      Or Gerlitz authored
      We wrongly direcly invoke hlist_del_rcu() and not hash_del_rcu() which
      does a slightly different call now and may change later, fix that.
      
      Fixes: a54e20b4
      
       ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reported-by: default avatarPaul Blakey <paulb@mellanox.com>
      Reviewed-by: default avatarPaul Blakey <paulb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      cdc5a7f3
    • Arend Van Spriel's avatar
      cfg80211: make cfg80211_sched_scan_results() work from atomic context · 1b57b621
      Arend Van Spriel authored
      Drivers should be able to call cfg80211_sched_scan_results() from atomic
      context. However, with the introduction of multiple scheduled scan feature
      this requirement was not taken into account resulting in regression shown
      below.
      
      [  119.021594] BUG: scheduling while atomic: irq/47-iwlwifi/517/0x00000200
      [  119.021604] Modules linked in: [...]
      [  119.021759] CPU: 1 PID: 517 Comm: irq/47-iwlwifi Not tainted 4.12.0-rc2-t440s-20170522+ #1
      [  119.021763] Hardware name: LENOVO 20AQS03H00/20AQS03H00, BIOS GJET91WW (2.41 ) 09/21/2016
      [  119.021766] Call Trace:
      [  119.021778]  ? dump_stack+0x5c/0x84
      [  119.021784]  ? __schedule_bug+0x4c/0x70
      [  119.021792]  ? __schedule+0x496/0x5c0
      [  119.021798]  ? schedule+0x2d/0x80
      [  119.021804]  ? schedule_preempt_disabled+0x5/0x10
      [  119.021810]  ? __mutex_lock.isra.0+0x18e/0x4c0
      [  119.021817]  ? __wake_up+0x2f/0x50
      [  119.021833]  ? cfg80211_sched_scan_results+0x19/0x60 [cfg80211]
      [  119.021844]  ? cfg80211_sched_scan_results+0x19/0x60 [cfg80211]
      [  119.021859]  ? iwl_mvm_rx_lmac_scan_iter_complete_notif+0x17/0x30 [iwlmvm]
      [  119.021869]  ? iwl_pcie_rx_handle+0x2a9/0x7e0 [iwlwifi]
      [  119.021878]  ? iwl_pcie_irq_handler+0x17c/0x730 [iwlwifi]
      [  119.021884]  ? irq_forced_thread_fn+0x60/0x60
      [  119.021887]  ? irq_thread_fn+0x16/0x40
      [  119.021892]  ? irq_thread+0x109/0x180
      [  119.021896]  ? wake_threads_waitq+0x30/0x30
      [  119.021901]  ? kthread+0xf2/0x130
      [  119.021905]  ? irq_thread_dtor+0x90/0x90
      [  119.021910]  ? kthread_create_on_node+0x40/0x40
      [  119.021915]  ? ret_from_fork+0x26/0x40
      
      Fixes: b34939b9
      
       ("cfg80211: add request id to cfg80211_sched_scan_*() api")
      Reported-by: default avatarSander Eikelenboom <linux@eikelenboom.it>
      Signed-off-by: default avatarArend van Spriel <arend.vanspriel@broadcom.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      1b57b621
    • Linus Torvalds's avatar
      Merge tag 'pstore-v4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · fadd2ce5
      Linus Torvalds authored
      Pull pstore fix from Kees Cook:
       "Marta noticed another misbehavior in EFI pstore, which this fixes.
      
        Hopefully this is the last of the v4.12 fixes for pstore!"
      
      * tag 'pstore-v4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        efi-pstore: Fix write/erase id tracking
      fadd2ce5
    • Linus Torvalds's avatar
      Merge tag 'acpi-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 74a9e7db
      Linus Torvalds authored
      Pull ACPI fixes from Rafael Wysocki:
       "These revert a 4.11 change that turned out to be problematic and add a
        .gitignore file.
      
        Specifics:
      
         - Revert a 4.11 commit related to the ACPI-based handling of laptop
           lids that made changes incompatible with existing user space stacks
           and broke things there (Lv Zheng).
      
         - Add .gitignore to the ACPI tools directory (Prarit Bhargava)"
      
      * tag 'acpi-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        Revert "ACPI / button: Remove lid_init_state=method mode"
        tools/power/acpi: Add .gitignore file
      74a9e7db
    • Linus Torvalds's avatar
      Merge tag 'pm-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 801099be
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These fix RTC wakeup from suspend-to-idle broken recently, fix CPU
        idleness detection condition in the schedutil cpufreq governor, fix a
        cpufreq driver build failure, fix an error code path in the power
        capping framework, clean up the hibernate core and update the
        intel_pstate documentation.
      
        Specifics:
      
         - Fix RTC wakeup from suspend-to-idle broken by the recent rework of
           ACPI wakeup handling (Rafael Wysocki).
      
         - Update intel_pstate driver documentation to reflect the current
           code and explain how it works in more detail (Rafael Wysocki).
      
         - Fix an issue related to CPU idleness detection on systems with
           shared cpufreq policies in the schedutil governor (Juri Lelli).
      
         - Fix a possible build issue in the dbx500 cpufreq driver (Arnd
           Bergmann).
      
         - Fix a function in the power capping framework core to return an
           error code instead of 0 when there's an error (Dan Carpenter).
      
         - Clean up variable definition in the hibernation core (Pushkar
           Jambhlekar)"
      
      * tag 'pm-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: dbx500: add a Kconfig symbol
        PM / hibernate: Declare variables as static
        PowerCap: Fix an error code in powercap_register_zone()
        RTC: rtc-cmos: Fix wakeup from suspend-to-idle
        PM / wakeup: Fix up wakeup_source_report_event()
        cpufreq: intel_pstate: Document the current behavior and user interface
        cpufreq: schedutil: use now as reference when aggregating shared policy requests
      801099be
    • Jan Kiszka's avatar
      i2c: designware: Fix bogus sda_hold_time due to uninitialized vars · ad258fb9
      Jan Kiszka authored
      
      
      We need to initializes those variables to 0 for platforms that do not
      provide ACPI parameters. Otherwise, we set sda_hold_time to random
      values, breaking e.g. Galileo and IOT2000 boards.
      
      Reported-and-tested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: default avatarTobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
      Fixes: 9d640843
      
       ("i2c: designware: don't infer timings described by ACPI from clock rate")
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Reviewed-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: default avatarJarkko Nikula <jarkko.nikula@linux.intel.com>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ad258fb9
    • Kees Cook's avatar
      efi-pstore: Fix write/erase id tracking · c10e8031
      Kees Cook authored
      
      
      Prior to the pstore interface refactoring, the "id" generated during
      a backend pstore_write() was only retained by the internal pstore
      inode tracking list. Additionally the "part" was ignored, so EFI
      would encode this in the id. This corrects the misunderstandings
      and correctly sets "id" during pstore_write(), and uses "part"
      directly during pstore_erase().
      
      Reported-by: default avatarMarta Lofstedt <marta.lofstedt@intel.com>
      Fixes: 76cc9580 ("pstore: Replace arguments for write() API")
      Fixes: a61072aa
      
       ("pstore: Replace arguments for erase() API")
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Tested-by: default avatarMarta Lofstedt <marta.lofstedt@intel.com>
      c10e8031
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 86ca984c
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Mostly netfilter bug fixes in here, but we have some bits elsewhere as
        well.
      
         1) Don't do SNAT replies for non-NATed connections in IPVS, from
            Julian Anastasov.
      
         2) Don't delete conntrack helpers while they are still in use, from
            Liping Zhang.
      
         3) Fix zero padding in xtables's xt_data_to_user(), from Willem de
            Bruijn.
      
         4) Add proper RCU protection to nf_tables_dump_set() because we
            cannot guarantee that we hold the NFNL_SUBSYS_NFTABLES lock. From
            Liping Zhang.
      
         5) Initialize rcv_mss in tcp_disconnect(), from Wei Wang.
      
         6) smsc95xx devices can't handle IPV6 checksums fully, so don't
            advertise support for offloading them. From Nisar Sayed.
      
         7) Fix out-of-bounds access in __ip6_append_data(), from Eric
            Dumazet.
      
         8) Make atl2_probe() propagate the error code properly on failures,
            from Alexey Khoroshilov.
      
         9) arp_target[] in bond_check_params() is used uninitialized. This
            got changes from a global static to a local variable, which is how
            this mistake happened. Fix from Jarod Wilson.
      
        10) Fix fallout from unnecessary NULL check removal in cls_matchall,
            from Jiri Pirko. This is definitely brown paper bag territory..."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
        net: sched: cls_matchall: fix null pointer dereference
        vsock: use new wait API for vsock_stream_sendmsg()
        bonding: fix randomly populated arp target array
        net: Make IP alignment calulations clearer.
        bonding: fix accounting of active ports in 3ad
        net: atheros: atl2: don't return zero on failure path in atl2_probe()
        ipv6: fix out of bound writes in __ip6_append_data()
        bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
        smsc95xx: Support only IPv4 TCP/UDP csum offload
        arp: always override existing neigh entries with gratuitous ARP
        arp: postpone addr_type calculation to as late as possible
        arp: decompose is_garp logic into a separate function
        arp: fixed error in a comment
        tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0
        netfilter: xtables: fix build failure from COMPAT_XT_ALIGN outside CONFIG_COMPAT
        ebtables: arpreply: Add the standard target sanity check
        netfilter: nf_tables: revisit chain/object refcounting from elements
        netfilter: nf_tables: missing sanitization in data from userspace
        netfilter: nf_tables: can't assume lock is acquired when dumping set elems
        netfilter: synproxy: fix conntrackd interaction
        ...
      86ca984c
    • Jiri Pirko's avatar
      net: sched: cls_matchall: fix null pointer dereference · 2d76b2f8
      Jiri Pirko authored
      Since the head is guaranteed by the check above to be null, the call_rcu
      would explode. Remove the previously logically dead code that was made
      logically very much alive and kicking.
      
      Fixes: 985538ee
      
       ("net/sched: remove redundant null check on head")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d76b2f8
    • WANG Cong's avatar
      vsock: use new wait API for vsock_stream_sendmsg() · 499fde66
      WANG Cong authored
      As reported by Michal, vsock_stream_sendmsg() could still
      sleep at vsock_stream_has_space() after prepare_to_wait():
      
        vsock_stream_has_space
          vmci_transport_stream_has_space
            vmci_qpair_produce_free_space
              qp_lock
                qp_acquire_queue_mutex
                  mutex_lock
      
      Just switch to the new wait API like we did for commit
      d9dc8b0f
      
       ("net: fix sleeping for sk_wait_event()").
      
      Reported-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Jorgen Hansen <jhansen@vmware.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      499fde66
    • Jarod Wilson's avatar
      bonding: fix randomly populated arp target array · 72ccc471
      Jarod Wilson authored
      In commit dc9c4d0f, the arp_target array moved from a static global
      to a local variable. By the nature of static globals, the array used to
      be initialized to all 0. At present, it's full of random data, which
      that gets interpreted as arp_target values, when none have actually been
      specified. Systems end up booting with spew along these lines:
      
      [   32.161783] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
      [   32.168475] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
      [   32.175089] 8021q: adding VLAN 0 to HW filter on device lacp0
      [   32.193091] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
      [   32.204892] lacp0: Setting MII monitoring interval to 100
      [   32.211071] lacp0: Removing ARP target 216.124.228.17
      [   32.216824] lacp0: Removing ARP target 218.160.255.255
      [   32.222646] lacp0: Removing ARP target 185.170.136.184
      [   32.228496] lacp0: invalid ARP target 255.255.255.255 specified for removal
      [   32.236294] lacp0: option arp_ip_target: invalid value (-255.255.255.255)
      [   32.243987] lacp0: Removing ARP target 56.125.228.17
      [   32.249625] lacp0: Removing ARP target 218.160.255.255
      [   32.255432] lacp0: Removing ARP target 15.157.233.184
      [   32.261165] lacp0: invalid ARP target 255.255.255.255 specified for removal
      [   32.268939] lacp0: option arp_ip_target: invalid value (-255.255.255.255)
      [   32.276632] lacp0: Removing ARP target 16.0.0.0
      [   32.281755] lacp0: Removing ARP target 218.160.255.255
      [   32.287567] lacp0: Removing ARP target 72.125.228.17
      [   32.293165] lacp0: Removing ARP target 218.160.255.255
      [   32.298970] lacp0: Removing ARP target 8.125.228.17
      [   32.304458] lacp0: Removing ARP target 218.160.255.255
      
      None of these were actually specified as ARP targets, and the driver does
      seem to clean up the mess okay, but it's rather noisy and confusing, leaks
      values to userspace, and the 255.255.255.255 spew shows up even when debug
      prints are disabled.
      
      The fix: just zero out arp_target at init time.
      
      While we're in here, init arp_all_targets_value in the right place.
      
      Fixes: dc9c4d0f
      
       ("bonding: reduce scope of some global variables")
      CC: Mahesh Bandewar <maheshb@google.com>
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: netdev@vger.kernel.org
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Acked-by: default avatarAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72ccc471