Skip to content
  1. Dec 22, 2017
    • Willem de Bruijn's avatar
      skbuff: skb_copy_ubufs must release uarg even without user frags · b90ddd56
      Willem de Bruijn authored
      skb_copy_ubufs creates a private copy of frags[] to release its hold
      on user frags, then calls uarg->callback to notify the owner.
      
      Call uarg->callback even when no frags exist. This edge case can
      happen when zerocopy_sg_from_iter finds enough room in skb_headlen
      to copy all the data.
      
      Fixes: 3ece7826
      
       ("sock: skb_copy_ubufs support for compound pages")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b90ddd56
    • Willem de Bruijn's avatar
      skbuff: orphan frags before zerocopy clone · 268b7906
      Willem de Bruijn authored
      Call skb_zerocopy_clone after skb_orphan_frags, to avoid duplicate
      calls to skb_uarg(skb)->callback for the same data.
      
      skb_zerocopy_clone associates skb_shinfo(skb)->uarg from frag_skb
      with each segment. This is only safe for uargs that do refcounting,
      which is those that pass skb_orphan_frags without dropping their
      shared frags. For others, skb_orphan_frags drops the user frags and
      sets the uarg to NULL, after which sock_zerocopy_clone has no effect.
      
      Qemu hangs were reported due to duplicate vhost_net_zerocopy_callback
      calls for the same data causing the vhost_net_ubuf_ref_>refcount to
      drop below zero.
      
      Link: http://lkml.kernel.org/r/<CAF=yD-LWyCD4Y0aJ9O0e_CHLR+3JOeKicRRTEVCPxgw4XOcqGQ@mail.gmail.com>
      Fixes: 1f8b977a
      
       ("sock: enable MSG_ZEROCOPY")
      Reported-by: default avatarAndreas Hartmann <andihartmann@01019freenet.de>
      Reported-by: default avatarDavid Hill <dhill@redhat.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      268b7906
    • Shaohua Li's avatar
      net: reevalulate autoflowlabel setting after sysctl setting · 513674b5
      Shaohua Li authored
      sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
      If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
      supposed to not include flowlabel. This is true for normal packet, but
      not for reset packet.
      
      The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
      we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
      changed, so the sock will keep the old behavior in terms of auto
      flowlabel. Reset packet is suffering from this problem, because reset
      packet is sent from a special control socket, which is created at boot
      time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
      socket will always have its ipv6_pinfo.autoflowlabel set, even after
      user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
      have flowlabel. Normal sock created before sysctl setting suffers from
      the same issue. We can't even turn off autoflowlabel unless we kill all
      socks in the hosts.
      
      To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
      autoflowlabel setting from user, otherwise we always call
      ip6_default_np_autolabel() which has the new settings of sysctl.
      
      Note, this changes behavior a little bit. Before commit 42240901
      
      
      (ipv6: Implement different admin modes for automatic flow labels), the
      autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
      existing connection will change autoflowlabel behavior. After that
      commit, autoflowlabel behavior is sticky in the whole life of the sock.
      With this patch, the behavior isn't sticky again.
      
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Tom Herbert <tom@quantonium.net>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      513674b5
    • Eric Garver's avatar
      openvswitch: Fix pop_vlan action for double tagged frames · c48e7473
      Eric Garver authored
      skb_vlan_pop() expects skb->protocol to be a valid TPID for double
      tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop()
      shift the true ethertype into position for us.
      
      Fixes: 5108bbad
      
       ("openvswitch: add processing of L3 packets")
      Signed-off-by: default avatarEric Garver <e@erig.me>
      Reviewed-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c48e7473
    • Ido Schimmel's avatar
      ipv6: Honor specified parameters in fibmatch lookup · 58acfd71
      Ido Schimmel authored
      Currently, parameters such as oif and source address are not taken into
      account during fibmatch lookup. Example (IPv4 for reference) before
      patch:
      
      $ ip -4 route show
      192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1
      198.51.100.0/24 dev dummy1 proto kernel scope link src 198.51.100.1
      
      $ ip -6 route show
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      2001:db8:2::/64 dev dummy1 proto kernel metric 256 pref medium
      fe80::/64 dev dummy0 proto kernel metric 256 pref medium
      fe80::/64 dev dummy1 proto kernel metric 256 pref medium
      
      $ ip -4 route get fibmatch 192.0.2.2 oif dummy0
      192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1
      $ ip -4 route get fibmatch 192.0.2.2 oif dummy1
      RTNETLINK answers: No route to host
      
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      
      After:
      
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1
      RTNETLINK answers: Network is unreachable
      
      The problem stems from the fact that the necessary route lookup flags
      are not set based on these parameters.
      
      Instead of duplicating the same logic for fibmatch, we can simply
      resolve the original route from its copy and dump it instead.
      
      Fixes: 18c3a61c
      
       ("net: ipv6: RTM_GETROUTE: return matched fib result when requested")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58acfd71
  2. Dec 21, 2017
  3. Dec 20, 2017
    • David Miller's avatar
      bpf: Fix tools and testing build. · 19c832ed
      David Miller authored
      
      
      I'm getting various build failures on sparc64.  The key is
      usually that the userland tools get built 32-bit.
      
      1) clock_gettime() is in librt, so that must be added to the link
         libraries.
      
      2) "sizeof(x)" must be printed with "%Z" printf prefix.
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      19c832ed
    • Moshe Shemesh's avatar
      net/mlx5: Stay in polling mode when command EQ destroy fails · a2fba188
      Moshe Shemesh authored
      During unload, on mlx5_stop_eqs we move command interface from events
      mode to polling mode, but if command interface EQ destroy fail we move
      back to events mode.
      That's wrong since even if we fail to destroy command interface EQ, we
      do release its irq, so no interrupts will be received.
      
      Fixes: e126ba97
      
       ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      a2fba188
    • Moshe Shemesh's avatar
      net/mlx5: Cleanup IRQs in case of unload failure · d6b2785c
      Moshe Shemesh authored
      When mlx5_stop_eqs fails to destroy any of the eqs it returns with an error.
      In such failure flow the function will return without
      releasing all EQs irqs and then pci_free_irq_vectors will fail.
      Fix by only warn on destroy EQ failure and continue to release other
      EQs and their irqs.
      
      It fixes the following kernel trace:
      kernel: kernel BUG at drivers/pci/msi.c:352!
      ...
      ...
      kernel: Call Trace:
      kernel: pci_disable_msix+0xd3/0x100
      kernel: pci_free_irq_vectors+0xe/0x20
      kernel: mlx5_load_one.isra.17+0x9f5/0xec0 [mlx5_core]
      
      Fixes: e126ba97
      
       ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      d6b2785c
    • Maor Gottlieb's avatar
      net/mlx5: Fix steering memory leak · 139ed6c6
      Maor Gottlieb authored
      Flow steering priority and namespace are software only objects that
      didn't have the proper destructors and were not freed during steering
      cleanup.
      
      Fix it by adding destructor functions for these objects.
      
      Fixes: bd71b08e
      
       ("net/mlx5: Support multiple updates of steering rules in parallel")
      Signed-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      139ed6c6
    • Gal Pressman's avatar
      net/mlx5e: Prevent possible races in VXLAN control flow · 0c1cc8b2
      Gal Pressman authored
      When calling add/remove VXLAN port, a lock must be held in order to
      prevent race scenarios when more than one add/remove happens at the
      same time.
      Fix by holding our state_lock (mutex) as done by all other parts of the
      driver.
      Note that the spinlock protecting the radix-tree is still needed in
      order to synchronize radix-tree access from softirq context.
      
      Fixes: b3f63c3d
      
       ("net/mlx5e: Add netdev support for VXLAN tunneling")
      Signed-off-by: default avatarGal Pressman <galp@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      0c1cc8b2
    • Gal Pressman's avatar
      net/mlx5e: Add refcount to VXLAN structure · 23f4cc2c
      Gal Pressman authored
      A refcount mechanism must be implemented in order to prevent unwanted
      scenarios such as:
      - Open an IPv4 VXLAN interface
      - Open an IPv6 VXLAN interface (different socket)
      - Remove one of the interfaces
      
      With current implementation, the UDP port will be removed from our VXLAN
      database and turn off the offloads for the other interface, which is
      still active.
      The reference count mechanism will only allow UDP port removals once all
      consumers are gone.
      
      Fixes: b3f63c3d
      
       ("net/mlx5e: Add netdev support for VXLAN tunneling")
      Signed-off-by: default avatarGal Pressman <galp@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      23f4cc2c
    • Gal Pressman's avatar
      net/mlx5e: Fix possible deadlock of VXLAN lock · 63235141
      Gal Pressman authored
      mlx5e_vxlan_lookup_port is called both from mlx5e_add_vxlan_port (user
      context) and mlx5e_features_check (softirq), but the lock acquired does
      not disable bottom half and might result in deadlock. Fix it by simply
      replacing spin_lock() with spin_lock_bh().
      While at it, replace all unnecessary spin_lock_irq() to spin_lock_bh().
      
      lockdep's WARNING: inconsistent lock state
      [  654.028136] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [  654.028229] swapper/5/0 [HC0[0]:SC1[9]:HE1:SE0] takes:
      [  654.028321]  (&(&vxlan_db->lock)->rlock){+.?.}, at: [<ffffffffa06e7f0e>] mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core]
      [  654.028528] {SOFTIRQ-ON-W} state was registered at:
      [  654.028607]   _raw_spin_lock+0x3c/0x70
      [  654.028689]   mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core]
      [  654.028794]   mlx5e_vxlan_add_port+0x2e/0x120 [mlx5_core]
      [  654.028878]   process_one_work+0x1e9/0x640
      [  654.028942]   worker_thread+0x4a/0x3f0
      [  654.029002]   kthread+0x141/0x180
      [  654.029056]   ret_from_fork+0x24/0x30
      [  654.029114] irq event stamp: 579088
      [  654.029174] hardirqs last  enabled at (579088): [<ffffffff818f475a>] ip6_finish_output2+0x49a/0x8c0
      [  654.029309] hardirqs last disabled at (579087): [<ffffffff818f470e>] ip6_finish_output2+0x44e/0x8c0
      [  654.029446] softirqs last  enabled at (579030): [<ffffffff810b3b3d>] irq_enter+0x6d/0x80
      [  654.029567] softirqs last disabled at (579031): [<ffffffff810b3c05>] irq_exit+0xb5/0xc0
      [  654.029684] other info that might help us debug this:
      [  654.029781]  Possible unsafe locking scenario:
      
      [  654.029868]        CPU0
      [  654.029908]        ----
      [  654.029947]   lock(&(&vxlan_db->lock)->rlock);
      [  654.030045]   <Interrupt>
      [  654.030090]     lock(&(&vxlan_db->lock)->rlock);
      [  654.030162]
       *** DEADLOCK ***
      
      Fixes: b3f63c3d
      
       ("net/mlx5e: Add netdev support for VXLAN tunneling")
      Signed-off-by: default avatarGal Pressman <galp@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      63235141
    • Moni Shoua's avatar
      net/mlx5: Fix error flow in CREATE_QP command · dbff26e4
      Moni Shoua authored
      In error flow, when DESTROY_QP command should be executed, the wrong
      mailbox was set with data, not the one that is written to hardware,
      Fix that.
      
      Fixes: 09a7d9ec
      
       '{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc'
      Signed-off-by: default avatarMoni Shoua <monis@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      dbff26e4
    • Eugenia Emantayev's avatar
      net/mlx5: Fix misspelling in the error message and comment · 777ec2b2
      Eugenia Emantayev authored
      Fix misspelling in word syndrome.
      
      Fixes: e126ba97
      
       ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: default avatarEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      777ec2b2
    • Eugenia Emantayev's avatar
      net/mlx5e: Fix defaulting RX ring size when not needed · 696a97cf
      Eugenia Emantayev authored
      Fixes the bug when turning on/off CQE compression mechanism
      resets the RX rings size to default value when it is not
      needed.
      
      Fixes: 2fc4bfb7
      
       ("net/mlx5e: Dynamic RQ type infrastructure")
      Signed-off-by: default avatarEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      696a97cf