Skip to content
  1. Mar 01, 2018
    • Ido Schimmel's avatar
      spectrum: Reference count VLAN entries · b3529af6
      Ido Schimmel authored
      One of the basic construct in the device is a port-VLAN pair, which can
      be bound to a FID or a RIF in order to direct packets to the bridge or
      the router, respectively.
      
      Since not all the netdevs are configured with a VLAN (e.g., sw1p1 vs.
      sw1p1.10), VID 1 is used to represent these and thus this VID can be
      used by both upper devices of mlxsw ports and by the driver itself.
      
      However, this VID is not reference counted and therefore might be freed
      prematurely, which can result in various WARNINGs. For example:
      
      $ ip link add name br0 type bridge vlan_filtering 1
      $ teamd -t team0 -d -c '{"runner": {"name": "lacp"}}'
      $ ip link set dev team0 master br0
      $ ip link set dev enp1s0np1 master team0
      $ ip address add 192.0.2.1/24 dev enp1s0np1
      
      The enslavement to team0 will fail because team0 already has an upper
      and thus vlan_vids_del_by_dev() will be executed as part of team's error
      path which will delete VID 1 from enp1s0np1 (added by br0 as PVID). The
      WARNING will be generated when the driver will realize it can't find VID
      1 on the port and bind it to a RIF.
      
      Fix this by adding a reference count to the VLAN entries on the port, in
      a similar fashion to the reference counting used by the corresponding
      'vlan_vid_info' structure in the 8021q driver.
      
      Fixes: c57529e1
      
       ("mlxsw: spectrum: Replace vPorts with Port-VLAN")
      Reported-by: default avatarTal Bar <talb@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Tested-by: default avatarTal Bar <talb@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b3529af6
    • Ido Schimmel's avatar
      mlxsw: spectrum: Treat IPv6 unregistered multicast as broadcast · 9d45deb0
      Ido Schimmel authored
      When multicast snooping is enabled, the Linux bridge resorts to flooding
      unregistered multicast packets to all ports only in case it did not
      detect a querier in the network.
      
      The above condition is not reflected to underlying drivers, which is
      especially problematic in IPv6 environments, as multicast snooping is
      enabled by default and since neighbour solicitation packets might be
      treated as unregistered multicast packets in case there is no
      corresponding MDB entry.
      
      Until the Linux bridge reflects its querier state to underlying drivers,
      simply treat unregistered multicast packets as broadcast and allow them
      to reach their destination.
      
      Fixes: 9df552ef
      
       ("mlxsw: spectrum: Improve IPv6 unregistered multicast flooding")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reported-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d45deb0
    • Jiri Pirko's avatar
      mlxsw: spectrum: Fix handling of resource_size_param · 77d27096
      Jiri Pirko authored
      Current code uses global variables, adjusts them and passes pointer down
      to devlink. With every other mlxsw_core instance, the previously passed
      pointer values are rewritten. Fix this by de-globalize the variables and
      also memcpy size_params during devlink resource registration.
      Also, introduce a convenient size_param_init helper.
      
      Fixes: ef3116e5
      
       ("mlxsw: spectrum: Register KVD resources with devlink")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77d27096
    • Jiri Pirko's avatar
      mlxsw: core: Fix flex keys scratchpad offset conflict · 2ddc94c7
      Jiri Pirko authored
      IP_TTL, IP_ECN and IP_DSCP are using the same offset within the
      scratchpad as L4 ports. Fix this by shifting all up.
      
      Fixes: 5f57e090
      
       ("mlxsw: acl: Add ip ttl acl element")
      Fixes: i80d0fe4710c ("mlxsw: acl: Add ip tos acl element")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ddc94c7
    • David S. Miller's avatar
      Merge branch 'net-smc-fixes' · 7358799c
      David S. Miller authored
      
      
      Ursula Braun says:
      
      ====================
      net/smc: fixes 2018-02-28
      
      here are 3 smc bug fixes for the net-tree. Karsten's first patch is
      the reworked version of last week's
         "[PATCH net-next 2/5] net/smc: fix structure size"
      patch, now solved without using __packed, and now targetted for net
      instead of net-next.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7358799c
    • Davide Caratti's avatar
      net/smc: fix NULL pointer dereference on sock_create_kern() error path · a5dcb73b
      Davide Caratti authored
      when sock_create_kern(..., a) returns an error, 'a' might not be a valid
      pointer, so it shouldn't be dereferenced to read a->sk->sk_sndbuf and
      and a->sk->sk_rcvbuf; not doing that caused the following crash:
      
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
          (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 4254 Comm: syzkaller919713 Not tainted 4.16.0-rc1+ #18
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:smc_create+0x14e/0x300 net/smc/af_smc.c:1410
      RSP: 0018:ffff8801b06afbc8 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff8801b63457c0 RCX: ffffffff85a3e746
      RDX: 0000000000000004 RSI: 00000000ffffffff RDI: 0000000000000020
      RBP: ffff8801b06afbf0 R08: 00000000000007c0 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: ffff8801b6345c08 R14: 00000000ffffffe9 R15: ffffffff8695ced0
      FS:  0000000001afb880(0000) GS:ffff8801db200000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000040 CR3: 00000001b0721004 CR4: 00000000001606f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        __sock_create+0x4d4/0x850 net/socket.c:1285
        sock_create net/socket.c:1325 [inline]
        SYSC_socketpair net/socket.c:1409 [inline]
        SyS_socketpair+0x1c0/0x6f0 net/socket.c:1366
        do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x26/0x9b
      RIP: 0033:0x4404b9
      RSP: 002b:00007fff44ab6908 EFLAGS: 00000246 ORIG_RAX: 0000000000000035
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004404b9
      RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000000002b
      RBP: 00007fff44ab6910 R08: 0000000000000002 R09: 00007fff44003031
      R10: 0000000020000040 R11: 0000000000000246 R12: ffffffffffffffff
      R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000
      Code: 48 c1 ea 03 80 3c 02 00 0f 85 b3 01 00 00 4c 8b a3 48 04 00 00 48
      b8
      00 00 00 00 00 fc ff df 49 8d 7c 24 20 48 89 fa 48 c1 ea 03 <80> 3c 02
      00
      0f 85 82 01 00 00 4d 8b 7c 24 20 48 b8 00 00 00 00
      RIP: smc_create+0x14e/0x300 net/smc/af_smc.c:1410 RSP: ffff8801b06afbc8
      
      Fixes: cd6851f3
      
       smc: remote memory buffers (RMBs)
      Reported-and-tested-by: default avatar <syzbot+aa0227369be2dcc26ebe@syzkaller.appspotmail.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5dcb73b
    • Karsten Graul's avatar
      net/smc: use link_id of server in confirm link reply · 2be922f3
      Karsten Graul authored
      
      
      The CONFIRM LINK reply message must contain the link_id sent
      by the server. And set the link_id explicitly when
      initializing the link.
      
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.vnet.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2be922f3
    • Karsten Graul's avatar
      net/smc: use a constant for control message length · cbba07a7
      Karsten Graul authored
      
      
      The sizeof(struct smc_cdc_msg) evaluates to 48 bytes instead of the
      required 44 bytes. We need to use the constant value of
      SMC_WR_TX_SIZE to set and check the control message length.
      
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.vnet.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cbba07a7
    • Jason Wang's avatar
      virtio-net: disable NAPI only when enabled during XDP set · 4e09ff53
      Jason Wang authored
      We try to disable NAPI to prevent a single XDP TX queue being used by
      multiple cpus. But we don't check if device is up (NAPI is enabled),
      this could result stall because of infinite wait in
      napi_disable(). Fixing this by checking device state through
      netif_running() before.
      
      Fixes: 4941d472
      
       ("virtio-net: do not reset during XDP set")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e09ff53
    • Joey Pabalinas's avatar
      net/tcp/illinois: replace broken algorithm reference link · ecc83275
      Joey Pabalinas authored
      The link to the pdf containing the algorithm description is now a
      dead link; it seems http://www.ifp.illinois.edu/~srikant/ has been
      moved to https://sites.google.com/a/illinois.edu/srikant/ and none of
      the original papers can be found there...
      
      I have replaced it with the only working copy I was able to find.
      
      n.b. there is also a copy available at:
      
      http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.296.6350&rep=rep1&type=pdf
      
      
      
      However, this seems to only be a *cached* version, so I am unsure
      exactly how reliable that link can be expected to remain over time
      and have decided against using that one.
      
      Signed-off-by: default avatarJoey Pabalinas <joeypabalinas@gmail.com>
      
       1 file changed, 1 insertion(+), 1 deletion(-)
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ecc83275
    • Soheil Hassas Yeganeh's avatar
      tcp: purge write queue upon RST · a27fd7a8
      Soheil Hassas Yeganeh authored
      When the connection is reset, there is no point in
      keeping the packets on the write queue until the connection
      is closed.
      
      RFC 793 (page 70) and RFC 793-bis (page 64) both suggest
      purging the write queue upon RST:
      https://tools.ietf.org/html/draft-ietf-tcpm-rfc793bis-07
      
      Moreover, this is essential for a correct MSG_ZEROCOPY
      implementation, because userspace cannot call close(fd)
      before receiving zerocopy signals even when the connection
      is reset.
      
      Fixes: f214f915
      
       ("tcp: enable MSG_ZEROCOPY")
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a27fd7a8
    • David S. Miller's avatar
      Merge branch 'tcp-revert-a-F-RTO-extension-due-to-broken-middle-boxes' · 55e84dd7
      David S. Miller authored
      Yuchung Cheng says:
      
      ====================
      tcp: revert a F-RTO extension due to broken middle-boxes
      
      This patch series reverts a (non-standard) TCP F-RTO extension that aimed
      to detect more spurious timeouts. Unfortunately it could result in poor
      performance due to broken middle-boxes that modify TCP packets. E.g.
      https://www.spinics.net/lists/netdev/msg484154.html
      
      
      We believe the best and simplest solution is to just revert the change.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55e84dd7
    • Yuchung Cheng's avatar
      tcp: revert F-RTO extension to detect more spurious timeouts · fc68e171
      Yuchung Cheng authored
      This reverts commit 89fe18e4.
      
      While the patch could detect more spurious timeouts, it could cause
      poor TCP performance on broken middle-boxes that modifies TCP packets
      (e.g. receive window, SACK options). Since the performance gain is
      much smaller compared to the potential loss. The best solution is
      to fully revert the change.
      
      Fixes: 89fe18e4
      
       ("tcp: extend F-RTO to catch more spurious timeouts")
      Reported-by: default avatarTeodor Milkov <tm@del.bg>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc68e171
    • Yuchung Cheng's avatar
      tcp: revert F-RTO middle-box workaround · d4131f09
      Yuchung Cheng authored
      This reverts commit cc663f4d. While fixing
      some broken middle-boxes that modifies receive window fields, it does not
      address middle-boxes that strip off SACK options. The best solution is
      to fully revert this patch and the root F-RTO enhancement.
      
      Fixes: cc663f4d
      
       ("tcp: restrict F-RTO to work-around broken middle-boxes")
      Reported-by: default avatarTeodor Milkov <tm@del.bg>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4131f09
    • David S. Miller's avatar
      Merge branch 's390-qeth-fixes' · c8431622
      David S. Miller authored
      
      
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2018-02-27
      
      please apply some more qeth patches for -net and stable.
      
      One patch fixes a performance bug in the TSO path. Then there's several
      more fixes for IP management on L3 devices - including a revert, so that
      the subsequent fix cleanly applies to earlier kernels.
      The final patch takes care of a race in the control IO code that causes
      qeth to miss the cmd response, and subsequently trigger device recovery.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8431622
    • Julian Wiedmann's avatar
      s390/qeth: fix IPA command submission race · d22ffb5a
      Julian Wiedmann authored
      
      
      If multiple IPA commands are build & sent out concurrently,
      fill_ipacmd_header() may assign a seqno value to a command that's
      different from what send_control_data() later assigns to this command's
      reply.
      This is due to other commands passing through send_control_data(),
      and incrementing card->seqno.ipa along the way.
      
      So one IPA command has no reply that's waiting for its seqno, while some
      other IPA command has multiple reply objects waiting for it.
      Only one of those waiting replies wins, and the other(s) times out and
      triggers a recovery via send_ipa_cmd().
      
      Fix this by making sure that the same seqno value is assigned to
      a command and its reply object.
      Do so immediately before submitting the command & while holding the
      irq_pending "lock", to produce nicely ascending seqnos.
      
      As a side effect, *all* IPA commands now use a reply object that's
      waiting for its actual seqno. Previously, early IPA commands that were
      submitted while the card was still DOWN used the "catch-all" IDX seqno.
      
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d22ffb5a
    • Julian Wiedmann's avatar
      s390/qeth: fix IP address lookup for L3 devices · c5c48c58
      Julian Wiedmann authored
      Current code ("qeth_l3_ip_from_hash()") matches a queried address object
      against objects in the IP table by IP address, Mask/Prefix Length and
      MAC address ("qeth_l3_ipaddrs_is_equal()"). But what callers actually
      require is either
      a) "is this IP address registered" (ie. match by IP address only),
      before adding a new address.
      b) or "is this address object registered" (ie. match all relevant
         attributes), before deleting an address.
      
      Right now
      1. the ADD path is too strict in its lookup, and eg. doesn't detect
      conflicts between an existing NORMAL address and a new VIPA address
      (because the NORMAL address will have mask != 0, while VIPA has
      a mask == 0),
      2. the DELETE path is not strict enough, and eg. allows del_rxip() to
      delete a VIPA address as long as the IP address matches.
      
      Fix all this by adding helpers (_addr_match_ip() and _addr_match_all())
      that do the appropriate checking.
      
      Note that the ADD path for NORMAL addresses is special, as qeth keeps
      track of how many times such an address is in use (and there is no
      immediate way of returning errors to the caller). So when a requested
      NORMAL address _fully_ matches an existing one, it's not considered a
      conflict and we merely increment the refcount.
      
      Fixes: 5f78e29c
      
       ("qeth: optimize IP handling in rx_mode callback")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5c48c58
    • Julian Wiedmann's avatar
      Revert "s390/qeth: fix using of ref counter for rxip addresses" · 4964c66f
      Julian Wiedmann authored
      This reverts commit cb816192.
      
      The issue this attempted to fix never actually occurs.
      l3_add_rxip() checks (via l3_ip_from_hash()) if the requested address
      was previously added to the card. If so, it returns -EEXIST and doesn't
      call l3_add_ip().
      As a result, the "address exists" path in l3_add_ip() is never taken
      for rxip addresses, and this patch had no effect.
      
      Fixes: cb816192
      
       ("s390/qeth: fix using of ref counter for rxip addresses")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4964c66f
    • Julian Wiedmann's avatar
      s390/qeth: fix double-free on IP add/remove race · 14d066c3
      Julian Wiedmann authored
      Registering an IPv4 address with the HW takes quite a while, so we
      temporarily drop the ip_htable lock. Any concurrent add/remove of the
      same IP adjusts the IP's use count, and (on remove) is then blocked by
      addr->in_progress.
      After the register call has completed, we check the use count for
      concurrently attempted add/remove calls - and possibly straight-away
      deregister the IP again. This happens via l3_delete_ip(), which
      1) looks up the queried IP in the htable (getting a reference to the
         *same* queried object),
      2) deregisters the IP from the HW, and
      3) frees the IP object.
      
      The caller in l3_add_ip() then does a second free on the same object.
      
      For this case, skip all the extra checks and lookups in l3_delete_ip()
      and just deregister & free the IP object ourselves.
      
      Fixes: 5f78e29c
      
       ("qeth: optimize IP handling in rx_mode callback")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14d066c3
    • Julian Wiedmann's avatar
      s390/qeth: fix IP removal on offline cards · 98d823ab
      Julian Wiedmann authored
      If the HW is not reachable, then none of the IPs in qeth's internal
      table has been registered with the HW yet. So when deleting such an IP,
      there's no need to stage it for deregistration - just drop it from
      the table.
      
      This fixes the "add-delete-add" scenario on an offline card, where the
      the second "add" merely increments the IP's use count. But as the IP is
      still set to DISP_ADDR_DELETE from the previous "delete" step,
      l3_recover_ip() won't register it with the HW when the card goes online.
      
      Fixes: 5f78e29c
      
       ("qeth: optimize IP handling in rx_mode callback")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98d823ab
    • Julian Wiedmann's avatar
      s390/qeth: fix overestimated count of buffer elements · 12472af8
      Julian Wiedmann authored
      qeth_get_elements_for_range() doesn't know how to handle a 0-length
      range (ie. start == end), and returns 1 when it should return 0.
      Such ranges occur on TSO skbs, where the L2/L3/L4 headers (and thus all
      of the skb's linear data) are skipped when mapping the skb into regular
      buffer elements.
      
      This overestimation may cause several performance-related issues:
      1. sub-optimal IO buffer selection, where the next buffer gets selected
         even though the skb would actually still fit into the current buffer.
      2. forced linearization, if the element count for a non-linear skb
         exceeds QETH_MAX_BUFFER_ELEMENTS.
      
      Rather than modifying qeth_get_elements_for_range() and adding overhead
      to every caller, fix up those callers that are in risk of passing a
      0-length range.
      
      Fixes: 2863c613
      
       ("qeth: refactor calculation of SBALE count")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      12472af8
  2. Feb 28, 2018
  3. Feb 27, 2018