Skip to content
  1. Feb 11, 2014
    • Liu Junliang's avatar
    • John Ogness's avatar
      tcp: tsq: fix nonagle handling · bf06200e
      John Ogness authored
      
      
      Commit 46d3ceab ("tcp: TCP Small Queues") introduced a possible
      regression for applications using TCP_NODELAY.
      
      If TCP session is throttled because of tsq, we should consult
      tp->nonagle when TX completion is done and allow us to send additional
      segment, especially if this segment is not a full MSS.
      Otherwise this segment is sent after an RTO.
      
      [edumazet] : Cooked the changelog, added another fix about testing
      sk_wmem_alloc twice because TX completion can happen right before
      setting TSQ_THROTTLED bit.
      
      This problem is particularly visible with recent auto corking,
      but might also be triggered with low tcp_limit_output_bytes
      values or NIC drivers delaying TX completion by hundred of usec,
      and very low rtt.
      
      Thomas Glanzmann for example reported an iscsi regression, caused
      by tcp auto corking making this bug quite visible.
      
      Fixes: 46d3ceab ("tcp: TCP Small Queues")
      Signed-off-by: default avatarJohn Ogness <john.ogness@linutronix.de>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarThomas Glanzmann <thomas@glanzmann.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf06200e
    • David S. Miller's avatar
      Merge branch 'bridge' · 684bd2e1
      David S. Miller authored
      
      
      Toshiaki Makita says:
      
      ====================
      bridge: Fix corner case problems around local fdb entries
      
      There are so many corner cases that are not handled properly around local
      fdb entries.
      - We might fail to delete the old entry and might delete an arbitrary local
        entry when changing mac address of a bridge port.
      - We always fail to delete the old entry when changing mac address of the
        bridge device.
      - We might incorrectly delete a necessary entry when detaching a bridge port.
      - We might incorrectly delete a necessary entry when deleting a vlan.
      and so on.
      
      This is a patch series to fix these issues.
      
      v3:
      - Handle NTF_USE case in patch 1/9, commented by Vlad Yasevich.
      
      - Tested port detach/attach and didn't find any problem with patch 5/9,
        suggested by Stephen Hemminger.
      
      - Add comments about possible inconsistent state in current implementation
        into commit log of patch 5/9, found by the above test.
      
      - Reword unintensive changelog of patch 7/9, commented by Vlad Yasevich.
      
      v2:
      - Change the way to find the old entry in br_fdb_changeaddr() from memorizing
        previous port address to introducing a new flag indicating whether a fdb
        entry is added by user or not, commented by Stephen Hemminger.
      
      - Add a fix for the way to insert a new address in br_fdb_changeaddr().
      
      - Prevent creating an entry such that its dst is NULL in br_add_if() to
        preserve old behavior, commented by Vlad Yasevich.
      
      - Add more comments about slight behavior change, where the bridge device
        come to be able to receive traffic to an address it has during short
        window, to changelogs, commented by Vlad Yasevich.
      
      - Add a fix for possible race in br_fdb_change_mac_address().
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      684bd2e1
    • Toshiaki Makita's avatar
      bridge: Prevent possible race condition in br_fdb_change_mac_address · ac4c8868
      Toshiaki Makita authored
      
      
      br_fdb_change_mac_address() calls fdb_insert()/fdb_delete() without
      br->hash_lock.
      
      These hash list updates are racy with br_fdb_update()/br_fdb_cleanup().
      
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac4c8868
    • Toshiaki Makita's avatar
      bridge: Properly check if local fdb entry can be deleted when deleting vlan · 424bb9c9
      Toshiaki Makita authored
      
      
      Vlan codes unconditionally delete local fdb entries.
      We should consider the possibility that other ports have the same
      address and vlan.
      
      Example of problematic case:
        ip link set eth0 address 12:34:56:78:90:ab
        ip link set eth1 address aa:bb:cc:dd:ee:ff
        brctl addif br0 eth0
        brctl addif br0 eth1 # br0 will have mac address 12:34:56:78:90:ab
        bridge vlan add dev eth0 vid 10
        bridge vlan add dev eth1 vid 10
        bridge vlan add dev br0 vid 10 self
      We will have fdb entry such that f->dst == eth0, f->vlan_id == 10 and
      f->addr == 12:34:56:78:90:ab at this time.
      Next, delete eth0 vlan 10.
        bridge vlan del dev eth0 vid 10
      In this case, we still need the entry for br0, but it will be deleted.
      
      Note that br0 needs the entry even though its mac address is not set
      manually. To delete the entry with proper condition checking,
      fdb_delete_local() is suitable to use.
      
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      424bb9c9
    • Toshiaki Makita's avatar
      bridge: Properly check if local fdb entry can be deleted in br_fdb_delete_by_port · a778e6d1
      Toshiaki Makita authored
      
      
      br_fdb_delete_by_port() doesn't care about vlan and mac address of the
      bridge device.
      
      As the check is almost the same as mac address changing, slightly modify
      fdb_delete_local() and use it.
      
      Note that we can always set added_by_user to 0 in fdb_delete_local() because
      - br_fdb_delete_by_port() calls fdb_delete_local() for local entries
        regardless of its added_by_user. In this case, we have to check if another
        port has the same address and vlan, and if found, we have to create the
        entry (by changing dst). This is kernel-added entry, not user-added.
      - br_fdb_changeaddr() doesn't call fdb_delete_local() for user-added entry.
      
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a778e6d1
    • Toshiaki Makita's avatar
      bridge: Properly check if local fdb entry can be deleted in br_fdb_change_mac_address · 960b589f
      Toshiaki Makita authored
      
      
      br_fdb_change_mac_address() doesn't check if the local entry has the
      same address as any of bridge ports.
      Although I'm not sure when it is beneficial, current implementation allow
      the bridge device to receive any mac address of its ports.
      To preserve this behavior, we have to check if the mac address of the
      entry being deleted is identical to that of any port.
      
      As this check is almost the same as that in br_fdb_changeaddr(), create
      a common function fdb_delete_local() and call it from
      br_fdb_changeadddr() and br_fdb_change_mac_address().
      
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      960b589f
    • Toshiaki Makita's avatar
      bridge: Fix the way to check if a local fdb entry can be deleted · 2b292fb4
      Toshiaki Makita authored
      
      
      We should take into account the followings when deleting a local fdb
      entry.
      
      - nbp_vlan_find() can be used only when vid != 0 to check if an entry is
        deletable, because a fdb entry with vid 0 can exist at any time while
        nbp_vlan_find() always return false with vid 0.
      
        Example of problematic case:
          ip link set eth0 address 12:34:56:78:90:ab
          ip link set eth1 address 12:34:56:78:90:ab
          brctl addif br0 eth0
          brctl addif br0 eth1
          ip link set eth0 address aa:bb:cc:dd:ee:ff
        Then, the fdb entry 12:34:56:78:90:ab will be deleted even though the
        bridge port eth1 still has that address.
      
      - The port to which the bridge device is attached might needs a local entry
        if its mac address is set manually.
      
        Example of problematic case:
          ip link set eth0 address 12:34:56:78:90:ab
          brctl addif br0 eth0
          ip link set br0 address 12:34:56:78:90:ab
          ip link set eth0 address aa:bb:cc:dd:ee:ff
        Then, the fdb still must have the entry 12:34:56:78:90:ab, but it will be
        deleted.
      
      We can use br->dev->addr_assign_type to check if the address is manually
      set or not, but I propose another approach.
      
      Since we delete and insert local entries whenever changing mac address
      of the bridge device, we can change dst of the entry to NULL regardless of
      addr_assign_type when deleting an entry associated with a certain port,
      and if it is found to be unnecessary later, then delete it.
      That is, if changing mac address of a port, the entry might be changed
      to its dst being NULL first, but is eventually deleted when recalculating
      and changing bridge id.
      
      This approach is especially useful when we want to share the code with
      deleting vlan in which the bridge device might want such an entry regardless
      of addr_assign_type, and makes things easy because we don't have to consider
      if mac address of the bridge device will be changed or not at the time we
      delete a local entry of a port, which means fdb code will not be bothered
      even if the bridge id calculating logic is changed in the future.
      
      Also, this change reduces inconsistent state, where frames whose dst is the
      mac address of the bridge, can't reach the bridge because of premature fdb
      entry deletion. This change reduces the possibility that the bridge device
      replies unreachable mac address to arp requests, which could occur during
      the short window between calling del_nbp() and br_stp_recalculate_bridge_id()
      in br_del_if(). This will effective after br_fdb_delete_by_port() starts to
      use the same code by following patch.
      
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b292fb4
    • Toshiaki Makita's avatar
      bridge: Change local fdb entries whenever mac address of bridge device changes · a4b816d8
      Toshiaki Makita authored
      
      
      Vlan code may need fdb change when changing mac address of bridge device
      even if it is caused by the mac address changing of a bridge port.
      
      Example configuration:
        ip link set eth0 address 12:34:56:78:90:ab
        ip link set eth1 address aa:bb:cc:dd:ee:ff
        brctl addif br0 eth0
        brctl addif br0 eth1 # br0 will have mac address 12:34:56:78:90:ab
        bridge vlan add dev br0 vid 10 self
        bridge vlan add dev eth0 vid 10
      We will have fdb entry such that f->dst == NULL, f->vlan_id == 10 and
      f->addr == 12:34:56:78:90:ab at this time.
      Next, change the mac address of eth0 to greater value.
        ip link set eth0 address ee:ff:12:34:56:78
      Then, mac address of br0 will be recalculated and set to aa:bb:cc:dd:ee:ff.
      However, an entry aa:bb:cc:dd:ee:ff will not be created and we will be not
      able to communicate using br0 on vlan 10.
      
      Address this issue by deleting and adding local entries whenever
      changing the mac address of the bridge device.
      
      If there already exists an entry that has the same address, for example,
      in case that br_fdb_changeaddr() has already inserted it,
      br_fdb_change_mac_address() will simply fail to insert it and no
      duplicated entry will be made, as it was.
      
      This approach also needs br_add_if() to call br_fdb_insert() before
      br_stp_recalculate_bridge_id() so that we don't create an entry whose
      dst == NULL in this function to preserve previous behavior.
      
      Note that this is a slight change in behavior where the bridge device can
      receive the traffic to the new address before calling
      br_stp_recalculate_bridge_id() in br_add_if().
      However, it is not a problem because we have already the address on the
      new port and such a way to insert new one before recalculating bridge id
      is taken in br_device_event() as well.
      
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4b816d8
    • Toshiaki Makita's avatar
      bridge: Fix the way to find old local fdb entries in br_fdb_change_mac_address · a3ebb7ef
      Toshiaki Makita authored
      
      
      We have been always failed to delete the old entry at
      br_fdb_change_mac_address() because br_set_mac_address() updates
      dev->dev_addr before calling br_fdb_change_mac_address() and
      br_fdb_change_mac_address() uses dev->dev_addr to find the old entry.
      
      That update of dev_addr is completely unnecessary because the same work
      is done in br_stp_change_bridge_id() which is called right away after
      calling br_fdb_change_mac_address().
      
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3ebb7ef
    • Toshiaki Makita's avatar
      bridge: Fix the way to insert new local fdb entries in br_fdb_changeaddr · 2836882f
      Toshiaki Makita authored
      
      
      Since commit bc9a25d2 ("bridge: Add vlan support for local fdb entries"),
      br_fdb_changeaddr() has inserted a new local fdb entry only if it can
      find old one. But if we have two ports where they have the same address
      or user has deleted a local entry, there will be no entry for one of the
      ports.
      
      Example of problematic case:
        ip link set eth0 address aa:bb:cc:dd:ee:ff
        ip link set eth1 address aa:bb:cc:dd:ee:ff
        brctl addif br0 eth0
        brctl addif br0 eth1 # eth1 will not have a local entry due to dup.
        ip link set eth1 address 12:34:56:78:90:ab
      Then, the new entry for the address 12:34:56:78:90:ab will not be
      created, and the bridge device will not be able to communicate.
      
      Insert new entries regardless of whether we can find old entries or not.
      
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2836882f
    • Toshiaki Makita's avatar
      bridge: Fix the way to find old local fdb entries in br_fdb_changeaddr · a5642ab4
      Toshiaki Makita authored
      
      
      br_fdb_changeaddr() assumes that there is at most one local entry per port
      per vlan. It used to be true, but since commit 36fd2b63 ("bridge: allow
      creating/deleting fdb entries via netlink"), it has not been so.
      Therefore, the function might fail to search a correct previous address
      to be deleted and delete an arbitrary local entry if user has added local
      entries manually.
      
      Example of problematic case:
        ip link set eth0 address ee:ff:12:34:56:78
        brctl addif br0 eth0
        bridge fdb add 12:34:56:78:90:ab dev eth0 master
        ip link set eth0 address aa:bb:cc:dd:ee:ff
      Then, the address 12:34:56:78:90:ab might be deleted instead of
      ee:ff:12:34:56:78, the original mac address of eth0.
      
      Address this issue by introducing a new flag, added_by_user, to struct
      net_bridge_fdb_entry.
      
      Note that br_fdb_delete_by_port() has to set added_by_user to 0 in cases
      like:
        ip link set eth0 address 12:34:56:78:90:ab
        ip link set eth1 address aa:bb:cc:dd:ee:ff
        brctl addif br0 eth0
        bridge fdb add aa:bb:cc:dd:ee:ff dev eth0 master
        brctl addif br0 eth1
        brctl delif br0 eth0
      In this case, kernel should delete the user-added entry aa:bb:cc:dd:ee:ff,
      but it also should have been added by "brctl addif br0 eth1" originally,
      so we don't delete it and treat it a new kernel-created entry.
      
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5642ab4
  2. Feb 10, 2014
  3. Feb 08, 2014
  4. Feb 07, 2014
    • Pablo Neira Ayuso's avatar
      netfilter: nft_rbtree: fix data handling of end interval elements · 2fb91ddb
      Pablo Neira Ayuso authored
      
      
      This patch fixes several things which related to the handling of
      end interval elements:
      
      * Chain use underflow with intervals and map: If you add a rule
        using intervals+map that introduces a loop, the error path of the
        rbtree set decrements the chain refcount for each side of the
        interval, leading to a chain use counter underflow.
      
      * Don't copy the data part of the end interval element since, this
        area is uninitialized and this confuses the loop detection code.
      
      * Don't allocate room for the data part of end interval elements
        since this is unused.
      
      So, after this patch the idea is that end interval elements don't
      have a data part.
      
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: default avatarPatrick McHardy <kaber@trash.net>
      2fb91ddb
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: do not allow NFT_SET_ELEM_INTERVAL_END flag and data · bd7fc645
      Pablo Neira Ayuso authored
      
      
      This combination is not allowed since end interval elements cannot
      contain data.
      
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: default avatarPatrick McHardy <kaber@trash.net>
      bd7fc645
    • Eric Dumazet's avatar
      tcp: remove 1ms offset in srtt computation · 4a5ab4e2
      Eric Dumazet authored
      
      
      TCP pacing depends on an accurate srtt estimation.
      
      Current srtt estimation is using jiffie resolution,
      and has an artificial offset of at least 1 ms, which can produce
      slowdowns when FQ/pacing is used, especially in DC world,
      where typical rtt is below 1 ms.
      
      We are planning a switch to usec resolution for linux-3.15,
      but in the meantime, this patch removes the 1 ms offset.
      
      All we need is to have tp->srtt minimal value of 1 to differentiate
      the case of srtt being initialized or not, not 8.
      
      The problematic behavior was observed on a 40Gbit testbed,
      where 32 concurrent netperf were reaching 12Gbps of aggregate
      speed, instead of line speed.
      
      This patch also has the effect of reporting more accurate srtt and send
      rates to iproute2 ss command as in :
      
      $ ss -i dst cca2
      Netid  State      Recv-Q Send-Q          Local Address:Port
      Peer Address:Port
      tcp    ESTAB      0      0                10.244.129.1:56984
      10.244.129.2:12865
      	 cubic wscale:6,6 rto:200 rtt:0.25/0.25 ato:40 mss:1448 cwnd:10 send
      463.4Mbps rcv_rtt:1 rcv_space:29200
      tcp    ESTAB      0      390960           10.244.129.1:60247
      10.244.129.2:50204
      	 cubic wscale:6,6 rto:200 rtt:0.875/0.75 mss:1448 cwnd:73 ssthresh:51
      send 966.4Mbps unacked:73 retrans:0/121 rcv_space:29200
      
      Reported-by: default avatarVytautas Valancius <valas@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a5ab4e2