Skip to content
  1. Feb 20, 2020
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · fca07a93
      David S. Miller authored
      
      
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2020-02-19
      
      This series contains fixes to the ice driver.
      
      Brett fixes an issue where if a user sets an odd [tx|rx]-usecs value
      through ethtool, the request is denied because the hardware is set to
      have an ITR with 2us granularity.  Also fix an issue where the VF has
      not been completely removed/reset after being unbound from the host
      driver, so resolve this by waiting for the VF remove/reset process to
      happen before checking if the VF is disabled.
      
      Michal fixes an issue, where when the user changes flow control via
      ethtool, the OS is told the link is going down when that may not be the
      case.  Before the fix, the only way to get out of this state was to take
      the interface down and up again.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fca07a93
    • Willem de Bruijn's avatar
      udp: rehash on disconnect · 303d0403
      Willem de Bruijn authored
      
      
      As of the below commit, udp sockets bound to a specific address can
      coexist with one bound to the any addr for the same port.
      
      The commit also phased out the use of socket hashing based only on
      port (hslot), in favor of always hashing on {addr, port} (hslot2).
      
      The change broke the following behavior with disconnect (AF_UNSPEC):
      
          server binds to 0.0.0.0:1337
          server connects to 127.0.0.1:80
          server disconnects
          client connects to 127.0.0.1:1337
          client sends "hello"
          server reads "hello"	// times out, packet did not find sk
      
      On connect the server acquires a specific source addr suitable for
      routing to its destination. On disconnect it reverts to the any addr.
      
      The connect call triggers a rehash to a different hslot2. On
      disconnect, add the same to return to the original hslot2.
      
      Skip this step if the socket is going to be unhashed completely.
      
      Fixes: 4cdeeee9 ("net: udp: prefer listeners bound to an address")
      Reported-by: default avatarPavel Roskin <plroskin@gmail.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      303d0403
    • Rohit Maheshwari's avatar
      net/tls: Fix to avoid gettig invalid tls record · 06f5201c
      Rohit Maheshwari authored
      
      
      Current code doesn't check if tcp sequence number is starting from (/after)
      1st record's start sequnce number. It only checks if seq number is before
      1st record's end sequnce number. This problem will always be a possibility
      in re-transmit case. If a record which belongs to a requested seq number is
      already deleted, tls_get_record will start looking into list and as per the
      check it will look if seq number is before the end seq of 1st record, which
      will always be true and will return 1st record always, it should in fact
      return NULL.
      As part of the fix, start looking each record only if the sequence number
      lies in the list else return NULL.
      There is one more check added, driver look for the start marker record to
      handle tcp packets which are before the tls offload start sequence number,
      hence return 1st record if the record is tls start marker and seq number is
      before the 1st record's starting sequence number.
      
      Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
      Signed-off-by: default avatarRohit Maheshwari <rohitm@chelsio.com>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06f5201c
    • Brett Creeley's avatar
      ice: Wait for VF to be reset/ready before configuration · c54d209c
      Brett Creeley authored
      
      
      The configuration/command below is failing when the VF in the xml
      file is already bound to the host iavf driver.
      
      pci_0000_af_0_0.xml:
      
      <interface type='hostdev' managed='yes'>
      <source>
      <address type='pci' domain='0x0000' bus='0xaf' slot='0x0' function='0x0'/>
      </source>
      <mac address='00:de:ad:00:11:01'/>
      </interface>
      
      > virsh attach-device domain_name pci_0000_af_0_0.xml
      error: Failed to attach device from pci_0000_af_0_0.xml
      error: Cannot set interface MAC/vlanid to 00:de:ad:00:11:01/0 for
      	ifname ens1f1 vf 0: Device or resource busy
      
      This is failing because the VF has not been completely removed/reset
      after being unbound (via the virsh command above) from the host iavf
      driver and ice_set_vf_mac() checks if the VF is disabled before waiting
      for the reset to finish.
      
      Fix this by waiting for the VF remove/reset process to happen before
      checking if the VF is disabled. Also, since many functions for VF
      administration on the PF were more or less calling the same 3 functions
      (ice_wait_on_vf_reset(), ice_is_vf_disabled(), and ice_check_vf_init())
      move these into the helper function ice_check_vf_ready_for_cfg(). Then
      call this function in any flow that attempts to configure/query a VF
      from the PF.
      
      Lastly, increase the maximum wait time in ice_wait_on_vf_reset() to
      800ms, and modify/add the #define(s) that determine the wait time.
      This was done for robustness because in rare/stress cases VF removal can
      take a max of ~800ms and previously the wait was a max of ~300ms.
      
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c54d209c
    • Michal Swiatkowski's avatar
      ice: Don't tell the OS that link is going down · 8a55c08d
      Michal Swiatkowski authored
      
      
      Remove code that tell the OS that link is going down when user
      change flow control via ethtool. When link is up it isn't certain
      that link goes down after 0x0605 aq command. If link doesn't go
      down, OS thinks that link is down, but physical link is up. To
      reset this state user have to take interface down and up.
      
      If link goes down after 0x0605 command, FW send information
      about that and after that driver tells the OS that the link goes
      down. So this code in ethtool is unnecessary.
      
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      8a55c08d
    • Brett Creeley's avatar
      ice: Don't reject odd values of usecs set by user · 840f8ad0
      Brett Creeley authored
      
      
      Currently if a user sets an odd [tx|rx]-usecs value through ethtool,
      the request is denied because the hardware is set to have an ITR
      granularity of 2us. This caused poor customer experience. Fix this by
      aligning to a register allowed value, which results in rounding down.
      Also, print a once per ring container type message to be clear about
      our intentions.
      
      Also, change the ITR_TO_REG define to be the bitwise and of the ITR
      setting and the ICE_ITR_MASK. This makes the purpose of ITR_TO_REG more
      obvious.
      
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      840f8ad0
    • Madhuparna Bhowmik's avatar
      bridge: br_stp: Use built-in RCU list checking · 33c4acbe
      Madhuparna Bhowmik authored
      
      
      list_for_each_entry_rcu() has built-in RCU and lock checking.
      
      Pass cond argument to list_for_each_entry_rcu() to silence
      false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
      by default.
      
      Signed-off-by: default avatarMadhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33c4acbe
    • Dmitry Osipenko's avatar
      nfc: pn544: Fix occasional HW initialization failure · c3331d2f
      Dmitry Osipenko authored
      
      
      The PN544 driver checks the "enable" polarity during of driver's probe and
      it's doing that by turning ON and OFF NFC with different polarities until
      enabling succeeds. It takes some time for the hardware to power-down, and
      thus, to deassert the IRQ that is raised by turning ON the hardware.
      Since the delay after last power-down of the polarity-checking process is
      missed in the code, the interrupt may trigger immediately after installing
      the IRQ handler (right after the checking is done), which results in IRQ
      handler trying to touch the disabled HW and ends with marking NFC as
      'DEAD' during of the driver's probe:
      
        pn544_hci_i2c 1-002a: NFC: nfc_en polarity : active high
        pn544_hci_i2c 1-002a: NFC: invalid len byte
        shdlc: llc_shdlc_recv_frame: NULL Frame -> link is dead
      
      This patch fixes the occasional NFC initialization failure on Nexus 7
      device.
      
      Signed-off-by: default avatarDmitry Osipenko <digetx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3331d2f
    • Amol Grover's avatar
      net: hsr: Pass lockdep expression to RCU lists · a7a9456e
      Amol Grover authored
      
      
      node_db is traversed using list_for_each_entry_rcu
      outside an RCU read-side critical section but under the protection
      of hsr->list_lock.
      
      Hence, add corresponding lockdep expression to silence false-positive
      warnings, and harden RCU lists.
      
      Signed-off-by: default avatarAmol Grover <frextrite@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7a9456e
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2020-02-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 7822dee5
      David S. Miller authored
      
      
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5 fixes 2020-02-18
      
      This series introduces some fixes to mlx5 driver.
      
      Please pull and let me know if there is any problem.
      
      For -stable v5.3
       ('net/mlx5: Fix sleep while atomic in mlx5_eswitch_get_vepa')
      
      For -stable v5.4
       ('net/mlx5: DR, Fix matching on vport gvmi')
       ('net/mlx5e: Fix crash in recovery flow without devlink reporter')
      
      For -stable v5.5
       ('net/mlx5e: Reset RQ doorbell counter before moving RQ state from RST to RDY')
       ('net/mlx5e: Don't clear the whole vf config when switching modes')
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7822dee5
  2. Feb 19, 2020
  3. Feb 18, 2020
  4. Feb 17, 2020
    • Florian Westphal's avatar
      netfilter: conntrack: allow insertion of clashing entries · 6a757c07
      Florian Westphal authored
      
      
      This patch further relaxes the need to drop an skb due to a clash with
      an existing conntrack entry.
      
      Current clash resolution handles the case where the clash occurs between
      two identical entries (distinct nf_conn objects with same tuples), i.e.:
      
                          Original                        Reply
      existing: 10.2.3.4:42 -> 10.8.8.8:53      10.2.3.4:42 <- 10.0.0.6:5353
      clashing: 10.2.3.4:42 -> 10.8.8.8:53      10.2.3.4:42 <- 10.0.0.6:5353
      
      ... existing handling will discard the unconfirmed clashing entry and
      makes skb->_nfct point to the existing one.  The skb can then be
      processed normally just as if the clash would not have existed in the
      first place.
      
      For other clashes, the skb needs to be dropped.
      This frequently happens with DNS resolvers that send A and AAAA queries
      back-to-back when NAT rules are present that cause packets to get
      different DNAT transformations applied, for example:
      
      -m statistics --mode random ... -j DNAT --dnat-to 10.0.0.6:5353
      -m statistics --mode random ... -j DNAT --dnat-to 10.0.0.7:5353
      
      In this case the A or AAAA query is dropped which incurs a costly
      delay during name resolution.
      
      This patch also allows this collision type:
                             Original                   Reply
      existing: 10.2.3.4:42 -> 10.8.8.8:53      10.2.3.4:42 <- 10.0.0.6:5353
      clashing: 10.2.3.4:42 -> 10.8.8.8:53      10.2.3.4:42 <- 10.0.0.7:5353
      
      In this case, clash is in original direction -- the reply direction
      is still unique.
      
      The change makes it so that when the 2nd colliding packet is received,
      the clashing conntrack is tagged with new IPS_NAT_CLASH_BIT, gets a fixed
      1 second timeout and is inserted in the reply direction only.
      
      The entry is hidden from 'conntrack -L', it will time out quickly
      and it can be early dropped because it will never progress to the
      ASSURED state.
      
      To avoid special-casing the delete code path to special case
      the ORIGINAL hlist_nulls node, a new helper, "hlist_nulls_add_fake", is
      added so hlist_nulls_del() will work.
      
      Example:
      
            CPU A:                               CPU B:
      1.  10.2.3.4:42 -> 10.8.8.8:53 (A)
      2.                                         10.2.3.4:42 -> 10.8.8.8:53 (AAAA)
      3.  Apply DNAT, reply changed to 10.0.0.6
      4.                                         10.2.3.4:42 -> 10.8.8.8:53 (AAAA)
      5.                                         Apply DNAT, reply changed to 10.0.0.7
      6. confirm/commit to conntrack table, no collisions
      7.                                         commit clashing entry
      
      Reply comes in:
      
      10.2.3.4:42 <- 10.0.0.6:5353 (A)
       -> Finds a conntrack, DNAT is reversed & packet forwarded to 10.2.3.4:42
      10.2.3.4:42 <- 10.0.0.7:5353 (AAAA)
       -> Finds a conntrack, DNAT is reversed & packet forwarded to 10.2.3.4:42
          The conntrack entry is deleted from table, as it has the NAT_CLASH
          bit set.
      
      In case of a retransmit from ORIGINAL dir, all further packets will get
      the DNAT transformation to 10.0.0.6.
      
      I tried to come up with other solutions but they all have worse
      problems.
      
      Alternatives considered were:
      1.  Confirm ct entries at allocation time, not in postrouting.
       a. will cause uneccesarry work when the skb that creates the
          conntrack is dropped by ruleset.
       b. in case nat is applied, ct entry would need to be moved in
          the table, which requires another spinlock pair to be taken.
       c. breaks the 'unconfirmed entry is private to cpu' assumption:
          we would need to guard all nfct->ext allocation requests with
          ct->lock spinlock.
      
      2. Make the unconfirmed list a hash table instead of a pcpu list.
         Shares drawback c) of the first alternative.
      
      3. Document this is expected and force users to rearrange their
         ruleset (e.g. by using "-m cluster" instead of "-m statistics").
         nft has the 'jhash' expression which can be used instead of 'numgen'.
      
         Major drawback: doesn't fix what I consider a bug, not very realistic
         and I believe its reasonable to have the existing rulesets to 'just
         work'.
      
      4. Document this is expected and force users to steer problematic
         packets to the same CPU -- this would serialize the "allocate new
         conntrack entry/nat table evaluation/perform nat/confirm entry", so
         no race can occur.  Similar drawback to 3.
      
      Another advantage of this patch compared to 1) and 2) is that there are
      no changes to the hot path; things are handled in the udp tracker and
      the clash resolution path.
      
      Cc: rcu@vger.kernel.org
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6a757c07
    • Paul Cercueil's avatar
      net: ethernet: dm9000: Handle -EPROBE_DEFER in dm9000_parse_dt() · 9a6a0dea
      Paul Cercueil authored
      
      
      The call to of_get_mac_address() can return -EPROBE_DEFER, for instance
      when the MAC address is read from a NVMEM driver that did not probe yet.
      
      Cc: H. Nikolaus Schaller <hns@goldelico.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Signed-off-by: default avatarPaul Cercueil <paul@crapouillou.net>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a6a0dea
    • Randy Dunlap's avatar
      skbuff.h: fix all kernel-doc warnings · d2f273f0
      Randy Dunlap authored
      
      
      Fix all kernel-doc warnings in <linux/skbuff.h>.
      Fixes these warnings:
      
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'list' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'dev_scratch' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'ip_defrag_offset' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'skb_mstamp_ns' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member '__cloned_offset' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'head_frag' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member '__pkt_type_offset' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'encapsulation' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'encap_hdr_csum' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'csum_valid' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member '__pkt_vlan_present_offset' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'vlan_present' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'csum_complete_sw' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'csum_level' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'inner_protocol_type' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'remcsum_offload' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'sender_cpu' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'reserved_tailroom' not described in 'sk_buff'
      ../include/linux/skbuff.h:890: warning: Function parameter or member 'inner_ipproto' not described in 'sk_buff'
      
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2f273f0
    • Randy Dunlap's avatar
      skbuff: remove stale bit mask comments · 8955b435
      Randy Dunlap authored
      
      
      Remove stale comments since this flag is no longer a bit mask
      but is a bit field.
      
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8955b435
    • Randy Dunlap's avatar
      net/sock.h: fix all kernel-doc warnings · 66256e0b
      Randy Dunlap authored
      
      
      Fix all kernel-doc warnings for <net/sock.h>.
      Fixes these warnings:
      
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_addrpair' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_portpair' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_ipv6only' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_net_refcnt' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_v6_daddr' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_v6_rcv_saddr' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_cookie' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_listener' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_tw_dr' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_rcv_wnd' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_tw_rcv_nxt' not described in 'sock_common'
      
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_rx_skb_cache' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_wq_raw' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'tcp_rtx_queue' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_tx_skb_cache' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_route_forced_caps' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_txtime_report_errors' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_validate_xmit_skb' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_bpf_storage' not described in 'sock'
      
      ../include/net/sock.h:2024: warning: No description found for return value of 'sk_wmem_alloc_get'
      ../include/net/sock.h:2035: warning: No description found for return value of 'sk_rmem_alloc_get'
      ../include/net/sock.h:2046: warning: No description found for return value of 'sk_has_allocations'
      ../include/net/sock.h:2082: warning: No description found for return value of 'skwq_has_sleeper'
      ../include/net/sock.h:2244: warning: No description found for return value of 'sk_page_frag'
      ../include/net/sock.h:2444: warning: Function parameter or member 'tcp_rx_skb_cache_key' not described in 'DECLARE_STATIC_KEY_FALSE'
      ../include/net/sock.h:2444: warning: Excess function parameter 'sk' description in 'DECLARE_STATIC_KEY_FALSE'
      ../include/net/sock.h:2444: warning: Excess function parameter 'skb' description in 'DECLARE_STATIC_KEY_FALSE'
      
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66256e0b