Skip to content
  1. Aug 21, 2020
    • David Howells's avatar
      rxrpc: Fix loss of RTT samples due to interposed ACK · 4700c4d8
      David Howells authored
      The Rx protocol has a mechanism to help generate RTT samples that works by
      a client transmitting a REQUESTED-type ACK when it receives a DATA packet
      that has the REQUEST_ACK flag set.
      
      The peer, however, may interpose other ACKs before transmitting the
      REQUESTED-ACK, as can be seen in the following trace excerpt:
      
       rxrpc_tx_data: c=00000044 DATA d0b5ece8:00000001 00000001 q=00000001 fl=07
       rxrpc_rx_ack: c=00000044 00000001 PNG r=00000000 f=00000002 p=00000000 n=0
       rxrpc_rx_ack: c=00000044 00000002 REQ r=00000001 f=00000002 p=00000001 n=0
       ...
      
      DATA packet 1 (q=xx) has REQUEST_ACK set (bit 1 of fl=xx).  The incoming
      ping (labelled PNG) hard-acks the request DATA packet (f=xx exceeds the
      sequence number of the DATA packet), causing it to be discarded from the Tx
      ring.  The ACK that was requested (labelled REQ, r=xx references the serial
      of the DATA packet) comes after the ping, but the sk_buff holding the
      timestamp has gone and the RTT sample is lost.
      
      This is particularly noticeable on RPC calls used to probe the service
      offered by the peer.  A lot of peers end up with an unknown RTT because we
      only ever sent a single RPC.  This confuses the server rotation algorithm.
      
      Fix this by caching the information about the outgoing packet in RTT
      calculations in the rxrpc_call struct rather than looking in the Tx ring.
      
      A four-deep buffer is maintained and both REQUEST_ACK-flagged DATA and
      PING-ACK transmissions are recorded in there.  When the appropriate
      response ACK is received, the buffer is checked for a match and, if found,
      an RTT sample is recorded.
      
      If a received ACK refers to a packet with a later serial number than an
      entry in the cache, that entry is presumed lost and the entry is made
      available to record a new transmission.
      
      ACKs types other than REQUESTED-type and PING-type cause any matching
      sample to be cancelled as they don't necessarily represent a useful
      measurement.
      
      If there's no space in the buffer on ping/data transmission, the sample
      base is discarded.
      
      Fixes: 50235c4b
      
       ("rxrpc: Obtain RTT data by requesting ACKs on DATA packets")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      4700c4d8
  2. Aug 20, 2020
    • David Howells's avatar
      rxrpc: Keep the ACK serial in a var in rxrpc_input_ack() · 68528d93
      David Howells authored
      
      
      Keep the ACK serial number in a variable in rxrpc_input_ack() as it's used
      frequently.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      68528d93
    • Wang Hai's avatar
      net: gemini: Fix missing free_netdev() in error path of gemini_ethernet_port_probe() · cf96d977
      Wang Hai authored
      Replace alloc_etherdev_mq with devm_alloc_etherdev_mqs. In this way,
      when probe fails, netdev can be freed automatically.
      
      Fixes: 4d5ae32f
      
       ("net: ethernet: Add a driver for Gemini gigabit ethernet")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarWang Hai <wanghai38@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf96d977
    • Sebastian Andrzej Siewior's avatar
      net: atlantic: Use readx_poll_timeout() for large timeout · 9553b62c
      Sebastian Andrzej Siewior authored
      Commit
         8dcf2ad3 ("net: atlantic: add hwmon getter for MAC temperature")
      
      implemented a read callback with an udelay(10000U). This fails to
      compile on ARM because the delay is >1ms. I doubt that it is needed to
      spin for 10ms even if possible on x86.
      
      >From looking at the code, the context appears to be preemptible so using
      usleep() should work and avoid busy spinning.
      
      Use readx_poll_timeout() in the poll loop.
      
      Fixes: 8dcf2ad3
      
       ("net: atlantic: add hwmon getter for MAC temperature")
      Cc: Mark Starovoytov <mstarovoitov@marvell.com>
      Cc: Igor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Acked-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9553b62c
    • Min Li's avatar
      ptp: ptp_clockmatrix: use i2c_master_send for i2c write · 957ff427
      Min Li authored
      
      
      The old code for i2c write would break on some controllers, which fails
      at handling Repeated Start Condition. So we will just use i2c_master_send
      to handle write in one transanction.
      
      Changes since v1:
      - Remove indentation change
      
      Signed-off-by: default avatarMin Li <min.li.xe@renesas.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      957ff427
    • Johannes Berg's avatar
      netlink: fix state reallocation in policy export · d1fb5559
      Johannes Berg authored
      Evidently, when I did this previously, we didn't have more than
      10 policies and didn't run into the reallocation path, because
      it's missing a memset() for the unused policies. Fix that.
      
      Fixes: d07dcf9a
      
       ("netlink: add infrastructure to expose policies to userspace")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1fb5559
    • David S. Miller's avatar
      Merge branch 'Bug-fixes-for-ENA-ethernet-driver' · b4c8998b
      David S. Miller authored
      
      
      Shay Agroskin says:
      
      ====================
      Bug fixes for ENA ethernet driver
      
      This series adds the following:
      - Fix undesired call to ena_restore after returning from suspend
      - Fix condition inside a WARN_ON
      - Fix overriding previous value when updating missed_tx statistic
      
      v1->v2:
      - fix bug when calling reset routine after device resources are freed (Jakub)
      
      v2->v3:
      - fix wrong hash in 'Fixes' tag
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4c8998b
    • Shay Agroskin's avatar
      net: ena: Make missed_tx stat incremental · ccd143e5
      Shay Agroskin authored
      Most statistics in ena driver are incremented, meaning that a stat's
      value is a sum of all increases done to it since driver/queue
      initialization.
      
      This patch makes all statistics this way, effectively making missed_tx
      statistic incremental.
      Also added a comment regarding rx_drops and tx_drops to make it
      clearer how these counters are calculated.
      
      Fixes: 11095fdb
      
       ("net: ena: add statistics for missed tx packets")
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ccd143e5
    • Shay Agroskin's avatar
      net: ena: Change WARN_ON expression in ena_del_napi_in_range() · 8b147f6f
      Shay Agroskin authored
      The ena_del_napi_in_range() function unregisters the napi handler for
      rings in a given range.
      This function had the following WARN_ON macro:
      
          WARN_ON(ENA_IS_XDP_INDEX(adapter, i) &&
      	    adapter->ena_napi[i].xdp_ring);
      
      This macro prints the call stack if the expression inside of it is
      true [1], but the expression inside of it is the wanted situation.
      The expression checks whether the ring has an XDP queue and its index
      corresponds to a XDP one.
      
      This patch changes the expression to
          !ENA_IS_XDP_INDEX(adapter, i) && adapter->ena_napi[i].xdp_ring
      which indicates an unwanted situation.
      
      Also, change the structure of the function. The napi handler is
      unregistered for all rings, and so there's no need to check whether the
      index is an XDP index or not. By removing this check the code becomes
      much more readable.
      
      Fixes: 548c4940
      
       ("net: ena: Implement XDP_TX action")
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b147f6f
    • Shay Agroskin's avatar
      net: ena: Prevent reset after device destruction · 63d4a4c1
      Shay Agroskin authored
      The reset work is scheduled by the timer routine whenever it
      detects that a device reset is required (e.g. when a keep_alive signal
      is missing).
      When releasing device resources in ena_destroy_device() the driver
      cancels the scheduling of the timer routine without destroying the reset
      work explicitly.
      
      This creates the following bug:
          The driver is suspended and the ena_suspend() function is called
      	-> This function calls ena_destroy_device() to free the net device
      	   resources
      	    -> The driver waits for the timer routine to finish
      	    its execution and then cancels it, thus preventing from it
      	    to be called again.
      
          If, in its final execution, the timer routine schedules a reset,
          the reset routine might be called afterwards,and a redundant call to
          ena_restore_device() would be made.
      
      By changing the reset routine we allow it to read the device's state
      accurately.
      This is achieved by checking whether ENA_FLAG_TRIGGER_RESET flag is set
      before resetting the device and making both the destruction function and
      the flag check are under rtnl lock.
      The ENA_FLAG_TRIGGER_RESET is cleared at the end of the destruction
      routine. Also surround the flag check with 'likely' because
      we expect that the reset routine would be called only when
      ENA_FLAG_TRIGGER_RESET flag is set.
      
      The destruction of the timer and reset services in __ena_shutoff() have to
      stay, even though the timer routine is destroyed in ena_destroy_device().
      This is to avoid a case in which the reset routine is scheduled after
      free_netdev() in __ena_shutoff(), which would create an access to freed
      memory in adapter->flags.
      
      Fixes: 8c5c7abd
      
       ("net: ena: add power management ops to the ENA driver")
      Signed-off-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63d4a4c1
  3. Aug 19, 2020
  4. Aug 18, 2020
    • Linus Torvalds's avatar
      Merge tag 'pstore-v5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 06a4ec1d
      Linus Torvalds authored
      Pull mailmap update from Kees Cook:
       "This was originally part of my pstore tree, but when I realized that
        mailmap needed re-alphabetizing, I decided to wait until -rc1 to send
        this, as I saw a lot of mailmap additions pending in -next for the
        merge window.
      
        It's a programmatic reordering and the addition of a pstore
        contributor's preferred email address"
      
      * tag 'pstore-v5.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        mailmap: Add WeiXiong Liao
        mailmap: Restore dictionary sorting
      06a4ec1d
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 4cf75621
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Another batch of fixes:
      
        1) Remove nft_compat counter flush optimization, it generates warnings
           from the refcount infrastructure. From Florian Westphal.
      
        2) Fix BPF to search for build id more robustly, from Jiri Olsa.
      
        3) Handle bogus getopt lengths in ebtables, from Florian Westphal.
      
        4) Infoleak and other fixes to j1939 CAN driver, from Eric Dumazet and
           Oleksij Rempel.
      
        5) Reset iter properly on mptcp sendmsg() error, from Florian
           Westphal.
      
        6) Show a saner speed in bonding broadcast mode, from Jarod Wilson.
      
        7) Various kerneldoc fixes in bonding and elsewhere, from Lee Jones.
      
        8) Fix double unregister in bonding during namespace tear down, from
           Cong Wang.
      
        9) Disable RP filter during icmp_redirect selftest, from David Ahern"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (75 commits)
        otx2_common: Use devm_kcalloc() in otx2_config_npa()
        net: qrtr: fix usage of idr in port assignment to socket
        selftests: disable rp_filter for icmp_redirect.sh
        Revert "net: xdp: pull ethernet header off packet after computing skb->protocol"
        phylink: <linux/phylink.h>: fix function prototype kernel-doc warning
        mptcp: sendmsg: reset iter on error redux
        net: devlink: Remove overzealous WARN_ON with snapshots
        tipc: not enable tipc when ipv6 works as a module
        tipc: fix uninit skb->data in tipc_nl_compat_dumpit()
        net: Fix potential wrong skb->protocol in skb_vlan_untag()
        net: xdp: pull ethernet header off packet after computing skb->protocol
        ipvlan: fix device features
        bonding: fix a potential double-unregister
        can: j1939: add rxtimer for multipacket broadcast session
        can: j1939: abort multipacket broadcast session when timeout occurs
        can: j1939: cancel rxtimer on multipacket broadcast session complete
        can: j1939: fix support for multipacket broadcast message
        net: fddi: skfp: cfm: Remove seemingly unused variable 'ID_sccs'
        net: fddi: skfp: cfm: Remove set but unused variable 'oldstate'
        net: fddi: skfp: smt: Remove seemingly unused variable 'ID_sccs'
        ...
      4cf75621
    • vulab's avatar
      otx2_common: Use devm_kcalloc() in otx2_config_npa() · bf2bcd6f
      vulab authored
      
      
      A multiplication for the size determination of a memory allocation
      indicated that an array data structure should be processed.
      Thus use the corresponding function "devm_kcalloc".
      
      Signed-off-by: default avatarXu Wang <vulab@iscas.ac.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf2bcd6f
    • Necip Fazil Yildiran's avatar
      net: qrtr: fix usage of idr in port assignment to socket · 8dfddfb7
      Necip Fazil Yildiran authored
      Passing large uint32 sockaddr_qrtr.port numbers for port allocation
      triggers a warning within idr_alloc() since the port number is cast
      to int, and thus interpreted as a negative number. This leads to
      the rejection of such valid port numbers in qrtr_port_assign() as
      idr_alloc() fails.
      
      To avoid the problem, switch to idr_alloc_u32() instead.
      
      Fixes: bdabad3e
      
       ("net: Add Qualcomm IPC router")
      Reported-by: default avatar <syzbot+f31428628ef672716ea8@syzkaller.appspotmail.com>
      Signed-off-by: default avatarNecip Fazil Yildiran <necip@google.com>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8dfddfb7
    • David Ahern's avatar
      selftests: disable rp_filter for icmp_redirect.sh · bcf7ddb0
      David Ahern authored
      
      
      h1 is initially configured to reach h2 via r1 rather than the
      more direct path through r2. If rp_filter is set and inherited
      for r2, forwarding fails since the source address of h1 is
      reachable from eth0 vs the packet coming to it via r1 and eth1.
      Since rp_filter setting affects the test, explicitly reset it.
      
      Signed-off-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bcf7ddb0
    • Kees Cook's avatar
      mailmap: Add WeiXiong Liao · 5a4fe062
      Kees Cook authored
      
      
      WeiXiong Liao noted to me offlist that his preference for email address
      had changed and that he'd like it updated in the mailmap so people
      discussing pstore/blk would be able to reach him.
      
      Cc: WeiXiong Liao <gmpy.liaowx@gmail.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      5a4fe062
    • Kees Cook's avatar
      mailmap: Restore dictionary sorting · d6bd5201
      Kees Cook authored
      
      
      Several names had been recently appended (instead of inserted). While
      git-shortlog doesn't need this file to be sorted, it helps humans to
      keep it organized this way. Sort the entire file (which includes some
      minor shuffling for dictionary order).
      
      Done with the following commands:
      
      	grep -E '^(#|$)' .mailmap > .mailmap.head
      	grep -Ev '^(#|$)' .mailmap > .mailmap.body
       	sort -f .mailmap.body > .mailmap.body.sort
      	cat .mailmap.head .mailmap.body.sort > .mailmap
      	rm .mailmap.head .mailmap.body.sort
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      d6bd5201
    • David S. Miller's avatar
      Revert "net: xdp: pull ethernet header off packet after computing skb->protocol" · 7f9bf6e8
      David S. Miller authored
      This reverts commit f8414a8d
      
      .
      
      eth_type_trans() does the necessary pull on the skb.
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f9bf6e8
    • Randy Dunlap's avatar
      phylink: <linux/phylink.h>: fix function prototype kernel-doc warning · 0b76e642
      Randy Dunlap authored
      Fix a kernel-doc warning for the pcs_config() function prototype:
      
      ../include/linux/phylink.h:406: warning: Excess function parameter 'permit_pause_to_mac' description in 'pcs_config'
      
      Fixes: 7137e18f
      
       ("net: phylink: add struct phylink_pcs")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b76e642
    • David Howells's avatar
      watch_queue: Limit the number of watches a user can hold · 29e44f45
      David Howells authored
      Impose a limit on the number of watches that a user can hold so that
      they can't use this mechanism to fill up all the available memory.
      
      This is done by putting a counter in user_struct that's incremented when
      a watch is allocated and decreased when it is released.  If the number
      exceeds the RLIMIT_NOFILE limit, the watch is rejected with EAGAIN.
      
      This can be tested by the following means:
      
       (1) Create a watch queue and attach it to fd 5 in the program given - in
           this case, bash:
      
      	keyctl watch_session /tmp/nlog /tmp/gclog 5 bash
      
       (2) In the shell, set the maximum number of files to, say, 99:
      
      	ulimit -n 99
      
       (3) Add 200 keyrings:
      
      	for ((i=0; i<200; i++)); do keyctl newring a$i @s || break; done
      
       (4) Try to watch all of the keyrings:
      
      	for ((i=0; i<200; i++)); do echo $i; keyctl watch_add 5 %:a$i || break; done
      
           This should fail when the number of watches belonging to the user hits
           99.
      
       (5) Remove all the keyrings and all of those watches should go away:
      
      	for ((i=0; i<200; i++)); do keyctl unlink %:a$i; done
      
       (6) Kill off the watch queue by exiting the shell spawned by
           watch_session.
      
      Fixes: c73be61c
      
       ("pipe: Add general notification queue support")
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      29e44f45
  5. Aug 17, 2020