Skip to content
  1. Dec 22, 2022
  2. Dec 21, 2022
  3. Dec 20, 2022
    • Jakub Kicinski's avatar
      Merge tag 'linux-can-fixes-for-6.2-20221219' of... · 4be84df3
      Jakub Kicinski authored
      Merge tag 'linux-can-fixes-for-6.2-20221219' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2022-12-19
      
      The first patch is by Vincent Mailhol and adds the etas_es58x
      devlink documentation to the index.
      
      Haibo Chen's patch for the flexcan driver fixes a unbalanced
      pm_runtime_enable warning.
      
      The last patch is by me, targets the kvaser_usb driver and fixes
      an error occurring with gcc-13.
      
      * tag 'linux-can-fixes-for-6.2-20221219' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
        can: kvaser_usb: hydra: help gcc-13 to figure out cmd_len
        can: flexcan: avoid unbalanced pm_runtime_enable warning
        Documentation: devlink: add missing toc entry for etas_es58x devlink doc
      ====================
      
      Link: https://lore.kernel.org/r/20221219155210.1143439-1-mkl@pengutronix.de
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4be84df3
    • Jakub Kicinski's avatar
      Merge branch 'stop-corrupting-socket-s-task_frag' · 918fb1aa
      Jakub Kicinski authored
      Benjamin Coddington says:
      
      ====================
      Stop corrupting socket's task_frag
      
      The networking code uses flags in sk_allocation to determine if it can use
      current->task_frag, however in-kernel users of sockets may stop setting
      sk_allocation when they convert to the preferred memalloc_nofs_save/restore,
      as SUNRPC has done in commit a1231fda ("SUNRPC: Set memalloc_nofs_save()
      on all rpciod/xprtiod jobs").
      
      This will cause corruption in current->task_frag when recursing into the
      network layer for those subsystems during page fault or reclaim.  The
      corruption is difficult to diagnose because stack traces may not contain the
      offending subsystem at all.  The corruption is unlikely to show up in
      testing because it requires memory pressure, and so subsystems that
      convert to memalloc_nofs_save/restore are likely to continue to run into
      this issue.
      
      Previous reports and proposed fixes:
      https://lore.kernel.org/netdev/96a18bd00cbc6cb554603cc0d6ef1c551965b078.1663762494.git.gnault@redhat.com/
      https://lore.kernel.org/netdev/b4d8cb09c913d3e34f853736f3f5628abfd7f4b6.1656699567.git.gnault@redhat.com/
      https://lore.kernel.org/linux-nfs/de6d99321d1dcaa2ad456b92b3680aa77c07a747.1665401788.git.gnault@redhat.com/
      
      Guilluame Nault has done all of the hard work tracking this problem down and
      finding the best fix for this issue.  I'm just taking a turn posting another
      fix.
      ====================
      
      Link: https://lore.kernel.org/r/cover.1671194454.git.bcodding@redhat.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      918fb1aa
    • Benjamin Coddington's avatar
      net: simplify sk_page_frag · 08f65892
      Benjamin Coddington authored
      
      
      Now that in-kernel socket users that may recurse during reclaim have benn
      converted to sk_use_task_frag = false, we can have sk_page_frag() simply
      check that value.
      
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      08f65892
    • Benjamin Coddington's avatar
      Treewide: Stop corrupting socket's task_frag · 98123866
      Benjamin Coddington authored
      
      
      Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the
      GFP_NOIO flag on sk_allocation which the networking system uses to decide
      when it is safe to use current->task_frag.  The results of this are
      unexpected corruption in task_frag when SUNRPC is involved in memory
      reclaim.
      
      The corruption can be seen in crashes, but the root cause is often
      difficult to ascertain as a crashing machine's stack trace will have no
      evidence of being near NFS or SUNRPC code.  I believe this problem to
      be much more pervasive than reports to the community may indicate.
      
      Fix this by having kernel users of sockets that may corrupt task_frag due
      to reclaim set sk_use_task_frag = false.  Preemptively correcting this
      situation for users that still set sk_allocation allows them to convert to
      memalloc_nofs_save/restore without the same unexpected corruptions that are
      sure to follow, unlikely to show up in testing, and difficult to bisect.
      
      CC: Philipp Reisner <philipp.reisner@linbit.com>
      CC: Lars Ellenberg <lars.ellenberg@linbit.com>
      CC: "Christoph Böhmwalder" <christoph.boehmwalder@linbit.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Josef Bacik <josef@toxicpanda.com>
      CC: Keith Busch <kbusch@kernel.org>
      CC: Christoph Hellwig <hch@lst.de>
      CC: Sagi Grimberg <sagi@grimberg.me>
      CC: Lee Duncan <lduncan@suse.com>
      CC: Chris Leech <cleech@redhat.com>
      CC: Mike Christie <michael.christie@oracle.com>
      CC: "James E.J. Bottomley" <jejb@linux.ibm.com>
      CC: "Martin K. Petersen" <martin.petersen@oracle.com>
      CC: Valentina Manea <valentina.manea.m@gmail.com>
      CC: Shuah Khan <shuah@kernel.org>
      CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      CC: David Howells <dhowells@redhat.com>
      CC: Marc Dionne <marc.dionne@auristor.com>
      CC: Steve French <sfrench@samba.org>
      CC: Christine Caulfield <ccaulfie@redhat.com>
      CC: David Teigland <teigland@redhat.com>
      CC: Mark Fasheh <mark@fasheh.com>
      CC: Joel Becker <jlbec@evilplan.org>
      CC: Joseph Qi <joseph.qi@linux.alibaba.com>
      CC: Eric Van Hensbergen <ericvh@gmail.com>
      CC: Latchesar Ionkov <lucho@ionkov.net>
      CC: Dominique Martinet <asmadeus@codewreck.org>
      CC: Ilya Dryomov <idryomov@gmail.com>
      CC: Xiubo Li <xiubli@redhat.com>
      CC: Chuck Lever <chuck.lever@oracle.com>
      CC: Jeff Layton <jlayton@kernel.org>
      CC: Trond Myklebust <trond.myklebust@hammerspace.com>
      CC: Anna Schumaker <anna@kernel.org>
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      
      Suggested-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      98123866
    • Guillaume Nault's avatar
      net: Introduce sk_use_task_frag in struct sock. · fb87bd47
      Guillaume Nault authored
      Sockets that can be used while recursing into memory reclaim, like
      those used by network block devices and file systems, mustn't use
      current->task_frag: if the current process is already using it, then
      the inner memory reclaim call would corrupt the task_frag structure.
      
      To avoid this, sk_page_frag() uses ->sk_allocation to detect sockets
      that mustn't use current->task_frag, assuming that those used during
      memory reclaim had their allocation constraints reflected in
      ->sk_allocation.
      
      This unfortunately doesn't cover all cases: in an attempt to remove all
      usage of GFP_NOFS and GFP_NOIO, sunrpc stopped setting these flags in
      ->sk_allocation, and used memalloc_nofs critical sections instead.
      This breaks the sk_page_frag() heuristic since the allocation
      constraints are now stored in current->flags, which sk_page_frag()
      can't read without risking triggering a cache miss and slowing down
      TCP's fast path.
      
      This patch creates a new field in struct sock, named sk_use_task_frag,
      which sockets with memory reclaim constraints can set to false if they
      can't safely use current->task_frag. In such cases, sk_page_frag() now
      always returns the socket's page_frag (->sk_frag). The first user is
      sunrpc, which needs to avoid using current->task_frag but can keep
      ->sk_allocation set to GFP_KERNEL otherwise.
      
      Eventually, it might be possible to simplify sk_page_frag() by only
      testing ->sk_use_task_frag and avoid relying on the ->sk_allocation
      heuristic entirely (assuming other sockets will set ->sk_use_task_frag
      according to their constraints in the future).
      
      The new ->sk_use_task_frag field is placed in a hole in struct sock and
      belongs to a cache line shared with ->sk_shutdown. Therefore it should
      be hot and shouldn't have negative performance impacts on TCP's fast
      path (sk_shutdown is tested just before the while() loop in
      tcp_sendmsg_locked()).
      
      Link: https://lore.kernel.org/netdev/b4d8cb09c913d3e34f853736f3f5628abfd7f4b6.1656699567.git.gnault@redhat.com/
      
      
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Reviewed-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fb87bd47
    • Matt Johnston's avatar
      mctp: Remove device type check at unregister · b389a902
      Matt Johnston authored
      The unregister check could be incorrectly triggered if a netdev
      changes its type after register. That is possible for a tun device
      using TUNSETLINK ioctl, resulting in mctp unregister failing
      and the netdev unregister waiting forever.
      
      This was encountered by https://github.com/openthread/openthread/issues/8523
      
      Neither check at register or unregister is required. They were added in
      an attempt to track down mctp_ptr being set unexpectedly, which should
      not happen in normal operation.
      
      Fixes: 7b1871af
      
       ("mctp: Warn if pointer is set for a wrong dev type")
      Signed-off-by: default avatarMatt Johnston <matt@codeconstruct.com.au>
      Link: https://lore.kernel.org/r/20221215054933.2403401-1-matt@codeconstruct.com.au
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b389a902
    • Arun Ramadoss's avatar
      net: dsa: microchip: remove IRQF_TRIGGER_FALLING in request_threaded_irq · 62e027fb
      Arun Ramadoss authored
      KSZ swithes used interrupts for detecting the phy link up and down.
      During registering the interrupt handler, it used IRQF_TRIGGER_FALLING
      flag. But this flag has to be retrieved from device tree instead of hard
      coding in the driver, so removing the flag.
      
      Fixes: ff319a64
      
       ("net: dsa: microchip: move interrupt handling logic from lan937x to ksz_common")
      Reported-by: default avatarChristian Eggers <ceggers@arri.de>
      Signed-off-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Link: https://lore.kernel.org/r/20221213101440.24667-1-arun.ramadoss@microchip.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      62e027fb
  4. Dec 19, 2022
  5. Dec 17, 2022
  6. Dec 16, 2022
    • Eelco Chaudron's avatar
      openvswitch: Fix flow lookup to use unmasked key · 68bb1010
      Eelco Chaudron authored
      The commit mentioned below causes the ovs_flow_tbl_lookup() function
      to be called with the masked key. However, it's supposed to be called
      with the unmasked key. This due to the fact that the datapath supports
      installing wider flows, and OVS relies on this behavior. For example
      if ipv4(src=1.1.1.1/192.0.0.0, dst=1.1.1.2/192.0.0.0) exists, a wider
      flow (smaller mask) of ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/
      128.0.0.0) is allowed to be added.
      
      However, if we try to add a wildcard rule, the installation fails:
      
      $ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
        ipv4(src=1.1.1.1/192.0.0.0,dst=1.1.1.2/192.0.0.0,frag=no)" 2
      $ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
        ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.2/0.0.0.0,frag=no)" 2
      ovs-vswitchd: updating flow table (File exists)
      
      The reason is that the key used to determine if the flow is already
      present in the system uses the original key ANDed with the mask.
      This results in the IP address not being part of the (miniflow) key,
      i.e., being substituted with an all-zero value. When doing the actual
      lookup, this results in the key wrongfully matching the first flow,
      and therefore the flow does not get installed.
      
      This change reverses the commit below, but rather than having the key
      on the stack, it's allocated.
      
      Fixes: 190aa3e7
      
       ("openvswitch: Fix Frame-size larger than 1024 bytes warning.")
      
      Signed-off-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68bb1010
    • David S. Miller's avatar
      Merge branch 'devlink-fixes' · 3e31d209
      David S. Miller authored
      
      
      Jakub Kicinski says:
      
      ====================
      devlink: region snapshot locking fix and selftest adjustments
      
      Minor fix for region snapshot locking and adjustments to selftests.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e31d209
    • Jakub Kicinski's avatar
      selftests: devlink: add a warning for interfaces coming up · d1c4a346
      Jakub Kicinski authored
      
      
      NetworkManager (and other daemons) may bring the interface up
      and cause failures in quiescence checks. Print a helpful warning,
      and take the interface down again.
      
      I seem to forget about this every time I run these tests on a new VM.
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1c4a346
    • Jakub Kicinski's avatar
      selftests: devlink: fix the fd redirect in dummy_reporter_test · 2fc60e2f
      Jakub Kicinski authored
      $number + > bash means redirect FD $number, e.g. commonly
      used 2> redirects stderr (fd 2). The test uses 8192> to
      write the number 8192 to a file, this results in:
      
        ./devlink.sh: line 499: 8192: Bad file descriptor
      
      Oddly the test also papers over this issue by checking
      for failure (expecting an error rather than success)
      so it passes, anyway.
      
      Fixes: ff18176a
      
       ("selftests: Add a test of large binary to devlink health test")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fc60e2f