Skip to content
  1. Feb 08, 2022
    • Guillaume Nault's avatar
      ipv4: Reject routes specifying ECN bits in rtm_tos · f55fbb6a
      Guillaume Nault authored
      
      
      Use the new dscp_t type to replace the fc_tos field of fib_config, to
      ensure IPv4 routes aren't influenced by ECN bits when configured with
      non-zero rtm_tos.
      
      Before this patch, IPv4 routes specifying an rtm_tos with some of the
      ECN bits set were accepted. However they wouldn't work (never match) as
      IPv4 normally clears the ECN bits with IPTOS_RT_MASK before doing a FIB
      lookup (although a few buggy code paths don't).
      
      After this patch, IPv4 routes specifying an rtm_tos with any ECN bit
      set is rejected.
      
      Note: IPv6 routes ignore rtm_tos altogether, any rtm_tos is accepted,
      but treated as if it were 0.
      
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Acked-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f55fbb6a
    • Guillaume Nault's avatar
      ipv4: Stop taking ECN bits into account in fib4-rules · 563f8e97
      Guillaume Nault authored
      
      
      Use the new dscp_t type to replace the tos field of struct fib4_rule,
      so that fib4-rules consistently ignore ECN bits.
      
      Before this patch, fib4-rules did accept rules with the high order ECN
      bit set (but not the low order one). Also, it relied on its callers
      masking the ECN bits of ->flowi4_tos to prevent those from influencing
      the result. This was brittle and a few call paths still do the lookup
      without masking the ECN bits first.
      
      After this patch fib4-rules only compare the DSCP bits. ECN can't
      influence the result anymore, even if the caller didn't mask these
      bits. Also, fib4-rules now must have both ECN bits cleared or they will
      be rejected.
      
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Acked-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      563f8e97
    • Guillaume Nault's avatar
      ipv6: Define dscp_t and stop taking ECN bits into account in fib6-rules · a410a0cf
      Guillaume Nault authored
      
      
      Define a dscp_t type and its appropriate helpers that ensure ECN bits
      are not taken into account when handling DSCP.
      
      Use this new type to replace the tclass field of struct fib6_rule, so
      that fib6-rules don't get influenced by ECN bits anymore.
      
      Before this patch, fib6-rules didn't make any distinction between the
      DSCP and ECN bits. Therefore, rules specifying a DSCP (tos or dsfield
      options in iproute2) stopped working as soon a packets had at least one
      of its ECN bits set (as a work around one could create four rules for
      each DSCP value to match, one for each possible ECN value).
      
      After this patch fib6-rules only compare the DSCP bits. ECN doesn't
      influence the result anymore. Also, fib6-rules now must have the ECN
      bits cleared or they will be rejected.
      
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Acked-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a410a0cf
    • Yannick Vignon's avatar
      net: stmmac: optimize locking around PTP clock reads · 642436a1
      Yannick Vignon authored
      
      
      Reading the PTP clock is a simple operation requiring only 3 register
      reads. Under a PREEMPT_RT kernel, protecting those reads by a spin_lock is
      counter-productive: if the 2nd task preempting the 1st has a higher prio
      but needs to read time as well, it will require 2 context switches, which
      will pretty much always be more costly than just disabling preemption for
      the duration of the reads. Moreover, with the code logic recently added
      to get_systime(), disabling preemption is not even required anymore:
      reads and writes just need to be protected from each other, to prevent a
      clock read while the clock is being updated.
      
      Improve the above situation by replacing the PTP spinlock by a rwlock, and
      using read_lock for PTP clock reads so simultaneous reads do not block
      each other.
      
      Signed-off-by: default avatarYannick Vignon <yannick.vignon@nxp.com>
      Link: https://lore.kernel.org/r/20220204135545.2770625-1-yannick.vignon@oss.nxp.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      642436a1
    • Eric Dumazet's avatar
      net: typhoon: include <net/vxlan.h> · d1d5bd64
      Eric Dumazet authored
      We need this to get vxlan_features_check() definition.
      
      Fixes: d2692eee
      
       ("net: typhoon: implement ndo_features_check method")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20220208003502.1799728-1-eric.dumazet@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d1d5bd64
  2. Feb 07, 2022
  3. Feb 06, 2022
    • Eric Dumazet's avatar
      ref_tracker: remove filter_irq_stacks() call · c2d1e3df
      Eric Dumazet authored
      After commit e9400660
      
       ("lib/stackdepot: always do filter_irq_stacks()
      in stack_depot_save()") it became unnecessary to filter the stack
      before calling stack_depot_save().
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2d1e3df
    • Eric Dumazet's avatar
      net: initialize init_net earlier · 9c1be193
      Eric Dumazet authored
      While testing a patch that will follow later
      ("net: add netns refcount tracker to struct nsproxy")
      I found that devtmpfs_init() was called before init_net
      was initialized.
      
      This is a bug, because devtmpfs_setup() calls
      ksys_unshare(CLONE_NEWNS);
      
      This has the effect of increasing init_net refcount,
      which will be later overwritten to 1, as part of setup_net(&init_net)
      
      We had too many prior patches [1] trying to work around the root cause.
      
      Really, make sure init_net is in BSS section, and that net_ns_init()
      is called earlier at boot time.
      
      Note that another patch ("vfs: add netns refcount tracker
      to struct fs_context") also will need net_ns_init() being called
      before vfs_caches_init()
      
      As a bonus, this patch saves around 4KB in .data section.
      
      [1]
      
      f8c46cb3 ("netns: do not call pernet ops for not yet set up init_net namespace")
      b5082df8 ("net: Initialise init_net.count to 1")
      734b6541
      
       ("net: Statically initialize init_net.dev_base_head")
      
      v2: fixed a build error reported by kernel build bots (CONFIG_NET=n)
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c1be193
    • Juhee Kang's avatar
      net: hsr: use hlist_head instead of list_head for mac addresses · 4acc45db
      Juhee Kang authored
      
      
      Currently, HSR manages mac addresses of known HSR nodes by using list_head.
      It takes a lot of time when there are a lot of registered nodes due to
      finding specific mac address nodes by using linear search. We can be
      reducing the time by using hlist. Thus, this patch moves list_head to
      hlist_head for mac addresses and this allows for further improvement of
      network performance.
      
          Condition: registered 10,000 known HSR nodes
          Before:
          # iperf3 -c 192.168.10.1 -i 1 -t 10
          Connecting to host 192.168.10.1, port 5201
          [  5] local 192.168.10.2 port 59442 connected to 192.168.10.1 port 5201
          [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
          [  5]   0.00-1.49   sec  3.75 MBytes  21.1 Mbits/sec    0    158 KBytes
          [  5]   1.49-2.05   sec  1.25 MBytes  18.7 Mbits/sec    0    166 KBytes
          [  5]   2.05-3.06   sec  2.44 MBytes  20.3 Mbits/sec   56   16.9 KBytes
          [  5]   3.06-4.08   sec  1.43 MBytes  11.7 Mbits/sec   11   38.0 KBytes
          [  5]   4.08-5.00   sec   951 KBytes  8.49 Mbits/sec    0   56.3 KBytes
      
          After:
          # iperf3 -c 192.168.10.1 -i 1 -t 10
          Connecting to host 192.168.10.1, port 5201
          [  5] local 192.168.10.2 port 36460 connected to 192.168.10.1 port 5201
          [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
          [  5]   0.00-1.00   sec  7.39 MBytes  62.0 Mbits/sec    3    130 KBytes
          [  5]   1.00-2.00   sec  5.06 MBytes  42.4 Mbits/sec   16    113 KBytes
          [  5]   2.00-3.00   sec  8.58 MBytes  72.0 Mbits/sec   42   94.3 KBytes
          [  5]   3.00-4.00   sec  7.44 MBytes  62.4 Mbits/sec    2    131 KBytes
          [  5]   4.00-5.07   sec  8.13 MBytes  63.5 Mbits/sec   38   92.9 KBytes
      
      Signed-off-by: default avatarJuhee Kang <claudiajkang@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4acc45db
  4. Feb 05, 2022