Skip to content
  1. Jan 04, 2022
    • David S. Miller's avatar
      Merge branch 'act_tc-offload-originating-device' · dfb55f99
      David S. Miller authored
      
      
      Paul Blakey says:
      
      ====================
      net/sched: Pass originating device to drivers offloading ct connection
      
      Currently, drivers register to a ct zone that can be shared by multiple
      devices. This can be inefficient for the driver to offload, as it
      needs to handle all the cases where the tuple can come from,
      instead of where it's most likely will arive from.
      
      For example, consider the following tc rules:
      tc filter add dev dev1 ... flower action ct commit zone 5 \
         action mirred egress redirect dev dev2
      
      tc filter add dev dev2 ... flower action ct zone 5 \
         action goto chain chain 2
      tc filter add dev dev2 ... flower ct_state +trk+est ... \
         action mirred egress redirect dev dev1
      
      Both dev2 and dev1 register to the zone 5 flow table (created
      by act_ct). A tuple originating on dev1, going to dev2, will
      be offloaded to both devices, and both will need to offload
      both directions, resulting in 4 total rules. The traffic
      will only hit originiating tuple on dev1, and reply tuple
      on dev2.
      
      By passing the originating device that created the connection
      with the tuple, dev1 can choose to offload only the originating
      tuple, and dev2 only the reply tuple. Resulting in a more
      efficient offload.
      
      The first patch adds an act_ct nf conntrack extension, to
      temporarily store the originiating device from the skb before
      offloading the connection once the connection is established.
      Once sent to offload, it fills the tuple originating device.
      
      The second patch get this information from tuples
      which pass in openvswitch.
      
      The third patch is Mellanox driver ct offload implementation using
      this information to provide a hint to firmware of where this
      offloaded tuple packets will arrive from (LOCAL or UPLINK port),
      and thus increase insertion rate.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dfb55f99
    • Paul Blakey's avatar
      net/mlx5: CT: Set flow source hint from provided tuple device · c9c079b4
      Paul Blakey authored
      
      
      Get originating device from tuple offload metadata match ingress_ifindex,
      and set flow_source hint to either LOCAL for vf/sf reps, UPLINK for
      uplink/wire/tunnel devices/bond, or ANY (as before this patch)
      for all others.
      
      This allows lower layer (software steering or firmware) to insert the tuple
      rule only in one table (either rx or tx) instead of two (rx and tx).
      
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9c079b4
    • Paul Blakey's avatar
      net: openvswitch: Fill act ct extension · b702436a
      Paul Blakey authored
      
      
      To give drivers the originating device information for optimized
      connection tracking offload, fill in act ct extension with
      ifindex from skb.
      
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b702436a
    • Paul Blakey's avatar
      net/sched: act_ct: Fill offloading tuple iifidx · 9795ded7
      Paul Blakey authored
      
      
      Driver offloading ct tuples can use the information of which devices
      received the packets that created the offloaded connections, to
      more efficiently offload them only to the relevant device.
      
      Add new act_ct nf conntrack extension, which is used to store the skb
      devices before offloading the connection, and then fill in the tuple
      iifindex so drivers can get the device via metadata dissector match.
      
      Signed-off-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9795ded7
    • Jakub Kicinski's avatar
      Merge tag 'batadv-next-pullrequest-20220103' of git://git.open-mesh.org/linux-merge · 9d2c27aa
      Jakub Kicinski authored
      Simon Wunderlich says:
      
      ====================
      This cleanup patchset includes the following patches:
      
       - bump version strings, by Simon Wunderlich
       - allow netlink usage in unprivileged containers, by Linus Lüssing
       - remove unneeded variable, by Minghao Chi
      
      * tag 'batadv-next-pullrequest-20220103' of git://git.open-mesh.org/linux-merge:
        batman-adv: remove unneeded variable in batadv_nc_init
        batman-adv: allow netlink usage in unprivileged containers
        batman-adv: Start new development cycle
      ====================
      
      Link: https://lore.kernel.org/r/20220103171722.1126109-1-sw@simonwunderlich.de
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9d2c27aa
    • Florian Fainelli's avatar
      net: mdio: Demote probed message to debug print · 7590fc6f
      Florian Fainelli authored
      
      
      On systems with large numbers of MDIO bus/muxes the message indicating
      that a given MDIO bus has been successfully probed is repeated for as
      many buses we have, which can eat up substantial boot time for no
      reason, demote to a debug print.
      
      Reported-by: default avatarMaxime Bizon <mbizon@freebox.fr>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20220103194024.2620-1-f.fainelli@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7590fc6f
  2. Jan 03, 2022
  3. Jan 02, 2022
    • Christophe JAILLET's avatar
      tehuti: Use dma_set_mask_and_coherent() and simplify code · c95e0780
      Christophe JAILLET authored
      Use dma_set_mask_and_coherent() instead of unrolling it with some
      dma_set_mask()+dma_set_coherent_mask().
      
      Moreover, as stated in [1], dma_set_mask_and_coherent() with a 64-bit mask
      will never fail if dev->dma_mask is non-NULL.
      So, if it fails, the 32 bits case will also fail for the same reason.
      
      That said, 'pci_using_dac' can only be 1 after a successful
      dma_set_mask_and_coherent().
      
      Simplify code and remove some dead code accordingly.
      
      [1]: https://lkml.org/lkml/2021/6/7/398
      
      
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c95e0780
    • Christophe JAILLET's avatar
      enic: Use dma_set_mask_and_coherent() · c5180ad0
      Christophe JAILLET authored
      
      
      Use dma_set_mask_and_coherent() instead of unrolling it with some
      dma_set_mask()+dma_set_coherent_mask().
      
      This simplifies code and removes some dead code (dma_set_coherent_mask()
      can not fail after a successful dma_set_mask())
      
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5180ad0
    • Hamish MacDonald's avatar
      net: socket.c: style fix · e44ef1d4
      Hamish MacDonald authored
      
      
      Removed spaces and added a tab that was causing an error on checkpatch
      
      Signed-off-by: default avatarHamish MacDonald <elusivenode@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e44ef1d4
    • Justin Iurman's avatar
      ipv6: ioam: Support for Queue depth data field · b63c5478
      Justin Iurman authored
      v3:
       - Report 'backlog' (bytes) instead of 'qlen' (number of packets)
      
      v2:
       - Fix sparse warning (use rcu_dereference)
      
      This patch adds support for the queue depth in IOAM trace data fields.
      
      The draft [1] says the following:
      
         The "queue depth" field is a 4-octet unsigned integer field.  This
         field indicates the current length of the egress interface queue of
         the interface from where the packet is forwarded out.  The queue
         depth is expressed as the current amount of memory buffers used by
         the queue (a packet could consume one or more memory buffers,
         depending on its size).
      
      An existing function (i.e., qdisc_qstats_qlen_backlog) is used to
      retrieve the current queue length without reinventing the wheel.
      
      Note: it was tested and qlen is increasing when an artificial delay is
      added on the egress with tc.
      
        [1] https://datatracker.ietf.org/doc/html/draft-ietf-ippm-ioam-data#section-5.4.2.7
      
      
      
      Signed-off-by: default avatarJustin Iurman <justin.iurman@uliege.be>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b63c5478
    • Colin Ian King's avatar
      net/smc: remove redundant re-assignment of pointer link · 3a856c14
      Colin Ian King authored
      The pointer link is being re-assigned the same value that it was
      initialized with in the previous declaration statement. The
      re-assignment is redundant and can be removed.
      
      Fixes: 387707fd
      
       ("net/smc: convert static link ID to dynamic references")
      Signed-off-by: default avatarColin Ian King <colin.i.king@gmail.com>
      Reviewed-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a856c14
    • Tony Lu's avatar
      net/smc: Introduce TCP ULP support · d7cd421d
      Tony Lu authored
      
      
      This implements TCP ULP for SMC, helps applications to replace TCP with
      SMC protocol in place. And we use it to implement transparent
      replacement.
      
      This replaces original TCP sockets with SMC, reuse TCP as clcsock when
      calling setsockopt with TCP_ULP option, and without any overhead.
      
      To replace TCP sockets with SMC, there are two approaches:
      
      - use setsockopt() syscall with TCP_ULP option, if error, it would
        fallback to TCP.
      
      - use BPF prog with types BPF_CGROUP_INET_SOCK_CREATE or others to
        replace transparently. BPF hooks some points in create socket, bind
        and others, users can inject their BPF logics without modifying their
        applications, and choose which connections should be replaced with SMC
        by calling setsockopt() in BPF prog, based on rules, such as TCP tuples,
        PID, cgroup, etc...
      
        BPF doesn't support calling setsockopt with TCP_ULP now, I will send the
        patches after this accepted.
      
      Signed-off-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7cd421d
    • David S. Miller's avatar
      Merge branch 'smc-RDMA-net-namespace' · ab6dd952
      David S. Miller authored
      
      
      Tony Lu says:
      
      ====================
      RDMA device net namespace support for SMC
      
      This patch set introduces net namespace support for linkgroups.
      
      Path 1 is the main approach to implement net ns support.
      
      Path 2 - 4 are the additional modifications to let us know the netns.
      Also, I will submit changes of smc-tools to github later.
      
      Currently, smc doesn't support net namespace isolation. The ibdevs
      registered to smc are shared for all linkgroups and connections. When
      running applications in different net namespaces, such as container
      environment, applications should only use the ibdevs that belongs to the
      same net namespace.
      
      This adds a new field, net, in smc linkgroup struct. During first
      contact, it checks and find the linkgroup has same net namespace, if
      not, it is going to create and initialized the net field with first
      link's ibdev net namespace. When finding the rdma devices, it also checks
      the sk net device's and ibdev's net namespaces. After net namespace
      destroyed, the net device and ibdev move to root net namespace,
      linkgroups won't be matched, and wait for lgr free.
      
      If rdma net namespace exclusive mode is not enabled, it behaves as
      before.
      
      Steps to enable and test net namespaces:
      
      1. enable RDMA device net namespace exclusive support
      	rdma system set netns exclusive # default is shared
      
      2. create new net namespace, move and initialize them
      	ip netns add test1
      	rdma dev set mlx5_1 netns test1
      	ip link set dev eth2 netns test1
      	ip netns exec test1 ip link set eth2 up
      	ip netns exec test1 ip addr add ${HOST_IP}/26 dev eth2
      
      3. setup server and client, connect N <-> M
      	ip netns exec test1 smc_run sockperf server --tcp # server
      	ip netns exec test1 smc_run sockperf pp --tcp -i ${SERVER_IP} # client
      
      4. netns isolated linkgroups (2 * 2 mesh) with their own linkgroups
        - server
      LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
      00000100 SERV     SINGLE      0       0
      00000200 SERV     SINGLE      0       0
      00000300 SERV     SINGLE      0       0
      00000400 SERV     SINGLE      0       0
      
        - client
      LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
      00000100 CLNT     SINGLE      0       0
      00000200 CLNT     SINGLE      0       0
      00000300 CLNT     SINGLE      0       0
      00000400 CLNT     SINGLE      0       0
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab6dd952
    • Tony Lu's avatar
      net/smc: Add net namespace for tracepoints · a838f508
      Tony Lu authored
      
      
      This prints net namespace ID, helps us to distinguish different net
      namespaces when using tracepoints.
      
      Signed-off-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a838f508
    • Tony Lu's avatar
      net/smc: Print net namespace in log · de2fea7b
      Tony Lu authored
      
      
      This adds net namespace ID to the kernel log, net_cookie is unique in
      the whole system. It is useful in container environment.
      
      Signed-off-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de2fea7b
    • Tony Lu's avatar
      net/smc: Add netlink net namespace support · 79d39fc5
      Tony Lu authored
      
      
      This adds net namespace ID to diag of linkgroup, helps us to distinguish
      different namespaces, and net_cookie is unique in the whole system.
      
      Signed-off-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79d39fc5
    • Tony Lu's avatar
      net/smc: Introduce net namespace support for linkgroup · 0237a3a6
      Tony Lu authored
      
      
      Currently, rdma device supports exclusive net namespace isolation,
      however linkgroup doesn't know and support ibdev net namespace.
      Applications in the containers don't want to share the nics if we
      enabled rdma exclusive mode. Every net namespaces should have their own
      linkgroups.
      
      This patch introduce a new field net for linkgroup, which is standing
      for the ibdev net namespace in the linkgroup. The net in linkgroup is
      initialized with the net namespace of link's ibdev. It compares the net
      of linkgroup and sock or ibdev before choose it, if no matched, create
      new one in current net namespace. If rdma net namespace exclusive mode
      is not enabled, it behaves as before.
      
      Signed-off-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0237a3a6
  4. Dec 31, 2021
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · e63a0234
      David S. Miller authored
      
      
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf-next 2021-12-30
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      We've added 72 non-merge commits during the last 20 day(s) which contain
      a total of 223 files changed, 3510 insertions(+), 1591 deletions(-).
      
      The main changes are:
      
      1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii.
      
      2) Beautify and de-verbose verifier logs, from Christy.
      
      3) Composable verifier types, from Hao.
      
      4) bpf_strncmp helper, from Hou.
      
      5) bpf.h header dependency cleanup, from Jakub.
      
      6) get_func_[arg|ret|arg_cnt] helpers, from Jiri.
      
      7) Sleepable local storage, from KP.
      
      8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e63a0234
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2021-12-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · ce2b6eb4
      David S. Miller authored
      
      
      Saeed Mahameed says:
      
      ====================
      mlx5 Software steering, New features and optimizations
      
      This patch series brings various SW steering features, optimizations and
      debug-ability focused improvements.
      
       1) Expose debugfs for dumping the SW steering resources
       2) Removing unused fields
       3) support for matching on new fields
       4) steering optimization for RX/TX-only rules
       5) Make Software steering the default steering mechanism when
          available, applies only to Switchdev mode FDB
      
      From Yevgeny Kliteynik and Muhammad Sammar:
      
       - Patch 1 fixes an error flow in creating matchers
       - Patch 2 fix lower case macro prefix "mlx5_" to "MLX5_"
       - Patch 3 removes unused struct member in mlx5dr_matcher
       - Patch 4 renames list field in matcher struct to list_node to reflect the
         fact that is field is for list node that is stored on another struct's lists
       - Patch 5 adds checking for valid Flex parser ID value
       - Patch 6 adds the missing reserved fields to dr_match_param and aligns it to
         the format that is defined by HW spec
       - Patch 7 adds support for dumping SW steering (SMFS) resources using debugfs
         in CSV format: domain and its tables, matchers and rules
       - Patch 8 adds support for a new destination type - UPLINK
       - Patch 9 adds WARN_ON_ONCE on refcount checks in SW steering object destructors
       - Patches 10, 11, 12 add misc5 flow table match parameters and add support for
         matching on tunnel headers 0 and 1
       - Patch 13 adds support for matching on geneve_tlv_option_0_exist field
       - Patch 14 implements performance optimization for for empty or RX/TX-only
         matchers by splitting RX and TX matchers handling: matcher connection in the
         matchers chain is split into two separate lists (RX only and TX only), which
         solves a usecase of many RX or TX only rules that create a long chain of
         RX/TX-only paths w/o the actual rules
       - Patch 15 ignores modify TTL if device doesn't support it instead of
         adding and unsupported action
       - Patch 16 sets SMFS as a default steering mode
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce2b6eb4
    • David S. Miller's avatar
      Merge branch 'hnsd3-next' · 20a9013e
      David S. Miller authored
      
      
      Guangbin Huang says:
      
      ====================
      net: hns3: refactor cmdq functions in PF/VF
      
      Currently, hns3 PF and VF module have two sets of cmdq APIs to provide
      cmdq message interaction functions. Most of these APIs are the same. The
      only differences are the function variables and names with pf and vf
      suffixes. These two sets of cmdq APIs are redundent and add extra bug fix
      work.
      
      This series refactor the cmdq APIs in hns3 PF and VF by implementing one
      set of common cmdq APIs for PF and VF reuse and deleting the old APIs.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20a9013e
    • Jie Wang's avatar
      net: hns3: delete the hclge_cmd.c and hclgevf_cmd.c · aab8d1c6
      Jie Wang authored
      
      
      currently most cmdq APIs are unified in hclge_comm_cmd.c. Newly developed
      cmdq APIs should also be placed in hclge_comm_cmd.c. So there is no need to
      keep hclge_cmd.c and hclgevf_cmd.c.
      
      This patch moves the hclge(vf)_cmd_send to hclge(vf)_main.c and deletes
      the source files and makefile scripts.
      
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aab8d1c6
    • Jie Wang's avatar
      net: hns3: refactor VF cmdq init and uninit APIs with new common APIs · cb413bfa
      Jie Wang authored
      
      
      This patch uses common cmdq init and uninit APIs to replace the old APIs in
      VF cmdq module init and uninit module. Then the old VF init and uninit
      APIs is deleted.
      
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb413bfa
    • Jie Wang's avatar
      net: hns3: refactor PF cmdq init and uninit APIs with new common APIs · 8e2288ca
      Jie Wang authored
      
      
      This patch uses common cmdq init and uninit APIs to replace the old APIs in
      PF cmdq module init and uninit modules. Then the old PF init and uninit
      APIs is deleted.
      
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e2288ca
    • Jie Wang's avatar
      net: hns3: create common cmdq init and uninit APIs · 0b04224c
      Jie Wang authored
      
      
      The PF and VF cmdq init and uninit APIs are also almost same espect the
      suffixes of API names.
      
      This patch creates common cmdq init and uninit APIs needed by PF and VF
      cmdq modules. The next patch will use the new unified APIs to replace init
      and uninit APIs in PF module.
      
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b04224c
    • Jie Wang's avatar
      net: hns3: refactor VF cmdq resource APIs with new common APIs · 745f0a19
      Jie Wang authored
      
      
      This patch uses common cmdq resource allocate/free/query APIs to replace
      the old APIs in VF cmdq module and deletes the old cmdq resource APIs.
      Still we kept hclgevf_cmd_setup_basic_desc name as a seam API to avoid too
      many meaningless replacement.
      
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      745f0a19
    • Jie Wang's avatar
      net: hns3: refactor PF cmdq resource APIs with new common APIs · d3c69a88
      Jie Wang authored
      
      
      This patch uses common cmdq resource allocate/free/query APIs to replace
      the old APIs in PF cmdq module and deletes the old cmdq resource APIs.
      Still we kept hclge_cmd_setup_basic_desc name as a seam API to avoid too
      many meaningless replacement.
      
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3c69a88