Commits · 28044fc1d4953b07acec0da4d2fc4784c57ea6fb · Mirrors / git.yoctoproject.org / linux-yocto

Aug 25, 2022

net: Add a bhash2 table hashed by port and address · 28044fc1

Joanne Koong authored Aug 22, 2022



The current bind hashtable (bhash) is hashed by port only.
In the socket bind path, we have to check for bind conflicts by
traversing the specified port's inet_bind_bucket while holding the
hashbucket's spinlock (see inet_csk_get_port() and
inet_csk_bind_conflict()). In instances where there are tons of
sockets hashed to the same port at different addresses, the bind
conflict check is time-intensive and can cause softirq cpu lockups,
as well as stops new tcp connections since __inet_inherit_port()
also contests for the spinlock.

This patch adds a second bind table, bhash2, that hashes by
port and sk->sk_rcv_saddr (ipv4) and sk->sk_v6_rcv_saddr (ipv6).
Searching the bhash2 table leads to significantly faster conflict
resolution and less time holding the hashbucket spinlock.

Please note a few things:
* There can be the case where the a socket's address changes after it
has been bound. There are two cases where this happens:

  1) The case where there is a bind() call on INADDR_ANY (ipv4) or
  IPV6_ADDR_ANY (ipv6) and then a connect() call. The kernel will
  assign the socket an address when it handles the connect()

  2) In inet_sk_reselect_saddr(), which is called when rebuilding the
  sk header and a few pre-conditions are met (eg rerouting fails).

In these two cases, we need to update the bhash2 table by removing the
entry for the old address, and add a new entry reflecting the updated
address.

* The bhash2 table must have its own lock, even though concurrent
accesses on the same port are protected by the bhash lock. Bhash2 must
have its own lock to protect against cases where sockets on different
ports hash to different bhash hashbuckets but to the same bhash2
hashbucket.

This brings up a few stipulations:
  1) When acquiring both the bhash and the bhash2 lock, the bhash2 lock
  will always be acquired after the bhash lock and released before the
  bhash lock is released.

  2) There are no nested bhash2 hashbucket locks. A bhash2 lock is always
  acquired+released before another bhash2 lock is acquired+released.

* The bhash table cannot be superseded by the bhash2 table because for
bind requests on INADDR_ANY (ipv4) or IPV6_ADDR_ANY (ipv6), every socket
bound to that port must be checked for a potential conflict. The bhash
table is the only source of port->socket associations.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

28044fc1

netlink: fix some kernel-doc comments · 0bf73255

Zhengchao Shao authored Aug 24, 2022

Modify the comment of input parameter of nlmsg_ and nla_ function.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20220824013621.365103-1-shaozhengchao@huawei.com

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

0bf73255

net: ethernet: ti: davinci_mdio: fix build for mdio bitbang uses · 35bbe652

Randy Dunlap authored Aug 23, 2022

davinci_mdio.c uses mdio bitbang APIs, so it should select
MDIO_BITBANG to prevent build errors.

arm-linux-gnueabi-ld: drivers/net/ethernet/ti/davinci_mdio.o: in function `davinci_mdio_remove':
drivers/net/ethernet/ti/davinci_mdio.c:649: undefined reference to `free_mdio_bitbang'
arm-linux-gnueabi-ld: drivers/net/ethernet/ti/davinci_mdio.o: in function `davinci_mdio_probe':
drivers/net/ethernet/ti/davinci_mdio.c:545: undefined reference to `alloc_mdio_bitbang'
arm-linux-gnueabi-ld: drivers/net/ethernet/ti/davinci_mdio.o: in function `davinci_mdiobb_read':
drivers/net/ethernet/ti/davinci_mdio.c:236: undefined reference to `mdiobb_read'
arm-linux-gnueabi-ld: drivers/net/ethernet/ti/davinci_mdio.o: in function `davinci_mdiobb_write':
drivers/net/ethernet/ti/davinci_mdio.c:253: undefined reference to `mdiobb_write'

Fixes: d04807b8

("net: ethernet: ti: davinci_mdio: Add workaround for errata i2329")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Ravi Gunasekaran <r-gunasekaran@ti.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Sudip Mukherjee (Codethink) <sudipm.mukherjee@gmail.com>
Link: https://lore.kernel.org/r/20220824024216.4939-1-rdunlap@infradead.org

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

35bbe652

Documentation: sysctl: align cells in second content column · 1faa3467

Bagas Sanjaya authored Aug 24, 2022

Stephen Rothwell reported htmldocs warning when merging net-next tree:

Documentation/admin-guide/sysctl/net.rst:37: WARNING: Malformed table.
Text in column margin in table line 4.

========= =================== = ========== ==================
Directory Content               Directory  Content
========= =================== = ========== ==================
802       E802 protocol         mptcp     Multipath TCP
appletalk Appletalk protocol    netfilter Network Filter
ax25      AX25                  netrom     NET/ROM
bridge    Bridging              rose      X.25 PLP layer
core      General parameter     tipc      TIPC
ethernet  Ethernet protocol     unix      Unix domain sockets
ipv4      IP version 4          x25       X.25 protocol
ipv6      IP version 6
========= =================== = ========== ==================

The warning above is caused by cells in second "Content" column of
/proc/sys/net subdirectory table which are in column margin.

Align these cells against the column header to fix the warning.

Link: https://lore.kernel.org/linux-next/20220823134905.57ed08d5@canb.auug.org.au/
Fixes: 1202cdd6

 ("Remove DECnet support from kernel")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20220824035804.204322-1-bagasdotme@gmail.com


Signed-off-by: Jakub Kicinski <kuba@kernel.org>

1faa3467

Aug 24, 2022

Merge branch 'r8169-next' · 8357d67f

David S. Miller authored Aug 24, 2022



Heiner Kallweit says:

====================
r8169: remove support for few unused chip versions

There's a number of chip versions that apparently never made it to the
mass market. Detection of these chip versions has been disabled for
few kernel versions now and nobody complained. Therefore remove
support for these chip versions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

8357d67f

r8169: remove support for chip version 60 · efc37109

Heiner Kallweit authored Aug 23, 2022



Detection of this chip version has been disabled for few kernel versions now.
Nobody complained, so remove support for this chip version.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

efc37109

r8169: remove support for chip version 50 · 133706a9

Heiner Kallweit authored Aug 23, 2022



Detection of this chip version has been disabled for few kernel versions now.
Nobody complained, so remove support for this chip version.

v3:
- rebase patch

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

133706a9

r8169: remove support for chip version 49 · 8a1ab0c4

Heiner Kallweit authored Aug 23, 2022



Detection of this chip version has been disabled for few kernel versions now.
Nobody complained, so remove support for this chip version.

v2:
- fix a typo: RTL_GIGA_MAC_VER_40 -> RTL_GIGA_MAC_VER_50

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8a1ab0c4

r8169: remove support for chip versions 45 and 47 · ebe59898

Heiner Kallweit authored Aug 23, 2022



Detection of these chip versions has been disabled for few kernel versions now.
Nobody complained, so remove support for this chip version.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ebe59898

r8169: remove support for chip version 41 · 44307b27

Heiner Kallweit authored Aug 23, 2022



Detection of this chip version has been disabled for few kernel versions now.
Nobody complained, so remove support for this chip version.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

44307b27

Merge tag 'mlx5-updates-2022-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 1cd5ea44

David S. Miller authored Aug 24, 2022

mlx5-updates-2022-08-22

Roi Dayan Says:
===============
Add support for SF tunnel offload

Mlx5 driver only supports VF tunnel offload.
To add support for SF tunnel offload the driver needs to:
1. Add send-to-vport metadata matching rules like done for VFs.
2. Set an indirect table for SF vport, same as VF vport.

info smaller sub functions for better maintainability.

rules from esw init phase to representor load phase.
SFs could be created after esw initialized and thus the send-to-vport
meta rules would not be created for those SFs.
By moving the creation of the rules to representor load phase
we ensure creating the rules also for SFs created later.

===============

Lama Kayal Says:
================
Make flow steering API loosely coupled from mlx5e_priv, in a manner to
introduce more readable and maintainable modules.

Make TC's private, let mlx5e_flow_steering struct be dynamically allocated,
and introduce its API to maintain the code via setters and getters
instead of publicly exposing it.

Introduce flow steering debug macros to provide an elegant finish to the
decoupled flow steering API, where errors related to flow steering shall
be reported via them.

All flow steering related files will drop any coupling to mlx5e_priv,
instead they will get the relevant members as input. Among these,
fs_tt_redirect, fs_tc, and arfs.
================

1cd5ea44

micrel: ksz8851: fixes struct pointer issue · fef5de75

Jerry Ray authored Aug 22, 2022



Issue found during code review. This bug has no impact as long as the
ks8851_net structure is the first element of the ks8851_net_spi structure.
As long as the offset to the ks8851_net struct is zero, the container_of()
macro is subtracting 0 and therefore no damage done. But if the
ks8851_net_spi struct is ever modified such that the ks8851_net struct
within it is no longer the first element of the struct, then the bug would
manifest itself and cause problems.

struct ks8851_net is contained within ks8851_net_spi.
ks is contained within kss.
kss is the priv_data of the netdev structure.

Signed-off-by: Jerry Ray <jerry.ray@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fef5de75

tcp: annotate data-race around tcp_md5sig_pool_populated · aacd467c

Eric Dumazet authored Aug 22, 2022



tcp_md5sig_pool_populated can be read while another thread
changes its value.

The race has no consequence because allocations
are protected with tcp_md5sig_mutex.

This patch adds READ_ONCE() and WRITE_ONCE() to document
the race and silence KCSAN.

Reported-by: Abhishek Shah <abhishek.shah@columbia.edu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aacd467c

net: marvell: prestera: implement br_port_locked flag offloading · 73ef239c

Oleksandr Mazur authored Aug 22, 2022



Both <port> br_port_locked and <lag> interfaces's flag
offloading is supported. No new ABI is being added,
rather existing (port_param_set) API call gets extended.

Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>

V2:
  add missing receipents (linux-kernel, netdev)
Signed-off-by: David S. Miller <davem@davemloft.net>

73ef239c

Merge branch 'j7200-support' · 0d0f034d

David S. Miller authored Aug 24, 2022

Siddharth Vadapalli says:

====================
J7200: CPSW5G: Add support for QSGMII mode to am65-cpsw driver

Add support for QSGMII mode to am65-cpsw driver.

Change log:

v4-> v5:
1. Move ti,j7200-cpswxg-nuss compatible to the line above the
   ti,j721e-cpsw-nuss compatible.
2. Add allOf and move if-then statements within it to allow future if-then
   statements to be added easily.

v3 -> v4:
1. Update bindings to disallow ports based on compatible, instead of
   adding a new if/then statement for the new compatible.
2. Add Else-If condition for RMII mode in the set of supported interfaces.
   Support for RMII mode is already present in the driver and I had
   missed out adding a condition for RMII mode in the previous patches.

v2 -> v3:
1. In ti,k3-am654-cpsw-nuss.yaml, restrict if/then statement to port
   nodes.

v1 -> v2:
1. Add new compatible for CPSW5G in ti,k3-am654-cpsw-nuss.yaml and extend
   properties for new compatible.
2. Add extra_modes member to struct am65_cpsw_pdata to be used for QSGMII
   mode by new compatible.
3. Add check for phylink supported modes to ensure that only one phy mode
   is advertised as supported.
4. Check if extra_modes supports QSGMII mode in am65_cpsw_nuss_mac_config()
   for register write.
5. Add check for assigning port->sgmii_base only when extra_modes is valid.

v4: https://lore.kernel.org/r/20220816060139.111934-1-s-vadapalli@ti.com/
v3: https://lore.kernel.org/r/20220606110443.30362-1-s-vadapalli@ti.com/
v2: https://lore.kernel.org/r/20220602114558.6204-1-s-vadapalli@ti.com/
v1: https://lore.kernel.org/r/20220531113058.23708-1-s-vadapalli@ti.com/


====================

Signed-off-by: David S. Miller <davem@davemloft.net>

0d0f034d

net: ethernet: ti: am65-cpsw: Move phy_set_mode_ext() to correct location · 763015a7

Siddharth Vadapalli authored Aug 22, 2022

In TI's J7200 SoC CPSW5G ports, each of the 4 ports can be configured
as a QSGMII main or QSGMII-SUB port. This configuration is performed
by phy-gmii-sel driver on invoking the phy_set_mode_ext() function.

It is necessary for the QSGMII main port to be configured before any of
the QSGMII-SUB interfaces are brought up. Currently, the QSGMII-SUB
interfaces come up before the QSGMII main port is configured.

Fix this by moving the call to phy_set_mode_ext() from
am65_cpsw_nuss_ndo_slave_open() to am65_cpsw_nuss_init_slave_ports(),
thereby ensuring that the QSGMII main port is configured before any of
the QSGMII-SUB ports are brought up.

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

763015a7

net: ethernet: ti: am65-cpsw: Add support for J7200 CPSW5G · 37184fc1

Siddharth Vadapalli authored Aug 22, 2022



CPSW5G in J7200 supports additional modes like QSGMII and SGMII.
Add new compatible for J7200 and enable QSGMII mode in am65-cpsw driver.

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

37184fc1

dt-bindings: net: ti: k3-am654-cpsw-nuss: Update bindings for J7200 CPSW5G · d9849516

Siddharth Vadapalli authored Aug 22, 2022



Update bindings for TI K3 J7200 SoC which contains 5 ports (4 external
ports) CPSW5G module and add compatible for it.

Changes made:
    - Add new compatible ti,j7200-cpswxg-nuss for CPSW5G.
    - Extend pattern properties for new compatible.
    - Change maximum number of CPSW ports to 4 for new compatible.

Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

d9849516

net: skb: prevent the split of kfree_skb_reason() by gcc · c205cc75

Menglong Dong authored Aug 21, 2022



Sometimes, gcc will optimize the function by spliting it to two or
more functions. In this case, kfree_skb_reason() is splited to
kfree_skb_reason and kfree_skb_reason.part.0. However, the
function/tracepoint trace_kfree_skb() in it needs the return address
of kfree_skb_reason().

This split makes the call chains becomes:
  kfree_skb_reason() -> kfree_skb_reason.part.0 -> trace_kfree_skb()

which makes the return address that passed to trace_kfree_skb() be
kfree_skb().

Therefore, introduce '__fix_address', which is the combination of
'__noclone' and 'noinline', and apply it to kfree_skb_reason() to
prevent to from being splited or made inline.

(Is it better to simply apply '__noclone oninline' to kfree_skb_reason?
I'm thinking maybe other functions have the same problems)

Meanwhile, wrap 'skb_unref()' with 'unlikely()', as the compiler thinks
it is likely return true and splits kfree_skb_reason().

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c205cc75

Merge branch 'add-interface-mode-select-and-rmii' · fa2bc962

Jakub Kicinski authored Aug 23, 2022

Wei Fang says:

====================
add interface mode select and RMII

From: Wei Fang <wei.fang@nxp.com>

The patches add the below feature support for both TJA1100 and
TJA1101 PHYs cards:
- Add MII and RMII mode support.
- Add REF_CLK input/output support for RMII mode.
====================

Link: https://lore.kernel.org/r/20220822015949.1569969-1-wei.fang@nxp.com


Signed-off-by: Jakub Kicinski <kuba@kernel.org>

fa2bc962

net: phy: tja11xx: add interface mode and RMII REF_CLK support · 60ddc78d

Wei Fang authored Aug 22, 2022



Add below features support for both TJA1100 and TJA1101 cards:
- Add MII and RMII mode support.
- Add REF_CLK input/output support for RMII mode.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

60ddc78d

dt-bindings: net: tja11xx: add nxp,refclk_in property · 52b2fe45

Wei Fang authored Aug 22, 2022



TJA110x REF_CLK can be configured as interface reference clock
intput or output when the RMII mode enabled. This patch add the
property to make the REF_CLK can be configurable.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

52b2fe45

Merge branch 'mlxsw-introduce-modular-system-support-by-minimal-driver' · 3de1484b

Jakub Kicinski authored Aug 23, 2022

Petr Machata says:

====================
mlxsw: Introduce modular system support by minimal driver

Vadim Pasternak writes:

This patchset adds line cards support in mlxsw_minimal, which is used
for monitoring purposes on BMC systems. The BMC is connected to the
ASIC over I2C bus, unlike the host CPU that is connected to the ASIC
via PCI bus.

The BMC system needs to be notified whenever line cards become active
or inactive, so that, for example, netdevs will be registered /
unregistered by mlxsw_minimal. However, traps cannot be generated
towards the BMC over the I2C bus. To overcome that, the I2C bus driver
(i.e., mlxsw_i2c) registers an handler for an IRQ that is fired upon
specific system wide changes, like line card activation and
deactivation.

The generated event is handled by mlxsw_core, which checks whether
anything changed in the state of available line cards. If a line card
becomes active or inactive, interested parties such as mlxsw_minimal
are notified via their registered line card event callback.

Patch set overview:

Patches #1 is preparations.

Patches #2-#3 extend mlxsw_core with an infrastructure to handle the
	previously mentioned system events.

Patch #4 extends the I2C bus driver to register an handler for the IRQ
	fired upon specific system wide changes.

Patches #5-#8 gradually add line cards support in mlxsw_minimal by
	dynamically registering / unregistering netdevs for ports found on
	line cards, whenever a line card becomes active / inactive.
====================

Link: https://lore.kernel.org/r/cover.1661093502.git.petrm@nvidia.com


Signed-off-by: Jakub Kicinski <kuba@kernel.org>

3de1484b

mlxsw: minimal: Extend to support line card dynamic operations · 706ddb78

Vadim Pasternak authored Aug 21, 2022



Implement line card operation callbacks got_active() / got_inactive().
The purpose of these callback to create / remove line card ports after
line card is getting active / inactive.

Implement line ports_remove_selected() callback to support line card
un-provisioning flow through 'devlink'.

Add line card operation registration and de-registration APIs.

Add module offset for line card. Offset for main board iz zero.
For line card in slot #n offset is calculated as (#n - 1) multiplied by
maximum modules number.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

706ddb78

mlxsw: minimal: Extend module to port mapping with slot index · 01328e23

Vadim Pasternak authored Aug 21, 2022



The interfaces for ports found on line card are created and removed
dynamically after line card is getting active or inactive.

Introduce per line card array with module to port mapping.
For each port get 'slot_index' through PMLP register and set port
mapping for the relevant [slot_index][module] entry.

Split module and port allocation into separate routines.

Split per line card port creation and removing into separate routines.
Motivation to re-use these routines for line card operations.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

01328e23

mlxsw: minimal: Move ports allocation to separate routine · 9421c8b8

Vadim Pasternak authored Aug 21, 2022



Perform ports allocation in a separate routine.
Motivation is to re-use this routine for ports found on line cards.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

9421c8b8

mlxsw: minimal: Extend APIs with slot index for modular system support · c7ea08ba

Vadim Pasternak authored Aug 21, 2022



Add 'slot_index' field to port structure.
Replace zero slot_index argument with 'slot_index' in 'ethtool'
related APIs.
Add 'slot_index' argument to port initialization and
de-initialization related APIs.

Motivation is to prepare minimal driver for modular system support.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c7ea08ba

mlxsw: i2c: Add support for system interrupt handling · 33fa6909

Vadim Pasternak authored Aug 21, 2022



Extend i2c bus driver with interrupt handler to support system specific
hotplug events, related to line card state change.

Provide system IRQ line for interrupt handler. IRQ line Id could be
provided through the platform data if available, or could be set to the
default value.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

33fa6909

mlxsw: core_linecards: Register a system event handler · 508c29bf

Vadim Pasternak authored Aug 21, 2022



Add line card system event handler. Register it with core. It is
triggered by system interrupts raised from chassis programmable logic
devices to CPU. The purpose is to handle line card state changes over
I2C bus.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

508c29bf

mlxsw: core: Add registration APIs for system event handler · 2ab4e709

Vadim Pasternak authored Aug 21, 2022



The purpose of system event handler is to handle system interrupts.
Such interrupts are raised to CPU from system programmable logic
devices, upon specific system wide changes, like line card activation
and deactivation.

The purpose is to create an alternative to trap mechanism, which
delivers these events to driver over PCI bus, but not available for
the driver working over I2C bus.

Mechanism is system dependent and applicable only for the systems
equipped with programmable devices with custom logic.

Add APIs for event handler registration and un-registration and API
which should be invoked from the registered callbacks when system
interrupt is raised to CPU.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2ab4e709

mlxsw: core_linecards: Separate line card init and fini flow · 4be4779b

Vadim Pasternak authored Aug 21, 2022



Currently, each line card is initialized using the following steps:

1. Initializing its various fields (e.g., slot index).
2. Creating the corresponding devlink object.
3. Enabling events (i.e., traps) for changes in line card status.
4. Querying and processing line card status.

Unlike traps, the IRQ that notifies the CPU about line card status
changes cannot be enabled / disabled on a per line card basis.

If a handler is registered before the line cards are initialized, the
handler risks accessing uninitialized memory.

On the other hand, if the handler is registered after initialization,
we risk missing events. For example, in step 4, the driver might see
that a line card is in ready state and will tell the device to enable
it. When enablement is done, the line card will be activated and the
IRQ will be triggered. Since a handler was not registered, the event
will be missed.

Solve this by splitting the initialization sequence into two steps
(1-2 and 3-4). In a subsequent patch, the handler will be registered
between both steps.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

4be4779b

docs: netlink: basic introduction to Netlink · 510156a7

Jakub Kicinski authored Aug 19, 2022



Provide a bit of a brain dump of netlink related information
as documentation. Hopefully this will be useful to people
trying to navigate implementing YAML based parsing in languages
we won't be able to help with.

I started writing this doc while trying to figure out what
it'd take to widen the applicability of YAML to good old rtnl,
but the doc grew beyond that as it usually happens.

In all honesty a lot of this information is new to me as I usually
follow the "copy an existing example, drink to forget" process
of writing netlink user space, so reviews will be much appreciated.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20220819200221.422801-2-kuba@kernel.org


Signed-off-by: Jakub Kicinski <kuba@kernel.org>

510156a7

net: improve and fix netlink kdoc · 30b60554

Jakub Kicinski authored Aug 19, 2022

Subsequent patch will render the kdoc from
include/uapi/linux/netlink.h into Documentation.
We need to fix the warnings. While at it move
the comments on struct nlmsghdr to a proper
kdoc comment.

Link: https://lore.kernel.org/r/20220819200221.422801-1-kuba@kernel.org


Signed-off-by: Jakub Kicinski <kuba@kernel.org>

30b60554

net: ftmac100: set max_mtu to allow DSA overhead setting · 6c2c782f

Sergei Antonov authored Aug 21, 2022



In case ftmac100 is used with a DSA switch, Linux wants to set MTU
to 1504 to accommodate for DSA overhead. With the default max_mtu
it leads to the error message:
 ftmac100 92000000.mac eth0: error -22 setting MTU to 1504 to include DSA overhead

ftmac100 supports packet length 1518 (MAX_PKT_SIZE constant), so it is
safe to report it in max_mtu.

Signed-off-by: Sergei Antonov <saproj@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220821160844.474277-1-saproj@gmail.com


Signed-off-by: Jakub Kicinski <kuba@kernel.org>

6c2c782f

Aug 23, 2022

Merge branch 'dsa-changes-for-multiple-cpu-ports-part-3' · 52412f55

Paolo Abeni authored Aug 23, 2022

Vladimir Oltean says:

====================
DSA changes for multiple CPU ports (part 3)

Those who have been following part 1:
https://patchwork.kernel.org/project/netdevbpf/cover/20220511095020.562461-1-vladimir.oltean@nxp.com/
and part 2:
https://patchwork.kernel.org/project/netdevbpf/cover/20220521213743.2735445-1-vladimir.oltean@nxp.com/
will know that I am trying to enable the second internal port pair from
the NXP LS1028A Felix switch for DSA-tagged traffic via "ocelot-8021q".
This series represents part 3 of that effort.

Covered here are some preparations in DSA for handling multiple DSA
masters:
- when changing the tagging protocol via sysfs
- when the masters go down
as well as preparation for monitoring the upper devices of a DSA master
(to support DSA masters under a LAG).

There are also 2 small preparations for the ocelot driver, for the case
where multiple tag_8021q CPU ports are used in a LAG. Both those changes
have to do with PGID forwarding domains.

Compared to v1, the patches were trimmed down to just another
preparation stage, and the UAPI changes were pushed further out to part 4.
https://patchwork.kernel.org/project/netdevbpf/cover/20220523104256.3556016-1-olteanv@gmail.com/

Compared to v2, I had to export a symbol I forgot to
(ocelot_port_teardown_dsa_8021q_cpu), to avoid a build breakage when the
felix and seville drivers are built as modules.
====================

Link: https://lore.kernel.org/r/20220819174820.3585002-1-vladimir.oltean@nxp.com


Signed-off-by: Paolo Abeni <pabeni@redhat.com>

52412f55

net: mscc: ocelot: adjust forwarding domain for CPU ports in a LAG · 291ac151

Vladimir Oltean authored Aug 19, 2022



Currently when we have 2 CPU ports configured for DSA tag_8021q mode and
we put them in a LAG, a PGID dump looks like this:

PGID_SRC[0] = ports 4,
PGID_SRC[1] = ports 4,
PGID_SRC[2] = ports 4,
PGID_SRC[3] = ports 4,
PGID_SRC[4] = ports 0, 1, 2, 3, 4, 5,
PGID_SRC[5] = no ports

(ports 0-3 are user ports, ports 4 and 5 are CPU ports)

There are 2 problems with the configuration above:

- user ports should enable forwarding towards both CPU ports, not just 4,
  and the aggregation PGIDs should prune one CPU port or the other from
  the destination port mask, based on a hash computed from packet headers.

- CPU ports should not be allowed to forward towards themselves and also
  not towards other ports in the same LAG as themselves

The first problem requires fixing up the PGID_SRC of user ports, when
ocelot_port_assigned_dsa_8021q_cpu_mask() is called. We need to say that
when a user port is assigned to a tag_8021q CPU port and that port is in
a LAG, it should forward towards all ports in that LAG.

The second problem requires fixing up the PGID_SRC of port 4, to remove
ports 4 and 5 (in a LAG) from the allowed destinations.

After this change, the PGID source masks look as follows:

PGID_SRC[0] = ports 4, 5,
PGID_SRC[1] = ports 4, 5,
PGID_SRC[2] = ports 4, 5,
PGID_SRC[3] = ports 4, 5,
PGID_SRC[4] = ports 0, 1, 2, 3,
PGID_SRC[5] = no ports

Note that PGID_SRC[5] still looks weird (it should say "0, 1, 2, 3" just
like PGID_SRC[4] does), but I've tested forwarding through this CPU port
and it doesn't seem like anything is affected (it appears that PGID_SRC[4]
is being looked up on forwarding from the CPU, since both ports 4 and 5
have logical port ID 4). The reason why it looks weird is because
we've never called ocelot_port_assign_dsa_8021q_cpu() for any user port
towards port 5 (all user ports are assigned to port 4 which is in a LAG
with 5).

Since things aren't broken, I'm willing to leave it like that for now
and just document the oddity.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

291ac151

net: mscc: ocelot: set up tag_8021q CPU ports independent of user port affinity · 36a0bf44

Vladimir Oltean authored Aug 19, 2022

This is a partial revert of commit c295f983

 ("net: mscc: ocelot:
switch from {,un}set to {,un}assign for tag_8021q CPU ports"), because
as it turns out, this isn't how tag_8021q CPU ports under a LAG are
supposed to work.

Under that scenario, all user ports are "assigned" to the single
tag_8021q CPU port represented by the logical port corresponding to the
bonding interface. So one CPU port in a LAG would have is_dsa_8021q_cpu
set to true (the one whose physical port ID is equal to the logical port
ID), and the other one to false.

In turn, this makes 2 undesirable things happen:

(1) PGID_CPU contains only the first physical CPU port, rather than both
(2) only the first CPU port will be added to the private VLANs used by
    ocelot for VLAN-unaware bridging

To make the driver behave in the same way for both bonded CPU ports, we
need to bring back the old concept of setting up a port as a tag_8021q
CPU port, and this is what deals with VLAN membership and PGID_CPU
updating. But we also need the CPU port "assignment" (the user to CPU
port affinity), and this is what updates the PGID_SRC forwarding rules.

All DSA CPU ports are statically configured for tag_8021q mode when the
tagging protocol is changed to ocelot-8021q. User ports are "assigned"
to one CPU port or the other dynamically (this will be handled by a
future change).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

36a0bf44

net: dsa: use dsa_tree_for_each_cpu_port in dsa_tree_{setup,teardown}_master · 5dc760d1

Vladimir Oltean authored Aug 19, 2022



More logic will be added to dsa_tree_setup_master() and
dsa_tree_teardown_master() in upcoming changes.

Reduce the indentation by one level in these functions by introducing
and using a dedicated iterator for CPU ports of a tree.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

5dc760d1

net: dsa: all DSA masters must be down when changing the tagging protocol · f41ec1fd

Vladimir Oltean authored Aug 19, 2022



The fact that the tagging protocol is set and queried from the
/sys/class/net/<dsa-master>/dsa/tagging file is a bit of a quirk from
the single CPU port days which isn't aging very well now that DSA can
have more than a single CPU port. This is because the tagging protocol
is a switch property, yet in the presence of multiple CPU ports it can
be queried and set from multiple sysfs files, all of which are handled
by the same implementation.

The current logic ensures that the net device whose sysfs file we're
changing the tagging protocol through must be down. That net device is
the DSA master, and this is fine for single DSA master / CPU port setups.

But exactly because the tagging protocol is per switch [ tree, in fact ]
and not per DSA master, this isn't fine any longer with multiple CPU
ports, and we must iterate through the tree and find all DSA masters,
and make sure that all of them are down.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

f41ec1fd

net: dsa: only bring down user ports assigned to a given DSA master · 7136097e

Vladimir Oltean authored Aug 19, 2022

This is an adaptation of commit c0a8a9c2

 ("net: dsa: automatically
bring user ports down when master goes down") for multiple DSA masters.
When a DSA master goes down, only the user ports under its control
should go down too, the others can still send/receive traffic.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

7136097e