Commits · f152b41ba6cf43a3a5a87c7b45e6a2722a6195dc · Mirrors / github.com / raspberrypi_linux

Jul 29, 2020

mlxsw: core: Add support for temperature thresholds reading for QSFP-DD transceivers · f152b41b

Vadim Pasternak authored Jul 28, 2020



Allow QSFP-DD transceivers temperature thresholds reading for hardware
monitoring and thermal control.

For this type, the thresholds are located in page 02h according to the
"Module and Lane Thresholds" description from Common Management
Interface Specification.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f152b41b

mlxsw: core: Add ethtool support for QSFP-DD transceivers · 6af496ad

Vadim Pasternak authored Jul 28, 2020

The Quad Small Form Factor Pluggable Double Density (QSFP-DD) hardware
specification defines a form factor that supports up to 400 Gbps in
aggregate over an 8x50-Gbps electrical interface. The QSFP-DD supports
both optical and copper interfaces.

Implementation is based on Common Management Interface Specification;
Rev 4.0 May 8, 2019. Table 8-2 "Identifier and Status Summary (Lower
Page)" from this spec defines "Id and Status" fields located at offsets
00h - 02h. Bit 2 at offset 02h ("Flat_mem") specifies QSFP EEPROM memory
mode, which could be "upper memory flat" or "paged". Flat memory mode is
coded "1", and indicates that only page 00h is implemented in EEPROM.
Paged memory is coded "0" and indicates that pages 00h, 01h, 02h, 10h
and 11h are implemented. Pages 10h and 11h are currently not supported
by the driver.

"Flat" memory mode is used for the passive copper transceivers. For this
type only page 00h (256 bytes) is available. "Paged" memory is used for
the optical transceivers. For this type pages 00h (256 bytes), 01h (128
bytes) and 02h (128 bytes) are available. Upper page 01h contains static
advertising field, while upper page 02h contains the module-defined
thresholds and lane-specific monitors.

Extend enumerator 'mlxsw_reg_mcia_eeprom_module_info_id' with additional
field 'MLXSW_REG_MCIA_EEPROM_MODULE_INFO_TYPE_ID'. This field is used to
indicate for QSFP-DD transceiver type which memory mode is to be used.

Expose 256 bytes buffer for QSFP-DD passive copper transceiver and
512 bytes buffer for optical.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

6af496ad

Merge tag 'mlx5-updates-2020-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 0082dd8a

David S. Miller authored Jul 28, 2020



Saeed Mahameed says:

====================
mlx5-updates-2020-07-28

Misc and small update to mlx5 driver:

1) Aya adds PCIe relaxed ordering support for mlx5 netdev queues.
2) Eran Refactors pages data base to be per vf/function to speedup
   unload time.
3) Parav changes eswitch steering initialization to account for
   tota_vports rather than for only active vports and
   Link non uplink representors to PCI device, for uniform naming scheme.

4) Tariq, trivial RX code improvements and missing inidirect calls
   wrappers.

5) Small cleanup patches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

0082dd8a

farsync: use generic power management · f21bbd63

Vaibhav Gupta authored Jul 28, 2020



The .suspend() and .resume() callbacks are not defined for this driver.
Still, their power management structure follows the legacy framework. To
bring it under the generic framework, simply remove the binding of
callbacks from "struct pci_driver".

Change code indentation from space to tab in "struct pci_driver".

Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f21bbd63

Jul 28, 2020

net/mlx5: drop unnecessary list_empty · 22f9d2f4

Julia Lawall authored Jul 26, 2020



list_for_each_entry is able to handle an empty list.
The only effect of avoiding the loop is not initializing the
index variable.
Drop list_empty tests in cases where these variables are not
used.

Note that list_for_each_entry is defined in terms of list_first_entry,
which indicates that it should not be used on an empty list.  But in
list_for_each_entry, the element obtained by list_first_entry is not
really accessed, only the address of its list_head field is compared
to the address of the list head, so the list_first_entry is safe.

The semantic patch that makes this change is as follows (with another
variant for the no brace case): (http://coccinelle.lip6.fr/)

<smpl>
@@
expression x,e;
iterator name list_for_each_entry;
statement S;
identifier i;
@@

-if (!(list_empty(x))) {
   list_for_each_entry(i,x,...) S
- }
 ... when != i
? i = e
</smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

22f9d2f4

net/mlx5: Use fallthrough pseudo-keyword · c8b838d1

Gustavo A. R. Silva authored Jul 27, 2020



Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

c8b838d1

net/mlx5: DR, Reduce print level for matcher print · ffdc8ec0

Alex Vesker authored Jul 06, 2020



There is no need to print on each unsuccessful matcher
ip_version combination since it probably will happen when
trying to create all the possible combinations.
On a real failure we have a print in the calling function.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

ffdc8ec0

net/mlx5e: Add support for PCI relaxed ordering · 17347d54

Aya Levin authored Mar 26, 2020



The concept of Relaxed Ordering in the PCI Express environment allows
switches in the path between the Requester and Completer to reorder some
transactions just received before others that were previously enqueued.

In ETH driver, there is no question of write integrity since each memory
segment is written only once per cycle. In addition, the driver doesn't
access the memory shared with the hardware until the corresponding CQE
arrives indicating all PCI transactions are done.

Running TCP single stream over ConnectX-4 LX, ARM CPU on remote-numa has
300% improvement in the bandwidth.

With relaxed ordering turned off: BW:10 [GB/s]
With relaxed ordering turned on: BW:40 [GB/s]

The driver turns relaxed ordering with respect to the firmware
capabilities and the return value from pcie_relaxed_ordering_enabled().

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

17347d54

net/mlx5e: Use indirect call wrappers for RX post WQEs functions · 5d0b8476

Tariq Toukan authored Jul 12, 2020



Use the indirect call wrapper API macros for declaration and scope
of the RX post WQEs functions.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

5d0b8476

net/mlx5e: Move exposure of datapath function to txrx header · b307f7f1

Tariq Toukan authored Apr 30, 2020



Move them from the generic header file "en.h", to the
datapath header file "txrx.h".

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

b307f7f1

net/mlx5e: RX, Re-work initializaiton of RX function pointers · 5adf4c47

Tariq Toukan authored Apr 30, 2020



Instead of exposing the RQ datapath handlers (from en_rx.c) so that
they are set in the control path (in en_main.c), wrap this logic
in a single function in en_rx.c and expose it alone.

Every profile will now have a pointer to the new mlx5e_rx_handlers
structure, instead of directly pointing to the previously-exposed
RQ handlers.

This significantly improves locality and modularity of the driver,
and allows many functions in en_rx.c to become static.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

5adf4c47

net/mlx5e: Link non uplink representors to PCI device · 123f0f53

Parav Pandit authored Jun 12, 2020



Currently PF and VF representors are exposed as virtual device.
They are not linked to its parent PCI device like how uplink
representor is linked.
Due to this, PF and VF representors cannot benefit of the
systemd defined naming scheme. This requires special handling
by the users.

Hence, link the PF and VF representors to their parent PCI device
similar to existing uplink representor netdevice.

Example:
udevadm output before linking to PCI device:
$ udevadm test-builtin net_id  /sys/class/net/eth6
Load module index
Network interface NamePolicy= disabled on kernel command line, ignoring.
Parsed configuration file /usr/lib/systemd/network/99-default.link
Created link configuration context.
Using default interface naming scheme 'v243'.
ID_NET_NAMING_SCHEME=v243
Unload module index
Unloaded link configuration context.

udevadm output after linking to PCI device:
$ udevadm test-builtin net_id /sys/class/net/eth6
Load module index
Network interface NamePolicy= disabled on kernel command line, ignoring.
Parsed configuration file /usr/lib/systemd/network/99-default.link
Created link configuration context.
Using default interface naming scheme 'v243'.
ID_NET_NAMING_SCHEME=v243
ID_NET_NAME_PATH=enp0s8f0npf0vf0
Unload module index
Unloaded link configuration context.

In past there was little concern over seeing 10,000 lines output
showing up at thread [1] is not applicable as ndo ops for VF
handling is not exposed for all the 100 repesentors for mlx5 devices.

Additionally alternative device naming [2] to overcome shorter device
naming is also part of the latest systemd release v245.

[1] https://marc.info/?l=linux-netdev&m=152657949117904&w=2
[2] https://lwn.net/Articles/814068/

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

123f0f53

net/mlx5: E-switch, Use eswitch total_vports · 8d6bd3c3

Parav Pandit authored Jul 20, 2020



Currently steering table and rx group initialization helper
routines works on the total_vports passed as input parameter.

Both eswitch helpers work on the mlx5_eswitch and thereby have access
to esw->total_vports. Hence use it directly instead of passing it
via function input arguments.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

8d6bd3c3

net/mlx5: E-switch, Reuse total_vports and avoid duplicate nvports · 0da3c12d

Parav Pandit authored Jul 20, 2020



Total e-switch vports are already stored in mlx5_eswitch total_vports.
Avoid copy of it in nvports and reuse existing total_vports calculation.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

0da3c12d

net/mlx5: E-switch, Consider maximum vf vports for steering init · 8b95bda4

Parav Pandit authored Jul 20, 2020

When eswitch is enabled, VFs might not be enabled. Hence, consider
maximum number of VFs.
This further closes the gap between handling VF vports between ECPF and
PF.

Fixes: ea2128fd

 ("net/mlx5: E-switch, Reduce dependency on num_vfs during mode set")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

8b95bda4

net/mlx5: Add function ID to reclaim pages debug log · c1a0969e

Avihu Hagag authored Jun 08, 2020



Add function ID to reclaim pages debug log for better user visibility.

Signed-off-by: Avihu Hagag <avihuh@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

c1a0969e

net/mlx5: Hold pages RB tree per VF · d6945242

Eran Ben Elisha authored May 18, 2020



Per page request event, FW request to allocated or release pages for a
single function. Driver maintains FW pages object per function, so there
is no need to hold one global page data-base. Instead, have a page
data-base per function, which will improve performance release flow in all
cases, especially for "release all pages".

As the range of function IDs is large and not sequential, use xarray to
store a per function ID page data-base, where the function ID is the key.

Upon first allocation of a page to a function ID, create the page
data-base per function. This data-base will be released only at pagealloc
mechanism cleanup.

NIC: ConnectX-4 Lx
CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
Test case: 32 VFs, measure release pages on one VF as part of FLR
Before: 0.021 Sec
After:  0.014 Sec

The improvement depends on amount of VFs and memory utilization
by them. Time measurements above were taken from idle system.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

d6945242

net/mlx4: Use fallthrough pseudo-keyword · 5e619d73

Gustavo A. R. Silva authored Jul 27, 2020



Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1].

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

5e619d73

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · a02d26fe

David S. Miller authored Jul 27, 2020



Tony Nguyen says:

====================
1GbE Intel Wired LAN Driver Updates 2020-07-27

This series contains updates to igc driver only.

Sasha cleans up double definitions, unneeded and non applicable
registers, and removes unused fields in structs. Ensures the Receive
Descriptor Minimum Threshold Count is cleared and fixes a static checker
error.

v2: Remove fields from hw_stats in patches that removed their uses.
Reworded patch descriptions for patches 1, 2, and 4.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

a02d26fe

qed: fix assignment of n_rq_elems to incorrect params field · 35406697

Colin Ian King authored Jul 27, 2020

Currently n_rq_elems is being assigned to params.elem_size instead of the
field params.num_elems.  Coverity is detecting this as a double assingment
to params.elem_size and reporting this as an usused value on the first
assignment.  Fix this.

Addresses-Coverity: ("Unused value")
Fixes: b6db3f71

 ("qed: simplify chain allocation with init params struct")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

35406697

Merge branch 'sfc-driver-for-EF100-family-NICs-part-1' · 86f968a0

David S. Miller authored Jul 27, 2020



Edward Cree says:

====================
sfc: driver for EF100 family NICs, part 1

EF100 is a new NIC architecture under development at Xilinx, based
 partly on existing Solarflare technology.  As many of the hardware
 interfaces resemble EF10, support is implemented within the 'sfc'
 driver, which previous patch series "commonised" for this purpose.

In order to maintain bisectability while splitting into patches of a
 reasonable size, I had to do a certain amount of back-and-forth with
 stubs for things that the common code may try to call, mainly because
 we can't do them until we've set up MCDI, but we can't set up MCDI
 without probing the event queues, at which point a lot of the common
 machinery becomes reachable from event handlers.
Consequently, this first series doesn't get as far as actually sending
 and receiving packets.  I have a second series ready to follow it
 which implements the datapath (and a few other things like ethtool).

Changes from v4:
 * Fix build on CONFIG_RETPOLINE=n by using plain prototypes instead
   of INDIRECT_CALLABLE_DECLARE.

Changes from v3:
 * combine both drivers (sfc_ef100 and sfc) into a single module, to
   make non-modular builds work.  Patch #4 now adds a few indirections
   to support this; the ones in the RX and TX path use indirect-call-
   wrappers to minimise the performance impact.

Changes from v2:
 * remove MODULE_VERSION.
 * call efx_destroy_reset_workqueue() from ef100_exit_module().
 * correct uint32_ts to u32s.  While I was at it, I fixed a bunch of
   other style issues in the function-control-window code.
All in patch #4.

Changes from v1:
 * kernel test robot spotted a link error when sfc_ef100 was built
   without mdio.  It turns out the thing we were trying to link to
   was a bogus thing to do on anything but Falcon, so new patch #1
   removes it from this driver.
 * fix undeclared symbols in patch #4 by shuffling around prototypes
   and #includes and adding 'static' where appropriate.
 * fix uninitialised variable 'rc2' in patch #7.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

86f968a0

sfc_ef100: implement ndo_get_phys_port_{id,name} · 1c748843

Edward Cree authored Jul 27, 2020



Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1c748843

sfc_ef100: read device MAC address at probe time · 29ec1b27

Edward Cree authored Jul 27, 2020



Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

29ec1b27

sfc_ef100: probe the PHY and configure the MAC · 99a23c11

Edward Cree authored Jul 27, 2020



Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

99a23c11

sfc_ef100: actually perform resets · 4e5675bb

Edward Cree authored Jul 27, 2020



In ef100_reset(), make the MCDI call to do the reset.
Also, do a reset at start-of-day during probe, to put the function in
 a clean state.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4e5675bb

sfc_ef100: extend ef100_check_caps to cover datapath_caps3 · d802b0ae

Edward Cree authored Jul 27, 2020



MC_CMD_GET_CAPABILITIES now has a third word of flags; extend the
 efx_has_cap() machinery to cover it.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d802b0ae

sfc_ef100: read datapath caps, implement check_caps · f6573120

Edward Cree authored Jul 27, 2020



Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6573120

sfc_ef100: process events for MCDI completions · 5e4ef673

Edward Cree authored Jul 27, 2020



Currently RX and TX-completion events are unhandled, as neither the RX
 nor the TX path has been implemented yet.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5e4ef673

sfc_ef100: implement ndo_open/close and EVQ probing · 965b549f

Edward Cree authored Jul 27, 2020

Channels are probed, but actual event handling is still stubbed out.

Stub implementation of check_caps is needed because ptp.c will call into
it from efx_ptp_use_mac_tx_timestamps() to decide if it wants TXQs.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

965b549f

sfc_ef100: implement MCDI transport · 2200e6d9

Edward Cree authored Jul 27, 2020



Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2200e6d9

sfc_ef100: don't call efx_reset_down()/up() on EF100 · 35a36af8

Edward Cree authored Jul 27, 2020



We handle everything ourselves in ef100_reset(), rather than relying on
 the generic down/up routines.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

35a36af8

sfc_ef100: PHY probe stub · aa86a75f

Edward Cree authored Jul 27, 2020



We can't actually do the MCDI to probe it fully until we have working
 MCDI, which comes later, but we need efx->phy_data to be allocated so
 that when we get MCDI events the link-state change handler doesn't
 NULL-dereference.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aa86a75f

sfc_ef100: reset-handling stub · c027f2a7

Edward Cree authored Jul 27, 2020



We don't actually do the efx_mcdi_reset() because we don't have MCDI yet.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c027f2a7

sfc: skeleton EF100 PF driver · 51b35a45

Edward Cree authored Jul 27, 2020



No TX or RX path, no MCDI, not even an ifup/down handler.
Besides stubs, the bulk of the patch deals with reading the Xilinx
 extended PCIe capability, which tells us where to find our BAR.

Though in the same module, EF100 has its own struct pci_driver,
 which is named sfc_ef100.

A small number of additional nic_type methods are added; those in the
 TX (tx_enqueue) and RX (rx_packet) paths are called through indirect
 call wrappers to minimise the performance impact.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

51b35a45

sfc_ef100: register accesses on EF100 · 61060c5d

Edward Cree authored Jul 27, 2020



EF100 adds a few new valid addresses for efx_writed_page(), as well as
 a Function Control Window in the BAR whose location is variable.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

61060c5d

sfc_ef100: add EF100 register definitions · adf72ee3

Edward Cree authored Jul 27, 2020



Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

adf72ee3

sfc: remove efx_ethtool_nway_reset() · 0ccf267e

Edward Cree authored Jul 27, 2020



An MDIO-based n-way restart does not make sense for any of the NICs
 supported by this driver, nor for the coming EF100.
Unlike on Falcon (which was already split off into a separate driver),
 the PHY on all of Siena, EF10 and EF100 is managed by MC firmware.
While Siena can talk to the PHY over MDIO, doing so for anything other
 than debugging purposes (mdio_mii_ioctl) is likely to confuse the
 firmware.
(According to the SFC firmware team, this support was originally added
 to the Siena driver early in the development of that product, before
 it was decided to have firmware manage the PHY.)

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0ccf267e

Merge branch 'Add-PRP-driver' · 65ccbbda

David S. Miller authored Jul 27, 2020



Murali Karicheri says:

====================
Add PRP driver

This series is dependent on the following patches sent out to
netdev list. All (1-3) are already merged to net/master as of
sending this, but not on the net-next master branch. So need
to apply them to net-next before applying this series. v3 of
the iproute2 patches can be merged to work with this series
as there are no updates since then.

[1] https://marc.info/?l=linux-netdev&m=159526378131542&w=2
[2] https://marc.info/?l=linux-netdev&m=159499772225350&w=2
[3] https://marc.info/?l=linux-netdev&m=159499772425352&w=2

This series adds support for Parallel Redundancy Protocol (PRP)
in the Linux HSR driver as defined in IEC-62439-3. PRP Uses a
Redundancy Control Trailer (RCT) the format of which is
similar to HSR Tag. This is used for implementing redundancy.
RCT consists of 6 bytes similar to HSR tag and contain following
fields:-

- 16-bit sequence number (SeqNr);
- 4-bit LAN identifier (LanId);
- 12 bit frame size (LSDUsize);
- 16-bit suffix (PRPsuffix).

The PRPsuffix identifies PRP frames and distinguishes PRP frames
from other protocols that also append a trailer to their useful
data. The LSDUsize field allows the receiver to distinguish PRP
frames from random, nonredundant frames as an additional check.
LSDUsize is the size of the Ethernet payload inclusive of the
RCT. Sequence number along with LanId is used for duplicate
detection and discard.

PRP node is also known as Dual Attached Node (DAN-P) since it
is typically attached to two different LAN for redundancy.
DAN-P duplicates each of L2 frames and send it over the two
Ethernet links. Each outgoing frame is appended with RCT.
Unlike HSR, these are added to the end of L2 frame and will be
treated as pad by bridges and therefore would be work with
traditional bridges or switches, where as HSR wouldn't as Tag
is prefixed to the Ethenet frame. At the remote end, these are
received and the duplicate frame is discarded before the stripped
frame is send up the networking stack. Like HSR, PRP also sends
periodic Supervision frames to the network. These frames are
received and MAC address from the SV frames are populated in a
database called Node Table. The above functions are grouped into
a block called Link Redundancy Entity (LRE) in the IEC spec.

As there are many similarities between HSR and PRP protocols,
this patch re-uses the code from HSR driver to implement PRP
driver. As per feedback from the RFC series, the implementation
uses the existing HSR Netlink socket interface to create the
PRP interface by adding a new proto parameter to the ip link
command to identify the PRP protocol. iproute2 is enhanced to
implement this new parameter. The hsr_netlink.c is enhanced
to handle the new proto parameter. As suggested during the RFC
review, the driver introduced a proto_ops structure to hold
protocol specfic functions to handle HSR and PRP specific
function pointers and use them in the code based on the
protocol to handle protocol specific part differently in the
driver.

Please review this and provide me feedback so that I can work to
incorporate them and spin the next version if needed.

The patch was tested using two TI AM57x IDK boards for PRP which
are connected back to back over two CPSW Ethernet ports.

PRP Test setup
---------------

--------eth0             eth0 --------
|AM572x|----------------------|AM572x|
|      |----------------------|      |
--------eth1             eth1 --------

To build, enable CONFIG_HSR=y or m
make omap2plus_defconfig
make zImage; make modules; make dtbs
Copy the zImage and dtb files to the file system on SD card
and power on the AM572x boards.
This can be tested on any platforms with 2 Ethernet interfaces.
So will appreciate if you can give it a try and provide your
Tested-by.

Command to create PRP interface
-------------------------------
ifconfig eth0 0.0.0.0 down
ifconfig eth1 0.0.0.0 down
ifconfig eth0 hw ether 70:FF:76:1C:0E:8C
ifconfig eth1 hw ether 70:FF:76:1C:0E:8C
ifconfig eth0 up
ifconfig eth1 up
ip link add name prp0 type hsr slave1 eth0 slave2 eth1 supervision 45 proto 1
ifconfig prp0 192.168.2.10

ifconfig eth0 0.0.0.0 down
ifconfig eth1 0.0.0.0 down
ifconfig eth0 hw ether 70:FF:76:1C:0E:8D
ifconfig eth1 hw ether 70:FF:76:1C:0E:8D
ifconfig eth0 up
ifconfig eth1 up
ip link add name prp0 type hsr slave1 eth0 slave2 eth1 supervision 45 proto 1
ifconfig prp0 192.168.2.20

command to show node table
----------------------------
Ping the peer board after the prp0 interface is up.

The remote node (DAN-P) will be shown in the node table as below.

root@am57xx-evm:~# cat /sys/kernel/debug/hsr/prp0/node_table
Node Table entries for (PRP) device
MAC-Address-A,    MAC-Address-B,    time_in[A], time_in[B], Address-B port, SAN-A, SAN-B, DAN-P
70:ff:76:1c:0e:8c 00:00:00:00:00:00   ffffe83f,   ffffe83f,              0,     0,     0,     1

Try to capture the raw PRP frames at the eth0 interface as
tcpdump -i eth0 -xxx

Sample Supervision frames and ARP frames shown below.

==================================================================================
Successive Supervision frames captured with tcpdump (with RCT at the end):

03:43:29.500999 70:ff:76:1c:0e:8d (oui Unknown) > 01:15:4e:00:01:2d (oui Unknown), ethertype Unknown (0x88f
        0x0000:  0115 4e00 012d 70ff 761c 0e8d 88fb 0001
        0x0010:  7e0a 1406 70ff 761c 0e8d 0000 0000 0000
        0x0020:  0000 0000 0000 0000 0000 0000 0000 0000
        0x0030:  0000 0000 0000 0000 0000 0000 fc2b a034
        0x0040:  88fb

03:43:31.581025 70:ff:76:1c:0e:8d (oui Unknown) > 01:15:4e:00:01:2d (oui Unknown), ethertype Unknown (0x88f
        0x0000:  0115 4e00 012d 70ff 761c 0e8d 88fb 0001
        0x0010:  7e0b 1406 70ff 761c 0e8d 0000 0000 0000
        0x0020:  0000 0000 0000 0000 0000 0000 0000 0000
        0x0030:  0000 0000 0000 0000 0000 0000 fc2c a034
        0x0040:  88fb

ICMP Echo request frame with RCT
03:43:33.805354 IP 192.168.2.20 > 192.168.2.10: ICMP echo request, id 63748, seq 1, length 64
        0x0000:  70ff 761c 0e8c 70ff 761c 0e8d 0800 4500
        0x0010:  0054 26a4 4000 4001 8e96 c0a8 0214 c0a8
        0x0020:  020a 0800 c28e f904 0001 202e 1c3d 0000
        0x0030:  0000 0000 0000 0000 0000 0000 0000 0000
        0x0040:  0000 0000 0000 0000 0000 0000 0000 0000
        0x0050:  0000 0000 0000 0000 0000 0000 0000 0000
        0x0060:  0000 fc31 a05a 88fb
==================================================================================
The iperf3 traffic test logs can be accessed at the links below.
DUT-1: https://pastebin.ubuntu.com/p/8SkQzWJMn8/
DUT-2: https://pastebin.ubuntu.com/p/j2BZvvs7p4/

Other tests done.
 - Connect a SAN (eth0 and eth1 without prp interface) and
   do ping test from eth0 (192.168.2.40) to prp0 (192.168.2.10)
   verify the SAN node shows at the correct link A and B as shown
   in the node table dump
 - Regress HSR interface using 3 nodes connected in a ring topology.
   create hsr link version 0. Do iperf3 test between all nodes
   create hsr link version 1. Do iperf3 test between all nodes.

         --------eth0             eth1 --------eth0      eth1-------|
         |AM572x|----------------------|AM572x|--------------|AM572x|
         |      |                      |      |        ------|      |
         --------eth1---|               -------        | eth0 -------
                        |-------------------------------

   command used for HSR interface

   HSR V0

   ifconfig eth0 0.0.0.0 down
   ifconfig eth1 0.0.0.0 down
   ifconfig eth0 hw ether 70:FF:76:1C:0E:8C
   ifconfig eth1 hw ether 70:FF:76:1C:0E:8C
   ifconfig eth0 up
   ifconfig eth1 up
   ip link add name hsr0 type hsr slave1 eth0 slave2 eth1 supervision 45 version 0
   ifconfig hsr0 192.168.2.10

   HSR V1

   ifconfig eth0 0.0.0.0 down
   ifconfig eth1 0.0.0.0 down
   ifconfig eth0 hw ether 70:FF:76:1C:0E:8C
   ifconfig eth1 hw ether 70:FF:76:1C:0E:8C
   ifconfig eth0 up
   ifconfig eth1 up
   ip link add name hsr0 type hsr slave1 eth0 slave2 eth1 supervision 45 version 1
   ifconfig hsr0 192.168.2.10

   Logs at
   DUT-1 : https://pastebin.ubuntu.com/p/6PSJbZwQ6y/
   DUT-2 : https://pastebin.ubuntu.com/p/T8TqJsPRHc/
   DUT-3 : https://pastebin.ubuntu.com/p/VNzpv6HzKj/
 - Build tests :-
   Build with CONFIG_HSR=m
   allmodconfig build
   build with CONFIG_HSR=y and rebuild with sparse checker
   make C=1 zImage; make modules

Version history:
  v5 : Fixed comments about Kconfig changes on Patch 1/7 against v4
       Rebased to netnext/master branch.

  v4 : fixed following vs v3
       reverse xmas tree for local variables
       check for return type in call to skb_put_padto()

  v3 : Separated bug fixes from this series and send them for immediate merge
       But for that this is same as v2.

  v2 : updated comments on RFC. Following are the main changes:-
       - Removed the hsr_prp prefix
       - Added PRP information in header files to indicate
         the support for PRP explicitely
       - Re-use netlink socket interface with an added
         parameter proto for identifying PRP.
       - Use function pointers using a proto_ops struct
         to do things differently for PRP vs HSR.

   RFC: initial version posted and discussed at
       https://www.spinics.net/lists/netdev/msg656229.html
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

65ccbbda

net: prp: enhance debugfs to display PRP info · 795ec4f5

Murali Karicheri authored Jul 22, 2020



Print PRP specific information from node table as part of debugfs
node table display. Also display the node as DAN-H or DAN-P depending
on the info from node table.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

795ec4f5

net: prp: add packet handling support · 451d8123

Murali Karicheri authored Jul 22, 2020



DAN-P (Dual Attached Nodes PRP) nodes are expected to receive
traditional IP packets as well as PRP (Parallel Redundancy
Protocol) tagged (trailer) packets. PRP trailer is 6 bytes
of PRP protocol unit called RCT, Redundancy Control Trailer
(RCT) similar to HSR tag. PRP network can have traditional
devices such as bridges/switches or PC attached to it and
should be able to communicate. Regular Ethernet devices treat
the RCT as pads.  This patch adds logic to format L2 frames
from network stack to add a trailer (RCT) and send it as
duplicates over the slave interfaces when the protocol is
PRP as per IEC 62439-3. At the ingress, it strips the trailer,
do duplicate detection and rejection and forward a stripped
frame up the network stack. PRP device should accept frames
from Singly Attached Nodes (SAN) and thus the driver mark
the link where the frame came from in the node table.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

451d8123