Skip to content
  1. Aug 03, 2021
  2. Aug 02, 2021
    • Gustavo A. R. Silva's avatar
      net/ipv4: Replace one-element array with flexible-array member · 2d3e5caf
      Gustavo A. R. Silva authored
      There is a regular need in the kernel to provide a way to declare having
      a dynamically sized set of trailing elements in a structure. Kernel code
      should always use “flexible array members”[1] for these cases. The older
      style of one-element or zero-length arrays should no longer be used[2].
      
      Use an anonymous union with a couple of anonymous structs in order to
      keep userspace unchanged:
      
      $ pahole -C ip_msfilter net/ipv4/ip_sockglue.o
      struct ip_msfilter {
      	union {
      		struct {
      			__be32     imsf_multiaddr_aux;   /*     0     4 */
      			__be32     imsf_interface_aux;   /*     4     4 */
      			__u32      imsf_fmode_aux;       /*     8     4 */
      			__u32      imsf_numsrc_aux;      /*    12     4 */
      			__be32     imsf_slist[1];        /*    16     4 */
      		};                                       /*     0    20 */
      		struct {
      			__be32     imsf_multiaddr;       /*     0     4 */
      			__be32     imsf_interface;       /*     4     4 */
      			__u32      imsf_fmode;           /*     8     4 */
      			__u32      imsf_numsrc;          /*    12     4 */
      			__be32     imsf_slist_flex[0];   /*    16     0 */
      		};                                       /*     0    16 */
      	};                                               /*     0    20 */
      
      	/* size: 20, cachelines: 1, members: 1 */
      	/* last cacheline: 20 bytes */
      };
      
      Also, refactor the code accordingly and make use of the struct_size()
      and flex_array_size() helpers.
      
      This helps with the ongoing efforts to globally enable -Warray-bounds
      and get us closer to being able to tighten the FORTIFY_SOURCE routines
      on memcpy().
      
      [1] https://en.wikipedia.org/wiki/Flexible_array_member
      [2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays
      
      Link: https://github.com/KSPP/linux/issues/79
      Link: https://github.com/KSPP/linux/issues/109
      
      
      Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d3e5caf
    • Vladimir Oltean's avatar
      29a097b7
    • Krzysztof Kozlowski's avatar
      nfc: hci: pass callback data param as pointer in nci_request() · 35d7a6f1
      Krzysztof Kozlowski authored
      
      
      The nci_request() receives a callback function and unsigned long data
      argument "opt" which is passed to the callback.  Almost all of the
      nci_request() callers pass pointer to a stack variable as data argument.
      Only few pass scalar value (e.g. u8).
      
      All such callbacks do not modify passed data argument and in previous
      commit they were made as const.  However passing pointers via unsigned
      long removes the const annotation.  The callback could simply cast
      unsigned long to a pointer to writeable memory.
      
      Use "const void *" as type of this "opt" argument to solve this and
      prevent modifying the pointed contents.  This is also consistent with
      generic pattern of passing data arguments - via "void *".  In few places
      which pass scalar values, use casts via "unsigned long" to suppress any
      warnings.
      
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35d7a6f1
    • Christophe JAILLET's avatar
      cavium: switch from 'pci_' to 'dma_' API · 1e0dd56e
      Christophe JAILLET authored
      
      
      The wrappers in include/linux/pci-dma-compat.h should go away.
      
      The patch has been generated with the coccinelle script below. It has been
      hand modified to use 'dma_set_mask_and_coherent()' instead of
      'pci_set_dma_mask()/pci_set_consistent_dma_mask()' when applicable.
      
      It has been compile tested.
      
      @@
      @@
      -    PCI_DMA_BIDIRECTIONAL
      +    DMA_BIDIRECTIONAL
      
      @@
      @@
      -    PCI_DMA_TODEVICE
      +    DMA_TO_DEVICE
      
      @@
      @@
      -    PCI_DMA_FROMDEVICE
      +    DMA_FROM_DEVICE
      
      @@
      @@
      -    PCI_DMA_NONE
      +    DMA_NONE
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_alloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_zalloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_free_consistent(e1, e2, e3, e4)
      +    dma_free_coherent(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_single(e1, e2, e3, e4)
      +    dma_map_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_single(e1, e2, e3, e4)
      +    dma_unmap_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4, e5;
      @@
      -    pci_map_page(e1, e2, e3, e4, e5)
      +    dma_map_page(&e1->dev, e2, e3, e4, e5)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_page(e1, e2, e3, e4)
      +    dma_unmap_page(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_sg(e1, e2, e3, e4)
      +    dma_map_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_sg(e1, e2, e3, e4)
      +    dma_unmap_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
      +    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_device(e1, e2, e3, e4)
      +    dma_sync_single_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
      +    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
      +    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2;
      @@
      -    pci_dma_mapping_error(e1, e2)
      +    dma_mapping_error(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_dma_mask(e1, e2)
      +    dma_set_mask(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_consistent_dma_mask(e1, e2)
      +    dma_set_coherent_mask(&e1->dev, e2)
      
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e0dd56e
    • Vladimir Oltean's avatar
      net: dsa: mt7530: drop paranoid checks in .get_tag_protocol() · 244f8a80
      Vladimir Oltean authored
      
      
      It is desirable to reduce the surface of DSA_TAG_PROTO_NONE as much as
      we can, because we now have options for switches without hardware
      support for DSA tagging, and the occurrence in the mt7530 driver is in
      fact quite gratuitout and easy to remove. Since ds->ops->get_tag_protocol()
      is only called for CPU ports, the checks for a CPU port in
      mtk_get_tag_protocol() are redundant and can be removed.
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarDENG Qingfang <dqfext@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      244f8a80
    • David S. Miller's avatar
      Merge branch 'octeon-drr-config' · a3280efd
      David S. Miller authored
      
      
      Sunil Goutham says:
      
      ====================
      cn10k: DWRR MTU and weights configuration
      
      On OcteonTx2 DWRR quantum is directly configured into each of
      the transmit scheduler queues. And PF/VF drivers were free to
      config any value upto 2^24.
      
      On CN10K, HW is modified, the quantum configuration at scheduler
      queues is in terms of weight. And SW needs to setup a base DWRR MTU
      at NIX_AF_DWRR_RPM_MTU / NIX_AF_DWRR_SDP_MTU. HW will do
      'DWRR MTU * weight' to get the quantum.
      
      This patch series addresses this HW change on CN10K silicons,
      both admin function and PF/VF drivers are modified.
      
      Also added support to program DWRR MTU via devlink params.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3280efd
    • Sunil Goutham's avatar
      octeontx2-pf: cn10k: Config DWRR weight based on MTU · c39830a4
      Sunil Goutham authored
      
      
      Program SQ, MDQ, TL4 to TL2 transmit scheduler queues' DWRR
      weight based on DWRR MTU programmed at NIX_AF_DWRR_RPM_MTU.
      The DWRR MTU from admin function is retrieved via mbox.
      
      On OcteaonTx2 silicon, admin function driver responds with DWRR
      MTU as '1'. This helps to avoid silicon specific transmit
      scheduler DWRR quantum/weight configuration logic.
      
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c39830a4
    • Sunil Goutham's avatar
      octeontx2-af: cn10k: DWRR MTU configuration · 76660df2
      Sunil Goutham authored
      
      
      On OcteonTx2 DWRR quantum is directly configured into each of
      the transmit scheduler queues. And PF/VF drivers were free to
      config any value upto 2^24.
      
      On CN10K, HW is modified, the quantum configuration at scheduler
      queues is in terms of weight. And SW needs to setup a base DWRR MTU
      at NIX_AF_DWRR_RPM_MTU / NIX_AF_DWRR_SDP_MTU. HW will do
      'DWRR MTU * weight' to get the quantum. For LBK traffic, value
      programmed into NIX_AF_DWRR_RPM_MTU register is considered as
      DWRR MTU.
      
      This patch programs a default DWRR MTU of 8192 into HW and also
      provides a way to change this via devlink params.
      
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76660df2
    • Dust Li's avatar
      selftests/net: remove min gso test in packet_snd · cfba3fb6
      Dust Li authored
      This patch removed the 'raw gso min size - 1' test which
      always fails now:
      ./in_netns.sh ./psock_snd -v -c -g -l "${mss}"
        raw gso min size - 1 (expected to fail)
        tx: 1524
        rx: 1472
        OK
      
      After commit 7c6d2ecb
      
       ("net: be more gentle about silly
      gso requests coming from user"), we relaxed the min gso_size
      check in virtio_net_hdr_to_skb().
      So when a packet which is smaller then the gso_size,
      GSO for this packet will not be set, the packet will be
      send/recv successfully.
      
      Signed-off-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Reviewed-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cfba3fb6
    • Yufeng Mo's avatar
      bonding: 3ad: fix the concurrency between __bond_release_one() and bond_3ad_state_machine_handler() · 220ade77
      Yufeng Mo authored
      Some time ago, I reported a calltrace issue
      "did not find a suitable aggregator", please see[1].
      After a period of analysis and reproduction, I find
      that this problem is caused by concurrency.
      
      Before the problem occurs, the bond structure is like follows:
      
      bond0 - slaver0(eth0) - agg0.lag_ports -> port0 - port1
                            \
                              port0
            \
              slaver1(eth1) - agg1.lag_ports -> NULL
                            \
                              port1
      
      If we run 'ifenslave bond0 -d eth1', the process is like below:
      
      excuting __bond_release_one()
      |
      bond_upper_dev_unlink()[step1]
      |                       |                       |
      |                       |                       bond_3ad_lacpdu_recv()
      |                       |                       ->bond_3ad_rx_indication()
      |                       |                       spin_lock_bh()
      |                       |                       ->ad_rx_machine()
      |                       |                       ->__record_pdu()[step2]
      |                       |                       spin_unlock_bh()
      |                       |                       |
      |                       bond_3ad_state_machine_handler()
      |                       spin_lock_bh()
      |                       ->ad_port_selection_logic()
      |                       ->try to find free aggregator[step3]
      |                       ->try to find suitable aggregator[step4]
      |                       ->did not find a suitable aggregator[step5]
      |                       spin_unlock_bh()
      |                       |
      |                       |
      bond_3ad_unbind_slave() |
      spin_lock_bh()
      spin_unlock_bh()
      
      step1: already removed slaver1(eth1) from list, but port1 remains
      step2: receive a lacpdu and update port0
      step3: port0 will be removed from agg0.lag_ports. The struct is
             "agg0.lag_ports -> port1" now, and agg0 is not free. At the
      	   same time, slaver1/agg1 has been removed from the list by step1.
      	   So we can't find a free aggregator now.
      step4: can't find suitable aggregator because of step2
      step5: cause a calltrace since port->aggregator is NULL
      
      To solve this concurrency problem, put bond_upper_dev_unlink()
      after bond_3ad_unbind_slave(). In this way, we can invalid the port
      first and skip this port in bond_3ad_state_machine_handler(). This
      eliminates the situation that the slaver has been removed from the
      list but the port is still valid.
      
      [1]https://lore.kernel.org/netdev/10374.1611947473@famine/
      
      
      
      Signed-off-by: default avatarYufeng Mo <moyufeng@huawei.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      220ade77
    • Cong Wang's avatar
      net_sched: refactor TC action init API · 695176bf
      Cong Wang authored
      
      
      TC action ->init() API has 10 parameters, it becomes harder
      to read. Some of them are just boolean and can be replaced
      by flags. Similarly for the internal API tcf_action_init()
      and tcf_exts_validate().
      
      This patch converts them to flags and fold them into
      the upper 16 bits of "flags", whose lower 16 bits are still
      reserved for user-space. More specifically, the following
      kernel flags are introduced:
      
      TCA_ACT_FLAGS_POLICE replace 'name' in a few contexts, to
      distinguish whether it is compatible with policer.
      
      TCA_ACT_FLAGS_BIND replaces 'bind', to indicate whether
      this action is bound to a filter.
      
      TCA_ACT_FLAGS_REPLACE  replaces 'ovr' in most contexts,
      means we are replacing an existing action.
      
      TCA_ACT_FLAGS_NO_RTNL replaces 'rtnl_held' but has the
      opposite meaning, because we still hold RTNL in most
      cases.
      
      The only user-space flag TCA_ACT_FLAGS_NO_PERCPU_STATS is
      untouched and still stored as before.
      
      I have tested this patch with tdc and I do not see any
      failure related to this patch.
      
      Tested-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Acked-by: default avatarJamal Hadi <Salim&lt;jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      695176bf
    • Martin Kaiser's avatar
      niu: read property length only if we use it · 451395f7
      Martin Kaiser authored
      
      
      In three places, the driver calls of_get_property and reads the property
      length although the length is not used. Update the calls to not request
      the length.
      
      Signed-off-by: default avatarMartin Kaiser <martin@kaiser.cx>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      451395f7
  3. Aug 01, 2021
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · d39e8b92
      Jakub Kicinski authored
      Andrii Nakryiko says:
      
      ====================
      bpf-next 2021-07-30
      
      We've added 64 non-merge commits during the last 15 day(s) which contain
      a total of 83 files changed, 5027 insertions(+), 1808 deletions(-).
      
      The main changes are:
      
      1) BTF-guided binary data dumping libbpf API, from Alan.
      
      2) Internal factoring out of libbpf CO-RE relocation logic, from Alexei.
      
      3) Ambient BPF run context and cgroup storage cleanup, from Andrii.
      
      4) Few small API additions for libbpf 1.0 effort, from Evgeniy and Hengqi.
      
      5) bpf_program__attach_kprobe_opts() fixes in libbpf, from Jiri.
      
      6) bpf_{get,set}sockopt() support in BPF iterators, from Martin.
      
      7) BPF map pinning improvements in libbpf, from Martynas.
      
      8) Improved module BTF support in libbpf and bpftool, from Quentin.
      
      9) Bpftool cleanups and documentation improvements, from Quentin.
      
      10) Libbpf improvements for supporting CO-RE on old kernels, from Shuyi.
      
      11) Increased maximum cgroup storage size, from Stanislav.
      
      12) Small fixes and improvements to BPF tests and samples, from various folks.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits)
        tools: bpftool: Complete metrics list in "bpftool prog profile" doc
        tools: bpftool: Document and add bash completion for -L, -B options
        selftests/bpf: Update bpftool's consistency script for checking options
        tools: bpftool: Update and synchronise option list in doc and help msg
        tools: bpftool: Complete and synchronise attach or map types
        selftests/bpf: Check consistency between bpftool source, doc, completion
        tools: bpftool: Slightly ease bash completion updates
        unix_bpf: Fix a potential deadlock in unix_dgram_bpf_recvmsg()
        libbpf: Add btf__load_vmlinux_btf/btf__load_module_btf
        tools: bpftool: Support dumping split BTF by id
        libbpf: Add split BTF support for btf__load_from_kernel_by_id()
        tools: Replace btf__get_from_id() with btf__load_from_kernel_by_id()
        tools: Free BTF objects at various locations
        libbpf: Rename btf__get_from_id() as btf__load_from_kernel_by_id()
        libbpf: Rename btf__load() as btf__load_into_kernel()
        libbpf: Return non-null error on failures in libbpf_find_prog_btf_id()
        bpf: Emit better log message if bpf_iter ctx arg btf_id == 0
        tools/resolve_btfids: Emit warnings and patch zero id for missing symbols
        bpf: Increase supported cgroup storage value size
        libbpf: Fix race when pinning maps in parallel
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20210730225606.1897330-1-andrii@kernel.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d39e8b92