Skip to content
  1. Jan 06, 2022
    • Christoph Hellwig's avatar
      bpf, docs: Add a setion to explain the basic instruction encoding · 62e46838
      Christoph Hellwig authored
      
      
      The eBPF instruction set document does not currently document the basic
      instruction encoding.  Add a section to do that.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220103183556.41040-2-hch@lst.de
      62e46838
    • Daniel Borkmann's avatar
      bpf, selftests: Add verifier test for mem_or_null register with offset. · ca796fe6
      Daniel Borkmann authored
      
      
      Add a new test case with mem_or_null typed register with off > 0 to ensure
      it gets rejected by the verifier:
      
        # ./test_verifier 1011
        #1009/u check with invalid reg offset 0 OK
        #1009/p check with invalid reg offset 0 OK
        Summary: 2 PASSED, 0 SKIPPED, 0 FAILED
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ca796fe6
    • Daniel Borkmann's avatar
      bpf: Don't promote bogus looking registers after null check. · e60b0d12
      Daniel Borkmann authored
      If we ever get to a point again where we convert a bogus looking <ptr>_or_null
      typed register containing a non-zero fixed or variable offset, then lets not
      reset these bounds to zero since they are not and also don't promote the register
      to a <ptr> type, but instead leave it as <ptr>_or_null. Converting to a unknown
      register could be an avenue as well, but then if we run into this case it would
      allow to leak a kernel pointer this way.
      
      Fixes: f1174f77
      
       ("bpf/verifier: rework value tracking")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e60b0d12
    • John Fastabend's avatar
      bpf, sockmap: Fix double bpf_prog_put on error case in map_link · 218d747a
      John Fastabend authored
      sock_map_link() is called to update a sockmap entry with a sk. But, if the
      sock_map_init_proto() call fails then we return an error to the map_update
      op against the sockmap. In the error path though we need to cleanup psock
      and dec the refcnt on any programs associated with the map, because we
      refcnt them early in the update process to ensure they are pinned for the
      psock. (This avoids a race where user deletes programs while also updating
      the map with new socks.)
      
      In current code we do the prog refcnt dec explicitely by calling
      bpf_prog_put() when the program was found in the map. But, after commit
      '38207a5e' in this error path we've already done the prog to psock
      assignment so the programs have a reference from the psock as well. This
      then causes the psock tear down logic, invoked by sk_psock_put() in the
      error path, to similarly call bpf_prog_put on the programs there.
      
      To be explicit this logic does the prog->psock assignment:
      
        if (msg_*)
          psock_set_prog(...)
      
      Then the error path under the out_progs label does a similar check and
      dec with:
      
        if (msg_*)
           bpf_prog_put(...)
      
      And the teardown logic sk_psock_put() does ...
      
        psock_set_prog(msg_*, NULL)
      
      ... triggering another bpf_prog_put(...). Then KASAN gives us this splat,
      found by syzbot because we've created an inbalance between bpf_prog_inc and
      bpf_prog_put calling put twice on the program.
      
        BUG: KASAN: vmalloc-out-of-bounds in __bpf_prog_put kernel/bpf/syscall.c:1812 [inline]
        BUG: KASAN: vmalloc-out-of-bounds in __bpf_prog_put kernel/bpf/syscall.c:1812 [inline] kernel/bpf/syscall.c:1829
        BUG: KASAN: vmalloc-out-of-bounds in bpf_prog_put+0x8c/0x4f0 kernel/bpf/syscall.c:1829 kernel/bpf/syscall.c:1829
        Read of size 8 at addr ffffc90000e76038 by task syz-executor020/3641
      
      To fix clean up error path so it doesn't try to do the bpf_prog_put in the
      error path once progs are assigned then it relies on the normal psock
      tear down logic to do complete cleanup.
      
      For completness we also cover the case whereh sk_psock_init_strp() fails,
      but this is not expected because it indicates an incorrect socket type
      and should be caught earlier.
      
      Fixes: 38207a5e
      
       ("bpf, sockmap: Attach map progs to psock early for feature probes")
      Reported-by: default avatar <syzbot+bb73e71cf4b8fd376a4f@syzkaller.appspotmail.com>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20220104214645.290900-1-john.fastabend@gmail.com
      218d747a
    • John Fastabend's avatar
      bpf, sockmap: Fix return codes from tcp_bpf_recvmsg_parser() · 5b2c5540
      John Fastabend authored
      Applications can be confused slightly because we do not always return the
      same error code as expected, e.g. what the TCP stack normally returns. For
      example on a sock err sk->sk_err instead of returning the sock_error we
      return EAGAIN. This usually means the application will 'try again'
      instead of aborting immediately. Another example, when a shutdown event
      is received we should immediately abort instead of waiting for data when
      the user provides a timeout.
      
      These tend to not be fatal, applications usually recover, but introduces
      bogus errors to the user or introduces unexpected latency. Before
      'c5d2177a' we fell back to the TCP stack when no data was available
      so we managed to catch many of the cases here, although with the extra
      latency cost of calling tcp_msg_wait_data() first.
      
      To fix lets duplicate the error handling in TCP stack into tcp_bpf so
      that we get the same error codes.
      
      These were found in our CI tests that run applications against sockmap
      and do longer lived testing, at least compared to test_sockmap that
      does short-lived ping/pong tests, and in some of our test clusters
      we deploy.
      
      Its non-trivial to do these in a shorter form CI tests that would be
      appropriate for BPF selftests, but we are looking into it so we can
      ensure this keeps working going forward. As a preview one idea is to
      pull in the packetdrill testing which catches some of this.
      
      Fixes: c5d2177a
      
       ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20220104205918.286416-1-john.fastabend@gmail.com
      5b2c5540
    • Hou Tao's avatar
      bpf, arm64: Use emit_addr_mov_i64() for BPF_PSEUDO_FUNC · e4a41c2c
      Hou Tao authored
      The following error is reported when running "./test_progs -t for_each"
      under arm64:
      
        bpf_jit: multi-func JIT bug 58 != 56
        [...]
        JIT doesn't support bpf-to-bpf calls
      
      The root cause is the size of BPF_PSEUDO_FUNC instruction increases
      from 2 to 3 after the address of called bpf-function is settled and
      there are two bpf-to-bpf calls in test_pkt_access. The generated
      instructions are shown below:
      
        0x48:  21 00 C0 D2    movz x1, #0x1, lsl #32
        0x4c:  21 00 80 F2    movk x1, #0x1
      
        0x48:  E1 3F C0 92    movn x1, #0x1ff, lsl #32
        0x4c:  41 FE A2 F2    movk x1, #0x17f2, lsl #16
        0x50:  81 70 9F F2    movk x1, #0xfb84
      
      Fixing it by using emit_addr_mov_i64() for BPF_PSEUDO_FUNC, so
      the size of jited image will not change.
      
      Fixes: 69c087ba
      
       ("bpf: Add bpf_for_each_map_elem() helper")
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20211231151018.3781550-1-houtao1@huawei.com
      e4a41c2c
  2. Jan 05, 2022
    • Jiri Olsa's avatar
      bpf/selftests: Fix namespace mount setup in tc_redirect · 5e22dd18
      Jiri Olsa authored
      
      
      The tc_redirect umounts /sys in the new namespace, which can be
      mounted as shared and cause global umount. The lazy umount also
      takes down mounted trees under /sys like debugfs, which won't be
      available after sysfs mounts again and could cause fails in other
      tests.
      
        # cat /proc/self/mountinfo | grep debugfs
        34 23 0:7 / /sys/kernel/debug rw,nosuid,nodev,noexec,relatime shared:14 - debugfs debugfs rw
        # cat /proc/self/mountinfo | grep sysfs
        23 86 0:22 / /sys rw,nosuid,nodev,noexec,relatime shared:2 - sysfs sysfs rw
        # mount | grep debugfs
        debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
      
        # ./test_progs -t tc_redirect
        #164 tc_redirect:OK
        Summary: 1/4 PASSED, 0 SKIPPED, 0 FAILED
      
        # mount | grep debugfs
        # cat /proc/self/mountinfo | grep debugfs
        # cat /proc/self/mountinfo | grep sysfs
        25 86 0:22 / /sys rw,relatime shared:2 - sysfs sysfs rw
      
      Making the sysfs private under the new namespace so the umount won't
      trigger the global sysfs umount.
      
      Reported-by: default avatarHangbin Liu <haliu@redhat.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Jussi Maki <joamaki@gmail.com>
      Link: https://lore.kernel.org/bpf/20220104121030.138216-1-jolsa@kernel.org
      5e22dd18
    • Paul Chaignon's avatar
      bpftool: Probe for instruction set extensions · 0fd800b2
      Paul Chaignon authored
      This patch introduces new probes to check whether the kernel supports
      instruction set extensions v2 and v3. The first introduced eBPF
      instructions BPF_J{LT,LE,SLT,SLE} in commit 92b31a9a ("bpf: add
      BPF_J{LT,LE,SLT,SLE} instructions"). The second introduces 32-bit
      variants of all jump instructions in commit 092ed096
      
       ("bpf:
      verifier support JMP32").
      
      These probes are useful for userspace BPF projects that want to use newer
      instruction set extensions on newer kernels, to reduce the programs'
      sizes or their complexity. LLVM already provides an mcpu=probe option to
      automatically probe the kernel and select the newest-supported
      instruction set extension. That is however not flexible enough for all
      use cases. For example, in Cilium, we only want to use the v3
      instruction set extension on v5.10+, even though it is supported on all
      kernels v5.1+.
      
      Signed-off-by: default avatarPaul Chaignon <paul@isovalent.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/bpf/3bfedcd9898c1f41ac67ca61f144fec84c6c3a92.1641314075.git.paul@isovalent.com
      0fd800b2
    • Paul Chaignon's avatar
      bpftool: Probe for bounded loop support · c04fb2b0
      Paul Chaignon authored
      This patch introduces a new probe to check whether the verifier supports
      bounded loops as introduced in commit 2589726d
      
       ("bpf: introduce
      bounded loops"). This patch will allow BPF users such as Cilium to probe
      for loop support on startup and only unconditionally unroll loops on
      older kernels.
      
      The results are displayed as part of the miscellaneous section, as shown
      below.
      
        $ bpftool feature probe | grep loops
        Bounded loop support is available
        $ bpftool feature probe macro | grep LOOPS
        #define HAVE_BOUNDED_LOOPS
        $ bpftool feature probe -j | jq .misc
        {
          "have_large_insn_limit": true,
          "have_bounded_loops": true
        }
      
      Signed-off-by: default avatarPaul Chaignon <paul@isovalent.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/bpf/f7807c0b27d79f48e71de7b5a99c680ca4bd0151.1641314075.git.paul@isovalent.com
      c04fb2b0
    • Paul Chaignon's avatar
      bpftool: Refactor misc. feature probe · b22bf1b9
      Paul Chaignon authored
      
      
      There is currently a single miscellaneous feature probe,
      HAVE_LARGE_INSN_LIMIT, to check for the 1M instructions limit in the
      verifier. Subsequent patches will add additional miscellaneous probes,
      which follow the same pattern at the existing probe. This patch
      therefore refactors the probe to avoid code duplication in subsequent
      patches.
      
      The BPF program type and the checked error numbers in the
      HAVE_LARGE_INSN_LIMIT probe are changed to better generalize to other
      probes. The feature probe retains its current behavior despite those
      changes.
      
      Signed-off-by: default avatarPaul Chaignon <paul@isovalent.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/bpf/956c9329a932c75941194f91790d01f31dfbe01b.1641314075.git.paul@isovalent.com
      b22bf1b9
    • David S. Miller's avatar
      Merge branch 'lan966x-extend-switchdev-and-mdb-support' · c5bcdd82
      David S. Miller authored
      
      
      Horatiu Vultur says:
      
      ====================
      net: lan966x: Extend switchdev with mdb support
      
      This patch series extends lan966x with mdb support by implementing
      the switchdev callbacks: SWITCHDEV_OBJ_ID_PORT_MDB and
      SWITCHDEV_OBJ_ID_HOST_MDB.
      It adds support for both ipv4/ipv6 entries and l2 entries.
      
      v2->v3:
      - rename PGID_FIRST and PGID_LAST to PGID_GP_START and PGID_GP_END
      - don't forget and relearn an entry for the CPU if there are more
        references to the cpu.
      
      v1->v2:
      - rename lan966x_mac_learn_impl to __lan966x_mac_learn
      - rename lan966x_mac_cpu_copy to lan966x_mac_ip_learn
      - fix grammar and typos in comments and commit messages
      - add reference counter for entries that copy frames to CPU
      ====================
      
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5bcdd82
    • Horatiu Vultur's avatar
      net: lan966x: Extend switchdev with mdb support · 7aacb894
      Horatiu Vultur authored
      
      
      Extend lan966x driver with mdb support by implementing the switchdev
      calls: SWITCHDEV_OBJ_ID_PORT_MDB and SWITCHDEV_OBJ_ID_HOST_MDB.
      It is allowed to add both ipv4/ipv6 entries and l2 entries. To add
      ipv4/ipv6 entries is not required to use the PGID table while for l2
      entries it is required. The PGID table is much smaller than MAC table
      so only fewer l2 entries can be added.
      
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7aacb894
    • Horatiu Vultur's avatar
      net: lan966x: Add PGID_GP_START and PGID_GP_END · 11b0a277
      Horatiu Vultur authored
      
      
      The first entries in the PGID table are used by the front ports while
      the last entries are used for different purposes like flooding mask,
      copy to CPU, etc. So add these macros to define which entries can be
      used for general purpose.
      
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11b0a277
    • Horatiu Vultur's avatar
      net: lan966x: Add function lan966x_mac_ip_learn() · fc0c3fe7
      Horatiu Vultur authored
      
      
      Extend mac functionality with the function lan966x_mac_ip_learn. This
      function adds an entry in the MAC table for IP multicast addresses.
      These entries can copy a frame to the CPU but also can forward on the
      front ports.
      This functionality is needed for mdb support. In case the CPU and some
      of the front ports subscribe to an IP multicast address.
      
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc0c3fe7
    • David S. Miller's avatar
      Merge branch 'mtk_eth_soc-refactoring-and-clause45' · 2a5ab39b
      David S. Miller authored
      
      
      Daniel Golle says:
      
      ====================
      net: ethernet: mtk_eth_soc: refactoring and Clause 45
      
      Rework value and type of mdio read and write functions in mtk_eth_soc
      and generally clean up and unify both functions.
      Then add support to access Clause 45 phy registers, using newly
      introduced helper inline functions added by a patch Russell King has
      suggested in a reply to an earlier version of this series [1].
      
      All three commits are tested on the Bananapi BPi-R64 board having
      MediaTek MT7531BE DSA gigE switch using clause 22 MDIO and
      Ubiquiti UniFi 6 LR access point having Aquantia AQR112C PHY using
      clause 45 MDIO.
      
      [1]: https://lore.kernel.org/netdev/Ycr5Cna76eg2B0An@shell.armlinux.org.uk/
      
      v11: also address return value of mtk_mdio_busy_wait
      v10: correct order of SoB lines in 2/3, change patch order in series
      v9: improved formatting and Cc missing maintainer
      v8: add patch from Russel King, switch to bitfield helper macros
      v7: remove unneeded variables and order OR-ed call parameters
      v6: further clean up functions and more cleanly separate patches
      v5: fix wrong variable name in first patch covered by follow-up patch
      v4: clean-up return values and types, split into two commits
      v3: return -1 instead of 0xffff on error in _mtk_mdio_write
      v2: use MII_DEVADDR_C45_SHIFT and MII_REGADDR_C45_MASK to extract
          device id and register address. Unify read and write functions to
          have identical types and parameter names where possible as we are
          anyway already replacing both function bodies.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a5ab39b
    • Daniel Golle's avatar
      net: ethernet: mtk_eth_soc: implement Clause 45 MDIO access · e2e7f6e2
      Daniel Golle authored
      
      
      Implement read and write access to IEEE 802.3 Clause 45 Ethernet
      phy registers while making use of new mdiobus_c45_regad and
      mdiobus_c45_devad helpers.
      
      Tested on the Ubiquiti UniFi 6 LR access point featuring
      MediaTek MT7622BV WiSoC with Aquantia AQR112C.
      
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e2e7f6e2
    • Russell King (Oracle)'s avatar
      net: mdio: add helpers to extract clause 45 regad and devad fields · c6af53f0
      Russell King (Oracle) authored
      
      
      Add a couple of helpers and definitions to extract the clause 45 regad
      and devad fields from the regnum passed into MDIO drivers.
      
      Tested-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6af53f0
    • Daniel Golle's avatar
      net: ethernet: mtk_eth_soc: fix return values and refactor MDIO ops · eda80b24
      Daniel Golle authored
      Instead of returning -1 (-EPERM) when MDIO bus is stuck busy
      while writing or 0xffff if it happens while reading, return the
      appropriate -ETIMEDOUT. Also fix return type to int instead of u32.
      Refactor functions to use bitfield helpers instead of having various
      masking and shifting constants in the code, which also results in the
      register definitions in the header file being more obviously related
      to what is stated in the MediaTek's Reference Manual.
      
      Fixes: 656e7052
      
       ("net-next: mediatek: add support for MT7623 ethernet")
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eda80b24
    • M Chetan Kumar's avatar
      Revert "net: wwan: iosm: Keep device at D0 for s2idle case" · ffd32ea6
      M Chetan Kumar authored
      Depending on BIOS configuration IOSM driver exchanges
      protocol required for putting device into D3L2 or D3L1.2.
      
      ipc_pcie_suspend_s2idle() is implemented to put device to D3L1.2.
      
      This patch forces PCI core know this device should stay at D0.
      - pci_save_state()is expensive since it does a lot of slow PCI
      config reads.
      
      The reported issue is not observed on x86 platform. The supurios
      wake on AMD platform needs to be futher debugged with orignal patch
      submitter [1]. Also the impact of adding pci_save_state() needs to be
      assessed by testing it on other platforms.
      
      This reverts commit f4dd5174
      
      ("net: wwan: iosm: Keep device
      at D0 for s2idle case").
      
      [1] https://lore.kernel.org/all/20211224081914.345292-2-kai.heng.feng@canonical.com/
      
      Signed-off-by: default avatarM Chetan Kumar <m.chetan.kumar@linux.intel.com>
      Link: https://lore.kernel.org/r/20220104150213.1894-1-m.chetan.kumar@linux.intel.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ffd32ea6
    • Jakub Kicinski's avatar
      Merge tag 'mac80211-next-for-net-next-2022-01-04' of... · 18343b80
      Jakub Kicinski authored
      
      Merge tag 'mac80211-next-for-net-next-2022-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
      
      Johannes Berg says:
      
      ====================
      Just a few more changes:
       - mac80211: allow non-standard VHT MCSes 10/11
       - mac80211: add sleepable station iterator for drivers
       - nl80211: clarify a comment
       - mac80211: small cleanup to use typed element helpers
      
      * tag 'mac80211-next-for-net-next-2022-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next:
        mac80211: use ieee80211_bss_get_elem()
        nl80211: clarify comment for mesh PLINK_BLOCKED state
        mac80211: Add stations iterator where the iterator function may sleep
        mac80211: allow non-standard VHT MCS-10/11
      ====================
      
      Link: https://lore.kernel.org/r/20220104153403.69749-1-johannes@sipsolutions.net
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      18343b80
  3. Jan 04, 2022
  4. Jan 03, 2022