Skip to content
  1. Sep 05, 2019
  2. Sep 04, 2019
  3. Sep 03, 2019
  4. Aug 31, 2019
    • Daniel Borkmann's avatar
      Merge branch 'bpf-xdp-unaligned-chunk' · bdb15a29
      Daniel Borkmann authored
      Kevin Laatz says:
      
      ====================
      This patch set adds the ability to use unaligned chunks in the XDP umem.
      
      Currently, all chunk addresses passed to the umem are masked to be chunk
      size aligned (max is PAGE_SIZE). This limits where we can place chunks
      within the umem as well as limiting the packet sizes that are supported.
      
      The changes in this patch set removes these restrictions, allowing XDP to
      be more flexible in where it can place a chunk within a umem. By relaxing
      where the chunks can be placed, it allows us to use an arbitrary buffer
      size and place that wherever we have a free address in the umem. These
      changes add the ability to support arbitrary frame sizes up to 4k
      (PAGE_SIZE) and make it easy to integrate with other existing frameworks
      that have their own memory management systems, such as DPDK.
      In DPDK, for example, there is already support for AF_XDP with zero-copy.
      However, with this patch set the integration will be much more seamless.
      You can find the DPDK AF_XDP driver at:
      https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
      
      Since we are now dealing with arbitrary frame sizes, we need also need to
      update how we pass around addresses. Currently, the addresses can simply be
      masked to 2k to get back to the original address. This becomes less trivial
      when using frame sizes that are not a 'power of 2' size. This patch set
      modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
      field for an offset value, leaving the lower 48-bits for the address (this
      leaves us with 256 Terabytes, which should be enough!). We only need to use
      the upper 16-bits to store the offset when running in unaligned mode.
      Rather than adding the offset (headroom etc) to the address, we will store
      it in the upper 16-bits of the address field. This way, we can easily add
      the offset to the address where we need it, using some bit manipulation and
      addition, and we can also easily get the original address wherever we need
      it (for example in i40e_zca_free) by simply masking to get the lower
      48-bits of the address field.
      
      The patch set was tested with the following set up:
        - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
        - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
        - Driver: i40e
        - Application: xdpsock with l2fwd (single interface)
        - Turbo disabled in BIOS
      
      There are no changes to performance before and after these patches for SKB
      mode and Copy mode. Zero-copy mode saw a performance degradation of ~1.5%.
      
      This patch set has been applied against
      commit 0bb52b0d
      
       ("tools: bpftool: add 'bpftool map freeze' subcommand")
      
      Structure of the patch set:
      
      Patch 1:
        - Remove unnecessary masking and headroom addition during zero-copy Rx
          buffer recycling in i40e. This change is required in order for the
          buffer recycling to work in the unaligned chunk mode.
      
      Patch 2:
        - Remove unnecessary masking and headroom addition during
          zero-copy Rx buffer recycling in ixgbe. This change is required in
          order for the  buffer recycling to work in the unaligned chunk mode.
      
      Patch 3:
        - Add infrastructure for unaligned chunks. Since we are dealing with
          unaligned chunks that could potentially cross a physical page boundary,
          we add checks to keep track of that information. We can later use this
          information to correctly handle buffers that are placed at an address
          where they cross a page boundary.  This patch also modifies the
          existing Rx and Tx functions to use the new descriptor format. To
          handle addresses correctly, we need to mask appropriately based on
          whether we are in aligned or unaligned mode.
      
      Patch 4:
        - This patch updates the i40e driver to make use of the new descriptor
          format.
      
      Patch 5:
        - This patch updates the ixgbe driver to make use of the new descriptor
          format.
      
      Patch 6:
        - This patch updates the mlx5e driver to make use of the new descriptor
          format. These changes are required to handle the new descriptor format
          and for unaligned chunks support.
      
      Patch 7:
        - This patch allows XSK frames smaller than page size in the mlx5e
          driver. Relax the requirements to the XSK frame size to allow it to be
          smaller than a page and even not a power of two. The current
          implementation can work in this mode, both with Striding RQ and without
          it.
      
      Patch 8:
        - Add flags for umem configuration to libbpf. Since we increase the size
          of the struct by adding flags, we also need to add the ABI versioning
          in this patch.
      
      Patch 9:
        - Modify xdpsock application to add a command line option for
          unaligned chunks
      
      Patch 10:
        - Since we can now run the application in unaligned chunk mode, we need
          to make sure we recycle the buffers appropriately.
      
      Patch 11:
        - Adds hugepage support to the xdpsock application
      
      Patch 12:
        - Documentation update to include the unaligned chunk scenario. We need
          to explicitly state that the incoming addresses are only masked in the
          aligned chunk mode and not the unaligned chunk mode.
      
      v2:
        - fixed checkpatch issues
        - fixed Rx buffer recycling for unaligned chunks in xdpsock
        - removed unused defines
        - fixed how chunk_size is calculated in xsk_diag.c
        - added some performance numbers to cover letter
        - modified descriptor format to make it easier to retrieve original
          address
        - removed patch adding off_t off to the zero copy allocator. This is no
          longer needed with the new descriptor format.
      
      v3:
        - added patch for mlx5 driver changes needed for unaligned chunks
        - moved offset handling to new helper function
        - changed value used for the umem chunk_mask. Now using the new
          descriptor format to save us doing the calculations in a number of
          places meaning more of the code is left unchanged while adding
          unaligned chunk support.
      
      v4:
        - reworked the next_pg_contig field in the xdp_umem_page struct. We now
          use the low 12 bits of the addr for flags rather than adding an extra
          field in the struct.
        - modified unaligned chunks flag define
        - fixed page_start calculation in __xsk_rcv_memcpy().
        - move offset handling to the xdp_umem_get_* functions
        - modified the len field in xdp_umem_reg struct. We now use 16 bits from
          this for the flags field.
        - fixed headroom addition to handle in the mlx5e driver
        - other minor changes based on review comments
      
      v5:
        - Added ABI versioning in the libbpf patch
        - Removed bitfields in the xdp_umem_reg struct. Adding new flags field.
        - Added accessors for getting addr and offset.
        - Added helper function for adding the offset to the addr.
        - Fixed conflicts with 'bpf-af-xdp-wakeup' which was merged recently.
        - Fixed typo in mlx driver patch.
        - Moved libbpf patch to later in the set (7/11, just before the sample
          app changes)
      
      v6:
        - Added support for XSK frames smaller than page in mlx5e driver (Maxim
          Mikityanskiy <maximmi@mellanox.com).
        - Fixed offset handling in xsk_generic_rcv.
        - Added check for base address in xskq_is_valid_addr_unaligned.
      ====================
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      bdb15a29
    • Kevin Laatz's avatar
      doc/af_xdp: include unaligned chunk case · d57f172c
      Kevin Laatz authored
      
      
      The addition of unaligned chunks mode, the documentation needs to be
      updated to indicate that the incoming addr to the fill ring will only be
      masked if the user application is run in the aligned chunk mode. This patch
      also adds a line to explicitly indicate that the incoming addr will not be
      masked if running the user application in the unaligned chunk mode.
      
      Signed-off-by: default avatarKevin Laatz <kevin.laatz@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      d57f172c
    • Kevin Laatz's avatar
      samples/bpf: use hugepages in xdpsock app · 3945b37a
      Kevin Laatz authored
      
      
      This patch modifies xdpsock to use mmap instead of posix_memalign. With
      this change, we can use hugepages when running the application in unaligned
      chunks mode. Using hugepages makes it more likely that we have physically
      contiguous memory, which supports the unaligned chunk mode better.
      
      Signed-off-by: default avatarKevin Laatz <kevin.laatz@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      3945b37a
    • Kevin Laatz's avatar
      samples/bpf: add buffer recycling for unaligned chunks to xdpsock · 03895e63
      Kevin Laatz authored
      
      
      This patch adds buffer recycling support for unaligned buffers. Since we
      don't mask the addr to 2k at umem_reg in unaligned mode, we need to make
      sure we give back the correct (original) addr to the fill queue. We achieve
      this using the new descriptor format and associated masks. The new format
      uses the upper 16-bits for the offset and the lower 48-bits for the addr.
      Since we have a field for the offset, we no longer need to modify the
      actual address. As such, all we have to do to get back the original address
      is mask for the lower 48 bits (i.e. strip the offset and we get the address
      on it's own).
      
      Signed-off-by: default avatarKevin Laatz <kevin.laatz@intel.com>
      Signed-off-by: default avatarBruce Richardson <bruce.richardson@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      03895e63
    • Kevin Laatz's avatar
      samples/bpf: add unaligned chunks mode support to xdpsock · c543f546
      Kevin Laatz authored
      
      
      This patch adds support for the unaligned chunks mode. The addition of the
      unaligned chunks option will allow users to run the application with more
      relaxed chunk placement in the XDP umem.
      
      Unaligned chunks mode can be used with the '-u' or '--unaligned' command
      line options.
      
      Signed-off-by: default avatarKevin Laatz <kevin.laatz@intel.com>
      Signed-off-by: default avatarCiara Loftus <ciara.loftus@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c543f546
    • Kevin Laatz's avatar
      libbpf: add flags to umem config · 10d30e30
      Kevin Laatz authored
      
      
      This patch adds a 'flags' field to the umem_config and umem_reg structs.
      This will allow for more options to be added for configuring umems.
      
      The first use for the flags field is to add a flag for unaligned chunks
      mode. These flags can either be user-provided or filled with a default.
      
      Since we change the size of the xsk_umem_config struct, we need to version
      the ABI. This patch includes the ABI versioning for xsk_umem__create. The
      Makefile was also updated to handle multiple function versions in
      check-abi.
      
      Signed-off-by: default avatarKevin Laatz <kevin.laatz@intel.com>
      Signed-off-by: default avatarCiara Loftus <ciara.loftus@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      10d30e30
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Allow XSK frames smaller than a page · 282c0c79
      Maxim Mikityanskiy authored
      
      
      Relax the requirements to the XSK frame size to allow it to be smaller
      than a page and even not a power of two. The current implementation can
      work in this mode, both with Striding RQ and without it.
      
      The code that checks `mtu + headroom <= XSK frame size` is modified
      accordingly. Any frame size between 2048 and PAGE_SIZE is accepted.
      
      Functions that worked with pages only now work with XSK frames, even if
      their size is different from PAGE_SIZE.
      
      With XSK queues, regardless of the frame size, Striding RQ uses the
      stride size of PAGE_SIZE, and UMR MTTs are posted using starting
      addresses of frames, but PAGE_SIZE as page size. MTU guarantees that no
      packet data will overlap with other frames. UMR MTT size is made equal
      to the stride size of the RQ, because UMEM frames may come in random
      order, and we need to handle them one by one. PAGE_SIZE is just a power
      of two that is bigger than any allowed XSK frame size, and also it
      doesn't require making additional changes to the code.
      
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      282c0c79
    • Kevin Laatz's avatar
      mlx5e: modify driver for handling offsets · beb3e4b2
      Kevin Laatz authored
      
      
      With the addition of the unaligned chunks option, we need to make sure we
      handle the offsets accordingly based on the mode we are currently running
      in. This patch modifies the driver to appropriately mask the address for
      each case.
      
      Signed-off-by: default avatarKevin Laatz <kevin.laatz@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      beb3e4b2
    • Kevin Laatz's avatar
      ixgbe: modify driver for handling offsets · d8c3061e
      Kevin Laatz authored
      
      
      With the addition of the unaligned chunks option, we need to make sure we
      handle the offsets accordingly based on the mode we are currently running
      in. This patch modifies the driver to appropriately mask the address for
      each case.
      
      Signed-off-by: default avatarKevin Laatz <kevin.laatz@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      d8c3061e
    • Kevin Laatz's avatar
      i40e: modify driver for handling offsets · 2f86c806
      Kevin Laatz authored
      
      
      With the addition of the unaligned chunks option, we need to make sure we
      handle the offsets accordingly based on the mode we are currently running
      in. This patch modifies the driver to appropriately mask the address for
      each case.
      
      Signed-off-by: default avatarBruce Richardson <bruce.richardson@intel.com>
      Signed-off-by: default avatarKevin Laatz <kevin.laatz@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      2f86c806
    • Kevin Laatz's avatar
      xsk: add support to allow unaligned chunk placement · c05cd364
      Kevin Laatz authored
      
      
      Currently, addresses are chunk size aligned. This means, we are very
      restricted in terms of where we can place chunk within the umem. For
      example, if we have a chunk size of 2k, then our chunks can only be placed
      at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
      
      This patch introduces the ability to use unaligned chunks. With these
      changes, we are no longer bound to having to place chunks at a 2k (or
      whatever your chunk size is) interval. Since we are no longer dealing with
      aligned chunks, they can now cross page boundaries. Checks for page
      contiguity have been added in order to keep track of which pages are
      followed by a physically contiguous page.
      
      Signed-off-by: default avatarKevin Laatz <kevin.laatz@intel.com>
      Signed-off-by: default avatarCiara Loftus <ciara.loftus@intel.com>
      Signed-off-by: default avatarBruce Richardson <bruce.richardson@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c05cd364
    • Kevin Laatz's avatar
      ixgbe: simplify Rx buffer recycle · b35a2d3e
      Kevin Laatz authored
      
      
      Currently, the dma, addr and handle are modified when we reuse Rx buffers
      in zero-copy mode. However, this is not required as the inputs to the
      function are copies, not the original values themselves. As we use the
      copies within the function, we can use the original 'obi' values
      directly without having to mask and add the headroom.
      
      Signed-off-by: default avatarKevin Laatz <kevin.laatz@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      b35a2d3e
    • Kevin Laatz's avatar
      i40e: simplify Rx buffer recycle · 10912fc9
      Kevin Laatz authored
      
      
      Currently, the dma, addr and handle are modified when we reuse Rx buffers
      in zero-copy mode. However, this is not required as the inputs to the
      function are copies, not the original values themselves. As we use the
      copies within the function, we can use the original 'old_bi' values
      directly without having to mask and add the headroom.
      
      Signed-off-by: default avatarKevin Laatz <kevin.laatz@intel.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      10912fc9
    • Masanari Iida's avatar
      selftests/bpf: Fix a typo in test_offload.py · 1c6d6e02
      Masanari Iida authored
      
      
      This patch fix a spelling typo in test_offload.py
      
      Signed-off-by: default avatarMasanari Iida <standby24x7@gmail.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      1c6d6e02
    • Petar Penkov's avatar
      bpf: fix error check in bpf_tcp_gen_syncookie · 0741be35
      Petar Penkov authored
      If a SYN cookie is not issued by tcp_v#_gen_syncookie, then the return
      value will be exactly 0, rather than <= 0. Let's change the check to
      reflect that, especially since mss is an unsigned value and cannot be
      negative.
      
      Fixes: 70d66244
      
       ("bpf: add bpf_tcp_gen_syncookie helper")
      Reported-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarPetar Penkov <ppenkov@google.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      0741be35
    • Daniel Borkmann's avatar
      Merge branch 'bpf-nfp-map-op-cache' · 736a5530
      Daniel Borkmann authored
      
      
      Jakub Kicinski says:
      
      ====================
      This set adds a small batching and cache mechanism to the driver.
      Map dumps require two operations per element - get next, and
      lookup. Each of those needs a round trip to the device, and on
      a loaded system scheduling out and in of the dumping process.
      This set makes the driver request a number of entries at the same
      time, and if no operation which would modify the map happens
      from the host side those entries are used to serve lookup
      requests for up to 250us, at which point they are considered
      stale.
      
      This set has been measured to provide almost 4x dumping speed
      improvement, Jaco says:
      
      OLD dump times
          500 000 elements: 26.1s
        1 000 000 elements: 54.5s
      
      NEW dump times
          500 000 elements: 7.6s
        1 000 000 elements: 16.5s
      ====================
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      736a5530
    • Jakub Kicinski's avatar
      nfp: bpf: add simple map op cache · f24e2909
      Jakub Kicinski authored
      
      
      Each get_next and lookup call requires a round trip to the device.
      However, the device is capable of giving us a few entries back,
      instead of just one.
      
      In this patch we ask for a small yet reasonable number of entries
      (4) on every get_next call, and on subsequent get_next/lookup calls
      check this little cache for a hit. The cache is only kept for 250us,
      and is invalidated on every operation which may modify the map
      (e.g. delete or update call). Note that operations may be performed
      simultaneously, so we have to keep track of operations in flight.
      
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f24e2909
    • Jakub Kicinski's avatar
      nfp: bpf: rework MTU checking · bc2796db
      Jakub Kicinski authored
      
      
      If control channel MTU is too low to support map operations a warning
      will be printed. This is not enough, we want to make sure probe fails
      in such scenario, as this would clearly be a faulty configuration.
      
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      bc2796db
    • Daniel Borkmann's avatar
      Merge branch 'bpf-bpftool-build-improvements' · c5a2c734
      Daniel Borkmann authored
      
      
      Quentin Monnet says:
      
      ====================
      This set attempts to make it easier to build bpftool, in particular when
      passing a specific output directory. This is a follow-up to the
      conversation held last month by Lorenz, Ilya and Jakub [0].
      
      The first patch is a minor fix to bpftool's Makefile, regarding the
      retrieval of kernel version (which currently prints a non-relevant make
      warning on some invocations).
      
      Second patch improves the Makefile commands to support more "make"
      invocations, or to fix building with custom output directory. On Jakub's
      suggestion, a script is also added to BPF selftests in order to keep track
      of the supported build variants.
      
      Building bpftool with "make tools/bpf" from the top of the repository
      generates files in "libbpf/" and "feature/" directories under tools/bpf/
      and tools/bpf/bpftool/. The third patch ensures such directories are taken
      care of on "make clean", and add them to the relevant .gitignore files.
      
      At last, fourth patch is a sligthly modified version of Ilya's fix
      regarding libbpf.a appearing twice on the linking command for bpftool.
      
      [0] https://lore.kernel.org/bpf/CACAyw9-CWRHVH3TJ=Tke2x8YiLsH47sLCijdp=V+5M836R9aAA@mail.gmail.com/
      
      v2:
      - Return error from check script if one of the make invocations returns
        non-zero (even if binary is successfully produced).
      - Run "make clean" from bpf/ and not only bpf/bpftool/ in that same script,
        when relevant.
      - Add a patch to clean up generated "feature/" and "libbpf/" directories.
      ====================
      
      Acked-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Tested-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Cc: Lorenz Bauer <lmb@cloudflare.com>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c5a2c734
    • Quentin Monnet's avatar
      tools: bpftool: do not link twice against libbpf.a in Makefile · 5b84ad2e
      Quentin Monnet authored
      
      
      In bpftool's Makefile, $(LIBS) includes $(LIBBPF), therefore the library
      is used twice in the linking command. No need to have $(LIBBPF) (from
      $^) on that command, let's do with "$(OBJS) $(LIBS)" (but move $(LIBBPF)
      _before_ the -l flags in $(LIBS)).
      
      Signed-off-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      5b84ad2e
    • Quentin Monnet's avatar
      tools: bpf: account for generated feature/ and libbpf/ directories · fbdb620b
      Quentin Monnet authored
      
      
      When building "tools/bpf" from the top of the Linux repository, the
      build system passes a value for the $(OUTPUT) Makefile variable to
      tools/bpf/Makefile and tools/bpf/bpftool/Makefile, which results in
      generating "libbpf/" (for bpftool) and "feature/" (bpf and bpftool)
      directories inside the tree.
      
      This commit adds such directories to the relevant .gitignore files, and
      edits the Makefiles to ensure they are removed on "make clean". The use
      of "rm" is also made consistent throughout those Makefiles (relies on
      the $(RM) variable, use "--" to prevent interpreting
      $(OUTPUT)/$(DESTDIR) as options.
      
      v2:
      - New patch.
      
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      fbdb620b
    • Quentin Monnet's avatar
      tools: bpftool: improve and check builds for different make invocations · 45c5589d
      Quentin Monnet authored
      
      
      There are a number of alternative "make" invocations that can be used to
      compile bpftool. The following invocations are expected to work:
      
        - through the kbuild system, from the top of the repository
          (make tools/bpf)
        - by telling make to change to the bpftool directory
          (make -C tools/bpf/bpftool)
        - by building the BPF tools from tools/
          (cd tools && make bpf)
        - by running make from bpftool directory
          (cd tools/bpf/bpftool && make)
      
      Additionally, setting the O or OUTPUT variables should tell the build
      system to use a custom output path, for each of these alternatives.
      
      The following patch fixes the following invocations:
      
        $ make tools/bpf
        $ make tools/bpf O=<dir>
        $ make -C tools/bpf/bpftool OUTPUT=<dir>
        $ make -C tools/bpf/bpftool O=<dir>
        $ cd tools/ && make bpf O=<dir>
        $ cd tools/bpf/bpftool && make OUTPUT=<dir>
        $ cd tools/bpf/bpftool && make O=<dir>
      
      After this commit, the build still fails for two variants when passing
      the OUTPUT variable:
      
        $ make tools/bpf OUTPUT=<dir>
        $ cd tools/ && make bpf OUTPUT=<dir>
      
      In order to remember and check what make invocations are supposed to
      work, and to document the ones which do not, a new script is added to
      the BPF selftests. Note that some invocations require the kernel to be
      configured, so the script skips them if no .config file is found.
      
      v2:
      - In make_and_clean(), set $ERROR to 1 when "make" returns non-zero,
        even if the binary was produced.
      - Run "make clean" from the correct directory (bpf/ instead of bpftool/,
        when relevant).
      
      Reported-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      45c5589d
    • Quentin Monnet's avatar
      tools: bpftool: ignore make built-in rules for getting kernel version · e0a43aa3
      Quentin Monnet authored
      
      
      Bpftool calls the toplevel Makefile to get the kernel version for the
      sources it is built from. But when the utility is built from the top of
      the kernel repository, it may dump the following error message for
      certain architectures (including x86):
      
          $ make tools/bpf
          [...]
          make[3]: *** [checkbin] Error 1
          [...]
      
      This does not prevent bpftool compilation, but may feel disconcerting.
      The "checkbin" arch-dependent target is not supposed to be called for
      target "kernelversion", which is a simple "echo" of the version number.
      
      It turns out this is caused by the make invocation in tools/bpf/bpftool,
      which attempts to find implicit rules to apply. Extract from debug
      output:
      
          Reading makefiles...
          Reading makefile 'Makefile'...
          Reading makefile 'scripts/Kbuild.include' (search path) (no ~ expansion)...
          Reading makefile 'scripts/subarch.include' (search path) (no ~ expansion)...
          Reading makefile 'arch/x86/Makefile' (search path) (no ~ expansion)...
          Reading makefile 'scripts/Makefile.kcov' (search path) (no ~ expansion)...
          Reading makefile 'scripts/Makefile.gcc-plugins' (search path) (no ~ expansion)...
          Reading makefile 'scripts/Makefile.kasan' (search path) (no ~ expansion)...
          Reading makefile 'scripts/Makefile.extrawarn' (search path) (no ~ expansion)...
          Reading makefile 'scripts/Makefile.ubsan' (search path) (no ~ expansion)...
          Updating makefiles....
           Considering target file 'scripts/Makefile.ubsan'.
            Looking for an implicit rule for 'scripts/Makefile.ubsan'.
            Trying pattern rule with stem 'Makefile.ubsan'.
          [...]
            Trying pattern rule with stem 'Makefile.ubsan'.
            Trying implicit prerequisite 'scripts/Makefile.ubsan.o'.
            Looking for a rule with intermediate file 'scripts/Makefile.ubsan.o'.
             Avoiding implicit rule recursion.
             Trying pattern rule with stem 'Makefile.ubsan'.
             Trying rule prerequisite 'prepare'.
             Trying rule prerequisite 'FORCE'.
            Found an implicit rule for 'scripts/Makefile.ubsan'.
              Considering target file 'prepare'.
               File 'prepare' does not exist.
                Considering target file 'prepare0'.
                 File 'prepare0' does not exist.
                  Considering target file 'archprepare'.
                   File 'archprepare' does not exist.
                    Considering target file 'archheaders'.
                     File 'archheaders' does not exist.
                     Finished prerequisites of target file 'archheaders'.
                    Must remake target 'archheaders'.
          Putting child 0x55976f4f6980 (archheaders) PID 31743 on the chain.
      
      To avoid that, pass the -r and -R flags to eliminate the use of make
      built-in rules (and while at it, built-in variables) when running
      command "make kernelversion" from bpftool's Makefile.
      
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      e0a43aa3
    • Yauheni Kaliuta's avatar
      bpf: s390: add JIT support for multi-function programs · 1c8f9b91
      Yauheni Kaliuta authored
      This adds support for bpf-to-bpf function calls in the s390 JIT
      compiler. The JIT compiler converts the bpf call instructions to
      native branch instructions. After a round of the usual passes, the
      start addresses of the JITed images for the callee functions are
      known. Finally, to fixup the branch target addresses, we need to
      perform an extra pass.
      
      Because of the address range in which JITed images are allocated on
      s390, the offsets of the start addresses of these images from
      __bpf_call_base are as large as 64 bits. So, for a function call,
      the imm field of the instruction cannot be used to determine the
      callee's address. Use bpf_jit_get_func_addr() helper instead.
      
      The patch borrows a lot from:
      
      commit 8c11ea5c ("bpf, arm64: fix getting subprog addr from aux
      for calls")
      
      commit e2c95a61 ("bpf, ppc64: generalize fetching subprog into
      bpf_jit_get_func_addr")
      
      commit 8484ce83
      
       ("bpf: powerpc64: add JIT support for
      multi-function programs")
      
      (including the commit message).
      
      test_verifier (5.3-rc6 with CONFIG_BPF_JIT_ALWAYS_ON=y):
      
      without patch:
      Summary: 1501 PASSED, 0 SKIPPED, 47 FAILED
      
      with patch:
      Summary: 1540 PASSED, 0 SKIPPED, 8 FAILED
      
      Signed-off-by: default avatarYauheni Kaliuta <yauheni.kaliuta@redhat.com>
      Acked-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Tested-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      1c8f9b91
  5. Aug 28, 2019