Skip to content
  1. May 24, 2018
    • Jakub Kicinski's avatar
      nfp: assign vNIC id as phys_port_name of vNICs which are not ports · 51c1df83
      Jakub Kicinski authored
      
      
      When NFP is modelled as a switch we assign phys_port_name to respective
      port(representor )s:
      
       vNIC0 - | - PF port (pf%d)     MAC/PHY (p%d[s%d]) - |E==
      
      In most cases there is only one vNIC for communication with the switch.
      If there is more than one we need to be able to identify them.  Use %d
      as phys_port_name of the vNICs.
      
      We don't have to pass ID to nfp_net_debugfs_vnic_add() separately any
      more.
      
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51c1df83
    • Jakub Kicinski's avatar
      nfp: use split in naming of PCIe PF ports · 290f54db
      Jakub Kicinski authored
      
      
      PCI PFs can host more than one logical endpoint.  In NFP terms
      this means having more than one vNIC for PCIe PF.  The vNICs
      are usually corresponding 1:1 to Ethernet ports.  In core NIC
      we use the legacy idea of vNIC *being* the Ethernet port,
      hence netdevs put pX(sY) in their phys_port_name, like Ethernet
      ports would.  When ASIC ports are fully represented we need to
      be able to name different PCIe PF ports, too.  Use a scheme
      similar to Ethernet ports - pfXsY, for PCIe PF number X,
      sub-port Y.
      
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      290f54db
    • Jakub Kicinski's avatar
      nfp: abm: force Ethternet port up · 1f700367
      Jakub Kicinski authored
      
      
      Current control firmware does not cater too well to multi-host
      applications.  There is no way to check which hosts are up or
      otherwise negotiate what the state of the external port (the
      Ethernet port) should be.  Make sure the link is up when driver
      loads, and don't take it down when Ethernet port netdev is
      closed.
      
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f700367
    • Jakub Kicinski's avatar
      nfp: abm: spawn port netdevs · d05d902e
      Jakub Kicinski authored
      
      
      To configure buffering points we need full set of netdevs:
      
                                    ASIC
      
       user netdev  -- | -- PCIe port   MAC port -- | --
      
      Configuring egrees qdiscs on user netdev configures standard
      Linux TC software qdiscs, configuring PCIe port qdiscs will
      provide a way of setting ASIC queuing parameters for PCIe block.
      MAC port netdev egress qdiscs correspond to ASIC MAC Traffic
      Manager block.
      
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d05d902e
    • Jakub Kicinski's avatar
      nfp: add devlink_eswitch_mode_set callback · 4afa3af4
      Jakub Kicinski authored
      
      
      Our previous apps all assumed to use only one eswitch mode (legacy
      or switchdev) without the ability to change it.  ABM NIC will
      want to support the switch so plumb devlink_eswitch_mode_set through.
      The devlink_eswitch_mode_set is expected to spawn representors and
      potentially devlink ports so it's called under big devlink lock and
      pf->lock.
      
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4afa3af4
    • Jakub Kicinski's avatar
      devlink: don't take instance lock around eswitch mode set · 7ac1cc9a
      Jakub Kicinski authored
      
      
      Changing switch mode may want to register and unregister devlink
      ports.  Therefore similarly to DEVLINK_CMD_PORT_SPLIT/UNSPLIT it
      should not take the instance lock.  Drivers don't depend on existing
      locking since it's a very recent addition.
      
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ac1cc9a
    • Jakub Kicinski's avatar
      nfp: add app pointer to port representors · 634c6b7a
      Jakub Kicinski authored
      
      
      nfp_apps can currently associate their structures with vNICs but
      not representors.  Add app priv pointer to representors as well.
      
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      634c6b7a
    • Jakub Kicinski's avatar
      nfp: abm: create project-specific vNIC structure · cc54dc28
      Jakub Kicinski authored
      
      
      ABM NIC requires more complex vNIC handling, allocate
      per-vNIC structure.  Find out RX queue base and PCI PF id.
      There will be multiple PFs sharing the same MAC port, therefore
      the MAC address assigned to the vNIC must be looked up in the
      HWInfo database.
      
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc54dc28
    • Jakub Kicinski's avatar
      nfp: abm: add initial active buffer management NIC skeleton · c4c8f39a
      Jakub Kicinski authored
      
      
      Add a very rudimentary active buffer management NIC support.
      For now it's like a core NIC without SR-IOV support.  Next
      commits will extend its functionality.
      
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4c8f39a
    • Jakub Kicinski's avatar
      nfp: core: allow 4-byte aligned accesses to Memory Units · b586c77b
      Jakub Kicinski authored
      
      
      Current code doesn't enforce length requirements on 32bit accesses
      with action NFP_CPP_ACTION_RW to memory units, but if the access
      is only aligned to 4 bytes as well we will fall into the explicit
      access case and error out.  Such accesses are correct, allow them
      by lowering the width earlier.
      
      While at it use a switch statement to improve readability.
      
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b586c77b
    • Jakub Kicinski's avatar
      nfp: add shared buffer configuration · a0d163f4
      Jakub Kicinski authored
      
      
      Allow app FW to advertise its shared buffer pool information.
      Use the per-PF mailbox to configure them from devlink.
      
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0d163f4
    • Jakub Kicinski's avatar
      nfp: add support for per-PCI PF mailbox · 0c693323
      Jakub Kicinski authored
      
      
      When working with devlink-related functionality for locking reasons
      it's easier to create a new mailbox per-PCI PF device than try to
      use one of the netdev/vNIC mailboxes.
      
      Define new mailbox structure and resolve its symbol during probe.
      For forward compatibility allow silent truncation of mailbox command
      data.
      
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c693323
    • Jakub Kicinski's avatar
      nfp: move rtsym helpers to pf code · 8f6196f6
      Jakub Kicinski authored
      
      
      nfp_net_pf_rtsym_read_optional() and nfp_net_pf_map_rtsym() are not
      really related to networking code.  Move them to the PF code and
      remove the net from their names.  They will soon be needed by code
      outside of nfp_net_main.c anyway.
      
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f6196f6
    • David S. Miller's avatar
      Merge branch 'bpfilter' · e95a5f54
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      bpfilter
      
      v2->v3:
      - followed Luis's suggestion and significantly simplied first patch
        with shmem_kernel_file_setup+kernel_write. Added kdoc for new helper
      - fixed typos and race to access pipes with mutex
      - tested with bpfilter being 'builtin'. CONFIG_BPFILTER_UMH=y|m both work.
        Interesting to see a usermode executable being embedded inside vmlinux.
      - it doesn't hurt to enable bpfilter in .config.
        ip_setsockopt commands sent to usermode via pipes and -ENOPROTOOPT is
        returned from userspace, so kernel falls back to original iptables code
      
      v1->v2:
      this patch set is almost a full rewrite of the earlier umh modules approach
      The v1 of patches and follow up discussion was covered by LWN:
      https://lwn.net/Articles/749108/
      
      
      
      I believe the v2 addresses all issues brought up by Andy and others.
      Mainly there are zero changes to kernel/module.c
      Instead of teaching module loading logic to recognize special
      umh module, let normal kernel modules execute part of its own
      .init.rodata as a new user space process (Andy's idea)
      Patch 1 introduces this new helper:
      int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
      Input:
        data + len == executable file
      Output:
        struct umh_info {
             struct file *pipe_to_umh;
             struct file *pipe_from_umh;
             pid_t pid;
        };
      
      Advantages vs v1:
      - the embedded user mode executable is stored as .init.rodata inside
        normal kernel module. These pages are freed when .ko finishes loading
      - the elf file is copied into tmpfs file. The user mode process is swappable.
      - the communication between user mode process and 'parent' kernel module
        is done via two unix pipes, hence protocol is not exposed to
        user space
      - impossible to launch umh on its own (that was the main issue of v1)
        and impossible to be man-in-the-middle due to pipes
      - bpfilter.ko consists of tiny kernel part that passes the data
        between kernel and umh via pipes and much bigger umh part that
        doing all the work
      - 'lsmod' shows bpfilter.ko as usual.
        'rmmod bpfilter' removes kernel module and kills corresponding umh
      - signed bpfilter.ko covers the whole image including umh code
      
      Few issues:
      - the user can still attach to the process and debug it with
        'gdb /proc/pid/exe pid', but 'gdb -p pid' doesn't work.
        (a bit worse comparing to v1)
      - tinyconfig will notice a small increase in .text
        +766 | TEXT | 7c8b94806bec umh: introduce fork_usermode_blob() helper
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e95a5f54
    • Alexei Starovoitov's avatar
      net: add skeleton of bpfilter kernel module · d2ba09c1
      Alexei Starovoitov authored
      
      
      bpfilter.ko consists of bpfilter_kern.c (normal kernel module code)
      and user mode helper code that is embedded into bpfilter.ko
      
      The steps to build bpfilter.ko are the following:
      - main.c is compiled by HOSTCC into the bpfilter_umh elf executable file
      - with quite a bit of objcopy and Makefile magic the bpfilter_umh elf file
        is converted into bpfilter_umh.o object file
        with _binary_net_bpfilter_bpfilter_umh_start and _end symbols
        Example:
        $ nm ./bld_x64/net/bpfilter/bpfilter_umh.o
        0000000000004cf8 T _binary_net_bpfilter_bpfilter_umh_end
        0000000000004cf8 A _binary_net_bpfilter_bpfilter_umh_size
        0000000000000000 T _binary_net_bpfilter_bpfilter_umh_start
      - bpfilter_umh.o and bpfilter_kern.o are linked together into bpfilter.ko
      
      bpfilter_kern.c is a normal kernel module code that calls
      the fork_usermode_blob() helper to execute part of its own data
      as a user mode process.
      
      Notice that _binary_net_bpfilter_bpfilter_umh_start - end
      is placed into .init.rodata section, so it's freed as soon as __init
      function of bpfilter.ko is finished.
      As part of __init the bpfilter.ko does first request/reply action
      via two unix pipe provided by fork_usermode_blob() helper to
      make sure that umh is healthy. If not it will kill it via pid.
      
      Later bpfilter_process_sockopt() will be called from bpfilter hooks
      in get/setsockopt() to pass iptable commands into umh via bpfilter.ko
      
      If admin does 'rmmod bpfilter' the __exit code bpfilter.ko will
      kill umh as well.
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2ba09c1
    • Alexei Starovoitov's avatar
      umh: introduce fork_usermode_blob() helper · 449325b5
      Alexei Starovoitov authored
      
      
      Introduce helper:
      int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
      struct umh_info {
             struct file *pipe_to_umh;
             struct file *pipe_from_umh;
             pid_t pid;
      };
      
      that GPLed kernel modules (signed or unsigned) can use it to execute part
      of its own data as swappable user mode process.
      
      The kernel will do:
      - allocate a unique file in tmpfs
      - populate that file with [data, data + len] bytes
      - user-mode-helper code will do_execve that file and, before the process
        starts, the kernel will create two unix pipes for bidirectional
        communication between kernel module and umh
      - close tmpfs file, effectively deleting it
      - the fork_usermode_blob will return zero on success and populate
        'struct umh_info' with two unix pipes and the pid of the user process
      
      As the first step in the development of the bpfilter project
      the fork_usermode_blob() helper is introduced to allow user mode code
      to be invoked from a kernel module. The idea is that user mode code plus
      normal kernel module code are built as part of the kernel build
      and installed as traditional kernel module into distro specified location,
      such that from a distribution point of view, there is
      no difference between regular kernel modules and kernel modules + umh code.
      Such modules can be signed, modprobed, rmmod, etc. The use of this new helper
      by a kernel module doesn't make it any special from kernel and user space
      tooling point of view.
      
      Such approach enables kernel to delegate functionality traditionally done
      by the kernel modules into the user space processes (either root or !root) and
      reduces security attack surface of the new code. The buggy umh code would crash
      the user process, but not the kernel. Another advantage is that umh code
      of the kernel module can be debugged and tested out of user space
      (e.g. opening the possibility to run clang sanitizers, fuzzers or
      user space test suites on the umh code).
      In case of the bpfilter project such architecture allows complex control plane
      to be done in the user space while bpf based data plane stays in the kernel.
      
      Since umh can crash, can be oom-ed by the kernel, killed by the admin,
      the kernel module that uses them (like bpfilter) needs to manage life
      time of umh on its own via two unix pipes and the pid of umh.
      
      The exit code of such kernel module should kill the umh it started,
      so that rmmod of the kernel module will cleanup the corresponding umh.
      Just like if the kernel module does kmalloc() it should kfree() it
      in the exit code.
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      449325b5
  2. May 23, 2018