Skip to content
  1. Aug 04, 2017
  2. Aug 03, 2017
    • David S. Miller's avatar
      Merge branch 'per-nexthop-offload' · 5a4d148f
      David S. Miller authored
      
      
      Jiri Pirko says:
      
      ====================
      ipv4: fib: Provide per-nexthop offload indication
      
      Ido says:
      
      Offload indication for IPv4 routes is currently set in the FIB info's
      flags. When multipath routes are employed, this can lead to a route being
      marked as offloaded although only one of its nexthops is actually
      offloaded.
      
      Instead, this patchset aims to proivde a higher resolution for the offload
      indication and report it on a per-nexthop basis.
      
      Example output from patched iproute:
      
      $ ip route show 192.168.200.0/24
      192.168.200.0/24
              nexthop via 192.168.100.2 dev enp3s0np7 weight 1 offload
              nexthop via 192.168.101.3 dev enp3s0np8 weight 1
      
      And once the second gateway is resolved:
      
      $ ip route show 192.168.200.0/24
      192.168.200.0/24
              nexthop via 192.168.100.2 dev enp3s0np7 weight 1 offload
              nexthop via 192.168.101.3 dev enp3s0np8 weight 1 offload
      
      First patch teaches the kernel to look for the offload indication in the
      nexthop flags. Patches 2-5 adjust current capable drivers to provide
      offload indication on a per-nexthop basis. Last patch removes no longer
      used functions to set offload indication in the FIB info's flags.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a4d148f
    • Ido Schimmel's avatar
      ipv4: fib: Remove unused functions · 2202e35d
      Ido Schimmel authored
      
      
      Previous patches converted users of these functions to provide offload
      indication using the nexthop's flags instead of the FIB info's.
      
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2202e35d
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Refresh offload indication upon group refresh · 77d964e6
      Ido Schimmel authored
      
      
      Now that we provide offload indication using the nexthop's flags we must
      refresh the offload indication whenever the offload state within the
      group changes.
      
      This didn't matter until now, as offload indication was provided using
      the FIB info flags and multipath routes were marked as offloaded as long
      as one of the nexthops was offloaded.
      
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Tested-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77d964e6
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Don't check state when refreshing offload indication · 1353ee70
      Ido Schimmel authored
      
      
      Previous patch removed the reliance on the counter in the FIB info to
      set the offload indication, so we no longer need to keep an offload
      state on each FIB entry and can just set or unset the RTNH_F_OFFLOAD
      flag in each nexthop.
      
      This is also necessary because we're going to need to refresh the
      offload indication whenever the nexthop group associated with the FIB
      entry is refreshed. Current check would prevent us from marking a newly
      resolved nexthop as offloaded if the FIB entry is already marked as
      offloaded.
      
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Tested-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1353ee70
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Provide offload indication using nexthop flags · 3984d1a8
      Ido Schimmel authored
      
      
      In a similar fashion to previous patch, use the nexthop flags to provide
      offload indication instead of the FIB info's flags.
      
      In case a nexthop in a multipath route can't be offloaded (gateway's MAC
      can't be resolved, for example), then its offload flag isn't set.
      
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Tested-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3984d1a8
    • Ido Schimmel's avatar
      rocker: Provide offload indication using nexthop flags · 63e701c1
      Ido Schimmel authored
      
      
      We want to stop using the FIB info's flags to provide the offlaod
      indication and instead do that on a per-nexthop basis.
      
      Convert rocker to do just that. It only supports one nexthop per-route,
      so conversion is simple.
      
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63e701c1
    • Ido Schimmel's avatar
      ipv4: fib: Set offload indication according to nexthop flags · 475abbf1
      Ido Schimmel authored
      
      
      We're going to have capable drivers indicate route offload using the
      nexthop flags, but for non-multipath routes these flags aren't dumped to
      user space.
      
      Instead, set the offload indication in the route message flags.
      
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      475abbf1
    • Ido Schimmel's avatar
      mlxsw: core: Use correct EMAD transaction ID in debug message · 9820355f
      Ido Schimmel authored
      
      
      'trans->tid' is only assigned later in the function, resulting in a zero
      transaction ID. Use 'tid' instead.
      
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9820355f
    • David S. Miller's avatar
      Merge branch 'netvsc-transparent-VF-support' · f6775a28
      David S. Miller authored
      
      
      Stephen Hemminger says:
      
      ====================
      netvsc: transparent VF support
      
      This patch set changes how SR-IOV Virtual Function devices are managed
      in the Hyper-V network driver. This version is rebased onto current net-next.
      
      Background
      
      In Hyper-V SR-IOV can be enabled (and disabled) by changing guest settings
      on host. When SR-IOV is enabled a matching PCI device is hot plugged and
      visible on guest. The VF device is an add-on to an existing netvsc
      device, and has the same MAC address.
      
      How is this different?
      
      The original support of VF relied on using bonding driver in active
      standby mode to handle the VF device.
      
      With the new netvsc VF logic, the Linux hyper-V network
      virtual driver will directly manage the link to SR-IOV VF device.
      When VF device is detected (hot plug) it is automatically made a
      slave device of the netvsc device. The VF device state reflects
      the state of the netvsc device; i.e. if netvsc is set down, then
      VF is set down. If netvsc is set up, then VF is brought up.
      
      Packet flow is independent of VF status; all packets are sent and
      received as if they were associated with the netvsc device. If VF is
      removed or link is down then the synthetic VMBUS path is used.
      
      What was wrong with using bonding script?
      
      A lot of work went into getting the bonding script to work on all
      distributions, but it was a major struggle. Linux network devices
      can be configured many, many ways and there is no one solution from
      userspace to make it all work. What is really hard is when
      configuration is attached to synthetic device during boot (eth0) and
      then the same addresses and firewall rules needs to also work later if
      doing bonding. The new code gets around all of this.
      
      How does VF work during initialization?
      
      Since all packets are sent and received through the logical netvsc
      device, initialization is much easier. Just configure the regular
      netvsc Ethernet device; when/if SR-IOV is enabled it just
      works. Provisioning and cloud init only need to worry about setting up
      netvsc device (eth0). If SR-IOV is enabled (even as a later step), the
      address and rules stay the same.
      
      What devices show up?
      
      Both netvsc and PCI devices are visible in the system. The netvsc
      device is active and named in usual manner (eth0). The PCI device is
      visible to Linux and gets renamed by udev to a persistent name
      (enP2p3s0). The PCI device name is now irrelevant now.
      
      The logic also sets the PCI VF device SLAVE flag on the network
      device so network tools can see the relationship if they are smart
      enough to understand how layered devices work.
      
      This is a lot like how I see Windows working.
      The VF device is visible in Device Manager, but is not configured.
      
      Is there any performance impact?
      There is no visible change in performance. The bonding
      and netvsc driver both have equivalent steps.
      
      Is it compatible with old bonding script?
      
      It turns out that if you use the old bonding script, then everything
      still works but in a sub-optimum manner. What happens is that bonding
      is unable to steal the VF from the netvsc device so it creates a one
      legged bond.  Packet flow then is:
      	bond0 <--> eth0 <- -> VF (enP2p3s0).
      In other words, if you get it wrong it still works, just
      awkward and slower.
      
      What if I add address or firewall rule onto the VF?
      
      Same problems occur with now as already occur with bonding, bridging,
      teaming on Linux if user incorrectly does configuration onto
      an underlying slave device. It will sort of work, packets will come in
      and out but the Linux kernel gets confused and things like ARP don’t
      work right.  There is no way to block manipulation of the slave
      device, and I am sure someone will find some special use case where
      they want it.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6775a28
    • stephen hemminger's avatar
      netvsc: remove bonding setup script · 12aa7469
      stephen hemminger authored
      
      
      No longer needed, now all managed by transparent VF logic.
      
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      12aa7469