Skip to content
  1. Nov 24, 2018
    • Fabio Estevam's avatar
      dt-bindings: dsa: Fix typo in "probed" · e7b9fb4f
      Fabio Estevam authored
      
      
      The correct form is "can be probed", so fix the typo.
      
      Signed-off-by: default avatarFabio Estevam <festevam@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7b9fb4f
    • Lorenzo Bianconi's avatar
      net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue · ef2a7cf1
      Lorenzo Bianconi authored
      Reset snd_queue tso_hdrs pointer to NULL in nicvf_free_snd_queue routine
      since it is used to check if tso dma descriptor queue has been previously
      allocated. The issue can be triggered with the following reproducer:
      
      $ip link set dev enP2p1s0v0 xdpdrv obj xdp_dummy.o
      $ip link set dev enP2p1s0v0 xdpdrv off
      
      [  341.467649] WARNING: CPU: 74 PID: 2158 at mm/vmalloc.c:1511 __vunmap+0x98/0xe0
      [  341.515010] Hardware name: GIGABYTE H270-T70/MT70-HD0, BIOS T49 02/02/2018
      [  341.521874] pstate: 60400005 (nZCv daif +PAN -UAO)
      [  341.526654] pc : __vunmap+0x98/0xe0
      [  341.530132] lr : __vunmap+0x98/0xe0
      [  341.533609] sp : ffff00001c5db860
      [  341.536913] x29: ffff00001c5db860 x28: 0000000000020000
      [  341.542214] x27: ffff810feb5090b0 x26: ffff000017e57000
      [  341.547515] x25: 0000000000000000 x24: 00000000fbd00000
      [  341.552816] x23: 0000000000000000 x22: ffff810feb5090b0
      [  341.558117] x21: 0000000000000000 x20: 0000000000000000
      [  341.563418] x19: ffff000017e57000 x18: 0000000000000000
      [  341.568719] x17: 0000000000000000 x16: 0000000000000000
      [  341.574020] x15: 0000000000000010 x14: ffffffffffffffff
      [  341.579321] x13: ffff00008985eb27 x12: ffff00000985eb2f
      [  341.584622] x11: ffff0000096b3000 x10: ffff00001c5db510
      [  341.589923] x9 : 00000000ffffffd0 x8 : ffff0000086868e8
      [  341.595224] x7 : 3430303030303030 x6 : 00000000000006ef
      [  341.600525] x5 : 00000000003fffff x4 : 0000000000000000
      [  341.605825] x3 : 0000000000000000 x2 : ffffffffffffffff
      [  341.611126] x1 : ffff0000096b3728 x0 : 0000000000000038
      [  341.616428] Call trace:
      [  341.618866]  __vunmap+0x98/0xe0
      [  341.621997]  vunmap+0x3c/0x50
      [  341.624961]  arch_dma_free+0x68/0xa0
      [  341.628534]  dma_direct_free+0x50/0x80
      [  341.632285]  nicvf_free_resources+0x160/0x2d8 [nicvf]
      [  341.637327]  nicvf_config_data_transfer+0x174/0x5e8 [nicvf]
      [  341.642890]  nicvf_stop+0x298/0x340 [nicvf]
      [  341.647066]  __dev_close_many+0x9c/0x108
      [  341.650977]  dev_close_many+0xa4/0x158
      [  341.654720]  rollback_registered_many+0x140/0x530
      [  341.659414]  rollback_registered+0x54/0x80
      [  341.663499]  unregister_netdevice_queue+0x9c/0xe8
      [  341.668192]  unregister_netdev+0x28/0x38
      [  341.672106]  nicvf_remove+0xa4/0xa8 [nicvf]
      [  341.676280]  nicvf_shutdown+0x20/0x30 [nicvf]
      [  341.680630]  pci_device_shutdown+0x44/0x88
      [  341.684720]  device_shutdown+0x144/0x250
      [  341.688640]  kernel_restart_prepare+0x44/0x50
      [  341.692986]  kernel_restart+0x20/0x68
      [  341.696638]  __se_sys_reboot+0x210/0x238
      [  341.700550]  __arm64_sys_reboot+0x24/0x30
      [  341.704555]  el0_svc_handler+0x94/0x110
      [  341.708382]  el0_svc+0x8/0xc
      [  341.711252] ---[ end trace 3f4019c8439959c9 ]---
      [  341.715874] page:ffff7e0003ef4000 count:0 mapcount:0 mapping:0000000000000000 index:0x4
      [  341.723872] flags: 0x1fffe000000000()
      [  341.727527] raw: 001fffe000000000 ffff7e0003f1a008 ffff7e0003ef4048 0000000000000000
      [  341.735263] raw: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
      [  341.742994] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
      
      where xdp_dummy.c is a simple bpf program that forwards the incoming
      frames to the network stack (available here:
      https://github.com/altoor/xdp_walkthrough_examples/blob/master/sample_1/xdp_dummy.c)
      
      Fixes: 05c773f5 ("net: thunderx: Add basic XDP support")
      Fixes: 4863dea3
      
       ("net: Adding support for Cavium ThunderX network controller")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef2a7cf1
    • Yangtao Li's avatar
      net: amd: add missing of_node_put() · c44c749d
      Yangtao Li authored
      
      
      of_find_node_by_path() acquires a reference to the node
      returned by it and that reference needs to be dropped by its caller.
      This place doesn't do that, so fix it.
      
      Signed-off-by: default avatarYangtao Li <tiny.windzz@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c44c749d
    • Hangbin Liu's avatar
      team: no need to do team_notify_peers or team_mcast_rejoin when disabling port · 5ed9dc99
      Hangbin Liu authored
      
      
      team_notify_peers() will send ARP and NA to notify peers. team_mcast_rejoin()
      will send multicast join group message to notify peers. We should do this when
      enabling/changed to a new port. But it doesn't make sense to do it when a port
      is disabled.
      
      On the other hand, when we set mcast_rejoin_count to 2, and do a failover,
      team_port_disable() will increase mcast_rejoin.count_pending to 2 and then
      team_port_enable() will increase mcast_rejoin.count_pending to 4. We will send
      4 mcast rejoin messages at latest, which will make user confused. The same
      with notify_peers.count.
      
      Fix it by deleting team_notify_peers() and team_mcast_rejoin() in
      team_port_disable().
      
      Reported-by: default avatarLiang Li <liali@redhat.com>
      Fixes: fc423ff0 ("team: add peer notification")
      Fixes: 492b200e
      
       ("team: add support for sending multicast rejoins")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ed9dc99
    • Jason Wang's avatar
      virtio-net: fail XDP set if guest csum is negotiated · 18ba58e1
      Jason Wang authored
      We don't support partial csumed packet since its metadata will be lost
      or incorrect during XDP processing. So fail the XDP set if guest_csum
      feature is negotiated.
      
      Fixes: f600b690
      
       ("virtio_net: Add XDP support")
      Reported-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Pavel Popa <pashinho1990@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18ba58e1
    • Jason Wang's avatar
      virtio-net: disable guest csum during XDP set · e59ff2c4
      Jason Wang authored
      We don't disable VIRTIO_NET_F_GUEST_CSUM if XDP was set. This means we
      can receive partial csumed packets with metadata kept in the
      vnet_hdr. This may have several side effects:
      
      - It could be overridden by header adjustment, thus is might be not
        correct after XDP processing.
      - There's no way to pass such metadata information through
        XDP_REDIRECT to another driver.
      - XDP does not support checksum offload right now.
      
      So simply disable guest csum if possible in this the case of XDP.
      
      Fixes: 3f93522f
      
       ("virtio-net: switch off offloads on demand if possible on XDP set")
      Reported-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Pavel Popa <pashinho1990@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e59ff2c4
    • Davide Caratti's avatar
      net/sched: act_police: add missing spinlock initialization · 484afd1b
      Davide Caratti authored
      commit f2cbd485 ("net/sched: act_police: fix race condition on state
      variables") introduces a new spinlock, but forgets its initialization.
      Ensure that tcf_police_init() initializes 'tcfp_lock' every time a 'police'
      action is newly created, to avoid the following lockdep splat:
      
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       <...>
       Call Trace:
        dump_stack+0x85/0xcb
        register_lock_class+0x581/0x590
        __lock_acquire+0xd4/0x1330
        ? tcf_police_init+0x2fa/0x650 [act_police]
        ? lock_acquire+0x9e/0x1a0
        lock_acquire+0x9e/0x1a0
        ? tcf_police_init+0x2fa/0x650 [act_police]
        ? tcf_police_init+0x55a/0x650 [act_police]
        _raw_spin_lock_bh+0x34/0x40
        ? tcf_police_init+0x2fa/0x650 [act_police]
        tcf_police_init+0x2fa/0x650 [act_police]
        tcf_action_init_1+0x384/0x4c0
        tcf_action_init+0xf6/0x160
        tcf_action_add+0x73/0x170
        tc_ctl_action+0x122/0x160
        rtnetlink_rcv_msg+0x2a4/0x490
        ? netlink_deliver_tap+0x99/0x400
        ? validate_linkmsg+0x370/0x370
        netlink_rcv_skb+0x4d/0x130
        netlink_unicast+0x196/0x230
        netlink_sendmsg+0x2e5/0x3e0
        sock_sendmsg+0x36/0x40
        ___sys_sendmsg+0x280/0x2f0
        ? _raw_spin_unlock+0x24/0x30
        ? handle_pte_fault+0xafe/0xf30
        ? find_held_lock+0x2d/0x90
        ? syscall_trace_enter+0x1df/0x360
        ? __sys_sendmsg+0x5e/0xa0
        __sys_sendmsg+0x5e/0xa0
        do_syscall_64+0x60/0x210
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
       RIP: 0033:0x7f1841c7cf10
       Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
       RSP: 002b:00007ffcf9df4d68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1841c7cf10
       RDX: 0000000000000000 RSI: 00007ffcf9df4dc0 RDI: 0000000000000003
       RBP: 000000005bf56105 R08: 0000000000000002 R09: 00007ffcf9df8edc
       R10: 00007ffcf9df47e0 R11: 0000000000000246 R12: 0000000000671be0
       R13: 00007ffcf9df4e84 R14: 0000000000000008 R15: 0000000000000000
      
      Fixes: f2cbd485
      
       ("net/sched: act_police: fix race condition on state variables")
      Reported-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      484afd1b
    • Paolo Abeni's avatar
      net: don't keep lonely packets forever in the gro hash · 605108ac
      Paolo Abeni authored
      
      
      Eric noted that with UDP GRO and NAPI timeout, we could keep a single
      UDP packet inside the GRO hash forever, if the related NAPI instance
      calls napi_gro_complete() at an higher frequency than the NAPI timeout.
      Willem noted that even TCP packets could be trapped there, till the
      next retransmission.
      This patch tries to address the issue, flushing the old packets -
      those with a NAPI_GRO_CB age before the current jiffy - before scheduling
      the NAPI timeout. The rationale is that such a timeout should be
      well below a jiffy and we are not flushing packets eligible for sane GRO.
      
      v1  -> v2:
       - clarified the commit message and comment
      
      RFC -> v1:
       - added 'Fixes tags', cleaned-up the wording.
      
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Fixes: 3b47d303
      
       ("net: gro: add a per device gro flush timer")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      605108ac
    • Hangbin Liu's avatar
      net/ipv6: re-do dad when interface has IFF_NOARP flag change · 896585d4
      Hangbin Liu authored
      
      
      When we add a new IPv6 address, we should also join corresponding solicited-node
      multicast address, unless the interface has IFF_NOARP flag, as function
      addrconf_join_solict() did. But if we remove IFF_NOARP flag later, we do
      not do dad and add the mcast address. So we will drop corresponding neighbour
      discovery message that came from other nodes.
      
      A typical example is after creating a ipvlan with mode l3, setting up an ipv6
      address and changing the mode to l2. Then we will not be able to ping this
      address as the interface doesn't join related solicited-node mcast address.
      
      Fix it by re-doing dad when interface changed IFF_NOARP flag. Then we will add
      corresponding mcast group and check if there is a duplicate address on the
      network.
      
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      896585d4
    • Willem de Bruijn's avatar
      packet: copy user buffers before orphan or clone · 5cd8d46e
      Willem de Bruijn authored
      tpacket_snd sends packets with user pages linked into skb frags. It
      notifies that pages can be reused when the skb is released by setting
      skb->destructor to tpacket_destruct_skb.
      
      This can cause data corruption if the skb is orphaned (e.g., on
      transmit through veth) or cloned (e.g., on mirror to another psock).
      
      Create a kernel-private copy of data in these cases, same as tun/tap
      zerocopy transmission. Reuse that infrastructure: mark the skb as
      SKBTX_ZEROCOPY_FRAG, which will trigger copy in skb_orphan_frags(_rx).
      
      Unlike other zerocopy packets, do not set shinfo destructor_arg to
      struct ubuf_info. tpacket_destruct_skb already uses that ptr to notify
      when the original skb is released and a timestamp is recorded. Do not
      change this timestamp behavior. The ubuf_info->callback is not needed
      anyway, as no zerocopy notification is expected.
      
      Mark destructor_arg as not-a-uarg by setting the lower bit to 1. The
      resulting value is not a valid ubuf_info pointer, nor a valid
      tpacket_snd frame address. Add skb_zcopy_.._nouarg helpers for this.
      
      The fix relies on features introduced in commit 52267790 ("sock:
      add MSG_ZEROCOPY"), so can be backported as is only to 4.14.
      
      Tested with from `./in_netns.sh ./txring_overwrite` from
      http://github.com/wdebruij/kerneltools/tests
      
      Fixes: 69e3c75f
      
       ("net: TX_RING and packet mmap")
      Reported-by: default avatarAnand H. Krishnan <anandhkrishnan@gmail.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5cd8d46e
  2. Nov 23, 2018
  3. Nov 22, 2018
    • Vincent Chen's avatar
      net: faraday: ftmac100: remove netif_running(netdev) check before disabling interrupts · 426a593e
      Vincent Chen authored
      
      
      In the original ftmac100_interrupt(), the interrupts are only disabled when
      the condition "netif_running(netdev)" is true. However, this condition
      causes kerenl hang in the following case. When the user requests to
      disable the network device, kernel will clear the bit __LINK_STATE_START
      from the dev->state and then call the driver's ndo_stop function. Network
      device interrupts are not blocked during this process. If an interrupt
      occurs between clearing __LINK_STATE_START and stopping network device,
      kernel cannot disable the interrupts due to the condition
      "netif_running(netdev)" in the ISR. Hence, kernel will hang due to the
      continuous interruption of the network device.
      
      In order to solve the above problem, the interrupts of the network device
      should always be disabled in the ISR without being restricted by the
      condition "netif_running(netdev)".
      
      [V2]
      Remove unnecessary curly braces.
      
      Signed-off-by: default avatarVincent Chen <vincentc@andestech.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      426a593e
    • David S. Miller's avatar
      Merge branch 'smc-fixes' · 395048eb
      David S. Miller authored
      
      
      Ursula Braun says:
      
      ====================
      net/smc: fixes 2018-11-12
      
      here is V4 of some net/smc fixes in different areas for the net tree.
      
      v1->v2:
         do not define 8-byte alignment for union smcd_cdc_cursor in
         patch 4/5 "net/smc: atomic SMCD cursor handling"
      v2->v3:
         stay with 8-byte alignment for union smcd_cdc_cursor in
         patch 4/5 "net/smc: atomic SMCD cursor handling", but get rid of
         __packed for struct smcd_cdc_msg
      v3->v4:
         get rid of another __packed for struct smc_cdc_msg in
         patch 4/5 "net/smc: atomic SMCD cursor handling"
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      395048eb
    • Ursula Braun's avatar
      net/smc: use after free fix in smc_wr_tx_put_slot() · e438bae4
      Ursula Braun authored
      
      
      In smc_wr_tx_put_slot() field pend->idx is used after being
      cleared. That means always idx 0 is cleared in the wr_tx_mask.
      This results in a broken administration of available WR send
      payload buffers.
      
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e438bae4
    • Ursula Braun's avatar
      net/smc: atomic SMCD cursor handling · b9a22dd9
      Ursula Braun authored
      
      
      Running uperf tests with SMCD on LPARs results in corrupted cursors.
      SMCD cursors should be treated atomically to fix cursor corruption.
      
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9a22dd9
    • Hans Wippel's avatar
      net/smc: add SMC-D shutdown signal · 0512f69e
      Hans Wippel authored
      
      
      When a SMC-D link group is freed, a shutdown signal should be sent to
      the peer to indicate that the link group is invalid. This patch adds the
      shutdown signal to the SMC code.
      
      Signed-off-by: default avatarHans Wippel <hwippel@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0512f69e
    • Karsten Graul's avatar
      net/smc: use queue pair number when matching link group · ee05ff7a
      Karsten Graul authored
      
      
      When searching for an existing link group the queue pair number is also
      to be taken into consideration. When the SMC server sends a new number
      in a CLC packet (keeping all other values equal) then a new link group
      is to be created on the SMC client side.
      
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee05ff7a
    • Hans Wippel's avatar
      net/smc: abort CLC connection in smc_release · f07920ad
      Hans Wippel authored
      
      
      In case of a non-blocking SMC socket, the initial CLC handshake is
      performed over a blocking TCP connection in a worker. If the SMC socket
      is released, smc_release has to wait for the blocking CLC socket
      operations (e.g., kernel_connect) inside the worker.
      
      This patch aborts a CLC connection when the respective non-blocking SMC
      socket is released to avoid waiting on socket operations or timeouts.
      
      Signed-off-by: default avatarHans Wippel <hwippel@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f07920ad
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-for-davem-2018-11-20' of... · 1e2b1046
      David S. Miller authored
      Merge tag 'wireless-drivers-for-davem-2018-11-20' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for 4.20
      
      First set of fixes for 4.20, this time we have quite a few them but
      all very small.
      
      ath9k
      
      * fix a locking regression found by a static checker
      
      wlcore
      
      * fix a crash which was a regression with wakeirq handling
      
      brcm80211
      
      * yet another fix for 160 MHz channel handling
      
      mt76
      
      * fix a longstaning build problem when CONFIG_LEDS_CLASS is disabled
      
      * don't use uninitialised mutex
      
      iwlwifi
      
      * do note that the iwlwifi merge tag (commit 4ec321c1
      
      ) seems to
        contain wrong list of changes so ignore that
      
      * fix ACPI data handling, a memory leak and other smaller fixes
      
      ath10k
      
      * fix a crash during suspend which was a recent regression
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e2b1046
    • Eric Dumazet's avatar
      tcp: defer SACK compression after DupThresh · 86de5921
      Eric Dumazet authored
      Jean-Louis reported a TCP regression and bisected to recent SACK
      compression.
      
      After a loss episode (receiver not able to keep up and dropping
      packets because its backlog is full), linux TCP stack is sending
      a single SACK (DUPACK).
      
      Sender waits a full RTO timer before recovering losses.
      
      While RFC 6675 says in section 5, "Algorithm Details",
      
         (2) If DupAcks < DupThresh but IsLost (HighACK + 1) returns true --
             indicating at least three segments have arrived above the current
             cumulative acknowledgment point, which is taken to indicate loss
             -- go to step (4).
      ...
         (4) Invoke fast retransmit and enter loss recovery as follows:
      
      there are old TCP stacks not implementing this strategy, and
      still counting the dupacks before starting fast retransmit.
      
      While these stacks probably perform poorly when receivers implement
      LRO/GRO, we should be a little more gentle to them.
      
      This patch makes sure we do not enable SACK compression unless
      3 dupacks have been sent since last rcv_nxt update.
      
      Ideally we should even rearm the timer to send one or two
      more DUPACK if no more packets are coming, but that will
      be work aiming for linux-4.21.
      
      Many thanks to Jean-Louis for bisecting the issue, providing
      packet captures and testing this patch.
      
      Fixes: 5d9f4262
      
       ("tcp: add SACK compression")
      Reported-by: default avatarJean-Louis Dupond <jean-louis@dupond.be>
      Tested-by: default avatarJean-Louis Dupond <jean-louis@dupond.be>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86de5921
    • Petr Machata's avatar
      net: skb_scrub_packet(): Scrub offload_fwd_mark · b5dd186d
      Petr Machata authored
      When a packet is trapped and the corresponding SKB marked as
      already-forwarded, it retains this marking even after it is forwarded
      across veth links into another bridge. There, since it ingresses the
      bridge over veth, which doesn't have offload_fwd_mark, it triggers a
      warning in nbp_switchdev_frame_mark().
      
      Then nbp_switchdev_allowed_egress() decides not to allow egress from
      this bridge through another veth, because the SKB is already marked, and
      the mark (of 0) of course matches. Thus the packet is incorrectly
      blocked.
      
      Solve by resetting offload_fwd_mark() in skb_scrub_packet(). That
      function is called from tunnels and also from veth, and thus catches the
      cases where traffic is forwarded between bridges and transformed in a
      way that invalidates the marking.
      
      Fixes: 6bc506b4 ("bridge: switchdev: Add forward mark support for stacked devices")
      Fixes: abf4bb6b
      
       ("skbuff: Add the offload_mr_fwd_mark field")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Suggested-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b5dd186d
  4. Nov 21, 2018
  5. Nov 20, 2018