Skip to content
  1. Dec 08, 2019
  2. Dec 07, 2019
    • David S. Miller's avatar
      Merge branch 'tcp-fix-handling-of-stale-syncookies-timestamps' · 5532946e
      David S. Miller authored
      
      
      Guillaume Nault says:
      
      ====================
      tcp: fix handling of stale syncookies timestamps
      
      The synflood timestamps (->ts_recent_stamp and ->synq_overflow_ts) are
      only refreshed when the syncookie protection triggers. Therefore, their
      value can become very far apart from jiffies if no synflood happens for
      a long time.
      
      If jiffies grows too much and wraps while the synflood timestamp isn't
      refreshed, then time_after32() might consider the later to be in the
      future. This can trick tcp_synq_no_recent_overflow() into returning
      erroneous values and rejecting valid ACKs.
      
      Patch 1 handles the case of ACKs using legitimate syncookies.
      Patch 2 handles the case of stray ACKs.
      Patch 3 annotates lockless timestamp operations with READ_ONCE() and
      WRITE_ONCE().
      
      Changes from v3:
        - Fix description of time_between32() (found by Eric Dumazet).
        - Use more accurate Fixes tag in patch 3 (suggested by Eric Dumazet).
      
      Changes from v2:
        - Define and use time_between32() instead of a pair of
          time_before32/time_after32 (suggested by Eric Dumazet).
        - Use 'last_overflow - HZ' as lower bound in
          tcp_synq_no_recent_overflow(), to accommodate for concurrent
          timestamp updates (found by Eric Dumazet).
        - Add a third patch to annotate lockless accesses to .ts_recent_stamp.
      
      Changes from v1:
        - Initialising timestamps at socket creation time is not enough
          because jiffies wraps in 24 days with HZ=1000 (Eric Dumazet).
          Handle stale timestamps in tcp_synq_overflow() and
          tcp_synq_no_recent_overflow() instead.
        - Rework commit description.
        - Add a second patch to handle the case of stray ACKs.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5532946e
    • Guillaume Nault's avatar
      tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE() · 721c8daf
      Guillaume Nault authored
      Syncookies borrow the ->rx_opt.ts_recent_stamp field to store the
      timestamp of the last synflood. Protect them with READ_ONCE() and
      WRITE_ONCE() since reads and writes aren't serialised.
      
      Use of .rx_opt.ts_recent_stamp for storing the synflood timestamp was
      introduced by a0f82f64 ("syncookies: remove last_synq_overflow from
      struct tcp_sock"). But unprotected accesses were already there when
      timestamp was stored in .last_synq_overflow.
      
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      721c8daf
    • Guillaume Nault's avatar
      tcp: tighten acceptance of ACKs not matching a child socket · cb44a08f
      Guillaume Nault authored
      
      
      When no synflood occurs, the synflood timestamp isn't updated.
      Therefore it can be so old that time_after32() can consider it to be
      in the future.
      
      That's a problem for tcp_synq_no_recent_overflow() as it may report
      that a recent overflow occurred while, in fact, it's just that jiffies
      has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31.
      
      Spurious detection of recent overflows lead to extra syncookie
      verification in cookie_v[46]_check(). At that point, the verification
      should fail and the packet dropped. But we should have dropped the
      packet earlier as we didn't even send a syncookie.
      
      Let's refine tcp_synq_no_recent_overflow() to report a recent overflow
      only if jiffies is within the
      [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This
      way, no spurious recent overflow is reported when jiffies wraps and
      'last_overflow' becomes in the future from the point of view of
      time_after32().
      
      However, if jiffies wraps and enters the
      [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with
      'last_overflow' being a stale synflood timestamp), then
      tcp_synq_no_recent_overflow() still erroneously reports an
      overflow. In such cases, we have to rely on syncookie verification
      to drop the packet. We unfortunately have no way to differentiate
      between a fresh and a stale syncookie timestamp.
      
      In practice, using last_overflow as lower bound is problematic.
      If the synflood timestamp is concurrently updated between the time
      we read jiffies and the moment we store the timestamp in
      'last_overflow', then 'now' becomes smaller than 'last_overflow' and
      tcp_synq_no_recent_overflow() returns true, potentially dropping a
      valid syncookie.
      
      Reading jiffies after loading the timestamp could fix the problem,
      but that'd require a memory barrier. Let's just accommodate for
      potential timestamp growth instead and extend the interval using
      'last_overflow - HZ' as lower bound.
      
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb44a08f
    • Guillaume Nault's avatar
      tcp: fix rejected syncookies due to stale timestamps · 04d26e7b
      Guillaume Nault authored
      If no synflood happens for a long enough period of time, then the
      synflood timestamp isn't refreshed and jiffies can advance so much
      that time_after32() can't accurately compare them any more.
      
      Therefore, we can end up in a situation where time_after32(now,
      last_overflow + HZ) returns false, just because these two values are
      too far apart. In that case, the synflood timestamp isn't updated as
      it should be, which can trick tcp_synq_no_recent_overflow() into
      rejecting valid syncookies.
      
      For example, let's consider the following scenario on a system
      with HZ=1000:
      
        * The synflood timestamp is 0, either because that's the timestamp
          of the last synflood or, more commonly, because we're working with
          a freshly created socket.
      
        * We receive a new SYN, which triggers synflood protection. Let's say
          that this happens when jiffies == 2147484649 (that is,
          'synflood timestamp' + HZ + 2^31 + 1).
      
        * Then tcp_synq_overflow() doesn't update the synflood timestamp,
          because time_after32(2147484649, 1000) returns false.
          With:
            - 2147484649: the value of jiffies, aka. 'now'.
            - 1000: the value of 'last_overflow' + HZ.
      
        * A bit later, we receive the ACK completing the 3WHS. But
          cookie_v[46]_check() rejects it because tcp_synq_no_recent_overflow()
          says that we're not under synflood. That's because
          time_after32(2147484649, 120000) returns false.
          With:
            - 2147484649: the value of jiffies, aka. 'now'.
            - 120000: the value of 'last_overflow' + TCP_SYNCOOKIE_VALID.
      
          Of course, in reality jiffies would have increased a bit, but this
          condition will last for the next 119 seconds, which is far enough
          to accommodate for jiffie's growth.
      
      Fix this by updating the overflow timestamp whenever jiffies isn't
      within the [last_overflow, last_overflow + HZ] range. That shouldn't
      have any performance impact since the update still happens at most once
      per second.
      
      Now we're guaranteed to have fresh timestamps while under synflood, so
      tcp_synq_no_recent_overflow() can safely use it with time_after32() in
      such situations.
      
      Stale timestamps can still make tcp_synq_no_recent_overflow() return
      the wrong verdict when not under synflood. This will be handled in the
      next patch.
      
      For 64 bits architectures, the problem was introduced with the
      conversion of ->tw_ts_recent_stamp to 32 bits integer by commit
      cca9bab1 ("tcp: use monotonic timestamps for PAWS").
      The problem has always been there on 32 bits architectures.
      
      Fixes: cca9bab1 ("tcp: use monotonic timestamps for PAWS")
      Fixes: 1da177e4
      
       ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04d26e7b
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2019-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 537d0779
      David S. Miller authored
      
      
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5 fixes 2019-12-05
      
      This series introduces some fixes to mlx5 driver.
      
      Please pull and let me know if there is any problem.
      
      For -stable v4.19:
       ('net/mlx5e: Query global pause state before setting prio2buffer')
      
      For -stable v5.3
       ('net/mlx5e: Fix SFF 8472 eeprom length')
       ('net/mlx5e: Fix translation of link mode into speed')
       ('net/mlx5e: Fix freeing flow with kfree() and not kvfree()')
       ('net/mlx5e: ethtool, Fix analysis of speed setting')
       ('net/mlx5e: Fix TXQ indices to be sequential')
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      537d0779
    • Bruno Carneiro da Cunha's avatar
      lpc_eth: kernel BUG on remove · 04aa1bc4
      Bruno Carneiro da Cunha authored
      
      
      We may have found a bug in the nxp/lpc_eth.c driver. The function
      platform_set_drvdata() is called twice, the second time it is called,
      in lpc_mii_init(), it overwrites the struct net_device which should be
      at pdev->dev->driver_data with pldat->mii_bus. When trying to remove
      the driver, in lpc_eth_drv_remove(), platform_get_drvdata() will
      return the pldat->mii_bus pointer and try to use it as a struct
      net_device pointer. This causes unregister_netdev to segfault and
      generate a kernel BUG. Is this reproducible?
      
      Signed-off-by: default avatarDaniel Martinez <linux@danielsmartinez.com>
      Signed-off-by: default avatarBruno Carneiro da Cunha <brunocarneirodacunha@usp.br>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04aa1bc4
    • Eric Dumazet's avatar
      tcp: md5: fix potential overestimation of TCP option space · 9424e2e7
      Eric Dumazet authored
      Back in 2008, Adam Langley fixed the corner case of packets for flows
      having all of the following options : MD5 TS SACK
      
      Since MD5 needs 20 bytes, and TS needs 12 bytes, no sack block
      can be cooked from the remaining 8 bytes.
      
      tcp_established_options() correctly sets opts->num_sack_blocks
      to zero, but returns 36 instead of 32.
      
      This means TCP cooks packets with 4 extra bytes at the end
      of options, containing unitialized bytes.
      
      Fixes: 33ad798c
      
       ("tcp: options clean up")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9424e2e7
    • David S. Miller's avatar
      Merge branch 'net-tc-indirect-block-relay' · 9a74542e
      David S. Miller authored
      
      
      John Hurley says:
      
      ====================
      Ensure egress un/bind are relayed with indirect blocks
      
      On register and unregister for indirect blocks, a command is called that
      sends a bind/unbind event to the registering driver. This command assumes
      that the bind to indirect block will be on ingress. However, drivers such
      as NFP have allowed binding to clsact qdiscs as well as ingress qdiscs
      from mainline Linux 5.2. A clsact qdisc binds to an ingress and an egress
      block.
      
      Rather than assuming that an indirect bind is always ingress, modify the
      function names to remove the ingress tag (patch 1). In cls_api, which is
      used by NFP to offload TC flower, generate bind/unbind message for both
      ingress and egress blocks on the event of indirectly
      registering/unregistering from that block. Doing so mimics the behaviour
      of both ingress and clsact qdiscs on initialise and destroy.
      
      This now ensures that drivers such as NFP receive the correct binder type
      for the indirect block registration.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a74542e
    • John Hurley's avatar
      net: sched: allow indirect blocks to bind to clsact in TC · 25a443f7
      John Hurley authored
      When a device is bound to a clsact qdisc, bind events are triggered to
      registered drivers for both ingress and egress. However, if a driver
      registers to such a device using the indirect block routines then it is
      assumed that it is only interested in ingress offload and so only replays
      ingress bind/unbind messages.
      
      The NFP driver supports the offload of some egress filters when
      registering to a block with qdisc of type clsact. However, on unregister,
      if the block is still active, it will not receive an unbind egress
      notification which can prevent proper cleanup of other registered
      callbacks.
      
      Modify the indirect block callback command in TC to send messages of
      ingress and/or egress bind depending on the qdisc in use. NFP currently
      supports egress offload for TC flower offload so the changes are only
      added to TC.
      
      Fixes: 4d12ba42
      
       ("nfp: flower: allow offloading of matches on 'internal' ports")
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25a443f7
    • John Hurley's avatar
      net: core: rename indirect block ingress cb function · dbad3408
      John Hurley authored
      With indirect blocks, a driver can register for callbacks from a device
      that is does not 'own', for example, a tunnel device. When registering to
      or unregistering from a new device, a callback is triggered to generate
      a bind/unbind event. This, in turn, allows the driver to receive any
      existing rules or to properly clean up installed rules.
      
      When first added, it was assumed that all indirect block registrations
      would be for ingress offloads. However, the NFP driver can, in some
      instances, support clsact qdisc binds for egress offload.
      
      Change the name of the indirect block callback command in flow_offload to
      remove the 'ingress' identifier from it. While this does not change
      functionality, a follow up patch will implement a more more generic
      callback than just those currently just supporting ingress offload.
      
      Fixes: 4d12ba42
      
       ("nfp: flower: allow offloading of matches on 'internal' ports")
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dbad3408
    • Jouni Hogander's avatar
      net-sysfs: Call dev_hold always in netdev_queue_add_kobject · e0b60903
      Jouni Hogander authored
      Dev_hold has to be called always in netdev_queue_add_kobject.
      Otherwise usage count drops below 0 in case of failure in
      kobject_init_and_add.
      
      Fixes: b8eb7183
      
       ("net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: David Miller <davem@davemloft.net>
      Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0b60903
    • Alexander Lobakin's avatar
      net: dsa: fix flow dissection on Tx path · 8bef0af0
      Alexander Lobakin authored
      Commit 43e66528 ("net-next: dsa: fix flow dissection") added an
      ability to override protocol and network offset during flow dissection
      for DSA-enabled devices (i.e. controllers shipped as switch CPU ports)
      in order to fix skb hashing for RPS on Rx path.
      
      However, skb_hash() and added part of code can be invoked not only on
      Rx, but also on Tx path if we have a multi-queued device and:
       - kernel is running on UP system or
       - XPS is not configured.
      
      The call stack in this two cases will be like: dev_queue_xmit() ->
      __dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() ->
      skb_tx_hash() -> skb_get_hash().
      
      The problem is that skbs queued for Tx have both network offset and
      correct protocol already set up even after inserting a CPU tag by DSA
      tagger, so calling tag_ops->flow_dissect() on this path actually only
      breaks flow dissection and hashing.
      
      This can be observed by adding debug prints just before and right after
      tag_ops->flow_dissect() call to the related block of code:
      
      Before the patch:
      
      Rx path (RPS):
      
      [   19.240001] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   19.244271] tag_ops->flow_dissect()
      [   19.247811] Rx: proto: 0x0800, nhoff: 8	/* ETH_P_IP */
      
      [   19.215435] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   19.219746] tag_ops->flow_dissect()
      [   19.223241] Rx: proto: 0x0806, nhoff: 8	/* ETH_P_ARP */
      
      [   18.654057] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   18.658332] tag_ops->flow_dissect()
      [   18.661826] Rx: proto: 0x8100, nhoff: 8	/* ETH_P_8021Q */
      
      Tx path (UP system):
      
      [   18.759560] Tx: proto: 0x0800, nhoff: 26	/* ETH_P_IP */
      [   18.763933] tag_ops->flow_dissect()
      [   18.767485] Tx: proto: 0x920b, nhoff: 34	/* junk */
      
      [   22.800020] Tx: proto: 0x0806, nhoff: 26	/* ETH_P_ARP */
      [   22.804392] tag_ops->flow_dissect()
      [   22.807921] Tx: proto: 0x920b, nhoff: 34	/* junk */
      
      [   16.898342] Tx: proto: 0x86dd, nhoff: 26	/* ETH_P_IPV6 */
      [   16.902705] tag_ops->flow_dissect()
      [   16.906227] Tx: proto: 0x920b, nhoff: 34	/* junk */
      
      After:
      
      Rx path (RPS):
      
      [   16.520993] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   16.525260] tag_ops->flow_dissect()
      [   16.528808] Rx: proto: 0x0800, nhoff: 8	/* ETH_P_IP */
      
      [   15.484807] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   15.490417] tag_ops->flow_dissect()
      [   15.495223] Rx: proto: 0x0806, nhoff: 8	/* ETH_P_ARP */
      
      [   17.134621] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   17.138895] tag_ops->flow_dissect()
      [   17.142388] Rx: proto: 0x8100, nhoff: 8	/* ETH_P_8021Q */
      
      Tx path (UP system):
      
      [   15.499558] Tx: proto: 0x0800, nhoff: 26	/* ETH_P_IP */
      
      [   20.664689] Tx: proto: 0x0806, nhoff: 26	/* ETH_P_ARP */
      
      [   18.565782] Tx: proto: 0x86dd, nhoff: 26	/* ETH_P_IPV6 */
      
      In order to fix that we can add the check 'proto == htons(ETH_P_XDSA)'
      to prevent code from calling tag_ops->flow_dissect() on Tx.
      I also decided to initialize 'offset' variable so tagger callbacks can
      now safely leave it untouched without provoking a chaos.
      
      Fixes: 43e66528
      
       ("net-next: dsa: fix flow dissection")
      Signed-off-by: default avatarAlexander Lobakin <alobakin@dlink.ru>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8bef0af0
    • Valentin Vidic's avatar
      net/tls: Fix return values to avoid ENOTSUPP · 4a5cdc60
      Valentin Vidic authored
      
      
      ENOTSUPP is not available in userspace, for example:
      
        setsockopt failed, 524, Unknown error 524
      
      Signed-off-by: default avatarValentin Vidic <vvidic@valentin-vidic.from.hr>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a5cdc60
    • Eric Dumazet's avatar
      net: avoid an indirect call in ____sys_recvmsg() · 1af66221
      Eric Dumazet authored
      
      
      CONFIG_RETPOLINE=y made indirect calls expensive.
      
      gcc seems to add an indirect call in ____sys_recvmsg().
      
      Rewriting the code slightly makes sure to avoid this indirection.
      
      Alternative would be to not call sock_recvmsg() and instead
      use security_socket_recvmsg() and sock_recvmsg_nosec(),
      but this is less readable IMO.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: David Laight <David.Laight@aculab.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1af66221
    • Chuhong Yuan's avatar
      phy: mdio-thunder: add missed pci_release_regions in remove · 462f8554
      Chuhong Yuan authored
      
      
      The driver forgets to call pci_release_regions() in remove like that
      in probe failure.
      Add the missed call to fix it.
      
      Signed-off-by: default avatarChuhong Yuan <hslester96@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      462f8554
    • Taehee Yoo's avatar
      tipc: fix ordering of tipc module init and exit routine · 9cf1cd8e
      Taehee Yoo authored
      In order to set/get/dump, the tipc uses the generic netlink
      infrastructure. So, when tipc module is inserted, init function
      calls genl_register_family().
      After genl_register_family(), set/get/dump commands are immediately
      allowed and these callbacks internally use the net_generic.
      net_generic is allocated by register_pernet_device() but this
      is called after genl_register_family() in the __init function.
      So, these callbacks would use un-initialized net_generic.
      
      Test commands:
          #SHELL1
          while :
          do
              modprobe tipc
              modprobe -rv tipc
          done
      
          #SHELL2
          while :
          do
              tipc link list
          done
      
      Splat looks like:
      [   59.616322][ T2788] kasan: CONFIG_KASAN_INLINE enabled
      [   59.617234][ T2788] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [   59.618398][ T2788] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [   59.619389][ T2788] CPU: 3 PID: 2788 Comm: tipc Not tainted 5.4.0+ #194
      [   59.620231][ T2788] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   59.621428][ T2788] RIP: 0010:tipc_bcast_get_broadcast_mode+0x131/0x310 [tipc]
      [   59.622379][ T2788] Code: c7 c6 ef 8b 38 c0 65 ff 0d 84 83 c9 3f e8 d7 a5 f2 e3 48 8d bb 38 11 00 00 48 b8 00 00 00 00
      [   59.622550][ T2780] NET: Registered protocol family 30
      [   59.624627][ T2788] RSP: 0018:ffff88804b09f578 EFLAGS: 00010202
      [   59.624630][ T2788] RAX: dffffc0000000000 RBX: 0000000000000011 RCX: 000000008bc66907
      [   59.624631][ T2788] RDX: 0000000000000229 RSI: 000000004b3cf4cc RDI: 0000000000001149
      [   59.624633][ T2788] RBP: ffff88804b09f588 R08: 0000000000000003 R09: fffffbfff4fb3df1
      [   59.624635][ T2788] R10: fffffbfff50318f8 R11: ffff888066cadc18 R12: ffffffffa6cc2f40
      [   59.624637][ T2788] R13: 1ffff11009613eba R14: ffff8880662e9328 R15: ffff8880662e9328
      [   59.624639][ T2788] FS:  00007f57d8f7b740(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000
      [   59.624645][ T2788] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   59.625875][ T2780] tipc: Started in single node mode
      [   59.626128][ T2788] CR2: 00007f57d887a8c0 CR3: 000000004b140002 CR4: 00000000000606e0
      [   59.633991][ T2788] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   59.635195][ T2788] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   59.636478][ T2788] Call Trace:
      [   59.637025][ T2788]  tipc_nl_add_bc_link+0x179/0x1470 [tipc]
      [   59.638219][ T2788]  ? lock_downgrade+0x6e0/0x6e0
      [   59.638923][ T2788]  ? __tipc_nl_add_link+0xf90/0xf90 [tipc]
      [   59.639533][ T2788]  ? tipc_nl_node_dump_link+0x318/0xa50 [tipc]
      [   59.640160][ T2788]  ? mutex_lock_io_nested+0x1380/0x1380
      [   59.640746][ T2788]  tipc_nl_node_dump_link+0x4fd/0xa50 [tipc]
      [   59.641356][ T2788]  ? tipc_nl_node_reset_link_stats+0x340/0x340 [tipc]
      [   59.642088][ T2788]  ? __skb_ext_del+0x270/0x270
      [   59.642594][ T2788]  genl_lock_dumpit+0x85/0xb0
      [   59.643050][ T2788]  netlink_dump+0x49c/0xed0
      [   59.643529][ T2788]  ? __netlink_sendskb+0xc0/0xc0
      [   59.644044][ T2788]  ? __netlink_dump_start+0x190/0x800
      [   59.644617][ T2788]  ? __mutex_unlock_slowpath+0xd0/0x670
      [   59.645177][ T2788]  __netlink_dump_start+0x5a0/0x800
      [   59.645692][ T2788]  genl_rcv_msg+0xa75/0xe90
      [   59.646144][ T2788]  ? __lock_acquire+0xdfe/0x3de0
      [   59.646692][ T2788]  ? genl_family_rcv_msg_attrs_parse+0x320/0x320
      [   59.647340][ T2788]  ? genl_lock_dumpit+0xb0/0xb0
      [   59.647821][ T2788]  ? genl_unlock+0x20/0x20
      [   59.648290][ T2788]  ? genl_parallel_done+0xe0/0xe0
      [   59.648787][ T2788]  ? find_held_lock+0x39/0x1d0
      [   59.649276][ T2788]  ? genl_rcv+0x15/0x40
      [   59.649722][ T2788]  ? lock_contended+0xcd0/0xcd0
      [   59.650296][ T2788]  netlink_rcv_skb+0x121/0x350
      [   59.650828][ T2788]  ? genl_family_rcv_msg_attrs_parse+0x320/0x320
      [   59.651491][ T2788]  ? netlink_ack+0x940/0x940
      [   59.651953][ T2788]  ? lock_acquire+0x164/0x3b0
      [   59.652449][ T2788]  genl_rcv+0x24/0x40
      [   59.652841][ T2788]  netlink_unicast+0x421/0x600
      [ ... ]
      
      Fixes: 7e436905 ("tipc: fix a slab object leak")
      Fixes: a62fbcce
      
       ("tipc: make subscriber server support net namespace")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9cf1cd8e
    • Vladyslav Tarasiuk's avatar
      mqprio: Fix out-of-bounds access in mqprio_dump · 9f104c77
      Vladyslav Tarasiuk authored
      When user runs a command like
      tc qdisc add dev eth1 root mqprio
      KASAN stack-out-of-bounds warning is emitted.
      Currently, NLA_ALIGN macro used in mqprio_dump provides too large
      buffer size as argument for nla_put and memcpy down the call stack.
      The flow looks like this:
      1. nla_put expects exact object size as an argument;
      2. Later it provides this size to memcpy;
      3. To calculate correct padding for SKB, nla_put applies NLA_ALIGN
         macro itself.
      
      Therefore, NLA_ALIGN should not be applied to the nla_put parameter.
      Otherwise it will lead to out-of-bounds memory access in memcpy.
      
      Fixes: 4e8b86c0
      
       ("mqprio: Introduce new hardware offload mode and shaper in mqprio")
      Signed-off-by: default avatarVladyslav Tarasiuk <vladyslavt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9f104c77
    • Jongsung Kim's avatar
      net: stmmac: reset Tx desc base address before restarting Tx · f421031e
      Jongsung Kim authored
      
      
      Refer to the databook of DesignWare Cores Ethernet MAC Universal:
      
      6.2.1.5 Register 4 (Transmit Descriptor List Address Register
      
      If this register is not changed when the ST bit is set to 0, then
      the DMA takes the descriptor address where it was stopped earlier.
      
      The stmmac_tx_err() does zero indices to Tx descriptors, but does
      not reset HW current Tx descriptor address. To fix inconsistency,
      the base address of the Tx descriptors should be rewritten before
      restarting Tx.
      
      Signed-off-by: default avatarJongsung Kim <neidhard.kim@lge.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f421031e
    • Yangbo Lu's avatar
      enetc: disable EEE autoneg by default · a6a10d45
      Yangbo Lu authored
      
      
      The EEE support has not been enabled on ENETC, but it may connect
      to a PHY which supports EEE and advertises EEE by default, while
      its link partner also advertises EEE. If this happens, the PHY enters
      low power mode when the traffic rate is low and causes packet loss.
      This patch disables EEE advertisement by default for any PHY that
      ENETC connects to, to prevent the above unwanted outcome.
      
      Signed-off-by: default avatarYangbo Lu <yangbo.lu@nxp.com>
      Reviewed-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6a10d45
  3. Dec 06, 2019