Skip to content
  1. Dec 18, 2019
    • Greg Kroah-Hartman's avatar
    • Jonathan Lemon's avatar
      xdp: obtain the mem_id mutex before trying to remove an entry. · 94fc256e
      Jonathan Lemon authored
      
      
      [ Upstream commit 86c76c09 ]
      
      A lockdep splat was observed when trying to remove an xdp memory
      model from the table since the mutex was obtained when trying to
      remove the entry, but not before the table walk started:
      
      Fix the splat by obtaining the lock before starting the table walk.
      
      Fixes: c3f812ce ("page_pool: do not release pool until inflight == 0.")
      Reported-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Tested-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      94fc256e
    • Jonathan Lemon's avatar
      page_pool: do not release pool until inflight == 0. · bf22306d
      Jonathan Lemon authored
      
      
      [ Upstream commit c3f812ce ]
      
      The page pool keeps track of the number of pages in flight, and
      it isn't safe to remove the pool until all pages are returned.
      
      Disallow removing the pool until all pages are back, so the pool
      is always available for page producers.
      
      Make the page pool responsible for its own delayed destruction
      instead of relying on XDP, so the page pool can be used without
      the xdp memory model.
      
      When all pages are returned, free the pool and notify xdp if the
      pool is registered with the xdp memory system.  Have the callback
      perform a table walk since some drivers (cpsw) may share the pool
      among multiple xdp_rxq_info.
      
      Note that the increment of pages_state_release_cnt may result in
      inflight == 0, resulting in the pool being released.
      
      Fixes: d956a048 ("xdp: force mem allocator removal and periodic warning")
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf22306d
    • Eran Ben Elisha's avatar
      net/mlx5e: Fix TXQ indices to be sequential · 573da3cf
      Eran Ben Elisha authored
      
      
      [ Upstream commit c55d8b10 ]
      
      Cited patch changed (channel index, tc) => (TXQ index) mapping to be a
      static one, in order to keep indices consistent when changing number of
      channels or TCs.
      
      For 32 channels (OOB) and 8 TCs, real num of TXQs is 256.
      When reducing the amount of channels to 8, the real num of TXQs will be
      changed to 64.
      This indices method is buggy:
      - Channel #0, TC 3, the TXQ index is 96.
      - Index 8 is not valid, as there is no such TXQ from driver perspective
        (As it represents channel #8, TC 0, which is not valid with the above
        configuration).
      
      As part of driver's select queue, it calls netdev_pick_tx which returns an
      index in the range of real number of TXQs. Depends on the return value,
      with the examples above, driver could have returned index larger than the
      real number of tx queues, or crash the kernel as it tries to read invalid
      address of SQ which was not allocated.
      
      Fix that by allocating sequential TXQ indices, and hold a new mapping
      between (channel index, tc) => (real TXQ index). This mapping will be
      updated as part of priv channels activation, and is used in
      mlx5e_select_queue to find the selected queue index.
      
      The existing indices mapping (channel_tc2txq) is no longer needed, as it
      is used only for statistics structures and can be calculated on run time.
      Delete its definintion and updates.
      
      Fixes: 8bfaf07f ("net/mlx5e: Present SW stats when state is not opened")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      573da3cf
    • Martin Varghese's avatar
      net: Fixed updating of ethertype in skb_mpls_push() · 280cec44
      Martin Varghese authored
      
      
      [ Upstream commit d04ac224 ]
      
      The skb_mpls_push was not updating ethertype of an ethernet packet if
      the packet was originally received from a non ARPHRD_ETHER device.
      
      In the below OVS data path flow, since the device corresponding to
      port 7 is an l3 device (ARPHRD_NONE) the skb_mpls_push function does
      not update the ethertype of the packet even though the previous
      push_eth action had added an ethernet header to the packet.
      
      recirc_id(0),in_port(7),eth_type(0x0800),ipv4(tos=0/0xfc,ttl=64,frag=no),
      actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
      push_mpls(label=13,tc=0,ttl=64,bos=1,eth_type=0x8847),4
      
      Fixes: 8822e270 ("net: core: move push MPLS functionality from OvS to core helper")
      Signed-off-by: default avatarMartin Varghese <martin.varghese@nokia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      280cec44
    • Taehee Yoo's avatar
      hsr: fix a NULL pointer dereference in hsr_dev_xmit() · 9d59e754
      Taehee Yoo authored
      
      
      [ Upstream commit df95467b ]
      
      hsr_dev_xmit() calls hsr_port_get_hsr() to find master node and that would
      return NULL if master node is not existing in the list.
      But hsr_dev_xmit() doesn't check return pointer so a NULL dereference
      could occur.
      
      Test commands:
          ip netns add nst
          ip link add veth0 type veth peer name veth1
          ip link add veth2 type veth peer name veth3
          ip link set veth1 netns nst
          ip link set veth3 netns nst
          ip link set veth0 up
          ip link set veth2 up
          ip link add hsr0 type hsr slave1 veth0 slave2 veth2
          ip a a 192.168.100.1/24 dev hsr0
          ip link set hsr0 up
          ip netns exec nst ip link set veth1 up
          ip netns exec nst ip link set veth3 up
          ip netns exec nst ip link add hsr1 type hsr slave1 veth1 slave2 veth3
          ip netns exec nst ip a a 192.168.100.2/24 dev hsr1
          ip netns exec nst ip link set hsr1 up
          hping3 192.168.100.2 -2 --flood &
          modprobe -rv hsr
      
      Splat looks like:
      [  217.351122][ T1635] kasan: CONFIG_KASAN_INLINE enabled
      [  217.352969][ T1635] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [  217.354297][ T1635] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  217.355507][ T1635] CPU: 1 PID: 1635 Comm: hping3 Not tainted 5.4.0+ #192
      [  217.356472][ T1635] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  217.357804][ T1635] RIP: 0010:hsr_dev_xmit+0x34/0x90 [hsr]
      [  217.373010][ T1635] Code: 48 8d be 00 0c 00 00 be 04 00 00 00 48 83 ec 08 e8 21 be ff ff 48 8d 78 10 48 ba 00 b
      [  217.376919][ T1635] RSP: 0018:ffff8880cd8af058 EFLAGS: 00010202
      [  217.377571][ T1635] RAX: 0000000000000000 RBX: ffff8880acde6840 RCX: 0000000000000002
      [  217.379465][ T1635] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: 0000000000000010
      [  217.380274][ T1635] RBP: ffff8880acde6840 R08: ffffed101b440d5d R09: 0000000000000001
      [  217.381078][ T1635] R10: 0000000000000001 R11: ffffed101b440d5c R12: ffff8880bffcc000
      [  217.382023][ T1635] R13: ffff8880bffcc088 R14: 0000000000000000 R15: ffff8880ca675c00
      [  217.383094][ T1635] FS:  00007f060d9d1740(0000) GS:ffff8880da000000(0000) knlGS:0000000000000000
      [  217.384289][ T1635] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  217.385009][ T1635] CR2: 00007faf15381dd0 CR3: 00000000d523c001 CR4: 00000000000606e0
      [  217.385940][ T1635] Call Trace:
      [  217.386544][ T1635]  dev_hard_start_xmit+0x160/0x740
      [  217.387114][ T1635]  __dev_queue_xmit+0x1961/0x2e10
      [  217.388118][ T1635]  ? check_object+0xaf/0x260
      [  217.391466][ T1635]  ? __alloc_skb+0xb9/0x500
      [  217.392017][ T1635]  ? init_object+0x6b/0x80
      [  217.392629][ T1635]  ? netdev_core_pick_tx+0x2e0/0x2e0
      [  217.393175][ T1635]  ? __alloc_skb+0xb9/0x500
      [  217.393727][ T1635]  ? rcu_read_lock_sched_held+0x90/0xc0
      [  217.394331][ T1635]  ? rcu_read_lock_bh_held+0xa0/0xa0
      [  217.395013][ T1635]  ? kasan_unpoison_shadow+0x30/0x40
      [  217.395668][ T1635]  ? __kasan_kmalloc.constprop.4+0xa0/0xd0
      [  217.396280][ T1635]  ? __kmalloc_node_track_caller+0x3a8/0x3f0
      [  217.399007][ T1635]  ? __kasan_kmalloc.constprop.4+0xa0/0xd0
      [  217.400093][ T1635]  ? __kmalloc_reserve.isra.46+0x2e/0xb0
      [  217.401118][ T1635]  ? memset+0x1f/0x40
      [  217.402529][ T1635]  ? __alloc_skb+0x317/0x500
      [  217.404915][ T1635]  ? arp_xmit+0xca/0x2c0
      [ ... ]
      
      Fixes: 311633b6 ("hsr: switch ->dellink() to ->ndo_uninit()")
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d59e754
    • Martin Varghese's avatar
      Fixed updating of ethertype in function skb_mpls_pop · 37f523ab
      Martin Varghese authored
      
      
      [ Upstream commit 040b5cfb ]
      
      The skb_mpls_pop was not updating ethertype of an ethernet packet if the
      packet was originally received from a non ARPHRD_ETHER device.
      
      In the below OVS data path flow, since the device corresponding to port 7
      is an l3 device (ARPHRD_NONE) the skb_mpls_pop function does not update
      the ethertype of the packet even though the previous push_eth action had
      added an ethernet header to the packet.
      
      recirc_id(0),in_port(7),eth_type(0x8847),
      mpls(label=12/0xfffff,tc=0/0,ttl=0/0x0,bos=1/1),
      actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),
      pop_mpls(eth_type=0x800),4
      
      Fixes: ed246cee ("net: core: move pop MPLS functionality from OvS to core helper")
      Signed-off-by: default avatarMartin Varghese <martin.varghese@nokia.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      37f523ab
    • Cong Wang's avatar
      gre: refetch erspan header from skb->data after pskb_may_pull() · 338ac295
      Cong Wang authored
      
      
      [ Upstream commit 0e494092 ]
      
      After pskb_may_pull() we should always refetch the header
      pointers from the skb->data in case it got reallocated.
      
      In gre_parse_header(), the erspan header is still fetched
      from the 'options' pointer which is fetched before
      pskb_may_pull().
      
      Found this during code review of a KMSAN bug report.
      
      Fixes: cb73ee40 ("net: ip_gre: use erspan key field for tunnel lookup")
      Cc: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Acked-by: default avatarWilliam Tu <u9012063@gmail.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      338ac295
    • Guillaume Nault's avatar
      tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE() · d317809e
      Guillaume Nault authored
      
      
      [ Upstream commit 721c8daf ]
      
      Syncookies borrow the ->rx_opt.ts_recent_stamp field to store the
      timestamp of the last synflood. Protect them with READ_ONCE() and
      WRITE_ONCE() since reads and writes aren't serialised.
      
      Use of .rx_opt.ts_recent_stamp for storing the synflood timestamp was
      introduced by a0f82f64 ("syncookies: remove last_synq_overflow from
      struct tcp_sock"). But unprotected accesses were already there when
      timestamp was stored in .last_synq_overflow.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d317809e
    • Guillaume Nault's avatar
      tcp: tighten acceptance of ACKs not matching a child socket · e0082abd
      Guillaume Nault authored
      
      
      [ Upstream commit cb44a08f ]
      
      When no synflood occurs, the synflood timestamp isn't updated.
      Therefore it can be so old that time_after32() can consider it to be
      in the future.
      
      That's a problem for tcp_synq_no_recent_overflow() as it may report
      that a recent overflow occurred while, in fact, it's just that jiffies
      has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31.
      
      Spurious detection of recent overflows lead to extra syncookie
      verification in cookie_v[46]_check(). At that point, the verification
      should fail and the packet dropped. But we should have dropped the
      packet earlier as we didn't even send a syncookie.
      
      Let's refine tcp_synq_no_recent_overflow() to report a recent overflow
      only if jiffies is within the
      [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This
      way, no spurious recent overflow is reported when jiffies wraps and
      'last_overflow' becomes in the future from the point of view of
      time_after32().
      
      However, if jiffies wraps and enters the
      [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with
      'last_overflow' being a stale synflood timestamp), then
      tcp_synq_no_recent_overflow() still erroneously reports an
      overflow. In such cases, we have to rely on syncookie verification
      to drop the packet. We unfortunately have no way to differentiate
      between a fresh and a stale syncookie timestamp.
      
      In practice, using last_overflow as lower bound is problematic.
      If the synflood timestamp is concurrently updated between the time
      we read jiffies and the moment we store the timestamp in
      'last_overflow', then 'now' becomes smaller than 'last_overflow' and
      tcp_synq_no_recent_overflow() returns true, potentially dropping a
      valid syncookie.
      
      Reading jiffies after loading the timestamp could fix the problem,
      but that'd require a memory barrier. Let's just accommodate for
      potential timestamp growth instead and extend the interval using
      'last_overflow - HZ' as lower bound.
      
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e0082abd
    • Guillaume Nault's avatar
      tcp: fix rejected syncookies due to stale timestamps · baeee854
      Guillaume Nault authored
      
      
      [ Upstream commit 04d26e7b ]
      
      If no synflood happens for a long enough period of time, then the
      synflood timestamp isn't refreshed and jiffies can advance so much
      that time_after32() can't accurately compare them any more.
      
      Therefore, we can end up in a situation where time_after32(now,
      last_overflow + HZ) returns false, just because these two values are
      too far apart. In that case, the synflood timestamp isn't updated as
      it should be, which can trick tcp_synq_no_recent_overflow() into
      rejecting valid syncookies.
      
      For example, let's consider the following scenario on a system
      with HZ=1000:
      
        * The synflood timestamp is 0, either because that's the timestamp
          of the last synflood or, more commonly, because we're working with
          a freshly created socket.
      
        * We receive a new SYN, which triggers synflood protection. Let's say
          that this happens when jiffies == 2147484649 (that is,
          'synflood timestamp' + HZ + 2^31 + 1).
      
        * Then tcp_synq_overflow() doesn't update the synflood timestamp,
          because time_after32(2147484649, 1000) returns false.
          With:
            - 2147484649: the value of jiffies, aka. 'now'.
            - 1000: the value of 'last_overflow' + HZ.
      
        * A bit later, we receive the ACK completing the 3WHS. But
          cookie_v[46]_check() rejects it because tcp_synq_no_recent_overflow()
          says that we're not under synflood. That's because
          time_after32(2147484649, 120000) returns false.
          With:
            - 2147484649: the value of jiffies, aka. 'now'.
            - 120000: the value of 'last_overflow' + TCP_SYNCOOKIE_VALID.
      
          Of course, in reality jiffies would have increased a bit, but this
          condition will last for the next 119 seconds, which is far enough
          to accommodate for jiffie's growth.
      
      Fix this by updating the overflow timestamp whenever jiffies isn't
      within the [last_overflow, last_overflow + HZ] range. That shouldn't
      have any performance impact since the update still happens at most once
      per second.
      
      Now we're guaranteed to have fresh timestamps while under synflood, so
      tcp_synq_no_recent_overflow() can safely use it with time_after32() in
      such situations.
      
      Stale timestamps can still make tcp_synq_no_recent_overflow() return
      the wrong verdict when not under synflood. This will be handled in the
      next patch.
      
      For 64 bits architectures, the problem was introduced with the
      conversion of ->tw_ts_recent_stamp to 32 bits integer by commit
      cca9bab1 ("tcp: use monotonic timestamps for PAWS").
      The problem has always been there on 32 bits architectures.
      
      Fixes: cca9bab1 ("tcp: use monotonic timestamps for PAWS")
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      baeee854
    • Sabrina Dubroca's avatar
      net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup · fe0cacb0
      Sabrina Dubroca authored
      
      
      [ Upstream commit 6c8991f4 ]
      
      ipv6_stub uses the ip6_dst_lookup function to allow other modules to
      perform IPv6 lookups. However, this function skips the XFRM layer
      entirely.
      
      All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the
      ip_route_output_key and ip_route_output helpers) for their IPv4 lookups,
      which calls xfrm_lookup_route(). This patch fixes this inconsistent
      behavior by switching the stub to ip6_dst_lookup_flow, which also calls
      xfrm_lookup_route().
      
      This requires some changes in all the callers, as these two functions
      take different arguments and have different return types.
      
      Fixes: 5f81bd2e ("ipv6: export a stub for IPv6 symbols used by vxlan")
      Reported-by: default avatarXiumei Mu <xmu@redhat.com>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe0cacb0
    • Sabrina Dubroca's avatar
      net: ipv6: add net argument to ip6_dst_lookup_flow · c583801c
      Sabrina Dubroca authored
      
      
      [ Upstream commit c4e85f73 ]
      
      This will be used in the conversion of ipv6_stub to ip6_dst_lookup_flow,
      as some modules currently pass a net argument without a socket to
      ip6_dst_lookup. This is equivalent to commit 343d60aa ("ipv6: change
      ipv6_stub_impl.ipv6_dst_lookup to take net argument").
      
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c583801c
    • Huy Nguyen's avatar
      net/mlx5e: Query global pause state before setting prio2buffer · 9ce39e40
      Huy Nguyen authored
      
      
      [ Upstream commit 73e65516 ]
      
      When the user changes prio2buffer mapping while global pause is
      enabled, mlx5 driver incorrectly sets all active buffers
      (buffer that has at least one priority mapped) to lossy.
      
      Solution:
      If global pause is enabled, set all the active buffers to lossless
      in prio2buffer command.
      Also, add error message when buffer size is not enough to meet
      xoff threshold.
      
      Fixes: 0696d608 ("net/mlx5e: Receive buffer configuration")
      Signed-off-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9ce39e40
    • Taehee Yoo's avatar
      tipc: fix ordering of tipc module init and exit routine · a3873b4c
      Taehee Yoo authored
      
      
      [ Upstream commit 9cf1cd8e ]
      
      In order to set/get/dump, the tipc uses the generic netlink
      infrastructure. So, when tipc module is inserted, init function
      calls genl_register_family().
      After genl_register_family(), set/get/dump commands are immediately
      allowed and these callbacks internally use the net_generic.
      net_generic is allocated by register_pernet_device() but this
      is called after genl_register_family() in the __init function.
      So, these callbacks would use un-initialized net_generic.
      
      Test commands:
          #SHELL1
          while :
          do
              modprobe tipc
              modprobe -rv tipc
          done
      
          #SHELL2
          while :
          do
              tipc link list
          done
      
      Splat looks like:
      [   59.616322][ T2788] kasan: CONFIG_KASAN_INLINE enabled
      [   59.617234][ T2788] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [   59.618398][ T2788] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [   59.619389][ T2788] CPU: 3 PID: 2788 Comm: tipc Not tainted 5.4.0+ #194
      [   59.620231][ T2788] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   59.621428][ T2788] RIP: 0010:tipc_bcast_get_broadcast_mode+0x131/0x310 [tipc]
      [   59.622379][ T2788] Code: c7 c6 ef 8b 38 c0 65 ff 0d 84 83 c9 3f e8 d7 a5 f2 e3 48 8d bb 38 11 00 00 48 b8 00 00 00 00
      [   59.622550][ T2780] NET: Registered protocol family 30
      [   59.624627][ T2788] RSP: 0018:ffff88804b09f578 EFLAGS: 00010202
      [   59.624630][ T2788] RAX: dffffc0000000000 RBX: 0000000000000011 RCX: 000000008bc66907
      [   59.624631][ T2788] RDX: 0000000000000229 RSI: 000000004b3cf4cc RDI: 0000000000001149
      [   59.624633][ T2788] RBP: ffff88804b09f588 R08: 0000000000000003 R09: fffffbfff4fb3df1
      [   59.624635][ T2788] R10: fffffbfff50318f8 R11: ffff888066cadc18 R12: ffffffffa6cc2f40
      [   59.624637][ T2788] R13: 1ffff11009613eba R14: ffff8880662e9328 R15: ffff8880662e9328
      [   59.624639][ T2788] FS:  00007f57d8f7b740(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000
      [   59.624645][ T2788] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   59.625875][ T2780] tipc: Started in single node mode
      [   59.626128][ T2788] CR2: 00007f57d887a8c0 CR3: 000000004b140002 CR4: 00000000000606e0
      [   59.633991][ T2788] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   59.635195][ T2788] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   59.636478][ T2788] Call Trace:
      [   59.637025][ T2788]  tipc_nl_add_bc_link+0x179/0x1470 [tipc]
      [   59.638219][ T2788]  ? lock_downgrade+0x6e0/0x6e0
      [   59.638923][ T2788]  ? __tipc_nl_add_link+0xf90/0xf90 [tipc]
      [   59.639533][ T2788]  ? tipc_nl_node_dump_link+0x318/0xa50 [tipc]
      [   59.640160][ T2788]  ? mutex_lock_io_nested+0x1380/0x1380
      [   59.640746][ T2788]  tipc_nl_node_dump_link+0x4fd/0xa50 [tipc]
      [   59.641356][ T2788]  ? tipc_nl_node_reset_link_stats+0x340/0x340 [tipc]
      [   59.642088][ T2788]  ? __skb_ext_del+0x270/0x270
      [   59.642594][ T2788]  genl_lock_dumpit+0x85/0xb0
      [   59.643050][ T2788]  netlink_dump+0x49c/0xed0
      [   59.643529][ T2788]  ? __netlink_sendskb+0xc0/0xc0
      [   59.644044][ T2788]  ? __netlink_dump_start+0x190/0x800
      [   59.644617][ T2788]  ? __mutex_unlock_slowpath+0xd0/0x670
      [   59.645177][ T2788]  __netlink_dump_start+0x5a0/0x800
      [   59.645692][ T2788]  genl_rcv_msg+0xa75/0xe90
      [   59.646144][ T2788]  ? __lock_acquire+0xdfe/0x3de0
      [   59.646692][ T2788]  ? genl_family_rcv_msg_attrs_parse+0x320/0x320
      [   59.647340][ T2788]  ? genl_lock_dumpit+0xb0/0xb0
      [   59.647821][ T2788]  ? genl_unlock+0x20/0x20
      [   59.648290][ T2788]  ? genl_parallel_done+0xe0/0xe0
      [   59.648787][ T2788]  ? find_held_lock+0x39/0x1d0
      [   59.649276][ T2788]  ? genl_rcv+0x15/0x40
      [   59.649722][ T2788]  ? lock_contended+0xcd0/0xcd0
      [   59.650296][ T2788]  netlink_rcv_skb+0x121/0x350
      [   59.650828][ T2788]  ? genl_family_rcv_msg_attrs_parse+0x320/0x320
      [   59.651491][ T2788]  ? netlink_ack+0x940/0x940
      [   59.651953][ T2788]  ? lock_acquire+0x164/0x3b0
      [   59.652449][ T2788]  genl_rcv+0x24/0x40
      [   59.652841][ T2788]  netlink_unicast+0x421/0x600
      [ ... ]
      
      Fixes: 7e436905 ("tipc: fix a slab object leak")
      Fixes: a62fbcce ("tipc: make subscriber server support net namespace")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a3873b4c
    • Eric Dumazet's avatar
      tcp: md5: fix potential overestimation of TCP option space · 13572323
      Eric Dumazet authored
      
      
      [ Upstream commit 9424e2e7 ]
      
      Back in 2008, Adam Langley fixed the corner case of packets for flows
      having all of the following options : MD5 TS SACK
      
      Since MD5 needs 20 bytes, and TS needs 12 bytes, no sack block
      can be cooked from the remaining 8 bytes.
      
      tcp_established_options() correctly sets opts->num_sack_blocks
      to zero, but returns 36 instead of 32.
      
      This means TCP cooks packets with 4 extra bytes at the end
      of options, containing unitialized bytes.
      
      Fixes: 33ad798c ("tcp: options clean up")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13572323
    • Aaron Conole's avatar
      openvswitch: support asymmetric conntrack · 9d47e135
      Aaron Conole authored
      
      
      [ Upstream commit 5d50aa83 ]
      
      The openvswitch module shares a common conntrack and NAT infrastructure
      exposed via netfilter.  It's possible that a packet needs both SNAT and
      DNAT manipulation, due to e.g. tuple collision.  Netfilter can support
      this because it runs through the NAT table twice - once on ingress and
      again after egress.  The openvswitch module doesn't have such capability.
      
      Like netfilter hook infrastructure, we should run through NAT twice to
      keep the symmetry.
      
      Fixes: 05752523 ("openvswitch: Interface with NAT.")
      Signed-off-by: default avatarAaron Conole <aconole@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d47e135
    • Valentin Vidic's avatar
      net/tls: Fix return values to avoid ENOTSUPP · 284504c2
      Valentin Vidic authored
      
      
      [ Upstream commit 4a5cdc60 ]
      
      ENOTSUPP is not available in userspace, for example:
      
        setsockopt failed, 524, Unknown error 524
      
      Signed-off-by: default avatarValentin Vidic <vvidic@valentin-vidic.from.hr>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      284504c2
    • Mian Yousaf Kaukab's avatar
      net: thunderx: start phy before starting autonegotiation · 01e73a8c
      Mian Yousaf Kaukab authored
      
      
      [ Upstream commit a350d2e7 ]
      
      Since commit 2b3e88ea ("net: phy: improve phy state checking")
      phy_start_aneg() expects phy state to be >= PHY_UP. Call phy_start()
      before calling phy_start_aneg() during probe so that autonegotiation
      is initiated.
      
      As phy_start() takes care of calling phy_start_aneg(), drop the explicit
      call to phy_start_aneg().
      
      Network fails without this patch on Octeon TX.
      
      Fixes: 2b3e88ea ("net: phy: improve phy state checking")
      Signed-off-by: default avatarMian Yousaf Kaukab <ykaukab@suse.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01e73a8c
    • Dust Li's avatar
      net: sched: fix dump qlen for sch_mq/sch_mqprio with NOLOCK subqueues · d38ae226
      Dust Li authored
      
      
      [ Upstream commit 2f23cd42 ]
      
      sch->q.len hasn't been set if the subqueue is a NOLOCK qdisc
       in mq_dump() and mqprio_dump().
      
      Fixes: ce679e8d ("net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio")
      Signed-off-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Signed-off-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d38ae226
    • Grygorii Strashko's avatar
      net: ethernet: ti: cpsw: fix extra rx interrupt · a2298468
      Grygorii Strashko authored
      
      
      [ Upstream commit 51302f77 ]
      
      Now RX interrupt is triggered twice every time, because in
      cpsw_rx_interrupt() it is asked first and then disabled. So there will be
      pending interrupt always, when RX interrupt is enabled again in NAPI
      handler.
      
      Fix it by first disabling IRQ and then do ask.
      
      Fixes: 870915fe ("drivers: net: cpsw: remove disable_irq/enable_irq as irq can be masked from cpsw itself")
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a2298468
    • Alexander Lobakin's avatar
      net: dsa: fix flow dissection on Tx path · 85578e67
      Alexander Lobakin authored
      
      
      [ Upstream commit 8bef0af0 ]
      
      Commit 43e66528 ("net-next: dsa: fix flow dissection") added an
      ability to override protocol and network offset during flow dissection
      for DSA-enabled devices (i.e. controllers shipped as switch CPU ports)
      in order to fix skb hashing for RPS on Rx path.
      
      However, skb_hash() and added part of code can be invoked not only on
      Rx, but also on Tx path if we have a multi-queued device and:
       - kernel is running on UP system or
       - XPS is not configured.
      
      The call stack in this two cases will be like: dev_queue_xmit() ->
      __dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() ->
      skb_tx_hash() -> skb_get_hash().
      
      The problem is that skbs queued for Tx have both network offset and
      correct protocol already set up even after inserting a CPU tag by DSA
      tagger, so calling tag_ops->flow_dissect() on this path actually only
      breaks flow dissection and hashing.
      
      This can be observed by adding debug prints just before and right after
      tag_ops->flow_dissect() call to the related block of code:
      
      Before the patch:
      
      Rx path (RPS):
      
      [   19.240001] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   19.244271] tag_ops->flow_dissect()
      [   19.247811] Rx: proto: 0x0800, nhoff: 8	/* ETH_P_IP */
      
      [   19.215435] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   19.219746] tag_ops->flow_dissect()
      [   19.223241] Rx: proto: 0x0806, nhoff: 8	/* ETH_P_ARP */
      
      [   18.654057] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   18.658332] tag_ops->flow_dissect()
      [   18.661826] Rx: proto: 0x8100, nhoff: 8	/* ETH_P_8021Q */
      
      Tx path (UP system):
      
      [   18.759560] Tx: proto: 0x0800, nhoff: 26	/* ETH_P_IP */
      [   18.763933] tag_ops->flow_dissect()
      [   18.767485] Tx: proto: 0x920b, nhoff: 34	/* junk */
      
      [   22.800020] Tx: proto: 0x0806, nhoff: 26	/* ETH_P_ARP */
      [   22.804392] tag_ops->flow_dissect()
      [   22.807921] Tx: proto: 0x920b, nhoff: 34	/* junk */
      
      [   16.898342] Tx: proto: 0x86dd, nhoff: 26	/* ETH_P_IPV6 */
      [   16.902705] tag_ops->flow_dissect()
      [   16.906227] Tx: proto: 0x920b, nhoff: 34	/* junk */
      
      After:
      
      Rx path (RPS):
      
      [   16.520993] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   16.525260] tag_ops->flow_dissect()
      [   16.528808] Rx: proto: 0x0800, nhoff: 8	/* ETH_P_IP */
      
      [   15.484807] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   15.490417] tag_ops->flow_dissect()
      [   15.495223] Rx: proto: 0x0806, nhoff: 8	/* ETH_P_ARP */
      
      [   17.134621] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   17.138895] tag_ops->flow_dissect()
      [   17.142388] Rx: proto: 0x8100, nhoff: 8	/* ETH_P_8021Q */
      
      Tx path (UP system):
      
      [   15.499558] Tx: proto: 0x0800, nhoff: 26	/* ETH_P_IP */
      
      [   20.664689] Tx: proto: 0x0806, nhoff: 26	/* ETH_P_ARP */
      
      [   18.565782] Tx: proto: 0x86dd, nhoff: 26	/* ETH_P_IPV6 */
      
      In order to fix that we can add the check 'proto == htons(ETH_P_XDSA)'
      to prevent code from calling tag_ops->flow_dissect() on Tx.
      I also decided to initialize 'offset' variable so tagger callbacks can
      now safely leave it untouched without provoking a chaos.
      
      Fixes: 43e66528 ("net-next: dsa: fix flow dissection")
      Signed-off-by: default avatarAlexander Lobakin <alobakin@dlink.ru>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85578e67
    • Nikolay Aleksandrov's avatar
      net: bridge: deny dev_set_mac_address() when unregistering · ceb2b651
      Nikolay Aleksandrov authored
      
      
      [ Upstream commit c4b4c421 ]
      
      We have an interesting memory leak in the bridge when it is being
      unregistered and is a slave to a master device which would change the
      mac of its slaves on unregister (e.g. bond, team). This is a very
      unusual setup but we do end up leaking 1 fdb entry because
      dev_set_mac_address() would cause the bridge to insert the new mac address
      into its table after all fdbs are flushed, i.e. after dellink() on the
      bridge has finished and we call NETDEV_UNREGISTER the bond/team would
      release it and will call dev_set_mac_address() to restore its original
      address and that in turn will add an fdb in the bridge.
      One fix is to check for the bridge dev's reg_state in its
      ndo_set_mac_address callback and return an error if the bridge is not in
      NETREG_REGISTERED.
      
      Easy steps to reproduce:
       1. add bond in mode != A/B
       2. add any slave to the bond
       3. add bridge dev as a slave to the bond
       4. destroy the bridge device
      
      Trace:
       unreferenced object 0xffff888035c4d080 (size 128):
         comm "ip", pid 4068, jiffies 4296209429 (age 1413.753s)
         hex dump (first 32 bytes):
           41 1d c9 36 80 88 ff ff 00 00 00 00 00 00 00 00  A..6............
           d2 19 c9 5e 3f d7 00 00 00 00 00 00 00 00 00 00  ...^?...........
         backtrace:
           [<00000000ddb525dc>] kmem_cache_alloc+0x155/0x26f
           [<00000000633ff1e0>] fdb_create+0x21/0x486 [bridge]
           [<0000000092b17e9c>] fdb_insert+0x91/0xdc [bridge]
           [<00000000f2a0f0ff>] br_fdb_change_mac_address+0xb3/0x175 [bridge]
           [<000000001de02dbd>] br_stp_change_bridge_id+0xf/0xff [bridge]
           [<00000000ac0e32b1>] br_set_mac_address+0x76/0x99 [bridge]
           [<000000006846a77f>] dev_set_mac_address+0x63/0x9b
           [<00000000d30738fc>] __bond_release_one+0x3f6/0x455 [bonding]
           [<00000000fc7ec01d>] bond_netdev_event+0x2f2/0x400 [bonding]
           [<00000000305d7795>] notifier_call_chain+0x38/0x56
           [<0000000028885d4a>] call_netdevice_notifiers+0x1e/0x23
           [<000000008279477b>] rollback_registered_many+0x353/0x6a4
           [<0000000018ef753a>] unregister_netdevice_many+0x17/0x6f
           [<00000000ba854b7a>] rtnl_delete_link+0x3c/0x43
           [<00000000adf8618d>] rtnl_dellink+0x1dc/0x20a
           [<000000009b6395fd>] rtnetlink_rcv_msg+0x23d/0x268
      
      Fixes: 43598813 ("bridge: add local MAC address to forwarding table (v2)")
      Reported-by: default avatar <syzbot+2add91c08eb181fea1bf@syzkaller.appspotmail.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ceb2b651
    • Vladyslav Tarasiuk's avatar
      mqprio: Fix out-of-bounds access in mqprio_dump · 856cf5d1
      Vladyslav Tarasiuk authored
      
      
      [ Upstream commit 9f104c77 ]
      
      When user runs a command like
      tc qdisc add dev eth1 root mqprio
      KASAN stack-out-of-bounds warning is emitted.
      Currently, NLA_ALIGN macro used in mqprio_dump provides too large
      buffer size as argument for nla_put and memcpy down the call stack.
      The flow looks like this:
      1. nla_put expects exact object size as an argument;
      2. Later it provides this size to memcpy;
      3. To calculate correct padding for SKB, nla_put applies NLA_ALIGN
         macro itself.
      
      Therefore, NLA_ALIGN should not be applied to the nla_put parameter.
      Otherwise it will lead to out-of-bounds memory access in memcpy.
      
      Fixes: 4e8b86c0 ("mqprio: Introduce new hardware offload mode and shaper in mqprio")
      Signed-off-by: default avatarVladyslav Tarasiuk <vladyslavt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      856cf5d1
    • Eric Dumazet's avatar
      inet: protect against too small mtu values. · 5945be95
      Eric Dumazet authored
      
      
      [ Upstream commit 501a90c9 ]
      
      syzbot was once again able to crash a host by setting a very small mtu
      on loopback device.
      
      Let's make inetdev_valid_mtu() available in include/net/ip.h,
      and use it in ip_setup_cork(), so that we protect both ip_append_page()
      and __ip_append_data()
      
      Also add a READ_ONCE() when the device mtu is read.
      
      Pairs this lockless read with one WRITE_ONCE() in __dev_set_mtu(),
      even if other code paths might write over this field.
      
      Add a big comment in include/linux/netdevice.h about dev->mtu
      needing READ_ONCE()/WRITE_ONCE() annotations.
      
      Hopefully we will add the missing ones in followup patches.
      
      [1]
      
      refcount_t: saturated; leaking memory.
      WARNING: CPU: 0 PID: 9464 at lib/refcount.c:22 refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 0 PID: 9464 Comm: syz-executor850 Not tainted 5.4.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x197/0x210 lib/dump_stack.c:118
       panic+0x2e3/0x75c kernel/panic.c:221
       __warn.cold+0x2f/0x3e kernel/panic.c:582
       report_bug+0x289/0x300 lib/bug.c:195
       fixup_bug arch/x86/kernel/traps.c:174 [inline]
       fixup_bug arch/x86/kernel/traps.c:169 [inline]
       do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267
       do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286
       invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
      RIP: 0010:refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
      Code: 06 31 ff 89 de e8 c8 f5 e6 fd 84 db 0f 85 6f ff ff ff e8 7b f4 e6 fd 48 c7 c7 e0 71 4f 88 c6 05 56 a6 a4 06 01 e8 c7 a8 b7 fd <0f> 0b e9 50 ff ff ff e8 5c f4 e6 fd 0f b6 1d 3d a6 a4 06 31 ff 89
      RSP: 0018:ffff88809689f550 EFLAGS: 00010286
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff815e4336 RDI: ffffed1012d13e9c
      RBP: ffff88809689f560 R08: ffff88809c50a3c0 R09: fffffbfff15d31b1
      R10: fffffbfff15d31b0 R11: ffffffff8ae98d87 R12: 0000000000000001
      R13: 0000000000040100 R14: ffff888099041104 R15: ffff888218d96e40
       refcount_add include/linux/refcount.h:193 [inline]
       skb_set_owner_w+0x2b6/0x410 net/core/sock.c:1999
       sock_wmalloc+0xf1/0x120 net/core/sock.c:2096
       ip_append_page+0x7ef/0x1190 net/ipv4/ip_output.c:1383
       udp_sendpage+0x1c7/0x480 net/ipv4/udp.c:1276
       inet_sendpage+0xdb/0x150 net/ipv4/af_inet.c:821
       kernel_sendpage+0x92/0xf0 net/socket.c:3794
       sock_sendpage+0x8b/0xc0 net/socket.c:936
       pipe_to_sendpage+0x2da/0x3c0 fs/splice.c:458
       splice_from_pipe_feed fs/splice.c:512 [inline]
       __splice_from_pipe+0x3ee/0x7c0 fs/splice.c:636
       splice_from_pipe+0x108/0x170 fs/splice.c:671
       generic_splice_sendpage+0x3c/0x50 fs/splice.c:842
       do_splice_from fs/splice.c:861 [inline]
       direct_splice_actor+0x123/0x190 fs/splice.c:1035
       splice_direct_to_actor+0x3b4/0xa30 fs/splice.c:990
       do_splice_direct+0x1da/0x2a0 fs/splice.c:1078
       do_sendfile+0x597/0xd00 fs/read_write.c:1464
       __do_sys_sendfile64 fs/read_write.c:1525 [inline]
       __se_sys_sendfile64 fs/read_write.c:1511 [inline]
       __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x441409
      Code: e8 ac e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fffb64c4f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441409
      RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000005
      RBP: 0000000000073b8a R08: 0000000000000010 R09: 0000000000000010
      R10: 0000000000010001 R11: 0000000000000246 R12: 0000000000402180
      R13: 0000000000402210 R14: 0000000000000000 R15: 0000000000000000
      Kernel Offset: disabled
      Rebooting in 86400 seconds..
      
      Fixes: 1470ddf7 ("inet: Remove explicit write references to sk/inet in ip_append_data")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5945be95
    • Greg Kroah-Hartman's avatar
    • Robert Richter's avatar
      EDAC/ghes: Do not warn when incrementing refcount on 0 · aadb4f25
      Robert Richter authored
      
      
      [ Upstream commit 16214bd9 ]
      
      The following warning from the refcount framework is seen during ghes
      initialization:
      
        EDAC MC0: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)
        ------------[ cut here ]------------
        refcount_t: increment on 0; use-after-free.
        WARNING: CPU: 36 PID: 1 at lib/refcount.c:156 refcount_inc_checked
       [...]
        Call trace:
         refcount_inc_checked
         ghes_edac_register
         ghes_probe
         ...
      
      It warns if the refcount is incremented from zero. This warning is
      reasonable as a kernel object is typically created with a refcount of
      one and freed once the refcount is zero. Afterwards the object would be
      "used-after-free".
      
      For GHES, the refcount is initialized with zero, and that is why this
      message is seen when initializing the first instance. However, whenever
      the refcount is zero, the device will be allocated and registered. Since
      the ghes_reg_mutex protects the refcount and serializes allocation and
      freeing of ghes devices, a use-after-free cannot happen here.
      
      Instead of using refcount_inc() for the first instance, use
      refcount_set(). This can be used here because the refcount is zero at
      this point and can not change due to its protection by the mutex.
      
      Fixes: 23f61b9f ("EDAC/ghes: Fix locking and memory barrier issues")
      Reported-by: default avatarJohn Garry <john.garry@huawei.com>
      Signed-off-by: default avatarRobert Richter <rrichter@marvell.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Tested-by: default avatarJohn Garry <john.garry@huawei.com>
      Cc: <huangming23@huawei.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: <linuxarm@huawei.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: <tanxiaofei@huawei.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: <wanghuiqiang@huawei.com>
      Link: https://lkml.kernel.org/r/20191121213628.21244-1-rrichter@marvell.com
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      aadb4f25
    • Andreas Gruenbacher's avatar
      block: fix "check bi_size overflow before merge" · 9d2710fe
      Andreas Gruenbacher authored
      
      
      [ Upstream commit cc90bc68 ]
      
      This partially reverts commit e3a5d8e3.
      
      Commit e3a5d8e3 ("check bi_size overflow before merge") adds a bio_full
      check to __bio_try_merge_page.  This will cause __bio_try_merge_page to fail
      when the last bi_io_vec has been reached.  Instead, what we want here is only
      the bi_size overflow check.
      
      Fixes: e3a5d8e3 ("block: check bi_size overflow before merge")
      Cc: stable@vger.kernel.org # v5.4+
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9d2710fe
    • Andre Przywara's avatar
      arm64: dts: allwinner: a64: Re-add PMU node · d0400f59
      Andre Przywara authored
      
      
      [ Upstream commit 6b832a14 ]
      
      As it was found recently, the Performance Monitoring Unit (PMU) on the
      Allwinner A64 SoC was not generating (the right) interrupts. With the
      SPI numbers from the manual the kernel did not receive any overflow
      interrupts, so perf was not happy at all.
      It turns out that the numbers were just off by 4, so the PMU interrupts
      are from 148 to 151, not from 152 to 155 as the manual describes.
      
      This was found by playing around with U-Boot, which typically does not
      use interrupts, so the GIC is fully available for experimentation:
      With *every* PPI and SPI enabled, an overflowing PMU cycle counter was
      found to set a bit in one of the GICD_ISPENDR registers, with careful
      counting this was determined to be number 148.
      
      Tested with perf record and perf top on a Pine64-LTS. Also tested with
      tasksetting to every core to confirm the assignment between IRQs and
      cores.
      
      This somewhat "revert-fixes" commit ed3e9406 ("arm64: dts: allwinner:
      a64: Drop PMU node").
      
      Fixes: 34a97fcc ("arm64: dts: allwinner: a64: Add PMU node")
      Fixes: ed3e9406 ("arm64: dts: allwinner: a64: Drop PMU node")
      Signed-off-by: default avatarAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: default avatarMaxime Ripard <maxime@cerno.tech>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d0400f59
    • Eric Dumazet's avatar
      net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add() · 5a93c71a
      Eric Dumazet authored
      
      
      [ Upstream commit 2dd5616e ]
      
      Use the new tcf_proto_check_kind() helper to make sure user
      provided value is well formed.
      
      BUG: KMSAN: uninit-value in string_nocheck lib/vsprintf.c:606 [inline]
      BUG: KMSAN: uninit-value in string+0x4be/0x600 lib/vsprintf.c:668
      CPU: 0 PID: 12358 Comm: syz-executor.1 Not tainted 5.4.0-rc8-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x220 lib/dump_stack.c:118
       kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108
       __msan_warning+0x64/0xc0 mm/kmsan/kmsan_instr.c:245
       string_nocheck lib/vsprintf.c:606 [inline]
       string+0x4be/0x600 lib/vsprintf.c:668
       vsnprintf+0x218f/0x3210 lib/vsprintf.c:2510
       __request_module+0x2b1/0x11c0 kernel/kmod.c:143
       tcf_proto_lookup_ops+0x171/0x700 net/sched/cls_api.c:139
       tc_chain_tmplt_add net/sched/cls_api.c:2730 [inline]
       tc_ctl_chain+0x1904/0x38a0 net/sched/cls_api.c:2850
       rtnetlink_rcv_msg+0x115a/0x1580 net/core/rtnetlink.c:5224
       netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
       rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5242
       netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
       netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1328
       netlink_sendmsg+0x110f/0x1330 net/netlink/af_netlink.c:1917
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg net/socket.c:657 [inline]
       ___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311
       __sys_sendmsg net/socket.c:2356 [inline]
       __do_sys_sendmsg net/socket.c:2365 [inline]
       __se_sys_sendmsg+0x305/0x460 net/socket.c:2363
       __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2363
       do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x45a649
      Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f0790795c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 000000000045a649
      RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000006
      RBP: 000000000075bfc8 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f07907966d4
      R13: 00000000004c8db5 R14: 00000000004df630 R15: 00000000ffffffff
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:149 [inline]
       kmsan_internal_poison_shadow+0x5c/0x110 mm/kmsan/kmsan.c:132
       kmsan_slab_alloc+0x97/0x100 mm/kmsan/kmsan_hooks.c:86
       slab_alloc_node mm/slub.c:2773 [inline]
       __kmalloc_node_track_caller+0xe27/0x11a0 mm/slub.c:4381
       __kmalloc_reserve net/core/skbuff.c:141 [inline]
       __alloc_skb+0x306/0xa10 net/core/skbuff.c:209
       alloc_skb include/linux/skbuff.h:1049 [inline]
       netlink_alloc_large_skb net/netlink/af_netlink.c:1174 [inline]
       netlink_sendmsg+0x783/0x1330 net/netlink/af_netlink.c:1892
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg net/socket.c:657 [inline]
       ___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311
       __sys_sendmsg net/socket.c:2356 [inline]
       __do_sys_sendmsg net/socket.c:2365 [inline]
       __se_sys_sendmsg+0x305/0x460 net/socket.c:2363
       __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2363
       do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 6f96c3c6 ("net_sched: fix backward compatibility for TCA_KIND")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5a93c71a
    • Chuck Lever's avatar
      SUNRPC: Fix another issue with MIC buffer space · af075ec9
      Chuck Lever authored
      
      
      [ Upstream commit e8d70b32 ]
      
      xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len.
      This can happen when xdr_buf_read_mic() is given an xdr_buf with
      a small page array (like, only a few bytes).
      
      Instead, just cap the number of bytes that xdr_shrink_pagelen()
      will move.
      
      Fixes: 5f1bc399 ("SUNRPC: Fix buffer handling of GSS MIC ... ")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      af075ec9
    • Roman Bolshakov's avatar
      scsi: qla2xxx: Change discovery state before PLOGI · 11c3ef90
      Roman Bolshakov authored
      [ Upstream commit 58e39a2c ]
      
      When a port sends PLOGI, discovery state should be changed to login
      pending, otherwise RELOGIN_NEEDED bit is set in
      qla24xx_handle_plogi_done_event(). RELOGIN_NEEDED triggers another PLOGI,
      and it never goes out of the loop until login timer expires.
      
      Fixes: 8777e431 ("scsi: qla2xxx: Migrate NVME N2N handling into state machine")
      Fixes: 8b5292bc ("scsi: qla2xxx: Fix Relogin to prevent modifying scan_state flag")
      Cc: Quinn Tran <qutran@marvell.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20191125165702.1013-6-r.bolshakov@yadro.com
      
      
      Acked-by: default avatarHimanshu Madhani <hmadhani@marvell.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Tested-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarRoman Bolshakov <r.bolshakov@yadro.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      11c3ef90
    • Guoqing Jiang's avatar
      raid5: need to set STRIPE_HANDLE for batch head · 51952b3c
      Guoqing Jiang authored
      
      
      [ Upstream commit a7ede3d1 ]
      
      With commit 6ce220dd ("raid5: don't set
      STRIPE_HANDLE to stripe which is in batch list"), we don't want to set
      STRIPE_HANDLE flag for sh which is already in batch list.
      
      However, the stripe which is the head of batch list should set this flag,
      otherwise panic could happen inside init_stripe at BUG_ON(sh->batch_head),
      it is reproducible with raid5 on top of nvdimm devices per Xiao oberserved.
      
      Thanks for Xiao's effort to verify the change.
      
      Fixes: 6ce220dd ("raid5: don't set STRIPE_HANDLE to stripe which is in batch list")
      Reported-by: default avatarXiao Ni <xni@redhat.com>
      Tested-by: default avatarXiao Ni <xni@redhat.com>
      Signed-off-by: default avatarGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      51952b3c
    • Tejun Heo's avatar
      workqueue: Fix missing kfree(rescuer) in destroy_workqueue() · 9b025e3f
      Tejun Heo authored
      
      
      commit 8efe1223 upstream.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Fixes: def98c84 ("workqueue: Fix spurious sanity check failures in destroy_workqueue()")
      Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b025e3f
    • Ming Lei's avatar
      blk-mq: make sure that line break can be printed · e484f554
      Ming Lei authored
      
      
      commit d2c9be89 upstream.
      
      8962842c ("blk-mq: avoid sysfs buffer overflow with too many CPU cores")
      avoids sysfs buffer overflow, and reserves one character for line break.
      However, the last snprintf() doesn't get correct 'size' parameter passed
      in, so fixed it.
      
      Fixes: 8962842c ("blk-mq: avoid sysfs buffer overflow with too many CPU cores")
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e484f554
    • yangerkun's avatar
      ext4: fix a bug in ext4_wait_for_tail_page_commit · e880a844
      yangerkun authored
      
      
      commit 565333a1 upstream.
      
      No need to wait for any commit once the page is fully truncated.
      Besides, it may confuse e.g. concurrent ext4_writepage() with the page
      still be dirty (will be cleared by truncate_pagecache() in
      ext4_setattr()) but buffers has been freed; and then trigger a bug
      show as below:
      
      [   26.057508] ------------[ cut here ]------------
      [   26.058531] kernel BUG at fs/ext4/inode.c:2134!
      ...
      [   26.088130] Call trace:
      [   26.088695]  ext4_writepage+0x914/0xb28
      [   26.089541]  writeout.isra.4+0x1b4/0x2b8
      [   26.090409]  move_to_new_page+0x3b0/0x568
      [   26.091338]  __unmap_and_move+0x648/0x988
      [   26.092241]  unmap_and_move+0x48c/0xbb8
      [   26.093096]  migrate_pages+0x220/0xb28
      [   26.093945]  kernel_mbind+0x828/0xa18
      [   26.094791]  __arm64_sys_mbind+0xc8/0x138
      [   26.095716]  el0_svc_common+0x190/0x490
      [   26.096571]  el0_svc_handler+0x60/0xd0
      [   26.097423]  el0_svc+0x8/0xc
      
      Run the procedure (generate by syzkaller) parallel with ext3.
      
      void main()
      {
      	int fd, fd1, ret;
      	void *addr;
      	size_t length = 4096;
      	int flags;
      	off_t offset = 0;
      	char *str = "12345";
      
      	fd = open("a", O_RDWR | O_CREAT);
      	assert(fd >= 0);
      
      	/* Truncate to 4k */
      	ret = ftruncate(fd, length);
      	assert(ret == 0);
      
      	/* Journal data mode */
      	flags = 0xc00f;
      	ret = ioctl(fd, _IOW('f', 2, long), &flags);
      	assert(ret == 0);
      
      	/* Truncate to 0 */
      	fd1 = open("a", O_TRUNC | O_NOATIME);
      	assert(fd1 >= 0);
      
      	addr = mmap(NULL, length, PROT_WRITE | PROT_READ,
      					MAP_SHARED, fd, offset);
      	assert(addr != (void *)-1);
      
      	memcpy(addr, str, 5);
      	mbind(addr, length, 0, 0, 0, MPOL_MF_MOVE);
      }
      
      And the bug will be triggered once we seen the below order.
      
      reproduce1                         reproduce2
      
      ...                            |   ...
      truncate to 4k                 |
      change to journal data mode    |
                                     |   memcpy(set page dirty)
      truncate to 0:                 |
      ext4_setattr:                  |
      ...                            |
      ext4_wait_for_tail_page_commit |
                                     |   mbind(trigger bug)
      truncate_pagecache(clean dirty)|   ...
      ...                            |
      
      mbind will call ext4_writepage() since the page still be dirty, and then
      report the bug since the buffers has been free. Fix it by return
      directly once offset equals to 0 which means the page has been fully
      truncated.
      
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avataryangerkun <yangerkun@huawei.com>
      Link: https://lore.kernel.org/r/20190919063508.1045-1-yangerkun@huawei.com
      
      
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e880a844
    • Darrick J. Wong's avatar
      splice: only read in as much information as there is pipe buffer space · acdd9804
      Darrick J. Wong authored
      
      
      commit 3253d9d0 upstream.
      
      Andreas Grünbacher reports that on the two filesystems that support
      iomap directio, it's possible for splice() to return -EAGAIN (instead of
      a short splice) if the pipe being written to has less space available in
      its pipe buffers than the length supplied by the calling process.
      
      Months ago we fixed splice_direct_to_actor to clamp the length of the
      read request to the size of the splice pipe.  Do the same to do_splice.
      
      Fixes: 17614445 ("splice: don't read more than available pipe space")
      Reported-by: default avatar <syzbot+3c01db6025f26530cf8d@syzkaller.appspotmail.com>
      Reported-by: default avatarAndreas Grünbacher <andreas.gruenbacher@gmail.com>
      Reviewed-by: default avatarAndreas Grünbacher <andreas.gruenbacher@gmail.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      acdd9804
    • Alexandre Belloni's avatar
      rtc: disable uie before setting time and enable after · ec968952
      Alexandre Belloni authored
      
      
      commit 7e7c005b upstream.
      
      When setting the time in the future with the uie timer enabled,
      rtc_timer_do_work will loop for a while because the expiration of the uie
      timer was way before the current RTC time and a new timer will be enqueued
      until the current rtc time is reached.
      
      If the uie timer is enabled, disable it before setting the time and enable
      it after expiring current timers (which may actually be an alarm).
      
      This is the safest thing to do to ensure the uie timer is still
      synchronized with the RTC, especially in the UIE emulation case.
      
      Reported-by: default avatar <syzbot+08116743f8ad6f9a6de7@syzkaller.appspotmail.com>
      Fixes: 6610e089 ("RTC: Rework RTC code to use timerqueue for events")
      Link: https://lore.kernel.org/r/20191020231320.8191-1-alexandre.belloni@bootlin.com
      
      
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ec968952
    • Chen Jun's avatar
      mm/shmem.c: cast the type of unmap_start to u64 · 3b353b2b
      Chen Jun authored
      commit aa71ecd8 upstream.
      
      In 64bit system. sb->s_maxbytes of shmem filesystem is MAX_LFS_FILESIZE,
      which equal LLONG_MAX.
      
      If offset > LLONG_MAX - PAGE_SIZE, offset + len < LLONG_MAX in
      shmem_fallocate, which will pass the checking in vfs_fallocate.
      
      	/* Check for wrap through zero too */
      	if (((offset + len) > inode->i_sb->s_maxbytes) || ((offset + len) < 0))
      		return -EFBIG;
      
      loff_t unmap_start = round_up(offset, PAGE_SIZE) in shmem_fallocate
      causes a overflow.
      
      Syzkaller reports a overflow problem in mm/shmem:
      
        UBSAN: Undefined behaviour in mm/shmem.c:2014:10
        signed integer overflow: '9223372036854775807 + 1' cannot be represented in type 'long long int'
        CPU: 0 PID:17076 Comm: syz-executor0 Not tainted 4.1.46+ #1
        Hardware name: linux, dummy-virt (DT)
        Call trace:
           dump_backtrace+0x0/0x2c8 arch/arm64/kernel/traps.c:100
           show_stack+0x20/0x30 arch/arm64/kernel/traps.c:238
           __dump_stack lib/dump_stack.c:15 [inline]
           ubsan_epilogue+0x18/0x70 lib/ubsan.c:164
           handle_overflow+0x158/0x1b0 lib/ubsan.c:195
           shmem_fallocate+0x6d0/0x820 mm/shmem.c:2104
           vfs_fallocate+0x238/0x428 fs/open.c:312
           SYSC_fallocate fs/open.c:335 [inline]
           SyS_fallocate+0x54/0xc8 fs/open.c:239
      
      The highest bit of unmap_start will be appended with sign bit 1
      (overflow) when calculate shmem_falloc.start:
      
          shmem_falloc.start = unmap_start >> PAGE_SHIFT.
      
      Fix it by casting the type of unmap_start to u64, when right shifted.
      
      This bug is found in LTS Linux 4.1.  It also seems to exist in mainline.
      
      Link: http://lkml.kernel.org/r/1573867464-5107-1-git-send-email-chenjun102@huawei.com
      
      
      Signed-off-by: default avatarChen Jun <chenjun102@huawei.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b353b2b
    • Gerald Schaefer's avatar
      s390/kaslr: store KASLR offset for early dumps · 58aa93a2
      Gerald Schaefer authored
      
      
      commit a9f2f686 upstream.
      
      The KASLR offset is added to vmcoreinfo in arch_crash_save_vmcoreinfo(),
      so that it can be found by crash when processing kernel dumps.
      
      However, arch_crash_save_vmcoreinfo() is called during a subsys_initcall,
      so if the kernel crashes before that, we have no vmcoreinfo and no KASLR
      offset.
      
      Fix this by storing the KASLR offset in the lowcore, where the vmcore_info
      pointer will be stored, and where it can be found by crash. In order to
      make it distinguishable from a real vmcore_info pointer, mark it as uneven
      (KASLR offset itself is aligned to THREAD_SIZE).
      
      When arch_crash_save_vmcoreinfo() stores the real vmcore_info pointer in
      the lowcore, it overwrites the KASLR offset. At that point, the KASLR
      offset is not yet added to vmcoreinfo, so we also need to move the
      mem_assign_absolute() behind the vmcoreinfo_append_str().
      
      Fixes: b2d24b97 ("s390/kernel: add support for kernel address space layout randomization (KASLR)")
      Cc: <stable@vger.kernel.org> # v5.2+
      Signed-off-by: default avatarGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58aa93a2
Loading