Skip to content
  1. Aug 09, 2019
    • Jia-Ju Bai's avatar
      net: sched: Fix a possible null-pointer dereference in dequeue_func() · d82dc254
      Jia-Ju Bai authored
      [ Upstream commit 051c7b39 ]
      
      In dequeue_func(), there is an if statement on line 74 to check whether
      skb is NULL:
          if (skb)
      
      When skb is NULL, it is used on line 77:
          prefetch(&skb->end);
      
      Thus, a possible null-pointer dereference may occur.
      
      To fix this bug, skb->end is used when skb is not NULL.
      
      This bug is found by a static analysis tool STCheck written by us.
      
      Fixes: 76e3cc12
      
       ("codel: Controlled Delay AQM")
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d82dc254
    • Subash Abhinov Kasiviswanathan's avatar
      net: qualcomm: rmnet: Fix incorrect UL checksum offload logic · 44b96a38
      Subash Abhinov Kasiviswanathan authored
      [ Upstream commit a7cf3d24 ]
      
      The udp_ip4_ind bit is set only for IPv4 UDP non-fragmented packets
      so that the hardware can flip the checksum to 0xFFFF if the computed
      checksum is 0 per RFC768.
      
      However, this bit had to be set for IPv6 UDP non fragmented packets
      as well per hardware requirements. Otherwise, IPv6 UDP packets
      with computed checksum as 0 were transmitted by hardware and were
      dropped in the network.
      
      In addition to setting this bit for IPv6 UDP, the field is also
      appropriately renamed to udp_ind as part of this change.
      
      Fixes: 5eb5f860
      
       ("net: qualcomm: rmnet: Add support for TX checksum offload")
      Cc: Sean Tranchetti <stranche@codeaurora.org>
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      44b96a38
    • René van Dorst's avatar
      net: phylink: Fix flow control for fixed-link · c8b05980
      René van Dorst authored
      [ Upstream commit 8aace4f3 ]
      
      In phylink_parse_fixedlink() the pl->link_config.advertising bits are AND
      with pl->supported, pl->supported is zeroed and only the speed/duplex
      modes and MII bits are set.
      So pl->link_config.advertising always loses the flow control/pause bits.
      
      By setting Pause and Asym_Pause bits in pl->supported, the flow control
      work again when devicetree "pause" is set in fixes-link node and the MAC
      advertise that is supports pause.
      
      Results with this patch.
      
      Legend:
      - DT = 'Pause' is set in the fixed-link in devicetree.
      - validate() = ‘Yes’ means phylink_set(mask, Pause) is set in the
        validate().
      - flow = results reported my link is Up line.
      
      +-----+------------+-------+
      | DT  | validate() | flow  |
      +-----+------------+-------+
      | Yes | Yes        | rx/tx |
      | No  | Yes        | off   |
      | Yes | No         | off   |
      +-----+------------+-------+
      
      Fixes: 9525ae83
      
       ("phylink: add phylink infrastructure")
      Signed-off-by: default avatarRené van Dorst <opensource@vdorst.com>
      Acked-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8b05980
    • Mark Zhang's avatar
      net/mlx5: Use reversed order when unregister devices · 4dddd08b
      Mark Zhang authored
      [ Upstream commit 08aa5e7d ]
      
      When lag is active, which is controlled by the bonded mlx5e netdev, mlx5
      interface unregestering must happen in the reverse order where rdma is
      unregistered (unloaded) first, to guarantee all references to the lag
      context in hardware is removed, then remove mlx5e netdev interface which
      will cleanup the lag context from hardware.
      
      Without this fix during destroy of LAG interface, we observed following
      errors:
       * mlx5_cmd_check:752:(pid 12556): DESTROY_LAG(0x843) op_mod(0x0) failed,
         status bad parameter(0x3), syndrome (0xe4ac33)
       * mlx5_cmd_check:752:(pid 12556): DESTROY_LAG(0x843) op_mod(0x0) failed,
         status bad parameter(0x3), syndrome (0xa5aee8).
      
      Fixes: a31208b1
      
       ("net/mlx5_core: New init and exit flow for mlx5_core")
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarMark Zhang <markz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4dddd08b
    • Qian Cai's avatar
      net/mlx5e: always initialize frag->last_in_page · 858f82c6
      Qian Cai authored
      [ Upstream commit 60d60c8f ]
      
      The commit 069d1146 ("net/mlx5e: RX, Enhance legacy Receive Queue
      memory scheme") introduced an undefined behaviour below due to
      "frag->last_in_page" is only initialized in mlx5e_init_frags_partition()
      when,
      
      if (next_frag.offset + frag_info[f].frag_stride > PAGE_SIZE)
      
      or after bailed out the loop,
      
      for (i = 0; i < mlx5_wq_cyc_get_size(&rq->wqe.wq); i++)
      
      As the result, there could be some "frag" have uninitialized
      value of "last_in_page".
      
      Later, get_frag() obtains those "frag" and check "frag->last_in_page" in
      mlx5e_put_rx_frag() and triggers the error during boot. Fix it by always
      initializing "frag->last_in_page" to "false" in
      mlx5e_init_frags_partition().
      
      UBSAN: Undefined behaviour in
      drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:325:12
      load of value 170 is not a valid value for type 'bool' (aka '_Bool')
      Call trace:
       dump_backtrace+0x0/0x264
       show_stack+0x20/0x2c
       dump_stack+0xb0/0x104
       __ubsan_handle_load_invalid_value+0x104/0x128
       mlx5e_handle_rx_cqe+0x8e8/0x12cc [mlx5_core]
       mlx5e_poll_rx_cq+0xca8/0x1a94 [mlx5_core]
       mlx5e_napi_poll+0x17c/0xa30 [mlx5_core]
       net_rx_action+0x248/0x940
       __do_softirq+0x350/0x7b8
       irq_exit+0x200/0x26c
       __handle_domain_irq+0xc8/0x128
       gic_handle_irq+0x138/0x228
       el1_irq+0xb8/0x140
       arch_cpu_idle+0x1a4/0x348
       do_idle+0x114/0x1b0
       cpu_startup_entry+0x24/0x28
       rest_init+0x1ac/0x1dc
       arch_call_rest_init+0x10/0x18
       start_kernel+0x4d4/0x57c
      
      Fixes: 069d1146
      
       ("net/mlx5e: RX, Enhance legacy Receive Queue memory scheme")
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      858f82c6
    • Jiri Pirko's avatar
      net: fix ifindex collision during namespace removal · edb7ad69
      Jiri Pirko authored
      [ Upstream commit 55b40dbf ]
      
      Commit aca51397 ("netns: Fix arbitrary net_device-s corruptions
      on net_ns stop.") introduced a possibility to hit a BUG in case device
      is returning back to init_net and two following conditions are met:
      1) dev->ifindex value is used in a name of another "dev%d"
         device in init_net.
      2) dev->name is used by another device in init_net.
      
      Under real life circumstances this is hard to get. Therefore this has
      been present happily for over 10 years. To reproduce:
      
      $ ip a
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host
             valid_lft forever preferred_lft forever
      2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff
      3: enp0s2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
      $ ip netns add ns1
      $ ip -n ns1 link add dummy1ns1 type dummy
      $ ip -n ns1 link add dummy2ns1 type dummy
      $ ip link set enp0s2 netns ns1
      $ ip -n ns1 link set enp0s2 name dummy0
      [  100.858894] virtio_net virtio0 dummy0: renamed from enp0s2
      $ ip link add dev4 type dummy
      $ ip -n ns1 a
      1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      2: dummy1ns1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 16:63:4c:38:3e:ff brd ff:ff:ff:ff:ff:ff
      3: dummy2ns1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether aa:9e:86:dd:6b:5d brd ff:ff:ff:ff:ff:ff
      4: dummy0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
      $ ip a
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host
             valid_lft forever preferred_lft forever
      2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff
      4: dev4: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 5a:e1:4a:b6:ec:f8 brd ff:ff:ff:ff:ff:ff
      $ ip netns del ns1
      [  158.717795] default_device_exit: failed to move dummy0 to init_net: -17
      [  158.719316] ------------[ cut here ]------------
      [  158.720591] kernel BUG at net/core/dev.c:9824!
      [  158.722260] invalid opcode: 0000 [#1] SMP KASAN PTI
      [  158.723728] CPU: 0 PID: 56 Comm: kworker/u2:1 Not tainted 5.3.0-rc1+ #18
      [  158.725422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
      [  158.727508] Workqueue: netns cleanup_net
      [  158.728915] RIP: 0010:default_device_exit.cold+0x1d/0x1f
      [  158.730683] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e
      [  158.736854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282
      [  158.738752] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000
      [  158.741369] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64
      [  158.743418] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c
      [  158.745626] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000
      [  158.748405] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72
      [  158.750638] FS:  0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
      [  158.752944] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  158.755245] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0
      [  158.757654] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  158.760012] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  158.762758] Call Trace:
      [  158.763882]  ? dev_change_net_namespace+0xbb0/0xbb0
      [  158.766148]  ? devlink_nl_cmd_set_doit+0x520/0x520
      [  158.768034]  ? dev_change_net_namespace+0xbb0/0xbb0
      [  158.769870]  ops_exit_list.isra.0+0xa8/0x150
      [  158.771544]  cleanup_net+0x446/0x8f0
      [  158.772945]  ? unregister_pernet_operations+0x4a0/0x4a0
      [  158.775294]  process_one_work+0xa1a/0x1740
      [  158.776896]  ? pwq_dec_nr_in_flight+0x310/0x310
      [  158.779143]  ? do_raw_spin_lock+0x11b/0x280
      [  158.780848]  worker_thread+0x9e/0x1060
      [  158.782500]  ? process_one_work+0x1740/0x1740
      [  158.784454]  kthread+0x31b/0x420
      [  158.786082]  ? __kthread_create_on_node+0x3f0/0x3f0
      [  158.788286]  ret_from_fork+0x3a/0x50
      [  158.789871] ---[ end trace defd6c657c71f936 ]---
      [  158.792273] RIP: 0010:default_device_exit.cold+0x1d/0x1f
      [  158.795478] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e
      [  158.804854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282
      [  158.807865] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000
      [  158.811794] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64
      [  158.816652] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c
      [  158.820930] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000
      [  158.825113] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72
      [  158.829899] FS:  0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
      [  158.834923] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  158.838164] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0
      [  158.841917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  158.845149] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fix this by checking if a device with the same name exists in init_net
      and fallback to original code - dev%d to allocate name - in case it does.
      
      This was found using syzkaller.
      
      Fixes: aca51397
      
       ("netns: Fix arbitrary net_device-s corruptions on net_ns stop.")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      edb7ad69
    • Nikolay Aleksandrov's avatar
      net: bridge: mcast: don't delete permanent entries when fast leave is enabled · a19d4e34
      Nikolay Aleksandrov authored
      [ Upstream commit 5c725b6b ]
      
      When permanent entries were introduced by the commit below, they were
      exempt from timing out and thus igmp leave wouldn't affect them unless
      fast leave was enabled on the port which was added before permanent
      entries existed. It shouldn't matter if fast leave is enabled or not
      if the user added a permanent entry it shouldn't be deleted on igmp
      leave.
      
      Before:
      $ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave
      $ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent
      $ bridge mdb show
      dev br0 port eth4 grp 229.1.1.1 permanent
      
      < join and leave 229.1.1.1 on eth4 >
      
      $ bridge mdb show
      $
      
      After:
      $ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave
      $ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent
      $ bridge mdb show
      dev br0 port eth4 grp 229.1.1.1 permanent
      
      < join and leave 229.1.1.1 on eth4 >
      
      $ bridge mdb show
      dev br0 port eth4 grp 229.1.1.1 permanent
      
      Fixes: ccb1c31a
      
       ("bridge: add flags to distinguish permanent mdb entires")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a19d4e34
    • Nikolay Aleksandrov's avatar
      net: bridge: delete local fdb on device init failure · 639239be
      Nikolay Aleksandrov authored
      [ Upstream commit d7bae09f
      
       ]
      
      On initialization failure we have to delete the local fdb which was
      inserted due to the default pvid creation. This problem has been present
      since the inception of default_pvid. Note that currently there are 2 cases:
      1) in br_dev_init() when br_multicast_init() fails
      2) if register_netdevice() fails after calling ndo_init()
      
      This patch takes care of both since br_vlan_flush() is called on both
      occasions. Also the new fdb delete would be a no-op on normal bridge
      device destruction since the local fdb would've been already flushed by
      br_dev_delete(). This is not an issue for ports since nbp_vlan_init() is
      called last when adding a port thus nothing can fail after it.
      
      Reported-by: default avatar <syzbot+88533dc8b582309bf3ee@syzkaller.appspotmail.com>
      Fixes: 5be5a2df
      
       ("bridge: Add filtering support for default_pvid")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      639239be
    • Matteo Croce's avatar
      mvpp2: refactor MTU change code · b3645a48
      Matteo Croce authored
      [ Upstream commit 230bd958 ]
      
      The MTU change code can call napi_disable() with the device already down,
      leading to a deadlock. Also, lot of code is duplicated unnecessarily.
      
      Rework mvpp2_change_mtu() to avoid the deadlock and remove duplicated code.
      
      Fixes: 3f518509
      
       ("ethernet: Add new driver for Marvell Armada 375 network unit")
      Signed-off-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3645a48
    • Matteo Croce's avatar
      mvpp2: fix panic on module removal · ffab47bf
      Matteo Croce authored
      [ Upstream commit 944a83a2 ]
      
      mvpp2 uses a delayed workqueue to gather traffic statistics.
      On module removal the workqueue can be destroyed before calling
      cancel_delayed_work_sync() on its works.
      Fix it by moving the destroy_workqueue() call after mvpp2_port_remove().
      Also remove an unneeded call to flush_workqueue()
      
          # rmmod mvpp2
          [ 2743.311722] mvpp2 f4000000.ethernet eth1: phy link down 10gbase-kr/10Gbps/Full
          [ 2743.320063] mvpp2 f4000000.ethernet eth1: Link is Down
          [ 2743.572263] mvpp2 f4000000.ethernet eth2: phy link down sgmii/1Gbps/Full
          [ 2743.580076] mvpp2 f4000000.ethernet eth2: Link is Down
          [ 2744.102169] mvpp2 f2000000.ethernet eth0: phy link down 10gbase-kr/10Gbps/Full
          [ 2744.110441] mvpp2 f2000000.ethernet eth0: Link is Down
          [ 2744.115614] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
          [ 2744.115615] Mem abort info:
          [ 2744.115616]   ESR = 0x96000005
          [ 2744.115617]   Exception class = DABT (current EL), IL = 32 bits
          [ 2744.115618]   SET = 0, FnV = 0
          [ 2744.115619]   EA = 0, S1PTW = 0
          [ 2744.115620] Data abort info:
          [ 2744.115621]   ISV = 0, ISS = 0x00000005
          [ 2744.115622]   CM = 0, WnR = 0
          [ 2744.115624] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000422681000
          [ 2744.115626] [0000000000000000] pgd=0000000000000000, pud=0000000000000000
          [ 2744.115630] Internal error: Oops: 96000005 [#1] SMP
          [ 2744.115632] Modules linked in: mvpp2(-) algif_hash af_alg nls_iso8859_1 nls_cp437 vfat fat xhci_plat_hcd m25p80 spi_nor xhci_hcd mtd usbcore i2c_mv64xxx sfp usb_common marvell10g phy_generic spi_orion mdio_i2c i2c_core mvmdio phylink sbsa_gwdt ip_tables x_tables autofs4 [last unloaded: mvpp2]
          [ 2744.115654] CPU: 3 PID: 8357 Comm: kworker/3:2 Not tainted 5.3.0-rc2 #1
          [ 2744.115655] Hardware name: Marvell 8040 MACCHIATOBin Double-shot (DT)
          [ 2744.115665] Workqueue: events_power_efficient phylink_resolve [phylink]
          [ 2744.115669] pstate: a0000085 (NzCv daIf -PAN -UAO)
          [ 2744.115675] pc : __queue_work+0x9c/0x4d8
          [ 2744.115677] lr : __queue_work+0x170/0x4d8
          [ 2744.115678] sp : ffffff801001bd50
          [ 2744.115680] x29: ffffff801001bd50 x28: ffffffc422597600
          [ 2744.115684] x27: ffffff80109ae6f0 x26: ffffff80108e4018
          [ 2744.115688] x25: 0000000000000003 x24: 0000000000000004
          [ 2744.115691] x23: ffffff80109ae6e0 x22: 0000000000000017
          [ 2744.115694] x21: ffffffc42c030000 x20: ffffffc42209e8f8
          [ 2744.115697] x19: 0000000000000000 x18: 0000000000000000
          [ 2744.115699] x17: 0000000000000000 x16: 0000000000000000
          [ 2744.115701] x15: 0000000000000010 x14: ffffffffffffffff
          [ 2744.115702] x13: ffffff8090e2b95f x12: ffffff8010e2b967
          [ 2744.115704] x11: ffffff8010906000 x10: 0000000000000040
          [ 2744.115706] x9 : ffffff80109223b8 x8 : ffffff80109223b0
          [ 2744.115707] x7 : ffffffc42bc00068 x6 : 0000000000000000
          [ 2744.115709] x5 : ffffffc42bc00000 x4 : 0000000000000000
          [ 2744.115710] x3 : 0000000000000000 x2 : 0000000000000000
          [ 2744.115712] x1 : 0000000000000008 x0 : ffffffc42c030000
          [ 2744.115714] Call trace:
          [ 2744.115716]  __queue_work+0x9c/0x4d8
          [ 2744.115718]  delayed_work_timer_fn+0x28/0x38
          [ 2744.115722]  call_timer_fn+0x3c/0x180
          [ 2744.115723]  expire_timers+0x60/0x168
          [ 2744.115724]  run_timer_softirq+0xbc/0x1e8
          [ 2744.115727]  __do_softirq+0x128/0x320
          [ 2744.115731]  irq_exit+0xa4/0xc0
          [ 2744.115734]  __handle_domain_irq+0x70/0xc0
          [ 2744.115735]  gic_handle_irq+0x58/0xa8
          [ 2744.115737]  el1_irq+0xb8/0x140
          [ 2744.115738]  console_unlock+0x3a0/0x568
          [ 2744.115740]  vprintk_emit+0x200/0x2a0
          [ 2744.115744]  dev_vprintk_emit+0x1c8/0x1e4
          [ 2744.115747]  dev_printk_emit+0x6c/0x7c
          [ 2744.115751]  __netdev_printk+0x104/0x1d8
          [ 2744.115752]  netdev_printk+0x60/0x70
          [ 2744.115756]  phylink_resolve+0x38c/0x3c8 [phylink]
          [ 2744.115758]  process_one_work+0x1f8/0x448
          [ 2744.115760]  worker_thread+0x54/0x500
          [ 2744.115762]  kthread+0x12c/0x130
          [ 2744.115764]  ret_from_fork+0x10/0x1c
          [ 2744.115768] Code: aa1403e0 97fffbbe aa0003f5 b4000700 (f9400261)
      
      Fixes: 118d6298
      
       ("net: mvpp2: add ethtool GOP statistics")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarMatteo Croce <mcroce@redhat.com>
      Acked-by: default avatarAntoine Tenart <antoine.tenart@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ffab47bf
    • Jiri Pirko's avatar
      mlxsw: spectrum: Fix error path in mlxsw_sp_module_init() · 3c46905f
      Jiri Pirko authored
      [ Upstream commit 28fe7900 ]
      
      In case of sp2 pci driver registration fail, fix the error path to
      start with sp1 pci driver unregister.
      
      Fixes: c3ab4354
      
       ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c46905f
    • Haishuang Yan's avatar
      ipip: validate header length in ipip_tunnel_xmit · f186fb5c
      Haishuang Yan authored
      [ Upstream commit 47d858d0 ]
      
      We need the same checks introduced by commit cb9f1b78
      ("ip: validate header length on virtual device xmit") for
      ipip tunnel.
      
      Fixes: cb9f1b78
      
       ("ip: validate header length on virtual device xmit")
      Signed-off-by: default avatarHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f186fb5c
    • Haishuang Yan's avatar
      ip6_tunnel: fix possible use-after-free on xmit · 1bb2dd37
      Haishuang Yan authored
      [ Upstream commit 01f5bffa ]
      
      ip4ip6/ip6ip6 tunnels run iptunnel_handle_offloads on xmit which
      can cause a possible use-after-free accessing iph/ipv6h pointer
      since the packet will be 'uncloned' running pskb_expand_head if
      it is a cloned gso skb.
      
      Fixes: 0e9a7095
      
       ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets")
      Signed-off-by: default avatarHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1bb2dd37
    • Haishuang Yan's avatar
      ip6_gre: reload ipv6h in prepare_ip6gre_xmit_ipv6 · fdcefa46
      Haishuang Yan authored
      [ Upstream commit 3bc817d6 ]
      
      Since ip6_tnl_parse_tlv_enc_lim() can call pskb_may_pull()
      which may change skb->data, so we need to re-load ipv6h at
      the right place.
      
      Fixes: 898b2979
      
       ("ip6_gre: Refactor ip6gre xmit codes")
      Cc: William Tu <u9012063@gmail.com>
      Signed-off-by: default avatarHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Acked-by: default avatarWilliam Tu <u9012063@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fdcefa46
    • Cong Wang's avatar
      ife: error out when nla attributes are empty · c4c88993
      Cong Wang authored
      [ Upstream commit c8ec4632
      
       ]
      
      act_ife at least requires TCA_IFE_PARMS, so we have to bail out
      when there is no attribute passed in.
      
      Reported-by: default avatar <syzbot+fbb5b288c9cb6a2eeac4@syzkaller.appspotmail.com>
      Fixes: ef6980b6
      
       ("introduce IFE action")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c4c88993
    • Sudarsana Reddy Kalluru's avatar
      bnx2x: Disable multi-cos feature. · 774358df
      Sudarsana Reddy Kalluru authored
      [ Upstream commit d1f0b5dc ]
      
      Commit 3968d389 ("bnx2x: Fix Multi-Cos.") which enabled multi-cos
      feature after prolonged time in driver added some regression causing
      numerous issues (sudden reboots, tx timeout etc.) reported by customers.
      We plan to backout this commit and submit proper fix once we have root
      cause of issues reported with this feature enabled.
      
      Fixes: 3968d389
      
       ("bnx2x: Fix Multi-Cos.")
      Signed-off-by: default avatarSudarsana Reddy Kalluru <skalluru@marvell.com>
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      774358df
    • Gustavo A. R. Silva's avatar
      atm: iphase: Fix Spectre v1 vulnerability · cb462678
      Gustavo A. R. Silva authored
      [ Upstream commit ea443e5e ]
      
      board is controlled by user-space, hence leading to a potential
      exploitation of the Spectre variant 1 vulnerability.
      
      This issue was detected with the help of Smatch:
      
      drivers/atm/iphase.c:2765 ia_ioctl() warn: potential spectre issue 'ia_dev' [r] (local cap)
      drivers/atm/iphase.c:2774 ia_ioctl() warn: possible spectre second half.  'iadev'
      drivers/atm/iphase.c:2782 ia_ioctl() warn: possible spectre second half.  'iadev'
      drivers/atm/iphase.c:2816 ia_ioctl() warn: possible spectre second half.  'iadev'
      drivers/atm/iphase.c:2823 ia_ioctl() warn: possible spectre second half.  'iadev'
      drivers/atm/iphase.c:2830 ia_ioctl() warn: potential spectre issue '_ia_dev' [r] (local cap)
      drivers/atm/iphase.c:2845 ia_ioctl() warn: possible spectre second half.  'iadev'
      drivers/atm/iphase.c:2856 ia_ioctl() warn: possible spectre second half.  'iadev'
      
      Fix this by sanitizing board before using it to index ia_dev and _ia_dev
      
      Notice that given that speculation windows are large, the policy is
      to kill the speculation on the first load and not worry if it can be
      completed with a dependent load/store [1].
      
      [1] https://lore.kernel.org/lkml/20180423164740.GY17484@dhcp22.suse.cz/
      
      
      
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb462678
    • Greg Kroah-Hartman's avatar
      IB: directly cast the sockaddr union to aockaddr · 8440cdc7
      Greg Kroah-Hartman authored
      Like commit 641114d2
      
       ("RDMA: Directly cast the sockaddr union to
      sockaddr") we need to quiet gcc 9 from warning about this crazy union.
      That commit did not fix all of the warnings in 4.19 and older kernels
      because the logic in roce_resolve_route_from_path() was rewritten
      between 4.19 and 5.2 when that change happened.
      
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8440cdc7
    • Sebastian Parschauer's avatar
      HID: Add quirk for HP X1200 PIXART OEM mouse · 608cfdfa
      Sebastian Parschauer authored
      commit 49869d2e upstream.
      
      The PixArt OEM mice are known for disconnecting every minute in
      runlevel 1 or 3 if they are not always polled. So add quirk
      ALWAYS_POLL for this one as well.
      
      Jonathan Teh (@jonathan-teh) reported and tested the quirk.
      Reference: https://github.com/sriemer/fix-linux-mouse/issues/15
      
      
      
      Signed-off-by: default avatarSebastian Parschauer <s.parschauer@gmx.de>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      608cfdfa
    • Aaron Armstrong Skomra's avatar
      HID: wacom: fix bit shift for Cintiq Companion 2 · e830c2c3
      Aaron Armstrong Skomra authored
      commit 693c3dab
      
       upstream.
      
      The bit indicating BTN_6 on this device is overshifted
      by 2 bits, resulting in the incorrect button being
      reported.
      
      Also fix copy-paste mistake in comments.
      
      Signed-off-by: default avatarAaron Armstrong Skomra <aaron.skomra@wacom.com>
      Reviewed-by: default avatarPing Cheng <ping.cheng@wacom.com>
      Link: https://github.com/linuxwacom/xf86-input-wacom/issues/71
      Fixes: c7f0522a
      
       ("HID: wacom: Slim down wacom_intuos_pad processing")
      Cc: <stable@vger.kernel.org> # v4.5+
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e830c2c3
    • Dan Williams's avatar
      libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock · 2364ed0d
      Dan Williams authored
      commit ca6bf264 upstream.
      
      A multithreaded namespace creation/destruction stress test currently
      deadlocks with the following lockup signature:
      
          INFO: task ndctl:2924 blocked for more than 122 seconds.
                Tainted: G           OE     5.2.0-rc4+ #3382
          "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
          ndctl           D    0  2924   1176 0x00000000
          Call Trace:
           ? __schedule+0x27e/0x780
           schedule+0x30/0xb0
           wait_nvdimm_bus_probe_idle+0x8a/0xd0 [libnvdimm]
           ? finish_wait+0x80/0x80
           uuid_store+0xe6/0x2e0 [libnvdimm]
           kernfs_fop_write+0xf0/0x1a0
           vfs_write+0xb7/0x1b0
           ksys_write+0x5c/0xd0
           do_syscall_64+0x60/0x240
      
           INFO: task ndctl:2923 blocked for more than 122 seconds.
                 Tainted: G           OE     5.2.0-rc4+ #3382
           "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
           ndctl           D    0  2923   1175 0x00000000
           Call Trace:
            ? __schedule+0x27e/0x780
            ? __mutex_lock+0x489/0x910
            schedule+0x30/0xb0
            schedule_preempt_disabled+0x11/0x20
            __mutex_lock+0x48e/0x910
            ? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
            ? __lock_acquire+0x23f/0x1710
            ? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
            nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
            __dax_pmem_probe+0x5e/0x210 [dax_pmem_core]
            ? nvdimm_bus_probe+0x1d0/0x2c0 [libnvdimm]
            dax_pmem_probe+0xc/0x20 [dax_pmem]
            nvdimm_bus_probe+0x90/0x2c0 [libnvdimm]
            really_probe+0xef/0x390
            driver_probe_device+0xb4/0x100
      
      In this sequence an 'nd_dax' device is being probed and trying to take
      the lock on its backing namespace to validate that the 'nd_dax' device
      indeed has exclusive access to the backing namespace. Meanwhile, another
      thread is trying to update the uuid property of that same backing
      namespace. So one thread is in the probe path trying to acquire the
      lock, and the other thread has acquired the lock and tries to flush the
      probe path.
      
      Fix this deadlock by not holding the namespace device_lock over the
      wait_nvdimm_bus_probe_idle() synchronization step. In turn this requires
      the device_lock to be held on entry to wait_nvdimm_bus_probe_idle() and
      subsequently dropped internally to wait_nvdimm_bus_probe_idle().
      
      Cc: <stable@vger.kernel.org>
      Fixes: bf9bccc1
      
       ("libnvdimm: pmem label sets and namespace instantiation")
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Tested-by: default avatarJane Chu <jane.chu@oracle.com>
      Link: https://lore.kernel.org/r/156341210094.292348.2384694131126767789.stgit@dwillia2-desk3.amr.corp.intel.com
      
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2364ed0d
    • Dan Williams's avatar
      libnvdimm/bus: Prepare the nd_ioctl() path to be re-entrant · 7f000e7b
      Dan Williams authored
      commit 6de5d06e
      
       upstream.
      
      In preparation for not holding a lock over the execution of nd_ioctl(),
      update the implementation to allow multiple threads to be attempting
      ioctls at the same time. The bus lock still prevents multiple in-flight
      ->ndctl() invocations from corrupting each other's state, but static
      global staging buffers are moved to the heap.
      
      Reported-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Reviewed-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Tested-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Link: https://lore.kernel.org/r/156341208947.292348.10560140326807607481.stgit@dwillia2-desk3.amr.corp.intel.com
      
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7f000e7b
    • Dan Williams's avatar
      libnvdimm/region: Register badblocks before namespaces · 32485369
      Dan Williams authored
      commit 700cd033 upstream.
      
      Namespace activation expects to be able to reference region badblocks.
      The following warning sometimes triggers when asynchronous namespace
      activation races in front of the completion of namespace probing. Move
      all possible namespace probing after region badblocks initialization.
      
      Otherwise, lockdep sometimes catches the uninitialized state of the
      badblocks seqlock with stack trace signatures like:
      
          INFO: trying to register non-static key.
          pmem2: detected capacity change from 0 to 136365211648
          the code is fine but needs lockdep annotation.
          turning off the locking correctness validator.
          CPU: 9 PID: 358 Comm: kworker/u80:5 Tainted: G           OE     5.2.0-rc4+ #3382
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
          Workqueue: events_unbound async_run_entry_fn
          Call Trace:
           dump_stack+0x85/0xc0
          pmem1.12: detected capacity change from 0 to 8589934592
           register_lock_class+0x56a/0x570
           ? check_object+0x140/0x270
           __lock_acquire+0x80/0x1710
           ? __mutex_lock+0x39d/0x910
           lock_acquire+0x9e/0x180
           ? nd_pfn_validate+0x28f/0x440 [libnvdimm]
           badblocks_check+0x93/0x1f0
           ? nd_pfn_validate+0x28f/0x440 [libnvdimm]
           nd_pfn_validate+0x28f/0x440 [libnvdimm]
           ? lockdep_hardirqs_on+0xf0/0x180
           nd_dax_probe+0x9a/0x120 [libnvdimm]
           nd_pmem_probe+0x6d/0x180 [nd_pmem]
           nvdimm_bus_probe+0x90/0x2c0 [libnvdimm]
      
      Fixes: 48af2f7e
      
       ("libnvdimm, pfn: during init, clear errors...")
      Cc: <stable@vger.kernel.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Reviewed-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Link: https://lore.kernel.org/r/156341208365.292348.1547528796026249120.stgit@dwillia2-desk3.amr.corp.intel.com
      
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      32485369
    • Dan Williams's avatar
      libnvdimm/bus: Prevent duplicate device_unregister() calls · d16bbdbb
      Dan Williams authored
      commit 8aac0e23
      
       upstream.
      
      A multithreaded namespace creation/destruction stress test currently
      fails with signatures like the following:
      
          sysfs group 'power' not found for kobject 'dax1.1'
          RIP: 0010:sysfs_remove_group+0x76/0x80
          Call Trace:
           device_del+0x73/0x370
           device_unregister+0x16/0x50
           nd_async_device_unregister+0x1e/0x30 [libnvdimm]
           async_run_entry_fn+0x39/0x160
           process_one_work+0x23c/0x5e0
           worker_thread+0x3c/0x390
      
          BUG: kernel NULL pointer dereference, address: 0000000000000020
          RIP: 0010:klist_put+0x1b/0x6c
          Call Trace:
           klist_del+0xe/0x10
           device_del+0x8a/0x2c9
           ? __switch_to_asm+0x34/0x70
           ? __switch_to_asm+0x40/0x70
           device_unregister+0x44/0x4f
           nd_async_device_unregister+0x22/0x2d [libnvdimm]
           async_run_entry_fn+0x47/0x15a
           process_one_work+0x1a2/0x2eb
           worker_thread+0x1b8/0x26e
      
      Use the kill_device() helper to atomically resolve the race of multiple
      threads issuing kill, device_unregister(), requests.
      
      Reported-by: default avatarJane Chu <jane.chu@oracle.com>
      Reported-by: default avatarErwin Tsaur <erwin.tsaur@oracle.com>
      Fixes: 4d88a97a ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
      Cc: <stable@vger.kernel.org>
      Link: https://github.com/pmem/ndctl/issues/96
      
      
      Tested-by: default avatarTested-by: Jane Chu <jane.chu@oracle.com>
      Link: https://lore.kernel.org/r/156341207846.292348.10435719262819764054.stgit@dwillia2-desk3.amr.corp.intel.com
      
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d16bbdbb
    • Dan Williams's avatar
      drivers/base: Introduce kill_device() · c23106d4
      Dan Williams authored
      commit 00289cd8 upstream.
      
      The libnvdimm subsystem arranges for devices to be destroyed as a result
      of a sysfs operation. Since device_unregister() cannot be called from
      an actively running sysfs attribute of the same device libnvdimm
      arranges for device_unregister() to be performed in an out-of-line async
      context.
      
      The driver core maintains a 'dead' state for coordinating its own racing
      async registration / de-registration requests. Rather than add local
      'dead' state tracking infrastructure to libnvdimm device objects, export
      the existing state tracking via a new kill_device() helper.
      
      The kill_device() helper simply marks the device as dead, i.e. that it
      is on its way to device_del(), or returns that the device was already
      dead. This can be used in advance of calling device_unregister() for
      subsystems like libnvdimm that might need to handle multiple user
      threads racing to delete a device.
      
      This refactoring does not change any behavior, but it is a pre-requisite
      for follow-on fixes and therefore marked for -stable.
      
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Fixes: 4d88a97a
      
       ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
      Cc: <stable@vger.kernel.org>
      Tested-by: default avatarJane Chu <jane.chu@oracle.com>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Link: https://lore.kernel.org/r/156341207332.292348.14959761496009347574.stgit@dwillia2-desk3.amr.corp.intel.com
      
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c23106d4
    • Alexander Duyck's avatar
      driver core: Establish order of operations for device_add and device_del via bitflag · 7c43f84e
      Alexander Duyck authored
      commit 3451a495
      
       upstream.
      
      Add an additional bit flag to the device_private struct named "dead".
      
      This additional flag provides a guarantee that when a device_del is
      executed on a given interface an async worker will not attempt to attach
      the driver following the earlier device_del call. Previously this
      guarantee was not present and could result in the device_del call
      attempting to remove a driver from an interface only to have the async
      worker attempt to probe the driver later when it finally completes the
      asynchronous probe call.
      
      One additional change added was that I pulled the check for dev->driver
      out of the __device_attach_driver call and instead placed it in the
      __device_attach_async_helper call. This was motivated by the fact that the
      only other caller of this, __device_attach, had already taken the
      device_lock() and checked for dev->driver. Instead of testing for this
      twice in this path it makes more sense to just consolidate the dev->dead
      and dev->driver checks together into one set of checks.
      
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7c43f84e
    • Linus Torvalds's avatar
      gcc-9: don't warn about uninitialized variable · a152a7b4
      Linus Torvalds authored
      commit cf676908
      
       upstream.
      
      I'm not sure what made gcc warn about this code now.  The 'ret' variable
      does end up initialized in all cases, but it's definitely not obvious,
      so the compiler is quite reasonable to warn about this.
      
      So just add initialization to make it all much more obvious both to
      compilers and to humans.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a152a7b4
    • Hannes Reinecke's avatar
      scsi: fcoe: Embed fc_rport_priv in fcoe_rport structure · 93d6f084
      Hannes Reinecke authored
      commit 023358b1
      
       upstream.
      
      Gcc-9 complains for a memset across pointer boundaries, which happens as
      the code tries to allocate a flexible array on the stack.  Turns out we
      cannot do this without relying on gcc-isms, so with this patch we'll embed
      the fc_rport_priv structure into fcoe_rport, can use the normal
      'container_of' outcast, and will only have to do a memset over one
      structure.
      
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      93d6f084
  2. Aug 07, 2019
    • Greg Kroah-Hartman's avatar
      Linux 4.19.65 · cc4c818b
      Greg Kroah-Hartman authored
      v4.19.65
      cc4c818b
    • Josh Poimboeuf's avatar
      Documentation: Add swapgs description to the Spectre v1 documentation · 7634b9cd
      Josh Poimboeuf authored
      commit 4c920576
      
       upstream
      
      Add documentation to the Spectre document about the new swapgs variant of
      Spectre v1.
      
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7634b9cd
    • Thomas Gleixner's avatar
      x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS · b88241ae
      Thomas Gleixner authored
      commit f36cf386
      
       upstream
      
      Intel provided the following information:
      
       On all current Atom processors, instructions that use a segment register
       value (e.g. a load or store) will not speculatively execute before the
       last writer of that segment retires. Thus they will not use a
       speculatively written segment value.
      
      That means on ATOMs there is no speculation through SWAPGS, so the SWAPGS
      entry paths can be excluded from the extra LFENCE if PTI is disabled.
      
      Create a separate bug flag for the through SWAPGS speculation and mark all
      out-of-order ATOMs and AMD/HYGON CPUs as not affected. The in-order ATOMs
      are excluded from the whole mitigation mess anyway.
      
      Reported-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Reviewed-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b88241ae
    • Josh Poimboeuf's avatar
      x86/entry/64: Use JMP instead of JMPQ · 931b6bfe
      Josh Poimboeuf authored
      commit 64dbc122 upstream
      
      Somehow the swapgs mitigation entry code patch ended up with a JMPQ
      instruction instead of JMP, where only the short jump is needed.  Some
      assembler versions apparently fail to optimize JMPQ into a two-byte JMP
      when possible, instead always using a 7-byte JMP with relocation.  For
      some reason that makes the entry code explode with a #GP during boot.
      
      Change it back to "JMP" as originally intended.
      
      Fixes: 18ec54fd
      
       ("x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations")
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      931b6bfe
    • Josh Poimboeuf's avatar
      x86/speculation: Enable Spectre v1 swapgs mitigations · 23e7a7b3
      Josh Poimboeuf authored
      commit a2059825
      
       upstream
      
      The previous commit added macro calls in the entry code which mitigate the
      Spectre v1 swapgs issue if the X86_FEATURE_FENCE_SWAPGS_* features are
      enabled.  Enable those features where applicable.
      
      The mitigations may be disabled with "nospectre_v1" or "mitigations=off".
      
      There are different features which can affect the risk of attack:
      
      - When FSGSBASE is enabled, unprivileged users are able to place any
        value in GS, using the wrgsbase instruction.  This means they can
        write a GS value which points to any value in kernel space, which can
        be useful with the following gadget in an interrupt/exception/NMI
        handler:
      
      	if (coming from user space)
      		swapgs
      	mov %gs:<percpu_offset>, %reg1
      	// dependent load or store based on the value of %reg
      	// for example: mov %(reg1), %reg2
      
        If an interrupt is coming from user space, and the entry code
        speculatively skips the swapgs (due to user branch mistraining), it
        may speculatively execute the GS-based load and a subsequent dependent
        load or store, exposing the kernel data to an L1 side channel leak.
      
        Note that, on Intel, a similar attack exists in the above gadget when
        coming from kernel space, if the swapgs gets speculatively executed to
        switch back to the user GS.  On AMD, this variant isn't possible
        because swapgs is serializing with respect to future GS-based
        accesses.
      
        NOTE: The FSGSBASE patch set hasn't been merged yet, so the above case
      	doesn't exist quite yet.
      
      - When FSGSBASE is disabled, the issue is mitigated somewhat because
        unprivileged users must use prctl(ARCH_SET_GS) to set GS, which
        restricts GS values to user space addresses only.  That means the
        gadget would need an additional step, since the target kernel address
        needs to be read from user space first.  Something like:
      
      	if (coming from user space)
      		swapgs
      	mov %gs:<percpu_offset>, %reg1
      	mov (%reg1), %reg2
      	// dependent load or store based on the value of %reg2
      	// for example: mov %(reg2), %reg3
      
        It's difficult to audit for this gadget in all the handlers, so while
        there are no known instances of it, it's entirely possible that it
        exists somewhere (or could be introduced in the future).  Without
        tooling to analyze all such code paths, consider it vulnerable.
      
        Effects of SMAP on the !FSGSBASE case:
      
        - If SMAP is enabled, and the CPU reports RDCL_NO (i.e., not
          susceptible to Meltdown), the kernel is prevented from speculatively
          reading user space memory, even L1 cached values.  This effectively
          disables the !FSGSBASE attack vector.
      
        - If SMAP is enabled, but the CPU *is* susceptible to Meltdown, SMAP
          still prevents the kernel from speculatively reading user space
          memory.  But it does *not* prevent the kernel from reading the
          user value from L1, if it has already been cached.  This is probably
          only a small hurdle for an attacker to overcome.
      
      Thanks to Dave Hansen for contributing the speculative_smap() function.
      
      Thanks to Andrew Cooper for providing the inside scoop on whether swapgs
      is serializing on AMD.
      
      [ tglx: Fixed the USER fence decision and polished the comment as suggested
        	by Dave Hansen ]
      
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarDave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23e7a7b3
    • Josh Poimboeuf's avatar
      x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations · befb822c
      Josh Poimboeuf authored
      commit 18ec54fd
      
       upstream
      
      Spectre v1 isn't only about array bounds checks.  It can affect any
      conditional checks.  The kernel entry code interrupt, exception, and NMI
      handlers all have conditional swapgs checks.  Those may be problematic in
      the context of Spectre v1, as kernel code can speculatively run with a user
      GS.
      
      For example:
      
      	if (coming from user space)
      		swapgs
      	mov %gs:<percpu_offset>, %reg
      	mov (%reg), %reg1
      
      When coming from user space, the CPU can speculatively skip the swapgs, and
      then do a speculative percpu load using the user GS value.  So the user can
      speculatively force a read of any kernel value.  If a gadget exists which
      uses the percpu value as an address in another load/store, then the
      contents of the kernel value may become visible via an L1 side channel
      attack.
      
      A similar attack exists when coming from kernel space.  The CPU can
      speculatively do the swapgs, causing the user GS to get used for the rest
      of the speculative window.
      
      The mitigation is similar to a traditional Spectre v1 mitigation, except:
      
        a) index masking isn't possible; because the index (percpu offset)
           isn't user-controlled; and
      
        b) an lfence is needed in both the "from user" swapgs path and the
           "from kernel" non-swapgs path (because of the two attacks described
           above).
      
      The user entry swapgs paths already have SWITCH_TO_KERNEL_CR3, which has a
      CR3 write when PTI is enabled.  Since CR3 writes are serializing, the
      lfences can be skipped in those cases.
      
      On the other hand, the kernel entry swapgs paths don't depend on PTI.
      
      To avoid unnecessary lfences for the user entry case, create two separate
      features for alternative patching:
      
        X86_FEATURE_FENCE_SWAPGS_USER
        X86_FEATURE_FENCE_SWAPGS_KERNEL
      
      Use these features in entry code to patch in lfences where needed.
      
      The features aren't enabled yet, so there's no functional change.
      
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarDave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      befb822c
    • Fenghua Yu's avatar
      x86/cpufeatures: Combine word 11 and 12 into a new scattered features word · b5dd7f61
      Fenghua Yu authored
      commit acec0ce0
      
       upstream
      
      It's a waste for the four X86_FEATURE_CQM_* feature bits to occupy two
      whole feature bits words. To better utilize feature words, re-define
      word 11 to host scattered features and move the four X86_FEATURE_CQM_*
      features into Linux defined word 11. More scattered features can be
      added in word 11 in the future.
      
      Rename leaf 11 in cpuid_leafs to CPUID_LNX_4 to reflect it's a
      Linux-defined leaf.
      
      Rename leaf 12 as CPUID_DUMMY which will be replaced by a meaningful
      name in the next patch when CPUID.7.1:EAX occupies world 12.
      
      Maximum number of RMID and cache occupancy scale are retrieved from
      CPUID.0xf.1 after scattered CQM features are enumerated. Carve out the
      code into a separate function.
      
      KVM doesn't support resctrl now. So it's safe to move the
      X86_FEATURE_CQM_* features to scattered features word 11 for KVM.
      
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Aaron Lewis <aaronlewis@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Babu Moger <babu.moger@amd.com>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: "Sean J Christopherson" <sean.j.christopherson@intel.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
      Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
      Cc: x86 <x86@kernel.org>
      Link: https://lkml.kernel.org/r/1560794416-217638-2-git-send-email-fenghua.yu@intel.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b5dd7f61
    • Borislav Petkov's avatar
      x86/cpufeatures: Carve out CQM features retrieval · 16ad0b63
      Borislav Petkov authored
      commit 45fc56e6
      
       upstream
      
      ... into a separate function for better readability. Split out from a
      patch from Fenghua Yu <fenghua.yu@intel.com> to keep the mechanical,
      sole code movement separate for easy review.
      
      No functional changes.
      
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: x86@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      16ad0b63
    • Suganath Prabu's avatar
      scsi: mpt3sas: Use 63-bit DMA addressing on SAS35 HBA · 9e034c61
      Suganath Prabu authored
      commit df9a6061
      
       upstream.
      
      Although SAS3 & SAS3.5 IT HBA controllers support 64-bit DMA addressing, as
      per hardware design, if DMA-able range contains all 64-bits
      set (0xFFFFFFFF-FFFFFFFF) then it results in a firmware fault.
      
      E.g. SGE's start address is 0xFFFFFFFF-FFFF000 and data length is 0x1000
      bytes. when HBA tries to DMA the data at 0xFFFFFFFF-FFFFFFFF location then
      HBA will fault the firmware.
      
      Driver will set 63-bit DMA mask to ensure the above address will not be
      used.
      
      Cc: <stable@vger.kernel.org> # 4.19.63
      Signed-off-by: default avatarSuganath Prabu <suganath-prabu.subramani@broadcom.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e034c61
    • Andy Lutomirski's avatar
      x86/vdso: Prevent segfaults due to hoisted vclock reads · 3732a473
      Andy Lutomirski authored
      commit ff17bbe0 upstream.
      
      GCC 5.5.0 sometimes cleverly hoists reads of the pvclock and/or hvclock
      pages before the vclock mode checks.  This creates a path through
      vclock_gettime() in which no vclock is enabled at all (due to disabled
      TSC on old CPUs, for example) but the pvclock or hvclock page
      nevertheless read.  This will segfault on bare metal.
      
      This fixes commit 459e3a21
      
       ("gcc-9: properly declare the
      {pv,hv}clock_page storage") in the sense that, before that commit, GCC
      didn't seem to generate the offending code.  There was nothing wrong
      with that commit per se, and -stable maintainers should backport this to
      all supported kernels regardless of whether the offending commit was
      present, since the same crash could just as easily be triggered by the
      phase of the moon.
      
      On GCC 9.1.1, this doesn't seem to affect the generated code at all, so
      I'm not too concerned about performance regressions from this fix.
      
      Cc: stable@vger.kernel.org
      Cc: x86@kernel.org
      Cc: Borislav Petkov <bp@alien8.de>
      Reported-by: default avatarDuncan Roe <duncan_roe@optusnet.com.au>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3732a473
    • Linus Torvalds's avatar
      gcc-9: properly declare the {pv,hv}clock_page storage · 8320768d
      Linus Torvalds authored
      commit 459e3a21
      
       upstream.
      
      The pvlock_page and hvclock_page variables are (as the name implies)
      addresses to pages, created by the linker script.
      
      But we declared them as just "extern u8" variables, which _works_, but
      now that gcc does some more bounds checking, it causes warnings like
      
          warning: array subscript 1 is outside array bounds of ‘u8[1]’
      
      when we then access more than one byte from those variables.
      
      Fix this by simply making the declaration of the variables match
      reality, which makes the compiler happy too.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8320768d
    • Josh Poimboeuf's avatar
      objtool: Support GCC 9 cold subfunction naming scheme · 354887ae
      Josh Poimboeuf authored
      commit bcb6fb5d
      
       upstream.
      
      Starting with GCC 8, a lot of unlikely code was moved out of line to
      "cold" subfunctions in .text.unlikely.
      
      For example, the unlikely bits of:
      
        irq_do_set_affinity()
      
      are moved out to the following subfunction:
      
        irq_do_set_affinity.cold.49()
      
      Starting with GCC 9, the numbered suffix has been removed.  So in the
      above example, the cold subfunction is instead:
      
        irq_do_set_affinity.cold()
      
      Tweak the objtool subfunction detection logic so that it detects both
      GCC 8 and GCC 9 naming schemes.
      
      Reported-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/015e9544b1f188d36a7f02fa31e9e95629aa5f50.1541040800.git.jpoimboe@redhat.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      354887ae