Skip to content
  1. Dec 01, 2021
    • Danielle Ratson's avatar
      mlxsw: Verify the accessed index doesn't exceed the array length · 33d89128
      Danielle Ratson authored
      
      
      [ Upstream commit 837ec05c ]
      
      There are few cases in which an array index queried from a fw register,
      is accessed without any validation that it doesn't exceed the array
      length.
      
      Add a proper length validation, so accessing memory past the end of an
      array will be forbidden.
      
      Signed-off-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      33d89128
    • Tony Lu's avatar
      net/smc: Ensure the active closing peer first closes clcsock · 29e1b573
      Tony Lu authored
      [ Upstream commit 606a63c9 ]
      
      The side that actively closed socket, it's clcsock doesn't enter
      TIME_WAIT state, but the passive side does it. It should show the same
      behavior as TCP sockets.
      
      Consider this, when client actively closes the socket, the clcsock in
      server enters TIME_WAIT state, which means the address is occupied and
      won't be reused before TIME_WAIT dismissing. If we restarted server, the
      service would be unavailable for a long time.
      
      To solve this issue, shutdown the clcsock in [A], perform the TCP active
      close progress first, before the passive closed side closing it. So that
      the actively closed side enters TIME_WAIT, not the passive one.
      
      Client                                            |  Server
      close() // client actively close                  |
        smc_release()                                   |
            smc_close_active() // PEERCLOSEWAIT1        |
                smc_close_final() // abort or closed = 1|
                    smc_cdc_get_slot_and_msg_send()     |
                [A]                                     |
                                                        |smc_cdc_msg_recv_action() // ACTIVE
                                                        |  queue_work(smc_close_wq, &conn->close_work)
                                                        |    smc_close_passive_work() // PROCESSABORT or APPCLOSEWAIT1
                                                        |      smc_close_passive_abort_received() // only in abort
                                                        |
                                                        |close() // server recv zero, close
                                                        |  smc_release() // PROCESSABORT or APPCLOSEWAIT1
                                                        |    smc_close_active()
                                                        |      smc_close_abort() or smc_close_final() // CLOSED
                                                        |        smc_cdc_get_slot_and_msg_send() // abort or closed = 1
      smc_cdc_msg_recv_action()                         |    smc_clcsock_release()
        queue_work(smc_close_wq, &conn->close_work)     |      sock_release(tcp) // actively close clc, enter TIME_WAIT
          smc_close_passive_work() // PEERCLOSEWAIT1    |    smc_conn_free()
            smc_close_passive_abort_received() // CLOSED|
            smc_conn_free()                             |
            smc_clcsock_release()                       |
              sock_release(tcp) // passive close clc    |
      
      Link: https://www.spinics.net/lists/netdev/msg780407.html
      
      
      Fixes: b38d7324 ("smc: socket closing and linkgroup cleanup")
      Signed-off-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Reviewed-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      29e1b573
    • Huang Jianan's avatar
      erofs: fix deadlock when shrink erofs slab · 77d9c2ef
      Huang Jianan authored
      [ Upstream commit 57bbeacd ]
      
      We observed the following deadlock in the stress test under low
      memory scenario:
      
      Thread A                               Thread B
      - erofs_shrink_scan
       - erofs_try_to_release_workgroup
        - erofs_workgroup_try_to_freeze -- A
                                             - z_erofs_do_read_page
                                              - z_erofs_collection_begin
                                               - z_erofs_register_collection
                                                - erofs_insert_workgroup
                                                 - xa_lock(&sbi->managed_pslots) -- B
                                                 - erofs_workgroup_get
                                                  - erofs_wait_on_workgroup_freezed -- A
        - xa_erase
         - xa_lock(&sbi->managed_pslots) -- B
      
      To fix this, it needs to hold xa_lock before freezing the workgroup
      since xarray will be touched then. So let's hold the lock before
      accessing each workgroup, just like what we did with the radix tree
      before.
      
      [ Gao Xiang: Jianhua Hao also reports this issue at
        https://lore.kernel.org/r/b10b85df30694bac8aadfe43537c897a@xiaomi.com ]
      
      Link: https://lore.kernel.org/r/20211118135844.3559-1-huangjianan@oppo.com
      
      
      Fixes: 64094a04 ("erofs: convert workstn to XArray")
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Reviewed-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      Signed-off-by: default avatarHuang Jianan <huangjianan@oppo.com>
      Reported-by: default avatarJianhua Hao <haojianhua1@xiaomi.com>
      Signed-off-by: default avatarGao Xiang <xiang@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      77d9c2ef
    • Shin'ichiro Kawasaki's avatar
      scsi: scsi_debug: Zero clear zones at reset write pointer · 9f540c7f
      Shin'ichiro Kawasaki authored
      [ Upstream commit 2d62253e ]
      
      When a reset is requested the position of the write pointer is updated but
      the data in the corresponding zone is not cleared. Instead scsi_debug
      returns any data written before the write pointer was reset. This is an
      error and prevents using scsi_debug for stale page cache testing of the
      BLKRESETZONE ioctl.
      
      Zero written data in the zone when resetting the write pointer.
      
      Link: https://lore.kernel.org/r/20211122061223.298890-1-shinichiro.kawasaki@wdc.com
      
      
      Fixes: f0d1cf93 ("scsi: scsi_debug: Add ZBC zone commands")
      Reviewed-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Acked-by: default avatarDouglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: default avatarShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9f540c7f
    • Mike Christie's avatar
      scsi: core: sysfs: Fix setting device state to SDEV_RUNNING · 725ba128
      Mike Christie authored
      [ Upstream commit eb97545d ]
      
      This fixes an issue added in commit 4edd8cd4 ("scsi: core: sysfs: Fix
      hang when device state is set via sysfs") where if userspace is requesting
      to set the device state to SDEV_RUNNING when the state is already
      SDEV_RUNNING, we return -EINVAL instead of count. The commmit above set ret
      to count for this case, when it should have set it to 0.
      
      Link: https://lore.kernel.org/r/20211120164917.4924-1-michael.christie@oracle.com
      
      
      Fixes: 4edd8cd4 ("scsi: core: sysfs: Fix hang when device state is set via sysfs")
      Reviewed-by: default avatarLee Duncan <lduncan@suse.com>
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      725ba128
    • Marta Plantykow's avatar
      ice: avoid bpf_prog refcount underflow · e65a8707
      Marta Plantykow authored
      
      
      [ Upstream commit f65ee535 ]
      
      Ice driver has the routines for managing XDP resources that are shared
      between ndo_bpf op and VSI rebuild flow. The latter takes place for
      example when user changes queue count on an interface via ethtool's
      set_channels().
      
      There is an issue around the bpf_prog refcounting when VSI is being
      rebuilt - since ice_prepare_xdp_rings() is called with vsi->xdp_prog as
      an argument that is used later on by ice_vsi_assign_bpf_prog(), same
      bpf_prog pointers are swapped with each other. Then it is also
      interpreted as an 'old_prog' which in turn causes us to call
      bpf_prog_put on it that will decrement its refcount.
      
      Below splat can be interpreted in a way that due to zero refcount of a
      bpf_prog it is wiped out from the system while kernel still tries to
      refer to it:
      
      [  481.069429] BUG: unable to handle page fault for address: ffffc9000640f038
      [  481.077390] #PF: supervisor read access in kernel mode
      [  481.083335] #PF: error_code(0x0000) - not-present page
      [  481.089276] PGD 100000067 P4D 100000067 PUD 1001cb067 PMD 106d2b067 PTE 0
      [  481.097141] Oops: 0000 [#1] PREEMPT SMP PTI
      [  481.101980] CPU: 12 PID: 3339 Comm: sudo Tainted: G           OE     5.15.0-rc5+ #1
      [  481.110840] Hardware name: Intel Corp. GRANTLEY/GRANTLEY, BIOS GRRFCRB1.86B.0276.D07.1605190235 05/19/2016
      [  481.122021] RIP: 0010:dev_xdp_prog_id+0x25/0x40
      [  481.127265] Code: 80 00 00 00 00 0f 1f 44 00 00 89 f6 48 c1 e6 04 48 01 fe 48 8b 86 98 08 00 00 48 85 c0 74 13 48 8b 50 18 31 c0 48 85 d2 74 07 <48> 8b 42 38 8b 40 20 c3 48 8b 96 90 08 00 00 eb e8 66 2e 0f 1f 84
      [  481.148991] RSP: 0018:ffffc90007b63868 EFLAGS: 00010286
      [  481.155034] RAX: 0000000000000000 RBX: ffff889080824000 RCX: 0000000000000000
      [  481.163278] RDX: ffffc9000640f000 RSI: ffff889080824010 RDI: ffff889080824000
      [  481.171527] RBP: ffff888107af7d00 R08: 0000000000000000 R09: ffff88810db5f6e0
      [  481.179776] R10: 0000000000000000 R11: ffff8890885b9988 R12: ffff88810db5f4bc
      [  481.188026] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      [  481.196276] FS:  00007f5466d5bec0(0000) GS:ffff88903fb00000(0000) knlGS:0000000000000000
      [  481.205633] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  481.212279] CR2: ffffc9000640f038 CR3: 000000014429c006 CR4: 00000000003706e0
      [  481.220530] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  481.228771] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  481.237029] Call Trace:
      [  481.239856]  rtnl_fill_ifinfo+0x768/0x12e0
      [  481.244602]  rtnl_dump_ifinfo+0x525/0x650
      [  481.249246]  ? __alloc_skb+0xa5/0x280
      [  481.253484]  netlink_dump+0x168/0x3c0
      [  481.257725]  netlink_recvmsg+0x21e/0x3e0
      [  481.262263]  ____sys_recvmsg+0x87/0x170
      [  481.266707]  ? __might_fault+0x20/0x30
      [  481.271046]  ? _copy_from_user+0x66/0xa0
      [  481.275591]  ? iovec_from_user+0xf6/0x1c0
      [  481.280226]  ___sys_recvmsg+0x82/0x100
      [  481.284566]  ? sock_sendmsg+0x5e/0x60
      [  481.288791]  ? __sys_sendto+0xee/0x150
      [  481.293129]  __sys_recvmsg+0x56/0xa0
      [  481.297267]  do_syscall_64+0x3b/0xc0
      [  481.301395]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  481.307238] RIP: 0033:0x7f5466f39617
      [  481.311373] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bd 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2f 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
      [  481.342944] RSP: 002b:00007ffedc7f4308 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
      [  481.361783] RAX: ffffffffffffffda RBX: 00007ffedc7f5460 RCX: 00007f5466f39617
      [  481.380278] RDX: 0000000000000000 RSI: 00007ffedc7f5360 RDI: 0000000000000003
      [  481.398500] RBP: 00007ffedc7f53f0 R08: 0000000000000000 R09: 000055d556f04d50
      [  481.416463] R10: 0000000000000077 R11: 0000000000000246 R12: 00007ffedc7f5360
      [  481.434131] R13: 00007ffedc7f5350 R14: 00007ffedc7f5344 R15: 0000000000000e98
      [  481.451520] Modules linked in: ice(OE) af_packet binfmt_misc nls_iso8859_1 ipmi_ssif intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp mxm_wmi mei_me coretemp mei ipmi_si ipmi_msghandler wmi acpi_pad acpi_power_meter ip_tables x_tables autofs4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel ahci crypto_simd cryptd libahci lpc_ich [last unloaded: ice]
      [  481.528558] CR2: ffffc9000640f038
      [  481.542041] ---[ end trace d1f24c9ecf5b61c1 ]---
      
      Fix this by only calling ice_vsi_assign_bpf_prog() inside
      ice_prepare_xdp_rings() when current vsi->xdp_prog pointer is NULL.
      This way set_channels() flow will not attempt to swap the vsi->xdp_prog
      pointers with itself.
      
      Also, sprinkle around some comments that provide a reasoning about
      correlation between driver and kernel in terms of bpf_prog refcount.
      
      Fixes: efc2214b ("ice: Add support for XDP")
      Reviewed-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Signed-off-by: default avatarMarta Plantykow <marta.a.plantykow@intel.com>
      Co-developed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarKiran Bhandare <kiranx.bhandare@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e65a8707
    • Maciej Fijalkowski's avatar
      ice: fix vsi->txq_map sizing · 1eb5395a
      Maciej Fijalkowski authored
      
      
      [ Upstream commit 792b2086 ]
      
      The approach of having XDP queue per CPU regardless of user's setting
      exposed a hidden bug that could occur in case when Rx queue count differ
      from Tx queue count. Currently vsi->txq_map's size is equal to the
      doubled vsi->alloc_txq, which is not correct due to the fact that XDP
      rings were previously based on the Rx queue count. Below splat can be
      seen when ethtool -L is used and XDP rings are configured:
      
      [  682.875339] BUG: kernel NULL pointer dereference, address: 000000000000000f
      [  682.883403] #PF: supervisor read access in kernel mode
      [  682.889345] #PF: error_code(0x0000) - not-present page
      [  682.895289] PGD 0 P4D 0
      [  682.898218] Oops: 0000 [#1] PREEMPT SMP PTI
      [  682.903055] CPU: 42 PID: 2878 Comm: ethtool Tainted: G           OE     5.15.0-rc5+ #1
      [  682.912214] Hardware name: Intel Corp. GRANTLEY/GRANTLEY, BIOS GRRFCRB1.86B.0276.D07.1605190235 05/19/2016
      [  682.923380] RIP: 0010:devres_remove+0x44/0x130
      [  682.928527] Code: 49 89 f4 55 48 89 fd 4c 89 ff 53 48 83 ec 10 e8 92 b9 49 00 48 8b 9d a8 02 00 00 48 8d 8d a0 02 00 00 49 89 c2 48 39 cb 74 0f <4c> 3b 63 10 74 25 48 8b 5b 08 48 39 cb 75 f1 4c 89 ff 4c 89 d6 e8
      [  682.950237] RSP: 0018:ffffc90006a679f0 EFLAGS: 00010002
      [  682.956285] RAX: 0000000000000286 RBX: ffffffffffffffff RCX: ffff88908343a370
      [  682.964538] RDX: 0000000000000001 RSI: ffffffff81690d60 RDI: 0000000000000000
      [  682.972789] RBP: ffff88908343a0d0 R08: 0000000000000000 R09: 0000000000000000
      [  682.981040] R10: 0000000000000286 R11: 3fffffffffffffff R12: ffffffff81690d60
      [  682.989282] R13: ffffffff81690a00 R14: ffff8890819807a8 R15: ffff88908343a36c
      [  682.997535] FS:  00007f08c7bfa740(0000) GS:ffff88a03fd00000(0000) knlGS:0000000000000000
      [  683.006910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  683.013557] CR2: 000000000000000f CR3: 0000001080a66003 CR4: 00000000003706e0
      [  683.021819] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  683.030075] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  683.038336] Call Trace:
      [  683.041167]  devm_kfree+0x33/0x50
      [  683.045004]  ice_vsi_free_arrays+0x5e/0xc0 [ice]
      [  683.050380]  ice_vsi_rebuild+0x4c8/0x750 [ice]
      [  683.055543]  ice_vsi_recfg_qs+0x9a/0x110 [ice]
      [  683.060697]  ice_set_channels+0x14f/0x290 [ice]
      [  683.065962]  ethnl_set_channels+0x333/0x3f0
      [  683.070807]  genl_family_rcv_msg_doit+0xea/0x150
      [  683.076152]  genl_rcv_msg+0xde/0x1d0
      [  683.080289]  ? channels_prepare_data+0x60/0x60
      [  683.085432]  ? genl_get_cmd+0xd0/0xd0
      [  683.089667]  netlink_rcv_skb+0x50/0xf0
      [  683.094006]  genl_rcv+0x24/0x40
      [  683.097638]  netlink_unicast+0x239/0x340
      [  683.102177]  netlink_sendmsg+0x22e/0x470
      [  683.106717]  sock_sendmsg+0x5e/0x60
      [  683.110756]  __sys_sendto+0xee/0x150
      [  683.114894]  ? handle_mm_fault+0xd0/0x2a0
      [  683.119535]  ? do_user_addr_fault+0x1f3/0x690
      [  683.134173]  __x64_sys_sendto+0x25/0x30
      [  683.148231]  do_syscall_64+0x3b/0xc0
      [  683.161992]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fix this by taking into account the value that num_possible_cpus()
      yields in addition to vsi->alloc_txq instead of doubling the latter.
      
      Fixes: efc2214b ("ice: Add support for XDP")
      Fixes: 22bf877e ("ice: introduce XDP_TX fallback path")
      Reviewed-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarKiran Bhandare <kiranx.bhandare@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1eb5395a
    • Nikolay Aleksandrov's avatar
      net: nexthop: release IPv6 per-cpu dsts when replacing a nexthop group · 26ed13d0
      Nikolay Aleksandrov authored
      
      
      [ Upstream commit 1005f19b ]
      
      When replacing a nexthop group, we must release the IPv6 per-cpu dsts of
      the removed nexthop entries after an RCU grace period because they
      contain references to the nexthop's net device and to the fib6 info.
      With specific series of events[1] we can reach net device refcount
      imbalance which is unrecoverable. IPv4 is not affected because dsts
      don't take a refcount on the route.
      
      [1]
       $ ip nexthop list
        id 200 via 2002:db8::2 dev bridge.10 scope link onlink
        id 201 via 2002:db8::3 dev bridge scope link onlink
        id 203 group 201/200
       $ ip -6 route
        2001:db8::10 nhid 203 metric 1024 pref medium
           nexthop via 2002:db8::3 dev bridge weight 1 onlink
           nexthop via 2002:db8::2 dev bridge.10 weight 1 onlink
      
      Create rt6_info through one of the multipath legs, e.g.:
       $ taskset -a -c 1  ./pkt_inj 24 bridge.10 2001:db8::10
       (pkt_inj is just a custom packet generator, nothing special)
      
      Then remove that leg from the group by replace (let's assume it is id
      200 in this case):
       $ ip nexthop replace id 203 group 201
      
      Now remove the IPv6 route:
       $ ip -6 route del 2001:db8::10/128
      
      The route won't be really deleted due to the stale rt6_info holding 1
      refcnt in nexthop id 200.
      At this point we have the following reference count dependency:
       (deleted) IPv6 route holds 1 reference over nhid 203
       nh 203 holds 1 ref over id 201
       nh 200 holds 1 ref over the net device and the route due to the stale
       rt6_info
      
      Now to create circular dependency between nh 200 and the IPv6 route, and
      also to get a reference over nh 200, restore nhid 200 in the group:
       $ ip nexthop replace id 203 group 201/200
      
      And now we have a permanent circular dependncy because nhid 203 holds a
      reference over nh 200 and 201, but the route holds a ref over nh 203 and
      is deleted.
      
      To trigger the bug just delete the group (nhid 203):
       $ ip nexthop del id 203
      
      It won't really be deleted due to the IPv6 route dependency, and now we
      have 2 unlinked and deleted objects that reference each other: the group
      and the IPv6 route. Since the group drops the reference it holds over its
      entries at free time (i.e. its own refcount needs to drop to 0) that will
      never happen and we get a permanent ref on them, since one of the entries
      holds a reference over the IPv6 route it will also never be released.
      
      At this point the dependencies are:
       (deleted, only unlinked) IPv6 route holds reference over group nh 203
       (deleted, only unlinked) group nh 203 holds reference over nh 201 and 200
       nh 200 holds 1 ref over the net device and the route due to the stale
       rt6_info
      
      This is the last point where it can be fixed by running traffic through
      nh 200, and specifically through the same CPU so the rt6_info (dst) will
      get released due to the IPv6 genid, that in turn will free the IPv6
      route, which in turn will free the ref count over the group nh 203.
      
      If nh 200 is deleted at this point, it will never be released due to the
      ref from the unlinked group 203, it will only be unlinked:
       $ ip nexthop del id 200
       $ ip nexthop
       $
      
      Now we can never release that stale rt6_info, we have IPv6 route with ref
      over group nh 203, group nh 203 with ref over nh 200 and 201, nh 200 with
      rt6_info (dst) with ref over the net device and the IPv6 route. All of
      these objects are only unlinked, and cannot be released, thus they can't
      release their ref counts.
      
       Message from syslogd@dev at Nov 19 14:04:10 ...
        kernel:[73501.828730] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3
       Message from syslogd@dev at Nov 19 14:04:20 ...
        kernel:[73512.068811] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3
      
      Fixes: 7bf4796d ("nexthops: add support for replace")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      26ed13d0
    • Nikolay Aleksandrov's avatar
      net: ipv6: add fib6_nh_release_dsts stub · 3c405845
      Nikolay Aleksandrov authored
      
      
      [ Upstream commit 8837cbbf ]
      
      We need a way to release a fib6_nh's per-cpu dsts when replacing
      nexthops otherwise we can end up with stale per-cpu dsts which hold net
      device references, so add a new IPv6 stub called fib6_nh_release_dsts.
      It must be used after an RCU grace period, so no new dsts can be created
      through a group's nexthop entry.
      Similar to fib6_nh_release it shouldn't be used if fib6_nh_init has failed
      so it doesn't need a dummy stub when IPv6 is not enabled.
      
      Fixes: 7bf4796d ("nexthops: add support for replace")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3c405845
    • Holger Assmann's avatar
      net: stmmac: retain PTP clock time during SIOCSHWTSTAMP ioctls · dc2f7e9d
      Holger Assmann authored
      
      
      [ Upstream commit a6da2bbb ]
      
      Currently, when user space emits SIOCSHWTSTAMP ioctl calls such as
      enabling/disabling timestamping or changing filter settings, the driver
      reads the current CLOCK_REALTIME value and programming this into the
      NIC's hardware clock. This might be necessary during system
      initialization, but at runtime, when the PTP clock has already been
      synchronized to a grandmaster, a reset of the timestamp settings might
      result in a clock jump. Furthermore, if the clock is also controlled by
      phc2sys in automatic mode (where the UTC offset is queried from ptp4l),
      that UTC-to-TAI offset (currently 37 seconds in 2021) would be
      temporarily reset to 0, and it would take a long time for phc2sys to
      readjust so that CLOCK_REALTIME and the PHC are apart by 37 seconds
      again.
      
      To address the issue, we introduce a new function called
      stmmac_init_tstamp_counter(), which gets called during ndo_open().
      It contains the code snippet moved from stmmac_hwtstamp_set() that
      manages the time synchronization. Besides, the sub second increment
      configuration is also moved here since the related values are hardware
      dependent and runtime invariant.
      
      Furthermore, the hardware clock must be kept running even when no time
      stamping mode is selected in order to retain the synchronized time base.
      That way, timestamping can be enabled again at any time only with the
      need to compensate the clock's natural drifting.
      
      As a side effect, this patch fixes the issue that ptp_clock_info::enable
      can be called before SIOCSHWTSTAMP and the driver (which looks at
      priv->systime_flags) was not prepared to handle that ordering.
      
      Fixes: 92ba6888 ("stmmac: add the support for PTP hw clock driver")
      Reported-by: default avatarMichael Olbrich <m.olbrich@pengutronix.de>
      Signed-off-by: default avatarAhmad Fatoum <a.fatoum@pengutronix.de>
      Signed-off-by: default avatarHolger Assmann <h.assmann@pengutronix.de>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dc2f7e9d
    • Joakim Zhang's avatar
      net: stmmac: fix system hang caused by eee_ctrl_timer during suspend/resume · 79068e6b
      Joakim Zhang authored
      
      
      [ Upstream commit 276aae37 ]
      
      commit 5f585913 ("net: stmmac: delete the eee_ctrl_timer after
      napi disabled"), this patch tries to fix system hang caused by eee_ctrl_timer,
      unfortunately, it only can resolve it for system reboot stress test. System
      hang also can be reproduced easily during system suspend/resume stess test
      when mount NFS on i.MX8MP EVK board.
      
      In stmmac driver, eee feature is combined to phylink framework. When do
      system suspend, phylink_stop() would queue delayed work, it invokes
      stmmac_mac_link_down(), where to deactivate eee_ctrl_timer synchronizly.
      In above commit, try to fix issue by deactivating eee_ctrl_timer obviously,
      but it is not enough. Looking into eee_ctrl_timer expire callback
      stmmac_eee_ctrl_timer(), it could enable hareware eee mode again. What is
      unexpected is that LPI interrupt (MAC_Interrupt_Enable.LPIEN bit) is always
      asserted. This interrupt has chance to be issued when LPI state entry/exit
      from the MAC, and at that time, clock could have been already disabled.
      The result is that system hang when driver try to touch register from
      interrupt handler.
      
      The reason why above commit can fix system hang issue in stmmac_release()
      is that, deactivate eee_ctrl_timer not just after napi disabled, further
      after irq freed.
      
      In conclusion, hardware would generate LPI interrupt when clock has been
      disabled during suspend or resume, since hardware is in eee mode and LPI
      interrupt enabled.
      
      Interrupts from MAC, MTL and DMA level are enabled and never been disabled
      when system suspend, so postpone clocks management from suspend stage to
      noirq suspend stage should be more safe.
      
      Fixes: 5f585913 ("net: stmmac: delete the eee_ctrl_timer after napi disabled")
      Signed-off-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      79068e6b
    • Diana Wang's avatar
      nfp: checking parameter process for rx-usecs/tx-usecs is invalid · cc301ad3
      Diana Wang authored
      
      
      [ Upstream commit 3bd6b2a8 ]
      
      Use nn->tlv_caps.me_freq_mhz instead of nn->me_freq_mhz to check whether
      rx-usecs/tx-usecs is valid.
      
      This is because nn->tlv_caps.me_freq_mhz represents the clock_freq (MHz) of
      the flow processing cores (FPC) on the NIC. While nn->me_freq_mhz is not
      be set.
      
      Fixes: ce991ab6 ("nfp: read ME frequency from vNIC ctrl memory")
      Signed-off-by: default avatarDiana Wang <na.wang@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cc301ad3
    • Eric Dumazet's avatar
      ipv6: fix typos in __ip6_finish_output() · 9b44cb67
      Eric Dumazet authored
      
      
      [ Upstream commit 19d36c5f ]
      
      We deal with IPv6 packets, so we need to use IP6CB(skb)->flags and
      IP6SKB_REROUTED, instead of IPCB(skb)->flags and IPSKB_REROUTED
      
      Found by code inspection, please double check that fixing this bug
      does not surface other bugs.
      
      Fixes: 09ee9dba ("ipv6: Reinject IPv6 packets if IPsec policy matches after SNAT")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tobias Brunner <tobias@strongswan.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: David Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Tested-by: default avatarTobias Brunner <tobias@strongswan.org>
      Acked-by: default avatarTobias Brunner <tobias@strongswan.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9b44cb67
    • Michael Kelley's avatar
      firmware: smccc: Fix check for ARCH_SOC_ID not implemented · 6d9e8dab
      Michael Kelley authored
      
      
      [ Upstream commit e95d8eae ]
      
      The ARCH_FEATURES function ID is a 32-bit SMC call, which returns
      a 32-bit result per the SMCCC spec.  Current code is doing a 64-bit
      comparison against -1 (SMCCC_RET_NOT_SUPPORTED) to detect that the
      feature is unimplemented.  That check doesn't work in a Hyper-V VM,
      where the upper 32-bits are zero as allowed by the spec.
      
      Cast the result as an 'int' so the comparison works. The change also
      makes the code consistent with other similar checks in this file.
      
      Fixes: 821b67fa ("firmware: smccc: Add ARCH_SOC_ID support")
      Signed-off-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Reviewed-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6d9e8dab
    • Eric Dumazet's avatar
      mptcp: fix delack timer · bbd1683e
      Eric Dumazet authored
      
      
      [ Upstream commit ee50e67b ]
      
      To compute the rtx timeout schedule_3rdack_retransmission() does multiple
      things in the wrong way: srtt_us is measured in usec/8 and the timeout
      itself is an absolute value.
      
      Fixes: ec3edaa7 ("mptcp: Add handling of outgoing MP_JOIN requests")
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau&gt;@linux.intel.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bbd1683e
    • Pierre-Louis Bossart's avatar
      ALSA: intel-dsp-config: add quirk for JSL devices based on ES8336 codec · 06154281
      Pierre-Louis Bossart authored
      [ Upstream commit fa9730b4 ]
      
      These devices are based on an I2C/I2S device, we need to force the use
      of the SOF driver otherwise the legacy HDaudio driver will be loaded -
      only HDMI will be supported.
      
      We previously added support for other Intel platforms but missed
      JasperLake.
      
      BugLink: https://github.com/thesofproject/linux/issues/3210
      
      
      Fixes: 9d36ceab ('ALSA: intel-dsp-config: add quirk for APL/GLK/TGL devices based on ES8336 codec')
      Signed-off-by: default avatarPierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
      Reviewed-by: default avatarKai Vehmanen <kai.vehmanen@intel.com>
      Signed-off-by: default avatarBard Liao <yung-chuan.liao@linux.intel.com>
      Link: https://lore.kernel.org/r/20211027023254.24955-1-yung-chuan.liao@linux.intel.com
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      06154281
    • Nitesh B Venkatesh's avatar
      iavf: Prevent changing static ITR values if adaptive moderation is on · f5af2def
      Nitesh B Venkatesh authored
      
      
      [ Upstream commit e792779e ]
      
      Resolve being able to change static values on VF when adaptive interrupt
      moderation is enabled.
      
      This problem is fixed by checking the interrupt settings is not
      a combination of change of static value while adaptive interrupt
      moderation is turned on.
      
      Without this fix, the user would be able to change static values
      on VF with adaptive moderation enabled.
      
      Fixes: 65e87c03 ("i40evf: support queue-specific settings for interrupt moderation")
      Signed-off-by: default avatarNitesh B Venkatesh <nitesh.b.venkatesh@intel.com>
      Tested-by: default avatarGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f5af2def
    • Volodymyr Mytnyk's avatar
      net: marvell: prestera: fix double free issue on err path · 5dca8eff
      Volodymyr Mytnyk authored
      
      
      [ Upstream commit e8d03250 ]
      
      fix error path handling in prestera_bridge_port_join() that
      cases prestera driver to crash (see below).
      
       Trace:
         Internal error: Oops: 96000044 [#1] SMP
         Modules linked in: prestera_pci prestera uio_pdrv_genirq
         CPU: 1 PID: 881 Comm: ip Not tainted 5.15.0 #1
         pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
         pc : prestera_bridge_destroy+0x2c/0xb0 [prestera]
         lr : prestera_bridge_port_join+0x2cc/0x350 [prestera]
         sp : ffff800011a1b0f0
         ...
         x2 : ffff000109ca6c80 x1 : dead000000000100 x0 : dead000000000122
          Call trace:
         prestera_bridge_destroy+0x2c/0xb0 [prestera]
         prestera_bridge_port_join+0x2cc/0x350 [prestera]
         prestera_netdev_port_event.constprop.0+0x3c4/0x450 [prestera]
         prestera_netdev_event_handler+0xf4/0x110 [prestera]
         raw_notifier_call_chain+0x54/0x80
         call_netdevice_notifiers_info+0x54/0xa0
         __netdev_upper_dev_link+0x19c/0x380
      
      Fixes: e1189d9a ("net: marvell: prestera: Add Switchdev driver implementation")
      Signed-off-by: default avatarVolodymyr Mytnyk <vmytnyk@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5dca8eff
    • Dan Carpenter's avatar
      drm/vc4: fix error code in vc4_create_object() · b33c5c82
      Dan Carpenter authored
      
      
      [ Upstream commit 96c5f82e ]
      
      The ->gem_create_object() functions are supposed to return NULL if there
      is an error.  None of the callers expect error pointers so returing one
      will lead to an Oops.  See drm_gem_vram_create(), for example.
      
      Fixes: c826a6e1 ("drm/vc4: Add a BO cache.")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarMaxime Ripard <maxime@cerno.tech>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211118111416.GC1147@kili
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b33c5c82
    • Sreekanth Reddy's avatar
      scsi: mpt3sas: Fix kernel panic during drive powercycle test · 2bf9c5a5
      Sreekanth Reddy authored
      [ Upstream commit 0ee4ba13 ]
      
      While looping over shost's sdev list it is possible that one
      of the drives is getting removed and its sas_target object is
      freed but its sdev object remains intact.
      
      Consequently, a kernel panic can occur while the driver is trying to access
      the sas_address field of sas_target object without also checking the
      sas_target object for NULL.
      
      Link: https://lore.kernel.org/r/20211117104909.2069-1-sreekanth.reddy@broadcom.com
      
      
      Fixes: f92363d1 ("[SCSI] mpt3sas: add new driver supporting 12GB SAS")
      Signed-off-by: default avatarSreekanth Reddy <sreekanth.reddy@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2bf9c5a5
    • Dan Carpenter's avatar
      drm/nouveau/acr: fix a couple NULL vs IS_ERR() checks · 29ecb4c0
      Dan Carpenter authored
      
      
      [ Upstream commit b371fd13 ]
      
      The nvkm_acr_lsfw_add() function never returns NULL.  It returns error
      pointers on error.
      
      Fixes: 22dcda45 ("drm/nouveau/acr: implement new subdev to replace "secure boot"")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarKarol Herbst <kherbst@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211118111314.GB1147@kili
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      29ecb4c0
    • Takashi Iwai's avatar
      ARM: socfpga: Fix crash with CONFIG_FORTIRY_SOURCE · 0effb7f5
      Takashi Iwai authored
      
      
      [ Upstream commit 187bea47 ]
      
      When CONFIG_FORTIFY_SOURCE is set, memcpy() checks the potential
      buffer overflow and panics.  The code in sofcpga bootstrapping
      contains the memcpy() calls are mistakenly translated as the shorter
      size, hence it triggers a panic as if it were overflowing.
      
      This patch changes the secondary_trampoline and *_end definitions
      to arrays for avoiding the false-positive crash above.
      
      Fixes: 9c4566a1 ("ARM: socfpga: Enable SMP for socfpga")
      Suggested-by: default avatarKees Cook <keescook@chromium.org>
      Buglink: https://bugzilla.suse.com/show_bug.cgi?id=1192473
      Link: https://lore.kernel.org/r/20211117193244.31162-1-tiwai@suse.de
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarDinh Nguyen <dinguyen@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0effb7f5
    • Trond Myklebust's avatar
      NFSv42: Don't fail clone() unless the OP_CLONE operation failed · 86c5adc7
      Trond Myklebust authored
      
      
      [ Upstream commit d3c45824 ]
      
      The failure to retrieve post-op attributes has no bearing on whether or
      not the clone operation itself was successful. We must therefore ignore
      the return value of decode_getfattr() when looking at the success or
      failure of nfs4_xdr_dec_clone().
      
      Fixes: 36022770 ("nfs42: add CLONE xdr functions")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      86c5adc7
    • Peng Fan's avatar
      firmware: arm_scmi: pm: Propagate return value to caller · c9ba7864
      Peng Fan authored
      [ Upstream commit 1446fc6c ]
      
      of_genpd_add_provider_onecell may return error, so let's propagate
      its return value to caller
      
      Link: https://lore.kernel.org/r/20211116064227.20571-1-peng.fan@oss.nxp.com
      
      
      Fixes: 898216c9 ("firmware: arm_scmi: add device power domain support using genpd")
      Signed-off-by: default avatarPeng Fan <peng.fan@nxp.com>
      Signed-off-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c9ba7864
    • Alexander Aring's avatar
      net: ieee802154: handle iftypes as u32 · 8730a679
      Alexander Aring authored
      
      
      [ Upstream commit 451dc48c ]
      
      This patch fixes an issue that an u32 netlink value is handled as a
      signed enum value which doesn't fit into the range of u32 netlink type.
      If it's handled as -1 value some BIT() evaluation ends in a
      shift-out-of-bounds issue. To solve the issue we set the to u32 max which
      is s32 "-1" value to keep backwards compatibility and let the followed enum
      values start counting at 0. This brings the compiler to never handle the
      enum as signed and a check if the value is above NL802154_IFTYPE_MAX should
      filter -1 out.
      
      Fixes: f3ea5e44 ("ieee802154: add new interface command")
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Link: https://lore.kernel.org/r/20211112030916.685793-1-aahringo@redhat.com
      
      
      Signed-off-by: default avatarStefan Schmidt <stefan@datenfreihafen.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8730a679
    • Srinivas Kandagatla's avatar
      ASoC: codecs: wcd934x: return error code correctly from hw_params · 2925aadd
      Srinivas Kandagatla authored
      
      
      [ Upstream commit 006ea27c ]
      
      Error returned from wcd934x_slim_set_hw_params() are not passed to upper layer,
      this could be misleading to the user which can start sending stream leading
      to unnecessary errors.
      
      Fix this by properly returning the errors.
      
      Fixes: a61f3b4f ("ASoC: wcd934x: add support to wcd9340/wcd9341 codec")
      Signed-off-by: default avatarSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Link: https://lore.kernel.org/r/20211116114623.11891-3-srinivas.kandagatla@linaro.org
      
      
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2925aadd
    • Takashi Iwai's avatar
      ASoC: topology: Add missing rwsem around snd_ctl_remove() calls · 3a25def0
      Takashi Iwai authored
      
      
      [ Upstream commit 7e567b5a ]
      
      snd_ctl_remove() has to be called with card->controls_rwsem held (when
      called after the card instantiation).  This patch add the missing
      rwsem calls around it.
      
      Fixes: 8a978234 ("ASoC: topology: Add topology core")
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Link: https://lore.kernel.org/r/20211116071812.18109-1-tiwai@suse.de
      
      
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3a25def0
    • Srinivas Kandagatla's avatar
      ASoC: qdsp6: q6asm: fix q6asm_dai_prepare error handling · 4a4f900e
      Srinivas Kandagatla authored
      
      
      [ Upstream commit 721a94b4 ]
      
      Error handling in q6asm_dai_prepare() seems to be completely broken,
      Fix this by handling it properly.
      
      Fixes: 2a9e92d3 ("ASoC: qdsp6: q6asm: Add q6asm dai driver")
      Signed-off-by: default avatarSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Link: https://lore.kernel.org/r/20211116114721.12517-4-srinivas.kandagatla@linaro.org
      
      
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4a4f900e
    • Srinivas Kandagatla's avatar
      ASoC: qdsp6: q6routing: Conditionally reset FrontEnd Mixer · 9196a685
      Srinivas Kandagatla authored
      
      
      [ Upstream commit 861afeac ]
      
      Stream IDs are reused across multiple BackEnd mixers, do not reset the
      stream mixers if they are not already set for that particular FrontEnd.
      
      Ex:
      amixer cset iface=MIXER,name='SLIMBUS_0_RX Audio Mixer MultiMedia1' 1
      
      would set the MultiMedia1 steam for SLIMBUS_0_RX, however doing below
      command will reset previously setup MultiMedia1 stream, because both of them
      are using MultiMedia1 PCM stream.
      
      amixer cset iface=MIXER,name='SLIMBUS_2_RX Audio Mixer MultiMedia1' 0
      
      reset the FrontEnd Mixers conditionally to fix this issue.
      
      This is more noticeable in desktop setup, where in alsactl tries to restore
      the alsa state and overwriting the previous mixer settings.
      
      Fixes: e3a33673 ("ASoC: qdsp6: q6routing: Add q6routing driver")
      Signed-off-by: default avatarSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Link: https://lore.kernel.org/r/20211116114721.12517-3-srinivas.kandagatla@linaro.org
      
      
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9196a685
    • Florian Fainelli's avatar
      ARM: dts: bcm2711: Fix PCIe interrupts · 2be17eca
      Florian Fainelli authored
      
      
      [ Upstream commit 98481f3d ]
      
      The PCIe host bridge has two interrupt lines, one that goes towards it
      PCIE_INTR2 second level interrupt controller and one for its MSI second
      level interrupt controller. The first interrupt line is not currently
      managed by the driver, which is why it was not a functional problem.
      
      The interrupt-map property was also only listing the PCI_INTA interrupts
      when there are also the INTB, C and D.
      
      Reported-by: default avatarJim Quinlan <jim2101024@gmail.com>
      Fixes: d5c8dc0d ("ARM: dts: bcm2711: Enable PCIe controller")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2be17eca
    • Florian Fainelli's avatar
      ARM: dts: BCM5301X: Add interrupt properties to GPIO node · 9db1d4a3
      Florian Fainelli authored
      
      
      [ Upstream commit 40f7342f ]
      
      The GPIO controller is also an interrupt controller provider and is
      currently missing the appropriate 'interrupt-controller' and
      '#interrupt-cells' properties to denote that.
      
      Fixes: fb026d3d ("ARM: BCM5301X: Add Broadcom's bus-axi to the DTS file")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9db1d4a3
    • Florian Fainelli's avatar
      ARM: dts: BCM5301X: Fix I2C controller interrupt · b2cd6fdc
      Florian Fainelli authored
      
      
      [ Upstream commit 754c4050 ]
      
      The I2C interrupt controller line is off by 32 because the datasheet
      describes interrupt inputs into the GIC which are for Shared Peripheral
      Interrupts and are starting at offset 32. The ARM GIC binding expects
      the SPI interrupts to be numbered from 0 relative to the SPI base.
      
      Fixes: bb097e3e ("ARM: dts: BCM5301X: Add I2C support to the DT")
      Tested-by: default avatarChristian Lamparter <chunkeey@gmail.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b2cd6fdc
    • Will Mortensen's avatar
      netfilter: flowtable: fix IPv6 tunnel addr match · b7ef25e8
      Will Mortensen authored
      
      
      [ Upstream commit 39f6eed4 ]
      
      Previously the IPv6 addresses in the key were clobbered and the mask was
      left unset.
      
      I haven't tested this; I noticed it while skimming the code to
      understand an unrelated issue.
      
      Fixes: cfab6dbd ("netfilter: flowtable: add tunnel match offload support")
      Cc: wenxu <wenxu@ucloud.cn>
      Signed-off-by: default avatarWill Mortensen <willmo@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b7ef25e8
    • yangxingwu's avatar
      netfilter: ipvs: Fix reuse connection if RS weight is 0 · d689176e
      yangxingwu authored
      
      
      [ Upstream commit c95c0783 ]
      
      We are changing expire_nodest_conn to work even for reused connections when
      conn_reuse_mode=0, just as what was done with commit dc7b3eb9 ("ipvs:
      Fix reuse connection if real server is dead").
      
      For controlled and persistent connections, the new connection will get the
      needed real server depending on the rules in ip_vs_check_template().
      
      Fixes: d752c364 ("ipvs: allow rescheduling of new connections when port reuse is detected")
      Co-developed-by: default avatarChuanqi Liu <legend050709@qq.com>
      Signed-off-by: default avatarChuanqi Liu <legend050709@qq.com>
      Signed-off-by: default avataryangxingwu <xingwu.yang@gmail.com>
      Acked-by: default avatarSimon Horman <horms@verge.net.au>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d689176e
    • Florent Fourcot's avatar
      netfilter: ctnetlink: do not erase error code with EINVAL · 994065f6
      Florent Fourcot authored
      
      
      [ Upstream commit 77522ff0 ]
      
      And be consistent in error management for both orig/reply filtering
      
      Fixes: cb8aa9a3 ("netfilter: ctnetlink: add kernel side filtering for dump")
      Signed-off-by: default avatarFlorent Fourcot <florent.fourcot@wifirst.fr>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      994065f6
    • Florent Fourcot's avatar
      netfilter: ctnetlink: fix filtering with CTA_TUPLE_REPLY · a3d829e5
      Florent Fourcot authored
      
      
      [ Upstream commit ad81d4da ]
      
      filter->orig_flags was used for a reply context.
      
      Fixes: cb8aa9a3 ("netfilter: ctnetlink: add kernel side filtering for dump")
      Signed-off-by: default avatarFlorent Fourcot <florent.fourcot@wifirst.fr>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a3d829e5
    • David Hildenbrand's avatar
      proc/vmcore: fix clearing user buffer by properly using clear_user() · a8a91705
      David Hildenbrand authored
      commit c1e63117 upstream.
      
      To clear a user buffer we cannot simply use memset, we have to use
      clear_user().  With a virtio-mem device that registers a vmcore_cb and
      has some logically unplugged memory inside an added Linux memory block,
      I can easily trigger a BUG by copying the vmcore via "cp":
      
        systemd[1]: Starting Kdump Vmcore Save Service...
        kdump[420]: Kdump is using the default log level(3).
        kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
        kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
        kdump[465]: saving vmcore-dmesg.txt complete
        kdump[467]: saving vmcore
        BUG: unable to handle page fault for address: 00007f2374e01000
        #PF: supervisor write access in kernel mode
        #PF: error_code(0x0003) - permissions violation
        PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867
        Oops: 0003 [#1] PREEMPT SMP NOPTI
        CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
        RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86
        Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81
        RSP: 0018:ffffc9000073be08 EFLAGS: 00010212
        RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000
        RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008
        RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50
        R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000
        R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8
        FS:  00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0
        Call Trace:
         read_vmcore+0x236/0x2c0
         proc_reg_read+0x55/0xa0
         vfs_read+0x95/0x190
         ksys_read+0x4f/0xc0
         do_syscall_64+0x3b/0x90
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access
      Prevention (SMAP)", which is used to detect wrong access from the kernel
      to user buffers like this: SMAP triggers a permissions violation on
      wrong access.  In the x86-64 variant of clear_user(), SMAP is properly
      handled via clac()+stac().
      
      To fix, properly use clear_user() when we're dealing with a user buffer.
      
      Link: https://lkml.kernel.org/r/20211112092750.6921-1-david@redhat.com
      
      
      Fixes: 997c136f ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Philipp Rudo <prudo@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8a91705
    • Pali Rohár's avatar
      PCI: aardvark: Fix link training · 1f520a0d
      Pali Rohár authored
      commit f76b36d4 upstream.
      
      Fix multiple link training issues in aardvark driver. The main reason of
      these issues was misunderstanding of what certain registers do, since their
      names and comments were misleading: before commit 96be36db ("PCI:
      aardvark: Replace custom macros by standard linux/pci_regs.h macros"), the
      pci-aardvark.c driver used custom macros for accessing standard PCIe Root
      Bridge registers, and misleading comments did not help to understand what
      the code was really doing.
      
      After doing more tests and experiments I've come to the conclusion that the
      SPEED_GEN register in aardvark sets the PCIe revision / generation
      compliance and forces maximal link speed. Both GEN3 and GEN2 values set the
      read-only PCI_EXP_FLAGS_VERS bits (PCIe capabilities version of Root
      Bridge) to value 2, while GEN1 value sets PCI_EXP_FLAGS_VERS to 1, which
      matches with PCI Express specifications revisions 3, 2 and 1 respectively.
      Changing SPEED_GEN also sets the read-only bits PCI_EXP_LNKCAP_SLS and
      PCI_EXP_LNKCAP2_SLS to corresponding speed.
      
      (Note that PCI Express rev 1 specification does not define PCI_EXP_LNKCAP2
       and PCI_EXP_LNKCTL2 registers and when SPEED_GEN is set to GEN1 (which
       also sets PCI_EXP_FLAGS_VERS set to 1), lspci cannot access
       PCI_EXP_LNKCAP2 and PCI_EXP_LNKCTL2 registers.)
      
      Changing PCIe link speed can be done via PCI_EXP_LNKCTL2_TLS bits of
      PCI_EXP_LNKCTL2 register. Armada 3700 Functional Specifications says that
      the default value of PCI_EXP_LNKCTL2_TLS is based on SPEED_GEN value, but
      tests showed that the default value is always 8.0 GT/s, independently of
      speed set by SPEED_GEN. So after setting SPEED_GEN, we must also set value
      in PCI_EXP_LNKCTL2 register via PCI_EXP_LNKCTL2_TLS bits.
      
      Triggering PCI_EXP_LNKCTL_RL bit immediately after setting LINK_TRAINING_EN
      bit actually doesn't do anything. Tests have shown that a delay is needed
      after enabling LINK_TRAINING_EN bit. As triggering PCI_EXP_LNKCTL_RL
      currently does nothing, remove it.
      
      Commit 43fc679c ("PCI: aardvark: Improve link training") introduced
      code which sets SPEED_GEN register based on negotiated link speed from
      PCI_EXP_LNKSTA_CLS bits of PCI_EXP_LNKSTA register. This code was added to
      fix detection of Compex WLE900VX (Atheros QCA9880) WiFi GEN1 PCIe cards, as
      otherwise these cards were "invisible" on PCIe bus (probably because they
      crashed). But apparently more people reported the same issues with these
      cards also with other PCIe controllers [1] and I was able to reproduce this
      issue also with other "noname" WiFi cards based on Atheros QCA9890 chip
      (with the same PCI vendor/device ids as Atheros QCA9880). So this is not an
      issue in aardvark but rather an issue in Atheros QCA98xx chips. Also, this
      issue only exists if the kernel is compiled with PCIe ASPM support, and a
      generic workaround for this is to change PCIe Bridge to 2.5 GT/s link speed
      via PCI_EXP_LNKCTL2_TLS_2_5GT bits in PCI_EXP_LNKCTL2 register [2], before
      triggering PCI_EXP_LNKCTL_RL bit. This workaround also works when SPEED_GEN
      is set to value GEN2 (5 GT/s). So remove this hack completely in the
      aardvark driver and always set SPEED_GEN to value from 'max-link-speed' DT
      property. Fix for Atheros QCA98xx chips is handled separately by patch [2].
      
      These two things (code for triggering PCI_EXP_LNKCTL_RL bit and changing
      SPEED_GEN value) also explain why commit 69644945 ("PCI: aardvark:
      Train link immediately after enabling training") somehow fixed detection of
      those problematic Compex cards with Atheros chips: if triggering link
      retraining (via PCI_EXP_LNKCTL_RL bit) was done immediately after enabling
      link training (via LINK_TRAINING_EN), it did nothing. If there was a
      specific delay, aardvark HW already initialized PCIe link and therefore
      triggering link retraining caused the above issue. Compex cards triggered
      link down event and disappeared from the PCIe bus.
      
      Commit f4c7d053 ("PCI: aardvark: Wait for endpoint to be ready before
      training link") added 100ms sleep before calling 'Start link training'
      command and explained that it is a requirement of PCI Express
      specification. But the code after this 100ms sleep was not doing 'Start
      link training', rather it triggered PCI_EXP_LNKCTL_RL bit via PCIe Root
      Bridge to put link into Recovery state.
      
      The required delay after fundamental reset is already done in function
      advk_pcie_wait_for_link() which also checks whether PCIe link is up.
      So after removing the code which triggers PCI_EXP_LNKCTL_RL bit on PCIe
      Root Bridge, there is no need to wait 100ms again. Remove the extra
      msleep() call and update comment about the delay required by the PCI
      Express specification.
      
      According to Marvell Armada 3700 Functional Specifications, Link training
      should be enabled via aardvark register LINK_TRAINING_EN after selecting
      PCIe generation and x1 lane. There is no need to disable it prior resetting
      card via PERST# signal. This disabling code was introduced in commit
      5169a985 ("PCI: aardvark: Issue PERST via GPIO") as a workaround for
      some Atheros cards. It turns out that this also is Atheros specific issue
      and affects any PCIe controller, not only aardvark. Moreover this Atheros
      issue was triggered by juggling with PCI_EXP_LNKCTL_RL, LINK_TRAINING_EN
      and SPEED_GEN bits interleaved with sleeps. Now, after removing triggering
      PCI_EXP_LNKCTL_RL, there is no need to explicitly disable LINK_TRAINING_EN
      bit. So remove this code too. The problematic Compex cards described in
      previous git commits are correctly detected in advk_pcie_train_link()
      function even after applying all these changes.
      
      Note that with this patch, and also prior this patch, some NVMe disks which
      support PCIe GEN3 with 8 GT/s speed are negotiated only at the lowest link
      speed 2.5 GT/s, independently of SPEED_GEN value. After manually triggering
      PCI_EXP_LNKCTL_RL bit (e.g. from userspace via setpci), these NVMe disks
      change link speed to 5 GT/s when SPEED_GEN was configured to GEN2. This
      issue first needs to be properly investigated. I will send a fix in the
      future.
      
      On the other hand, some other GEN2 PCIe cards with 5 GT/s speed are
      autonomously by HW autonegotiated at full 5 GT/s speed without need of any
      software interaction.
      
      Armada 3700 Functional Specifications describes the following steps for
      link training: set SPEED_GEN to GEN2, enable LINK_TRAINING_EN, poll until
      link training is complete, trigger PCI_EXP_LNKCTL_RL, poll until signal
      rate is 5 GT/s, poll until link training is complete, enable ASPM L0s.
      
      The requirement for triggering PCI_EXP_LNKCTL_RL can be explained by the
      need to achieve 5 GT/s speed (as changing link speed is done by throw to
      recovery state entered by PCI_EXP_LNKCTL_RL) or maybe as a part of enabling
      ASPM L0s (but in this case ASPM L0s should have been enabled prior
      PCI_EXP_LNKCTL_RL).
      
      It is unknown why the original pci-aardvark.c driver was triggering
      PCI_EXP_LNKCTL_RL bit before waiting for the link to be up. This does not
      align with neither PCIe base specifications nor with Armada 3700 Functional
      Specification. (Note that in older versions of aardvark, this bit was
      called incorrectly PCIE_CORE_LINK_TRAINING, so this may be the reason.)
      
      It is also unknown why Armada 3700 Functional Specification says that it is
      needed to trigger PCI_EXP_LNKCTL_RL for GEN2 mode, as according to PCIe
      base specification 5 GT/s speed negotiation is supposed to be entirely
      autonomous, even if initial speed is 2.5 GT/s.
      
      [1] - https://lore.kernel.org/linux-pci/87h7l8axqp.fsf@toke.dk/
      [2] - https://lore.kernel.org/linux-pci/20210326124326.21163-1-pali@kernel.org/
      
      Link: https://lore.kernel.org/r/20211005180952.6812-12-kabel@kernel.org
      
      
      Signed-off-by: default avatarPali Rohár <pali@kernel.org>
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f520a0d
    • Pali Rohár's avatar
      PCI: aardvark: Simplify initialization of rootcap on virtual bridge · aec0751f
      Pali Rohár authored
      commit 454c5327 upstream.
      
      PCIe config space can be initialized also before pci_bridge_emul_init()
      call, so move rootcap initialization after PCI config space initialization.
      
      This simplifies the function a little since it removes one if (ret < 0)
      check.
      
      Link: https://lore.kernel.org/r/20211005180952.6812-11-kabel@kernel.org
      
      
      Signed-off-by: default avatarPali Rohár <pali@kernel.org>
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aec0751f
    • Pali Rohár's avatar
      PCI: aardvark: Implement re-issuing config requests on CRS response · df574809
      Pali Rohár authored
      commit 223dec14 upstream.
      
      Commit 43f5c77b ("PCI: aardvark: Fix reporting CRS value") fixed
      handling of CRS response and when CRSSVE flag was not enabled it marked CRS
      response as failed transaction (due to simplicity).
      
      But pci-aardvark.c driver is already waiting up to the PIO_RETRY_CNT count
      for PIO config response and so we can with a small change implement
      re-issuing of config requests as described in PCIe base specification.
      
      This change implements re-issuing of config requests when response is CRS.
      Set upper bound of wait cycles to around PIO_RETRY_CNT, afterwards the
      transaction is marked as failed and an all-ones value is returned as
      before.
      
      We do this by returning appropriate error codes from function
      advk_pcie_check_pio_status(). On CRS we return -EAGAIN and caller then
      reissues transaction.
      
      Link: https://lore.kernel.org/r/20211005180952.6812-10-kabel@kernel.org
      
      
      Signed-off-by: default avatarPali Rohár <pali@kernel.org>
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df574809