Skip to content
  1. Aug 26, 2021
    • Jakub Kicinski's avatar
      bnxt: don't lock the tx queue from napi poll · bffcedb5
      Jakub Kicinski authored
      [ Upstream commit 3c603136 ]
      
      We can't take the tx lock from the napi poll routine, because
      netpoll can poll napi at any moment, including with the tx lock
      already held.
      
      The tx lock is protecting against two paths - the disable
      path, and (as Michael points out) the NETDEV_TX_BUSY case
      which may occur if NAPI completions race with start_xmit
      and both decide to re-enable the queue.
      
      For the disable/ifdown path use synchronize_net() to make sure
      closing the device does not race we restarting the queues.
      Annotate accesses to dev_state against data races.
      
      For the NAPI cleanup vs start_xmit path - appropriate barriers
      are already in place in the main spot where Tx queue is stopped
      but we need to do the same careful dance in the TX_BUSY case.
      
      Fixes: c0c050c5
      
       ("bnxt_en: New Broadcom ethernet driver.")
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Reviewed-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bffcedb5
    • Xie Yongji's avatar
      vhost: Fix the calculation in vhost_overflow() · 152962a7
      Xie Yongji authored
      [ Upstream commit f7ad318e ]
      
      This fixes the incorrect calculation for integer overflow
      when the last address of iova range is 0xffffffff.
      
      Fixes: ec33d031
      
       ("vhost: detect 32 bit integer wrap around")
      Reported-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarXie Yongji <xieyongji@bytedance.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Link: https://lore.kernel.org/r/20210728130756.97-2-xieyongji@bytedance.com
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      152962a7
    • Randy Dunlap's avatar
      dccp: add do-while-0 stubs for dccp_pr_debug macros · 91cc40b9
      Randy Dunlap authored
      [ Upstream commit 86aab09a ]
      
      GCC complains about empty macros in an 'if' statement, so convert
      them to 'do {} while (0)' macros.
      
      Fixes these build warnings:
      
      net/dccp/output.c: In function 'dccp_xmit_packet':
      ../net/dccp/output.c:283:71: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
        283 |                 dccp_pr_debug("transmit_skb() returned err=%d\n", err);
      net/dccp/ackvec.c: In function 'dccp_ackvec_update_old':
      ../net/dccp/ackvec.c:163:80: warning: suggest braces around empty body in an 'else' statement [-Wempty-body]
        163 |                                               (unsigned long long)seqno, state);
      
      Fixes: dc841e30 ("dccp: Extend CCID packet dequeueing interface")
      Fixes: 38024086
      
       ("dccp ccid-2: Update code for the Ack Vector input/registration routine")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: dccp@vger.kernel.org
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      91cc40b9
    • Ole Bjørn Midtbø's avatar
      Bluetooth: hidp: use correct wait queue when removing ctrl_wait · 49146a68
      Ole Bjørn Midtbø authored
      [ Upstream commit cca342d9
      
       ]
      
      A different wait queue was used when removing ctrl_wait than when adding
      it. This effectively made the remove operation without locking compared
      to other operations on the wait queue ctrl_wait was part of. This caused
      issues like below where dead000000000100 is LIST_POISON1 and
      dead000000000200 is LIST_POISON2.
      
       list_add corruption. next->prev should be prev (ffffffc1b0a33a08), \
      	but was dead000000000200. (next=ffffffc03ac77de0).
       ------------[ cut here ]------------
       CPU: 3 PID: 2138 Comm: bluetoothd Tainted: G           O    4.4.238+ #9
       ...
       ---[ end trace 0adc2158f0646eac ]---
       Call trace:
       [<ffffffc000443f78>] __list_add+0x38/0xb0
       [<ffffffc0000f0d04>] add_wait_queue+0x4c/0x68
       [<ffffffc00020eecc>] __pollwait+0xec/0x100
       [<ffffffc000d1556c>] bt_sock_poll+0x74/0x200
       [<ffffffc000bdb8a8>] sock_poll+0x110/0x128
       [<ffffffc000210378>] do_sys_poll+0x220/0x480
       [<ffffffc0002106f0>] SyS_poll+0x80/0x138
       [<ffffffc00008510c>] __sys_trace_return+0x0/0x4
      
       Unable to handle kernel paging request at virtual address dead000000000100
       ...
       CPU: 4 PID: 5387 Comm: kworker/u15:3 Tainted: G        W  O    4.4.238+ #9
       ...
       Call trace:
        [<ffffffc0000f079c>] __wake_up_common+0x7c/0xa8
        [<ffffffc0000f0818>] __wake_up+0x50/0x70
        [<ffffffc000be11b0>] sock_def_wakeup+0x58/0x60
        [<ffffffc000de5e10>] l2cap_sock_teardown_cb+0x200/0x224
        [<ffffffc000d3f2ac>] l2cap_chan_del+0xa4/0x298
        [<ffffffc000d45ea0>] l2cap_conn_del+0x118/0x198
        [<ffffffc000d45f8c>] l2cap_disconn_cfm+0x6c/0x78
        [<ffffffc000d29934>] hci_event_packet+0x564/0x2e30
        [<ffffffc000d19b0c>] hci_rx_work+0x10c/0x360
        [<ffffffc0000c2218>] process_one_work+0x268/0x460
        [<ffffffc0000c2678>] worker_thread+0x268/0x480
        [<ffffffc0000c94e0>] kthread+0x118/0x128
        [<ffffffc000085070>] ret_from_fork+0x10/0x20
        ---[ end trace 0adc2158f0646ead ]---
      
      Signed-off-by: default avatarOle Bjørn Midtbø <omidtbo@cisco.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      49146a68
    • Ivan T. Ivanov's avatar
      net: usb: lan78xx: don't modify phy_device state concurrently · d15995a9
      Ivan T. Ivanov authored
      [ Upstream commit 6b67d4d6
      
       ]
      
      Currently phy_device state could be left in inconsistent state shown
      by following alert message[1]. This is because phy_read_status could
      be called concurrently from lan78xx_delayedwork, phy_state_machine and
      __ethtool_get_link. Fix this by making sure that phy_device state is
      updated atomically.
      
      [1] lan78xx 1-1.1.1:1.0 eth0: No phy led trigger registered for speed(-1)
      
      Signed-off-by: default avatarIvan T. Ivanov <iivanov@suse.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d15995a9
    • Sudeep Holla's avatar
      ARM: dts: nomadik: Fix up interrupt controller node names · f7d86403
      Sudeep Holla authored
      [ Upstream commit 47091f47
      
       ]
      
      Once the new schema interrupt-controller/arm,vic.yaml is added, we get
      the below warnings:
      
      	arch/arm/boot/dts/ste-nomadik-nhk15.dt.yaml:
      	intc@10140000: $nodename:0: 'intc@10140000' does not match
      	'^interrupt-controller(@[0-9a-f,]+)*$'
      
      Fix the node names for the interrupt controller to conform
      to the standard node name interrupt-controller@..
      
      Signed-off-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Link: https://lore.kernel.org/r/20210617210825.3064367-2-sudeep.holla@arm.com
      Link: https://lore.kernel.org/r/20210626000103.830184-1-linus.walleij@linaro.org'
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f7d86403
    • Sreekanth Reddy's avatar
      scsi: core: Avoid printing an error if target_alloc() returns -ENXIO · 53fb1ce7
      Sreekanth Reddy authored
      [ Upstream commit 70edd2e6
      
       ]
      
      Avoid printing a 'target allocation failed' error if the driver
      target_alloc() callback function returns -ENXIO. This return value
      indicates that the corresponding H:C:T:L entry is empty.
      
      Removing this error reduces the scan time if the user issues SCAN_WILD_CARD
      scan operation through sysfs parameter on a host with a lot of empty
      H:C:T:L entries.
      
      Avoiding the printk on -ENXIO matches the behavior of the other callback
      functions during scanning.
      
      Link: https://lore.kernel.org/r/20210726115402.1936-1-sreekanth.reddy@broadcom.com
      Signed-off-by: default avatarSreekanth Reddy <sreekanth.reddy@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      53fb1ce7
    • Ye Bin's avatar
      scsi: scsi_dh_rdac: Avoid crash during rdac_bus_attach() · 19cbfd31
      Ye Bin authored
      [ Upstream commit bc546c0c
      
       ]
      
      The following BUG_ON() was observed during RDAC scan:
      
      [595952.944297] kernel BUG at drivers/scsi/device_handler/scsi_dh_rdac.c:427!
      [595952.951143] Internal error: Oops - BUG: 0 [#1] SMP
      ......
      [595953.251065] Call trace:
      [595953.259054]  check_ownership+0xb0/0x118
      [595953.269794]  rdac_bus_attach+0x1f0/0x4b0
      [595953.273787]  scsi_dh_handler_attach+0x3c/0xe8
      [595953.278211]  scsi_dh_add_device+0xc4/0xe8
      [595953.282291]  scsi_sysfs_add_sdev+0x8c/0x2a8
      [595953.286544]  scsi_probe_and_add_lun+0x9fc/0xd00
      [595953.291142]  __scsi_scan_target+0x598/0x630
      [595953.295395]  scsi_scan_target+0x120/0x130
      [595953.299481]  fc_user_scan+0x1a0/0x1c0 [scsi_transport_fc]
      [595953.304944]  store_scan+0xb0/0x108
      [595953.308420]  dev_attr_store+0x44/0x60
      [595953.312160]  sysfs_kf_write+0x58/0x80
      [595953.315893]  kernfs_fop_write+0xe8/0x1f0
      [595953.319888]  __vfs_write+0x60/0x190
      [595953.323448]  vfs_write+0xac/0x1c0
      [595953.326836]  ksys_write+0x74/0xf0
      [595953.330221]  __arm64_sys_write+0x24/0x30
      
      Code is in check_ownership:
      
      	list_for_each_entry_rcu(tmp, &h->ctlr->dh_list, node) {
      		/* h->sdev should always be valid */
      		BUG_ON(!tmp->sdev);
      		tmp->sdev->access_state = access_state;
      	}
      
      	rdac_bus_attach
      		initialize_controller
      			list_add_rcu(&h->node, &h->ctlr->dh_list);
      			h->sdev = sdev;
      
      	rdac_bus_detach
      		list_del_rcu(&h->node);
      		h->sdev = NULL;
      
      Fix the race between rdac_bus_attach() and rdac_bus_detach() where h->sdev
      is NULL when processing the RDAC attach.
      
      Link: https://lore.kernel.org/r/20210113063103.2698953-1-yebin10@huawei.com
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      19cbfd31
    • Harshvardhan Jha's avatar
      scsi: megaraid_mm: Fix end of loop tests for list_for_each_entry() · 3b90de19
      Harshvardhan Jha authored
      [ Upstream commit 77541f78
      
       ]
      
      The list_for_each_entry() iterator, "adapter" in this code, can never be
      NULL.  If we exit the loop without finding the correct adapter then
      "adapter" points invalid memory that is an offset from the list head.  This
      will eventually lead to memory corruption and presumably a kernel crash.
      
      Link: https://lore.kernel.org/r/20210708074642.23599-1-harshvardhan.jha@oracle.com
      Acked-by: default avatarSumit Saxena <sumit.saxena@broadcom.com>
      Signed-off-by: default avatarHarshvardhan Jha <harshvardhan.jha@oracle.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3b90de19
    • Peter Ujfalusi's avatar
      dmaengine: of-dma: router_xlate to return -EPROBE_DEFER if controller is not yet available · 8175f082
      Peter Ujfalusi authored
      [ Upstream commit eda97cb0
      
       ]
      
      If the router_xlate can not find the controller in the available DMA
      devices then it should return with -EPORBE_DEFER in a same way as the
      of_dma_request_slave_channel() does.
      
      The issue can be reproduced if the event router is registered before the
      DMA controller itself and a driver would request for a channel before the
      controller is registered.
      In of_dma_request_slave_channel():
      1. of_dma_find_controller() would find the dma_router
      2. ofdma->of_dma_xlate() would fail and returned NULL
      3. -ENODEV is returned as error code
      
      with this patch we would return in this case the correct -EPROBE_DEFER and
      the client can try to request the channel later.
      
      Signed-off-by: default avatarPeter Ujfalusi <peter.ujfalusi@gmail.com>
      Link: https://lore.kernel.org/r/20210717190021.21897-1-peter.ujfalusi@gmail.com
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8175f082
    • Dave Gerlach's avatar
      ARM: dts: am43x-epos-evm: Reduce i2c0 bus speed for tps65218 · b5df9e60
      Dave Gerlach authored
      [ Upstream commit 20a6b3fd
      
       ]
      
      Based on the latest timing specifications for the TPS65218 from the data
      sheet, http://www.ti.com/lit/ds/symlink/tps65218.pdf, document SLDS206
      from November 2014, we must change the i2c bus speed to better fit within
      the minimum high SCL time required for proper i2c transfer.
      
      When running at 400khz, measurements show that SCL spends
      0.8125 uS/1.666 uS high/low which violates the requirement for minimum
      high period of SCL provided in datasheet Table 7.6 which is 1 uS.
      Switching to 100khz gives us 5 uS/5 uS high/low which both fall above
      the minimum given values for 100 khz, 4.0 uS/4.7 uS high/low.
      
      Without this patch occasionally a voltage set operation from the kernel
      will appear to have worked but the actual voltage reflected on the PMIC
      will not have updated, causing problems especially with cpufreq that may
      update to a higher OPP without actually raising the voltage on DCDC2,
      leading to a hang.
      
      Signed-off-by: default avatarDave Gerlach <d-gerlach@ti.com>
      Signed-off-by: default avatarKevin Hilman <khilman@baylibre.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b5df9e60
    • Yu Kuai's avatar
      dmaengine: usb-dmac: Fix PM reference leak in usb_dmac_probe() · 9a24b6a1
      Yu Kuai authored
      [ Upstream commit 1da569fa
      
       ]
      
      pm_runtime_get_sync will increment pm usage counter even it failed.
      Forgetting to putting operation will result in reference leak here.
      Fix it by moving the error_pm label above the pm_runtime_put() in
      the error path.
      
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Link: https://lore.kernel.org/r/20210706124521.1371901-1-yukuai3@huawei.com
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9a24b6a1
    • Jouni Malinen's avatar
      ath9k: Postpone key cache entry deletion for TXQ frames reference it · 61b014a8
      Jouni Malinen authored
      commit ca284802
      
       upstream.
      
      Do not delete a key cache entry that is still being referenced by
      pending frames in TXQs. This avoids reuse of the key cache entry while a
      frame might still be transmitted using it.
      
      To avoid having to do any additional operations during the main TX path
      operations, track pending key cache entries in a new bitmap and check
      whether any pending entries can be deleted before every new key
      add/remove operation. Also clear any remaining entries when stopping the
      interface.
      
      Signed-off-by: default avatarJouni Malinen <jouni@codeaurora.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/20201214172118.18100-6-jouni@codeaurora.org
      Cc: Pali Rohár <pali@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61b014a8
    • Jouni Malinen's avatar
      ath: Modify ath_key_delete() to not need full key entry · f4d4f447
      Jouni Malinen authored
      commit 144cd24d
      
       upstream.
      
      tkip_keymap can be used internally to avoid the reference to key->cipher
      and with this, only the key index value itself is needed. This allows
      ath_key_delete() call to be postponed to be handled after the upper
      layer STA and key entry have already been removed. This is needed to
      make ath9k key cache management safer.
      
      Signed-off-by: default avatarJouni Malinen <jouni@codeaurora.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/20201214172118.18100-5-jouni@codeaurora.org
      Cc: Pali Rohár <pali@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4d4f447
    • Jouni Malinen's avatar
      ath: Export ath_hw_keysetmac() · 995586a5
      Jouni Malinen authored
      commit d2d3e364
      
       upstream.
      
      ath9k is going to use this for safer management of key cache entries.
      
      Signed-off-by: default avatarJouni Malinen <jouni@codeaurora.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/20201214172118.18100-4-jouni@codeaurora.org
      Cc: Pali Rohár <pali@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      995586a5
    • Jouni Malinen's avatar
      ath9k: Clear key cache explicitly on disabling hardware · 20e7de09
      Jouni Malinen authored
      commit 73488cb2
      
       upstream.
      
      Now that ath/key.c may not be explicitly clearing keys from the key
      cache, clear all key cache entries when disabling hardware to make sure
      no keys are left behind beyond this point.
      
      Signed-off-by: default avatarJouni Malinen <jouni@codeaurora.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/20201214172118.18100-3-jouni@codeaurora.org
      Cc: Pali Rohár <pali@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      20e7de09
    • Jouni Malinen's avatar
      ath: Use safer key clearing with key cache entries · 2cbb22fd
      Jouni Malinen authored
      commit 56c5485c
      
       upstream.
      
      It is possible for there to be pending frames in TXQs with a reference
      to the key cache entry that is being deleted. If such a key cache entry
      is cleared, those pending frame in TXQ might get transmitted without
      proper encryption. It is safer to leave the previously used key into the
      key cache in such cases. Instead, only clear the MAC address to prevent
      RX processing from using this key cache entry.
      
      This is needed in particularly in AP mode where the TXQs cannot be
      flushed on station disconnection. This change alone may not be able to
      address all cases where the key cache entry might get reused for other
      purposes immediately (the key cache entry should be released for reuse
      only once the TXQs do not have any remaining references to them), but
      this makes it less likely to get unprotected frames and the more
      complete changes may end up being significantly more complex.
      
      Signed-off-by: default avatarJouni Malinen <jouni@codeaurora.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/20201214172118.18100-2-jouni@codeaurora.org
      Cc: Pali Rohár <pali@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2cbb22fd
    • Thomas Gleixner's avatar
      x86/fpu: Make init_fpstate correct with optimized XSAVE · af3da506
      Thomas Gleixner authored
      commit f9dfb5e3 upstream.
      
      The XSAVE init code initializes all enabled and supported components with
      XRSTOR(S) to init state. Then it XSAVEs the state of the components back
      into init_fpstate which is used in several places to fill in the init state
      of components.
      
      This works correctly with XSAVE, but not with XSAVEOPT and XSAVES because
      those use the init optimization and skip writing state of components which
      are in init state. So init_fpstate.xsave still contains all zeroes after
      this operation.
      
      There are two ways to solve that:
      
         1) Use XSAVE unconditionally, but that requires to reshuffle the buffer when
            XSAVES is enabled because XSAVES uses compacted format.
      
         2) Save the components which are known to have a non-zero init state by other
            means.
      
      Looking deeper, #2 is the right thing to do because all components the
      kernel supports have all-zeroes init state except the legacy features (FP,
      SSE). Those cannot be hard coded because the states are not identical on all
      CPUs, but they can be saved with FXSAVE which avoids all conditionals.
      
      Use FXSAVE to save the legacy FP/SSE components in init_fpstate along with
      a BUILD_BUG_ON() which reminds developers to validate that a newly added
      component has all zeroes init state. As a bonus remove the now unused
      copy_xregs_to_kernel_booting() crutch.
      
      The XSAVE and reshuffle method can still be implemented in the unlikely
      case that components are added which have a non-zero init state and no
      other means to save them. For now, FXSAVE is just simple and good enough.
      
        [ bp: Fix a typo or two in the text. ]
      
      Fixes: 6bad06b7
      
       ("x86, xsave: Use xsaveopt in context-switch path when supported")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210618143444.587311343@linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af3da506
    • Maxim Levitsky's avatar
      KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl (CVE-2021-3653) · 26af47bd
      Maxim Levitsky authored
      [ upstream commit 0f923e07 ]
      
      * Invert the mask of bits that we pick from L2 in
        nested_vmcb02_prepare_control
      
      * Invert and explicitly use VIRQ related bits bitmask in svm_clear_vintr
      
      This fixes a security issue that allowed a malicious L1 to run L2 with
      AVIC enabled, which allowed the L2 to exploit the uninitialized and enabled
      AVIC to read/write the host physical memory at some offsets.
      
      Fixes: 3d6368ef
      
       ("KVM: SVM: Add VMRUN handler")
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      26af47bd
    • Maxim Levitsky's avatar
      KVM: nSVM: always intercept VMLOAD/VMSAVE when nested (CVE-2021-3656) · 6ed19838
      Maxim Levitsky authored
      [ upstream commit c7dfa400 ]
      
      If L1 disables VMLOAD/VMSAVE intercepts, and doesn't enable
      Virtual VMLOAD/VMSAVE (currently not supported for the nested hypervisor),
      then VMLOAD/VMSAVE must operate on the L1 physical memory, which is only
      possible by making L0 intercept these instructions.
      
      Failure to do so allowed the nested guest to run VMLOAD/VMSAVE unintercepted,
      and thus read/write portions of the host physical memory.
      
      Fixes: 89c8a498
      
       ("KVM: SVM: Enable Virtual VMLOAD VMSAVE feature")
      
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ed19838
    • Johannes Berg's avatar
      mac80211: drop data frames without key on encrypted links · 60986d10
      Johannes Berg authored
      commit a0761a30
      
       upstream.
      
      If we know that we have an encrypted link (based on having had
      a key configured for TX in the past) then drop all data frames
      in the key selection handler if there's no key anymore.
      
      This fixes an issue with mac80211 internal TXQs - there we can
      buffer frames for an encrypted link, but then if the key is no
      longer there when they're dequeued, the frames are sent without
      encryption. This happens if a station is disconnected while the
      frames are still on the TXQ.
      
      Detecting that a link should be encrypted based on a first key
      having been configured for TX is fine as there are no use cases
      for a connection going from with encryption to no encryption.
      With extended key IDs, however, there is a case of having a key
      configured for only decryption, so we can't just trigger this
      behaviour on a key being configured.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarJouni Malinen <j@w1.fi>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Link: https://lore.kernel.org/r/iwlwifi.20200326150855.6865c7f28a14.I9fb1d911b064262d33e33dfba730cdeef83926ca@changeid
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      [pali: Backported to 4.19 and older versions]
      Signed-off-by: default avatarPali Rohár <pali@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60986d10
    • Nathan Chancellor's avatar
      vmlinux.lds.h: Handle clang's module.{c,d}tor sections · 7b77a6ce
      Nathan Chancellor authored
      commit 84837881
      
       upstream.
      
      A recent change in LLVM causes module_{c,d}tor sections to appear when
      CONFIG_K{A,C}SAN are enabled, which results in orphan section warnings
      because these are not handled anywhere:
      
      ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_ctor) is being placed in '.text.asan.module_ctor'
      ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_dtor) is being placed in '.text.asan.module_dtor'
      ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.tsan.module_ctor) is being placed in '.text.tsan.module_ctor'
      
      Fangrui explains: "the function asan.module_ctor has the SHF_GNU_RETAIN
      flag, so it is in a separate section even with -fno-function-sections
      (default)".
      
      Place them in the TEXT_TEXT section so that these technologies continue
      to work with the newer compiler versions. All of the KASAN and KCSAN
      KUnit tests continue to pass after this change.
      
      Cc: stable@vger.kernel.org
      Link: https://github.com/ClangBuiltLinux/linux/issues/1432
      Link: https://github.com/llvm/llvm-project/commit/7b789562244ee941b7bf2cefeb3fc08a59a01865
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarFangrui Song <maskray@google.com>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20210731023107.1932981-1-nathan@kernel.org
      [nc: Fix conflicts due to lack of cf68fffb and 266ff2a8
      
      ]
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b77a6ce
    • Thomas Gleixner's avatar
      PCI/MSI: Enforce MSI[X] entry updates to be visible · 603c94b1
      Thomas Gleixner authored
      commit b9255a7c
      
       upstream.
      
      Nothing enforces the posted writes to be visible when the function
      returns. Flush them even if the flush might be redundant when the entry is
      masked already as the unmask will flush as well. This is either setup or a
      rare affinity change event so the extra flush is not the end of the world.
      
      While this is more a theoretical issue especially the logic in the X86
      specific msi_set_affinity() function relies on the assumption that the
      update has reached the hardware when the function returns.
      
      Again, as this never has been enforced the Fixes tag refers to a commit in:
         git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
      
      Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210729222542.515188147@linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      603c94b1
    • Thomas Gleixner's avatar
      PCI/MSI: Enforce that MSI-X table entry is masked for update · e206635e
      Thomas Gleixner authored
      commit da181dc9
      
       upstream.
      
      The specification (PCIe r5.0, sec 6.1.4.5) states:
      
          For MSI-X, a function is permitted to cache Address and Data values
          from unmasked MSI-X Table entries. However, anytime software unmasks a
          currently masked MSI-X Table entry either by clearing its Mask bit or
          by clearing the Function Mask bit, the function must update any Address
          or Data values that it cached from that entry. If software changes the
          Address or Data value of an entry while the entry is unmasked, the
          result is undefined.
      
      The Linux kernel's MSI-X support never enforced that the entry is masked
      before the entry is modified hence the Fixes tag refers to a commit in:
            git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
      
      Enforce the entry to be masked across the update.
      
      There is no point in enforcing this to be handled at all possible call
      sites as this is just pointless code duplication and the common update
      function is the obvious place to enforce this.
      
      Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support")
      Reported-by: default avatarKevin Tian <kevin.tian@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210729222542.462096385@linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e206635e
    • Thomas Gleixner's avatar
      PCI/MSI: Mask all unused MSI-X entries · e1d5e8a5
      Thomas Gleixner authored
      commit 7d5ec3d3
      
       upstream.
      
      When MSI-X is enabled the ordering of calls is:
      
        msix_map_region();
        msix_setup_entries();
        pci_msi_setup_msi_irqs();
        msix_program_entries();
      
      This has a few interesting issues:
      
       1) msix_setup_entries() allocates the MSI descriptors and initializes them
          except for the msi_desc:masked member which is left zero initialized.
      
       2) pci_msi_setup_msi_irqs() allocates the interrupt descriptors and sets
          up the MSI interrupts which ends up in pci_write_msi_msg() unless the
          interrupt chip provides its own irq_write_msi_msg() function.
      
       3) msix_program_entries() does not do what the name suggests. It solely
          updates the entries array (if not NULL) and initializes the masked
          member for each MSI descriptor by reading the hardware state and then
          masks the entry.
      
      Obviously this has some issues:
      
       1) The uninitialized masked member of msi_desc prevents the enforcement
          of masking the entry in pci_write_msi_msg() depending on the cached
          masked bit. Aside of that half initialized data is a NONO in general
      
       2) msix_program_entries() only ensures that the actually allocated entries
          are masked. This is wrong as experimentation with crash testing and
          crash kernel kexec has shown.
      
          This limited testing unearthed that when the production kernel had more
          entries in use and unmasked when it crashed and the crash kernel
          allocated a smaller amount of entries, then a full scan of all entries
          found unmasked entries which were in use in the production kernel.
      
          This is obviously a device or emulation issue as the device reset
          should mask all MSI-X table entries, but obviously that's just part
          of the paper specification.
      
      Cure this by:
      
       1) Masking all table entries in hardware
       2) Initializing msi_desc::masked in msix_setup_entries()
       3) Removing the mask dance in msix_program_entries()
       4) Renaming msix_program_entries() to msix_update_entries() to
          reflect the purpose of that function.
      
      As the masking of unused entries has never been done the Fixes tag refers
      to a commit in:
         git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
      
      Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210729222542.403833459@linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e1d5e8a5
    • Thomas Gleixner's avatar
      PCI/MSI: Protect msi_desc::masked for multi-MSI · dd3df556
      Thomas Gleixner authored
      commit 77e89afc upstream.
      
      Multi-MSI uses a single MSI descriptor and there is a single mask register
      when the device supports per vector masking. To avoid reading back the mask
      register the value is cached in the MSI descriptor and updates are done by
      clearing and setting bits in the cache and writing it to the device.
      
      But nothing protects msi_desc::masked and the mask register from being
      modified concurrently on two different CPUs for two different Linux
      interrupts which belong to the same multi-MSI descriptor.
      
      Add a lock to struct device and protect any operation on the mask and the
      mask register with it.
      
      This makes the update of msi_desc::masked unconditional, but there is no
      place which requires a modification of the hardware register without
      updating the masked cache.
      
      msi_mask_irq() is now an empty wrapper which will be cleaned up in follow
      up changes.
      
      The problem goes way back to the initial support of multi-MSI, but picking
      the commit which introduced the mask cache is a valid cut off point
      (2.6.30).
      
      Fixes: f2440d9a
      
       ("PCI MSI: Refactor interrupt masking code")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210729222542.726833414@linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dd3df556
    • Thomas Gleixner's avatar
      PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown() · e45820c3
      Thomas Gleixner authored
      commit d28d4ad2
      
       upstream.
      
      No point in using the raw write function from shutdown. Preparatory change
      to introduce proper serialization for the msi_desc::masked cache.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210729222542.674391354@linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e45820c3
    • Thomas Gleixner's avatar
      PCI/MSI: Correct misleading comments · aacbb0d7
      Thomas Gleixner authored
      commit 689e6b53
      
       upstream.
      
      The comments about preserving the cached state in pci_msi[x]_shutdown() are
      misleading as the MSI descriptors are freed right after those functions
      return. So there is nothing to restore. Preparatory change.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210729222542.621609423@linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aacbb0d7
    • Thomas Gleixner's avatar
      PCI/MSI: Do not set invalid bits in MSI mask · a4e85454
      Thomas Gleixner authored
      commit 361fd373 upstream.
      
      msi_mask_irq() takes a mask and a flags argument. The mask argument is used
      to mask out bits from the cached mask and the flags argument to set bits.
      
      Some places invoke it with a flags argument which sets bits which are not
      used by the device, i.e. when the device supports up to 8 vectors a full
      unmask in some places sets the mask to 0xFFFFFF00. While devices probably
      do not care, it's still bad practice.
      
      Fixes: 7ba1930d
      
       ("PCI MSI: Unmask MSI if setup failed")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210729222542.568173099@linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a4e85454
    • Thomas Gleixner's avatar
      PCI/MSI: Enable and mask MSI-X early · a7e53436
      Thomas Gleixner authored
      commit 43855395 upstream.
      
      The ordering of MSI-X enable in hardware is dysfunctional:
      
       1) MSI-X is disabled in the control register
       2) Various setup functions
       3) pci_msi_setup_msi_irqs() is invoked which ends up accessing
          the MSI-X table entries
       4) MSI-X is enabled and masked in the control register with the
          comment that enabling is required for some hardware to access
          the MSI-X table
      
      Step #4 obviously contradicts #3. The history of this is an issue with the
      NIU hardware. When #4 was introduced the table access actually happened in
      msix_program_entries() which was invoked after enabling and masking MSI-X.
      
      This was changed in commit d71d6432 ("PCI/MSI: Kill redundant call of
      irq_set_msi_desc() for MSI-X interrupts") which removed the table write
      from msix_program_entries().
      
      Interestingly enough nobody noticed and either NIU still works or it did
      not get any testing with a kernel 3.19 or later.
      
      Nevertheless this is inconsistent and there is no reason why MSI-X can't be
      enabled and masked in the control register early on, i.e. move step #4
      above to step #1. This preserves the NIU workaround and has no side effects
      on other hardware.
      
      Fixes: d71d6432
      
       ("PCI/MSI: Kill redundant call of irq_set_msi_desc() for MSI-X interrupts")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Reviewed-by: default avatarAshok Raj <ashok.raj@intel.com>
      Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210729222542.344136412@linutronix.de
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7e53436
    • Babu Moger's avatar
      x86/resctrl: Fix default monitoring groups reporting · a9cedbba
      Babu Moger authored
      commit 064855a6 upstream.
      
      Creating a new sub monitoring group in the root /sys/fs/resctrl leads to
      getting the "Unavailable" value for mbm_total_bytes and mbm_local_bytes
      on the entire filesystem.
      
      Steps to reproduce:
      
        1. mount -t resctrl resctrl /sys/fs/resctrl/
      
        2. cd /sys/fs/resctrl/
      
        3. cat mon_data/mon_L3_00/mbm_total_bytes
           23189832
      
        4. Create sub monitor group:
        mkdir mon_groups/test1
      
        5. cat mon_data/mon_L3_00/mbm_total_bytes
           Unavailable
      
      When a new monitoring group is created, a new RMID is assigned to the
      new group. But the RMID is not active yet. When the events are read on
      the new RMID, it is expected to report the status as "Unavailable".
      
      When the user reads the events on the default monitoring group with
      multiple subgroups, the events on all subgroups are consolidated
      together. Currently, if any of the RMID reads report as "Unavailable",
      then everything will be reported as "Unavailable".
      
      Fix the issue by discarding the "Unavailable" reads and reporting all
      the successful RMID reads. This is not a problem on Intel systems as
      Intel reports 0 on Inactive RMIDs.
      
      Fixes: d89b7379
      
       ("x86/intel_rdt/cqm: Add mon_data")
      Reported-by: default avatarPaweł Szulik <pawel.szulik@intel.com>
      Signed-off-by: default avatarBabu Moger <Babu.Moger@amd.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=213311
      Link: https://lkml.kernel.org/r/162793309296.9224.15871659871696482080.stgit@bmoger-ubuntu
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a9cedbba
    • Randy Dunlap's avatar
      x86/tools: Fix objdump version check again · 33c2c00a
      Randy Dunlap authored
      [ Upstream commit 839ad22f ]
      
      Skip (omit) any version string info that is parenthesized.
      
      Warning: objdump version 15) is older than 2.19
      Warning: Skipping posttest.
      
      where 'objdump -v' says:
      GNU objdump (GNU Binutils; SUSE Linux Enterprise 15) 2.35.1.20201123-7.18
      
      Fixes: 8bee738b
      
       ("x86: Fix objdump version check in chkobjdump.awk for different formats.")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Link: https://lore.kernel.org/r/20210731000146.2720-1-rdunlap@infradead.org
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      33c2c00a
    • Pu Lehui's avatar
      powerpc/kprobes: Fix kprobe Oops happens in booke · 8f262080
      Pu Lehui authored
      [ Upstream commit 43e8f760 ]
      
      When using kprobe on powerpc booke series processor, Oops happens
      as show bellow:
      
      / # echo "p:myprobe do_nanosleep" > /sys/kernel/debug/tracing/kprobe_events
      / # echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
      / # sleep 1
      [   50.076730] Oops: Exception in kernel mode, sig: 5 [#1]
      [   50.077017] BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500
      [   50.077221] Modules linked in:
      [   50.077462] CPU: 0 PID: 77 Comm: sleep Not tainted 5.14.0-rc4-00022-g251a1524293d #21
      [   50.077887] NIP:  c0b9c4e0 LR: c00ebecc CTR: 00000000
      [   50.078067] REGS: c3883de0 TRAP: 0700   Not tainted (5.14.0-rc4-00022-g251a1524293d)
      [   50.078349] MSR:  00029000 <CE,EE,ME>  CR: 24000228  XER: 20000000
      [   50.078675]
      [   50.078675] GPR00: c00ebdf0 c3883e90 c313e300 c3883ea0 00000001 00000000 c3883ecc 00000001
      [   50.078675] GPR08: c100598c c00ea250 00000004 00000000 24000222 102490c2 bff4180c 101e60d4
      [   50.078675] GPR16: 00000000 102454ac 00000040 10240000 10241100 102410f8 10240000 00500000
      [   50.078675] GPR24: 00000002 00000000 c3883ea0 00000001 00000000 0000c350 3b9b8d50 00000000
      [   50.080151] NIP [c0b9c4e0] do_nanosleep+0x0/0x190
      [   50.080352] LR [c00ebecc] hrtimer_nanosleep+0x14c/0x1e0
      [   50.080638] Call Trace:
      [   50.080801] [c3883e90] [c00ebdf0] hrtimer_nanosleep+0x70/0x1e0 (unreliable)
      [   50.081110] [c3883f00] [c00ec004] sys_nanosleep_time32+0xa4/0x110
      [   50.081336] [c3883f40] [c001509c] ret_from_syscall+0x0/0x28
      [   50.081541] --- interrupt: c00 at 0x100a4d08
      [   50.081749] NIP:  100a4d08 LR: 101b5234 CTR: 00000003
      [   50.081931] REGS: c3883f50 TRAP: 0c00   Not tainted (5.14.0-rc4-00022-g251a1524293d)
      [   50.082183] MSR:  0002f902 <CE,EE,PR,FP,ME>  CR: 24000222  XER: 00000000
      [   50.082457]
      [   50.082457] GPR00: 000000a2 bf980040 1024b4d0 bf980084 bf980084 64000000 00555345 fefefeff
      [   50.082457] GPR08: 7f7f7f7f 101e0000 00000069 00000003 28000422 102490c2 bff4180c 101e60d4
      [   50.082457] GPR16: 00000000 102454ac 00000040 10240000 10241100 102410f8 10240000 00500000
      [   50.082457] GPR24: 00000002 bf9803f4 10240000 00000000 00000000 100039e0 00000000 102444e8
      [   50.083789] NIP [100a4d08] 0x100a4d08
      [   50.083917] LR [101b5234] 0x101b5234
      [   50.084042] --- interrupt: c00
      [   50.084238] Instruction dump:
      [   50.084483] 4bfffc40 60000000 60000000 60000000 9421fff0 39400402 914200c0 38210010
      [   50.084841] 4bfffc20 00000000 00000000 00000000 <7fe00008> 7c0802a6 7c892378 93c10048
      [   50.085487] ---[ end trace f6fffe98e2fa8f3e ]---
      [   50.085678]
      Trace/breakpoint trap
      
      There is no real mode for booke arch and the MMU translation is
      always on. The corresponding MSR_IS/MSR_DS bit in booke is used
      to switch the address space, but not for real mode judgment.
      
      Fixes: 21f8b2fa
      
       ("powerpc/kprobes: Ignore traps that happened in real mode")
      Signed-off-by: default avatarPu Lehui <pulehui@huawei.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210809023658.218915-1-pulehui@huawei.com
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8f262080
    • Longpeng(Mike)'s avatar
      vsock/virtio: avoid potential deadlock when vsock device remove · 8dc6941d
      Longpeng(Mike) authored
      [ Upstream commit 49b0b6ff ]
      
      There's a potential deadlock case when remove the vsock device or
      process the RESET event:
      
        vsock_for_each_connected_socket:
            spin_lock_bh(&vsock_table_lock) ----------- (1)
            ...
                virtio_vsock_reset_sock:
                    lock_sock(sk) --------------------- (2)
            ...
            spin_unlock_bh(&vsock_table_lock)
      
      lock_sock() may do initiative schedule when the 'sk' is owned by
      other thread at the same time, we would receivce a warning message
      that "scheduling while atomic".
      
      Even worse, if the next task (selected by the scheduler) try to
      release a 'sk', it need to request vsock_table_lock and the deadlock
      occur, cause the system into softlockup state.
        Call trace:
         queued_spin_lock_slowpath
         vsock_remove_bound
         vsock_remove_sock
         virtio_transport_release
         __vsock_release
         vsock_release
         __sock_release
         sock_close
         __fput
         ____fput
      
      So we should not require sk_lock in this case, just like the behavior
      in vhost_vsock or vmci.
      
      Fixes: 0ea9e1d3
      
       ("VSOCK: Introduce virtio_transport.ko")
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarLongpeng(Mike) <longpeng2@huawei.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Link: https://lore.kernel.org/r/20210812053056.1699-1-longpeng2@huawei.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8dc6941d
    • Maximilian Heyne's avatar
      xen/events: Fix race in set_evtchn_to_irq · 47501793
      Maximilian Heyne authored
      [ Upstream commit 88ca2521
      
       ]
      
      There is a TOCTOU issue in set_evtchn_to_irq. Rows in the evtchn_to_irq
      mapping are lazily allocated in this function. The check whether the row
      is already present and the row initialization is not synchronized. Two
      threads can at the same time allocate a new row for evtchn_to_irq and
      add the irq mapping to the their newly allocated row. One thread will
      overwrite what the other has set for evtchn_to_irq[row] and therefore
      the irq mapping is lost. This will trigger a BUG_ON later in
      bind_evtchn_to_cpu:
      
        INFO: pci 0000:1a:15.4: [1d0f:8061] type 00 class 0x010802
        INFO: nvme 0000:1a:12.1: enabling device (0000 -> 0002)
        INFO: nvme nvme77: 1/0/0 default/read/poll queues
        CRIT: kernel BUG at drivers/xen/events/events_base.c:427!
        WARN: invalid opcode: 0000 [#1] SMP NOPTI
        WARN: Workqueue: nvme-reset-wq nvme_reset_work [nvme]
        WARN: RIP: e030:bind_evtchn_to_cpu+0xc2/0xd0
        WARN: Call Trace:
        WARN:  set_affinity_irq+0x121/0x150
        WARN:  irq_do_set_affinity+0x37/0xe0
        WARN:  irq_setup_affinity+0xf6/0x170
        WARN:  irq_startup+0x64/0xe0
        WARN:  __setup_irq+0x69e/0x740
        WARN:  ? request_threaded_irq+0xad/0x160
        WARN:  request_threaded_irq+0xf5/0x160
        WARN:  ? nvme_timeout+0x2f0/0x2f0 [nvme]
        WARN:  pci_request_irq+0xa9/0xf0
        WARN:  ? pci_alloc_irq_vectors_affinity+0xbb/0x130
        WARN:  queue_request_irq+0x4c/0x70 [nvme]
        WARN:  nvme_reset_work+0x82d/0x1550 [nvme]
        WARN:  ? check_preempt_wakeup+0x14f/0x230
        WARN:  ? check_preempt_curr+0x29/0x80
        WARN:  ? nvme_irq_check+0x30/0x30 [nvme]
        WARN:  process_one_work+0x18e/0x3c0
        WARN:  worker_thread+0x30/0x3a0
        WARN:  ? process_one_work+0x3c0/0x3c0
        WARN:  kthread+0x113/0x130
        WARN:  ? kthread_park+0x90/0x90
        WARN:  ret_from_fork+0x3a/0x50
      
      This patch sets evtchn_to_irq rows via a cmpxchg operation so that they
      will be set only once. The row is now cleared before writing it to
      evtchn_to_irq in order to not create a race once the row is visible for
      other threads.
      
      While at it, do not require the page to be zeroed, because it will be
      overwritten with -1's in clear_evtchn_to_irq_row anyway.
      
      Signed-off-by: default avatarMaximilian Heyne <mheyne@amazon.de>
      Fixes: d0b075ff
      
       ("xen/events: Refactor evtchn_to_irq array to be dynamically allocated")
      Link: https://lore.kernel.org/r/20210812130930.127134-1-mheyne@amazon.de
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      47501793
    • Neal Cardwell's avatar
      tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets · e0f5a8ed
      Neal Cardwell authored
      [ Upstream commit 6de035fe ]
      
      Currently if BBR congestion control is initialized after more than 2B
      packets have been delivered, depending on the phase of the
      tp->delivered counter the tracking of BBR round trips can get stuck.
      
      The bug arises because if tp->delivered is between 2^31 and 2^32 at
      the time the BBR congestion control module is initialized, then the
      initialization of bbr->next_rtt_delivered to 0 will cause the logic to
      believe that the end of the round trip is still billions of packets in
      the future. More specifically, the following check will fail
      repeatedly:
      
        !before(rs->prior_delivered, bbr->next_rtt_delivered)
      
      and thus the connection will take up to 2B packets delivered before
      that check will pass and the connection will set:
      
        bbr->round_start = 1;
      
      This could cause many mechanisms in BBR to fail to trigger, for
      example bbr_check_full_bw_reached() would likely never exit STARTUP.
      
      This bug is 5 years old and has not been observed, and as a practical
      matter this would likely rarely trigger, since it would require
      transferring at least 2B packets, or likely more than 3 terabytes of
      data, before switching congestion control algorithms to BBR.
      
      This patch is a stable candidate for kernels as far back as v4.9,
      when tcp_bbr.c was added.
      
      Fixes: 0f8782ea
      
       ("tcp_bbr: add BBR congestion control")
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Reviewed-by: default avatarYuchung Cheng <ycheng@google.com>
      Reviewed-by: default avatarKevin Yang <yyd@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20210811024056.235161-1-ncardwell@google.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e0f5a8ed
    • Yang Yingliang's avatar
      net: bridge: fix memleak in br_add_if() · b123e6b2
      Yang Yingliang authored
      [ Upstream commit 519133de ]
      
      I got a memleak report:
      
      BUG: memory leak
      unreferenced object 0x607ee521a658 (size 240):
      comm "syz-executor.0", pid 955, jiffies 4294780569 (age 16.449s)
      hex dump (first 32 bytes, cpu 1):
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      backtrace:
      [<00000000d830ea5a>] br_multicast_add_port+0x1c2/0x300 net/bridge/br_multicast.c:1693
      [<00000000274d9a71>] new_nbp net/bridge/br_if.c:435 [inline]
      [<00000000274d9a71>] br_add_if+0x670/0x1740 net/bridge/br_if.c:611
      [<0000000012ce888e>] do_set_master net/core/rtnetlink.c:2513 [inline]
      [<0000000012ce888e>] do_set_master+0x1aa/0x210 net/core/rtnetlink.c:2487
      [<0000000099d1cafc>] __rtnl_newlink+0x1095/0x13e0 net/core/rtnetlink.c:3457
      [<00000000a01facc0>] rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3488
      [<00000000acc9186c>] rtnetlink_rcv_msg+0x369/0xa10 net/core/rtnetlink.c:5550
      [<00000000d4aabb9c>] netlink_rcv_skb+0x134/0x3d0 net/netlink/af_netlink.c:2504
      [<00000000bc2e12a3>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
      [<00000000bc2e12a3>] netlink_unicast+0x4a0/0x6a0 net/netlink/af_netlink.c:1340
      [<00000000e4dc2d0e>] netlink_sendmsg+0x789/0xc70 net/netlink/af_netlink.c:1929
      [<000000000d22c8b3>] sock_sendmsg_nosec net/socket.c:654 [inline]
      [<000000000d22c8b3>] sock_sendmsg+0x139/0x170 net/socket.c:674
      [<00000000e281417a>] ____sys_sendmsg+0x658/0x7d0 net/socket.c:2350
      [<00000000237aa2ab>] ___sys_sendmsg+0xf8/0x170 net/socket.c:2404
      [<000000004f2dc381>] __sys_sendmsg+0xd3/0x190 net/socket.c:2433
      [<0000000005feca6c>] do_syscall_64+0x37/0x90 arch/x86/entry/common.c:47
      [<000000007304477d>] entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      On error path of br_add_if(), p->mcast_stats allocated in
      new_nbp() need be freed, or it will be leaked.
      
      Fixes: 1080ab95
      
       ("net: bridge: add support for IGMP/MLD stats and export them via netlink")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Link: https://lore.kernel.org/r/20210809132023.978546-1-yangyingliang@huawei.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b123e6b2
    • Takeshi Misawa's avatar
      net: Fix memory leak in ieee802154_raw_deliver · a0d5422c
      Takeshi Misawa authored
      [ Upstream commit 1090340f ]
      
      If IEEE-802.15.4-RAW is closed before receive skb, skb is leaked.
      Fix this, by freeing sk_receive_queue in sk->sk_destruct().
      
      syzbot report:
      BUG: memory leak
      unreferenced object 0xffff88810f644600 (size 232):
        comm "softirq", pid 0, jiffies 4294967032 (age 81.270s)
        hex dump (first 32 bytes):
          10 7d 4b 12 81 88 ff ff 10 7d 4b 12 81 88 ff ff  .}K......}K.....
          00 00 00 00 00 00 00 00 40 7c 4b 12 81 88 ff ff  ........@|K.....
        backtrace:
          [<ffffffff83651d4a>] skb_clone+0xaa/0x2b0 net/core/skbuff.c:1496
          [<ffffffff83fe1b80>] ieee802154_raw_deliver net/ieee802154/socket.c:369 [inline]
          [<ffffffff83fe1b80>] ieee802154_rcv+0x100/0x340 net/ieee802154/socket.c:1070
          [<ffffffff8367cc7a>] __netif_receive_skb_one_core+0x6a/0xa0 net/core/dev.c:5384
          [<ffffffff8367cd07>] __netif_receive_skb+0x27/0xa0 net/core/dev.c:5498
          [<ffffffff8367cdd9>] netif_receive_skb_internal net/core/dev.c:5603 [inline]
          [<ffffffff8367cdd9>] netif_receive_skb+0x59/0x260 net/core/dev.c:5662
          [<ffffffff83fe6302>] ieee802154_deliver_skb net/mac802154/rx.c:29 [inline]
          [<ffffffff83fe6302>] ieee802154_subif_frame net/mac802154/rx.c:102 [inline]
          [<ffffffff83fe6302>] __ieee802154_rx_handle_packet net/mac802154/rx.c:212 [inline]
          [<ffffffff83fe6302>] ieee802154_rx+0x612/0x620 net/mac802154/rx.c:284
          [<ffffffff83fe59a6>] ieee802154_tasklet_handler+0x86/0xa0 net/mac802154/main.c:35
          [<ffffffff81232aab>] tasklet_action_common.constprop.0+0x5b/0x100 kernel/softirq.c:557
          [<ffffffff846000bf>] __do_softirq+0xbf/0x2ab kernel/softirq.c:345
          [<ffffffff81232f4c>] do_softirq kernel/softirq.c:248 [inline]
          [<ffffffff81232f4c>] do_softirq+0x5c/0x80 kernel/softirq.c:235
          [<ffffffff81232fc1>] __local_bh_enable_ip+0x51/0x60 kernel/softirq.c:198
          [<ffffffff8367a9a4>] local_bh_enable include/linux/bottom_half.h:32 [inline]
          [<ffffffff8367a9a4>] rcu_read_unlock_bh include/linux/rcupdate.h:745 [inline]
          [<ffffffff8367a9a4>] __dev_queue_xmit+0x7f4/0xf60 net/core/dev.c:4221
          [<ffffffff83fe2db4>] raw_sendmsg+0x1f4/0x2b0 net/ieee802154/socket.c:295
          [<ffffffff8363af16>] sock_sendmsg_nosec net/socket.c:654 [inline]
          [<ffffffff8363af16>] sock_sendmsg+0x56/0x80 net/socket.c:674
          [<ffffffff8363deec>] __sys_sendto+0x15c/0x200 net/socket.c:1977
          [<ffffffff8363dfb6>] __do_sys_sendto net/socket.c:1989 [inline]
          [<ffffffff8363dfb6>] __se_sys_sendto net/socket.c:1985 [inline]
          [<ffffffff8363dfb6>] __x64_sys_sendto+0x26/0x30 net/socket.c:1985
      
      Fixes: 9ec76716
      
       ("net: add IEEE 802.15.4 socket family implementation")
      Reported-and-tested-by: default avatar <syzbot+1f68113fa907bf0695a8@syzkaller.appspotmail.com>
      Signed-off-by: default avatarTakeshi Misawa <jeliantsurux@gmail.com>
      Acked-by: default avatarAlexander Aring <aahringo@redhat.com>
      Link: https://lore.kernel.org/r/20210805075414.GA15796@DESKTOP
      Signed-off-by: default avatarStefan Schmidt <stefan@datenfreihafen.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a0d5422c
    • Roi Dayan's avatar
      psample: Add a fwd declaration for skbuff · 0ef55cb3
      Roi Dayan authored
      [ Upstream commit beb7f2de ]
      
      Without this there is a warning if source files include psample.h
      before skbuff.h or doesn't include it at all.
      
      Fixes: 6ae0a628
      
       ("net: Introduce psample, a new genetlink channel for packet sampling")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Link: https://lore.kernel.org/r/20210808065242.1522535-1-roid@nvidia.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0ef55cb3
    • Pali Rohár's avatar
      ppp: Fix generating ifname when empty IFLA_IFNAME is specified · d0ba3f1f
      Pali Rohár authored
      [ Upstream commit 2459dcb9
      
       ]
      
      IFLA_IFNAME is nul-term string which means that IFLA_IFNAME buffer can be
      larger than length of string which contains.
      
      Function __rtnl_newlink() generates new own ifname if either IFLA_IFNAME
      was not specified at all or userspace passed empty nul-term string.
      
      It is expected that if userspace does not specify ifname for new ppp netdev
      then kernel generates one in format "ppp<id>" where id matches to the ppp
      unit id which can be later obtained by PPPIOCGUNIT ioctl.
      
      And it works in this way if IFLA_IFNAME is not specified at all. But it
      does not work when IFLA_IFNAME is specified with empty string.
      
      So fix this logic also for empty IFLA_IFNAME in ppp_nl_newlink() function
      and correctly generates ifname based on ppp unit identifier if userspace
      did not provided preferred ifname.
      
      Without this patch when IFLA_IFNAME was specified with empty string then
      kernel created a new ppp interface in format "ppp<id>" but id did not
      match ppp unit id returned by PPPIOCGUNIT ioctl. In this case id was some
      number generated by __rtnl_newlink() function.
      
      Signed-off-by: default avatarPali Rohár <pali@kernel.org>
      Fixes: bb8082f6
      
       ("ppp: build ifname using unit identifier for rtnl based devices")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d0ba3f1f