Skip to content
  1. Jun 06, 2022
    • Reinette Chatre's avatar
      x86/sgx: Obtain backing storage page with enclave mutex held · 69432ff1
      Reinette Chatre authored
      commit 0e4e729a upstream.
      
      Haitao reported encountering a WARN triggered by the ENCLS[ELDU]
      instruction faulting with a #GP.
      
      The WARN is encountered when the reclaimer evicts a range of
      pages from the enclave when the same pages are faulted back
      right away.
      
      The SGX backing storage is accessed on two paths: when there
      are insufficient free pages in the EPC the reclaimer works
      to move enclave pages to the backing storage and as enclaves
      access pages that have been moved to the backing storage
      they are retrieved from there as part of page fault handling.
      
      An oversubscribed SGX system will often run the reclaimer and
      page fault handler concurrently and needs to ensure that the
      backing store is accessed safely between the reclaimer and
      the page fault handler. This is not the case because the
      reclaimer accesses the backing store without the enclave mutex
      while the page fault handler accesses the backing store with
      the enclave mutex.
      
      Consider the scenario where a page is faulted while a page sharing
      a PCMD page with the faulted page is being reclaimed. The
      consequence is a race between the reclaimer and page fault
      handler, the reclaimer attempting to access a PCMD at the
      same time it is truncated by the page fault handler. This
      could result in lost PCMD data. Data may still be
      lost if the reclaimer wins the race, this is addressed in
      the following patch.
      
      The reclaimer accesses pages from the backing storage without
      holding the enclave mutex and runs the risk of concurrently
      accessing the backing storage with the page fault handler that
      does access the backing storage with the enclave mutex held.
      
      In the scenario below a PCMD page is truncated from the backing
      store after all its pages have been loaded in to the enclave
      at the same time the PCMD page is loaded from the backing store
      when one of its pages are reclaimed:
      
      sgx_reclaim_pages() {              sgx_vma_fault() {
                                           ...
                                           mutex_lock(&encl->lock);
                                           ...
                                           __sgx_encl_eldu() {
                                             ...
                                             if (pcmd_page_empty) {
      /*
       * EPC page being reclaimed              /*
       * shares a PCMD page with an             * PCMD page truncated
       * enclave page that is being             * while requested from
       * faulted in.                            * reclaimer.
       */                                       */
      sgx_encl_get_backing()  <---------->      sgx_encl_truncate_backing_page()
                                              }
                                             mutex_unlock(&encl->lock);
      }                                    }
      
      In this scenario there is a race between the reclaimer and the page fault
      handler when the reclaimer attempts to get access to the same PCMD page
      that is being truncated. This could result in the reclaimer writing to
      the PCMD page that is then truncated, causing the PCMD data to be lost,
      or in a new PCMD page being allocated. The lost PCMD data may still occur
      after protecting the backing store access with the mutex - this is fixed
      in the next patch. By ensuring the backing store is accessed with the mutex
      held the enclave page state can be made accurate with the
      SGX_ENCL_PAGE_BEING_RECLAIMED flag accurately reflecting that a page
      is in the process of being reclaimed.
      
      Consistently protect the reclaimer's backing store access with the
      enclave's mutex to ensure that it can safely run concurrently with the
      page fault handler.
      
      Cc: stable@vger.kernel.org
      Fixes: 1728ab54
      
       ("x86/sgx: Add a page reclaimer")
      Reported-by: default avatarHaitao Huang <haitao.huang@intel.com>
      Signed-off-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Tested-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Tested-by: default avatarHaitao Huang <haitao.huang@intel.com>
      Link: https://lkml.kernel.org/r/fa2e04c561a8555bfe1f4e7adc37d60efc77387b.1652389823.git.reinette.chatre@intel.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      69432ff1
    • Reinette Chatre's avatar
      x86/sgx: Mark PCMD page as dirty when modifying contents · 876053dd
      Reinette Chatre authored
      commit 2154e1c1 upstream.
      
      Recent commit 08999b24 ("x86/sgx: Free backing memory
      after faulting the enclave page") expanded __sgx_encl_eldu()
      to clear an enclave page's PCMD (Paging Crypto MetaData)
      from the PCMD page in the backing store after the enclave
      page is restored to the enclave.
      
      Since the PCMD page in the backing store is modified the page
      should be marked as dirty to ensure the modified data is retained.
      
      Cc: stable@vger.kernel.org
      Fixes: 08999b24
      
       ("x86/sgx: Free backing memory after faulting the enclave page")
      Signed-off-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Tested-by: default avatarHaitao Huang <haitao.huang@intel.com>
      Link: https://lkml.kernel.org/r/00cd2ac480db01058d112e347b32599c1a806bc4.1652389823.git.reinette.chatre@intel.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      876053dd
    • Reinette Chatre's avatar
      x86/sgx: Disconnect backing page references from dirty status · 5ded81f4
      Reinette Chatre authored
      commit 6bd42964 upstream.
      
      SGX uses shmem backing storage to store encrypted enclave pages
      and their crypto metadata when enclave pages are moved out of
      enclave memory. Two shmem backing storage pages are associated with
      each enclave page - one backing page to contain the encrypted
      enclave page data and one backing page (shared by a few
      enclave pages) to contain the crypto metadata used by the
      processor to verify the enclave page when it is loaded back into
      the enclave.
      
      sgx_encl_put_backing() is used to release references to the
      backing storage and, optionally, mark both backing store pages
      as dirty.
      
      Managing references and dirty status together in this way results
      in both backing store pages marked as dirty, even if only one of
      the backing store pages are changed.
      
      Additionally, waiting until the page reference is dropped to set
      the page dirty risks a race with the page fault handler that
      may load outdated data into the enclave when a page is faulted
      right after it is reclaimed.
      
      Consider what happens if the reclaimer writes a page to the backing
      store and the page is immediately faulted back, before the reclaimer
      is able to set the dirty bit of the page:
      
      sgx_reclaim_pages() {                    sgx_vma_fault() {
        ...
        sgx_encl_get_backing();
        ...                                      ...
        sgx_reclaimer_write() {
          mutex_lock(&encl->lock);
          /* Write data to backing store */
          mutex_unlock(&encl->lock);
        }
                                                 mutex_lock(&encl->lock);
                                                 __sgx_encl_eldu() {
                                                   ...
                                                   /*
                                                    * Enclave backing store
                                                    * page not released
                                                    * nor marked dirty -
                                                    * contents may not be
                                                    * up to date.
                                                    */
                                                    sgx_encl_get_backing();
                                                    ...
                                                    /*
                                                     * Enclave data restored
                                                     * from backing store
                                                     * and PCMD pages that
                                                     * are not up to date.
                                                     * ENCLS[ELDU] faults
                                                     * because of MAC or PCMD
                                                     * checking failure.
                                                     */
                                                     sgx_encl_put_backing();
                                                  }
                                                  ...
        /* set page dirty */
        sgx_encl_put_backing();
        ...
                                                  mutex_unlock(&encl->lock);
      }                                        }
      
      Remove the option to sgx_encl_put_backing() to set the backing
      pages as dirty and set the needed pages as dirty right after
      receiving important data while enclave mutex is held. This ensures that
      the page fault handler can get up to date data from a page and prepares
      the code for a following change where only one of the backing pages
      need to be marked as dirty.
      
      Cc: stable@vger.kernel.org
      Fixes: 1728ab54
      
       ("x86/sgx: Add a page reclaimer")
      Suggested-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Tested-by: default avatarHaitao Huang <haitao.huang@intel.com>
      Link: https://lore.kernel.org/linux-sgx/8922e48f-6646-c7cc-6393-7c78dcf23d23@intel.com/
      Link: https://lkml.kernel.org/r/fa9f98986923f43e72ef4c6702a50b2a0b3c42e3.1652389823.git.reinette.chatre@intel.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ded81f4
    • Tao Jin's avatar
      HID: multitouch: add quirks to enable Lenovo X12 trackpoint · 6ad9dbb2
      Tao Jin authored
      commit 95cd2cdc upstream.
      
      This applies the similar quirks used by previous generation devices
      such as X1 tablet for X12 tablet, so that the trackpoint and buttons
      can work.
      
      This patch was applied and tested working on 5.17.1 .
      
      Cc: stable@vger.kernel.org # 5.8+ given that it relies on 40d5bb87
      
      
      Signed-off-by: default avatarTao Jin <tao-j@outlook.com>
      Signed-off-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Link: https://lore.kernel.org/r/CO6PR03MB6241CB276FCDC7F4CEDC34F6E1E29@CO6PR03MB6241.namprd03.prod.outlook.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ad9dbb2
    • Marek Maślanka's avatar
      HID: multitouch: Add support for Google Whiskers Touchpad · 557b6a9c
      Marek Maślanka authored
      commit 1d07cef7
      
       upstream.
      
      The Google Whiskers touchpad does not work properly with the default
      multitouch configuration. Instead, use the same configuration as Google
      Rose.
      
      Signed-off-by: default avatarMarek Maslanka <mm@semihalf.com>
      Acked-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      557b6a9c
    • Randy Dunlap's avatar
      fs/ntfs3: validate BOOT sectors_per_clusters · a2b69863
      Randy Dunlap authored
      commit a3b77434 upstream.
      
      When the NTFS BOOT sectors_per_clusters field is > 0x80, it represents a
      shift value.  Make sure that the shift value is not too large before using
      it (NTFS max cluster size is 2MB).  Return -EVINVAL if it too large.
      
      This prevents negative shift values and shift values that are larger than
      the field size.
      
      Prevents this UBSAN error:
      
       UBSAN: shift-out-of-bounds in ../fs/ntfs3/super.c:673:16
       shift exponent -192 is negative
      
      Link: https://lkml.kernel.org/r/20220502175342.20296-1-rdunlap@infradead.org
      Fixes: 82cae269
      
       ("fs/ntfs3: Add initialization of super block")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatar <syzbot+1631f09646bc214d2e76@syzkaller.appspotmail.com>
      Reviewed-by: default avatarNamjae Jeon <linkinjeon@kernel.org>
      Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Kari Argillander <kari.argillander@stargateuniverse.net>
      Cc: Namjae Jeon <linkinjeon@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a2b69863
    • Mariusz Tkaczyk's avatar
      raid5: introduce MD_BROKEN · 8a395a21
      Mariusz Tkaczyk authored
      commit 57668f0a upstream.
      
      Raid456 module had allowed to achieve failed state. It was fixed by
      fb73b357 ("raid5: block failing device if raid will be failed").
      This fix introduces a bug, now if raid5 fails during IO, it may result
      with a hung task without completion. Faulty flag on the device is
      necessary to process all requests and is checked many times, mainly in
      analyze_stripe().
      Allow to set faulty on drive again and set MD_BROKEN if raid is failed.
      
      As a result, this level is allowed to achieve failed state again, but
      communication with userspace (via -EBUSY status) will be preserved.
      
      This restores possibility to fail array via #mdadm --set-faulty command
      and will be fixed by additional verification on mdadm side.
      
      Reproduction steps:
       mdadm -CR imsm -e imsm -n 3 /dev/nvme[0-2]n1
       mdadm -CR r5 -e imsm -l5 -n3 /dev/nvme[0-2]n1 --assume-clean
       mkfs.xfs /dev/md126 -f
       mount /dev/md126 /mnt/root/
      
       fio --filename=/mnt/root/file --size=5GB --direct=1 --rw=randrw
      --bs=64k --ioengine=libaio --iodepth=64 --runtime=240 --numjobs=4
      --time_based --group_reporting --name=throughput-test-job
      --eta-newline=1 &
      
       echo 1 > /sys/block/nvme2n1/device/device/remove
       echo 1 > /sys/block/nvme1n1/device/device/remove
      
       [ 1475.787779] Call Trace:
       [ 1475.793111] __schedule+0x2a6/0x700
       [ 1475.799460] schedule+0x38/0xa0
       [ 1475.805454] raid5_get_active_stripe+0x469/0x5f0 [raid456]
       [ 1475.813856] ? finish_wait+0x80/0x80
       [ 1475.820332] raid5_make_request+0x180/0xb40 [raid456]
       [ 1475.828281] ? finish_wait+0x80/0x80
       [ 1475.834727] ? finish_wait+0x80/0x80
       [ 1475.841127] ? finish_wait+0x80/0x80
       [ 1475.847480] md_handle_request+0x119/0x190
       [ 1475.854390] md_make_request+0x8a/0x190
       [ 1475.861041] generic_make_request+0xcf/0x310
       [ 1475.868145] submit_bio+0x3c/0x160
       [ 1475.874355] iomap_dio_submit_bio.isra.20+0x51/0x60
       [ 1475.882070] iomap_dio_bio_actor+0x175/0x390
       [ 1475.889149] iomap_apply+0xff/0x310
       [ 1475.895447] ? iomap_dio_bio_actor+0x390/0x390
       [ 1475.902736] ? iomap_dio_bio_actor+0x390/0x390
       [ 1475.909974] iomap_dio_rw+0x2f2/0x490
       [ 1475.916415] ? iomap_dio_bio_actor+0x390/0x390
       [ 1475.923680] ? atime_needs_update+0x77/0xe0
       [ 1475.930674] ? xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
       [ 1475.938455] xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
       [ 1475.946084] xfs_file_read_iter+0xba/0xd0 [xfs]
       [ 1475.953403] aio_read+0xd5/0x180
       [ 1475.959395] ? _cond_resched+0x15/0x30
       [ 1475.965907] io_submit_one+0x20b/0x3c0
       [ 1475.972398] __x64_sys_io_submit+0xa2/0x180
       [ 1475.979335] ? do_io_getevents+0x7c/0xc0
       [ 1475.986009] do_syscall_64+0x5b/0x1a0
       [ 1475.992419] entry_SYSCALL_64_after_hwframe+0x65/0xca
       [ 1476.000255] RIP: 0033:0x7f11fc27978d
       [ 1476.006631] Code: Bad RIP value.
       [ 1476.073251] INFO: task fio:3877 blocked for more than 120 seconds.
      
      Cc: stable@vger.kernel.org
      Fixes: fb73b357
      
       ("raid5: block failing device if raid will be failed")
      Reviewd-by: default avatarXiao Ni <xni@redhat.com>
      Signed-off-by: default avatarMariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a395a21
    • Sarthak Kukreti's avatar
      dm verity: set DM_TARGET_IMMUTABLE feature flag · 417c73db
      Sarthak Kukreti authored
      commit 4caae584 upstream.
      
      The device-mapper framework provides a mechanism to mark targets as
      immutable (and hence fail table reloads that try to change the target
      type). Add the DM_TARGET_IMMUTABLE flag to the dm-verity target's
      feature flags to prevent switching the verity target with a different
      target type.
      
      Fixes: a4ffc152
      
       ("dm: add verity target")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSarthak Kukreti <sarthakkukreti@google.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      417c73db
    • Mikulas Patocka's avatar
      dm stats: add cond_resched when looping over entries · ddd5cd42
      Mikulas Patocka authored
      commit bfe2b014
      
       upstream.
      
      dm-stats can be used with a very large number of entries (it is only
      limited by 1/4 of total system memory), so add rescheduling points to
      the loops that iterate over the entries.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ddd5cd42
    • Mikulas Patocka's avatar
      dm crypt: make printing of the key constant-time · eb27bd45
      Mikulas Patocka authored
      commit 567dd8f3
      
       upstream.
      
      The device mapper dm-crypt target is using scnprintf("%02x", cc->key[i]) to
      report the current key to userspace. However, this is not a constant-time
      operation and it may leak information about the key via timing, via cache
      access patterns or via the branch predictor.
      
      Change dm-crypt's key printing to use "%c" instead of "%02x". Also
      introduce hex2asc() that carefully avoids any branching or memory
      accesses when converting a number in the range 0 ... 15 to an ascii
      character.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Tested-by: default avatarMilan Broz <gmazyland@gmail.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb27bd45
    • Dan Carpenter's avatar
      dm integrity: fix error code in dm_integrity_ctr() · 7b057349
      Dan Carpenter authored
      commit d3f2a14b upstream.
      
      The "r" variable shadows an earlier "r" that has function scope.  It
      means that we accidentally return success instead of an error code.
      Smatch has a warning for this:
      
      	drivers/md/dm-integrity.c:4503 dm_integrity_ctr()
      	warn: missing error code 'r'
      
      Fixes: 7eada909
      
       ("dm: add integrity target")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b057349
    • Jonathan Bakker's avatar
      ARM: dts: s5pv210: Correct interrupt name for bluetooth in Aries · 3ad6c173
      Jonathan Bakker authored
      commit 3f5e3d3a upstream.
      
      Correct the name of the bluetooth interrupt from host-wake to
      host-wakeup.
      
      Fixes: 1c65b618
      
       ("ARM: dts: s5pv210: Correct BCM4329 bluetooth node")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarJonathan Bakker <xc-racer2@live.ca>
      Link: https://lore.kernel.org/r/CY4PR04MB0567495CFCBDC8D408D44199CB1C9@CY4PR04MB0567.namprd04.prod.outlook.com
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ad6c173
    • Steven Rostedt's avatar
      Bluetooth: hci_qca: Use del_timer_sync() before freeing · 2717654a
      Steven Rostedt authored
      commit 72ef9844 upstream.
      
      While looking at a crash report on a timer list being corrupted, which
      usually happens when a timer is freed while still active. This is
      commonly triggered by code calling del_timer() instead of
      del_timer_sync() just before freeing.
      
      One possible culprit is the hci_qca driver, which does exactly that.
      
      Eric mentioned that wake_retrans_timer could be rearmed via the work
      queue, so also move the destruction of the work queue before
      del_timer_sync().
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: stable@vger.kernel.org
      Fixes: 0ff252c1
      
       ("Bluetooth: hciuart: Add support QCA chipset for UART")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2717654a
    • Craig McLure's avatar
      ALSA: usb-audio: Configure sync endpoints before data · 03141e3f
      Craig McLure authored
      commit 0e85a22d upstream.
      
      Devices such as the TC-Helicon GoXLR require the sync endpoint to be
      configured in advance of the data endpoint in order for sound output
      to work.
      
      This patch simply changes the ordering of EP configuration to resolve
      this.
      
      Fixes: bf6313a0
      
       ("ALSA: usb-audio: Refactor endpoint management")
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215079
      Signed-off-by: default avatarCraig McLure <craig@mclure.net>
      Reviewed-by: default avatarJaroslav Kysela <perex@perex.cz>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20220524062115.25968-1-tiwai@suse.de
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      03141e3f
    • Takashi Iwai's avatar
      ALSA: usb-audio: Add missing ep_idx in fixed EP quirks · 8a8972b9
      Takashi Iwai authored
      commit 7b0efea4 upstream.
      
      The quirk entry for Focusrite Saffire 6 had no proper ep_idx for the
      capture endpoint, and this confused the driver, resulting in the
      broken sound.  This patch adds the missing ep_idx in the entry.
      
      While we are at it, a couple of other entries (for Digidesign MBox and
      MOTU MicroBook II) seem to have the same problem, and those are
      covered as well.
      
      Fixes: bf6313a0
      
       ("ALSA: usb-audio: Refactor endpoint management")
      Reported-by: default avatarAndré Kapelrud <a.kapelrud@gmail.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20220521065325.426-1-tiwai@suse.de
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a8972b9
    • Takashi Iwai's avatar
      ALSA: usb-audio: Workaround for clock setup on TEAC devices · ff2ce1bf
      Takashi Iwai authored
      commit 5ce0b06a upstream.
      
      Maris reported that TEAC UD-501 (0644:8043) doesn't work with the
      typical "clock source 41 is not valid, cannot use" errors on the
      recent kernels.  The currently known workaround so far is to restore
      (partially) what we've done unconditionally at the clock setup;
      namely, re-setup the USB interface immediately after the clock is
      changed.  This patch re-introduces the behavior conditionally for TEAC
      devices.
      
      Further notes:
      - The USB interface shall be set later in
        snd_usb_endpoint_configure(), but this seems to be too late.
      - Even calling  usb_set_interface() right after
        sne_usb_init_sample_rate() doesn't help; so this must be related
        with the clock validation, too.
      - The device may still spew the "clock source 41 is not valid" error
        at the first clock setup.  This seems happening at the very first
        try of clock setup, but it disappears at later attempts.
        The error is likely harmless because the driver retries the clock
        setup (such an error is more or less expected on some devices).
      
      Fixes: bf6313a0
      
       ("ALSA: usb-audio: Refactor endpoint management")
      Reported-and-tested-by: default avatarMaris Abele <maris7abele@gmail.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20220521064627.29292-1-tiwai@suse.de
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff2ce1bf
    • Akira Yokosawa's avatar
      tools/memory-model/README: Update klitmus7 compat table · ae6ce355
      Akira Yokosawa authored
      commit 5b759db4
      
       upstream.
      
      EXPORT_SYMBOL of do_exec() was removed in v5.17.  Unfortunately,
      kernel modules from klitmus7 7.56 have do_exec() at the end of
      each kthread.
      
      herdtools7 7.56.1 has addressed the issue.
      
      Update the compatibility table accordingly.
      
      Signed-off-by: default avatarAkira Yokosawa <akiyks@gmail.com>
      Cc: Luc Maranget <luc.maranget@inria.fr>
      Cc: Jade Alglave <j.alglave@ucl.ac.uk>
      Cc: stable@vger.kernel.org # v5.17+
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae6ce355
    • Sultan Alsawaf's avatar
      zsmalloc: fix races between asynchronous zspage free and page migration · c5402fb5
      Sultan Alsawaf authored
      commit 2505a981 upstream.
      
      The asynchronous zspage free worker tries to lock a zspage's entire page
      list without defending against page migration.  Since pages which haven't
      yet been locked can concurrently migrate off the zspage page list while
      lock_zspage() churns away, lock_zspage() can suffer from a few different
      lethal races.
      
      It can lock a page which no longer belongs to the zspage and unsafely
      dereference page_private(), it can unsafely dereference a torn pointer to
      the next page (since there's a data race), and it can observe a spurious
      NULL pointer to the next page and thus not lock all of the zspage's pages
      (since a single page migration will reconstruct the entire page list, and
      create_page_chain() unconditionally zeroes out each list pointer in the
      process).
      
      Fix the races by using migrate_read_lock() in lock_zspage() to synchronize
      with page migration.
      
      Link: https://lkml.kernel.org/r/20220509024703.243847-1-sultan@kerneltoast.com
      Fixes: 77ff4657
      
       ("zsmalloc: zs_page_migrate: skip unnecessary loops but not return -EBUSY if zspage is not inuse")
      Signed-off-by: default avatarSultan Alsawaf <sultan@kerneltoast.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c5402fb5
    • Marco Chiappero's avatar
      crypto: qat - rework the VF2PF interrupt handling logic · 7f962066
      Marco Chiappero authored
      commit c690c7f6 upstream.
      
      Change the VF2PF interrupt handler in the PF ISR and the definition of
      the internal PFVF API to correct the current implementation, which can
      result in missed interrupts.
      
      More specifically, current HW generations consider a write to the mask
      register, regardless of the value, as an acknowledge of any pending
      VF2PF interrupt. Therefore, if there is an interrupt between the source
      register read and the mask register write, such interrupt will not be
      delivered and silently acknowledged, resulting in a lost VF2PF message.
      
      To work around the problem, rather than disabling specific interrupts,
      disable all the interrupts and re-enable only the ones that we are not
      serving (excluding the already disabled ones too). This will force any
      other pending interrupt to be triggered and be serviced by a subsequent
      ISR.
      
      This new approach requires, however, changes to the interrupt related
      pfvf_ops functions. In particular, get_vf2pf_sources() has now been
      removed in favor of disable_pending_vf2pf_interrupts(), which not only
      retrieves and returns the pending (and enabled) sources, but also
      disables them.
      As a consequence, introduce the adf_disable_pending_vf2pf_interrupts()
      utility in place of adf_disable_vf2pf_interrupts_irq(), which is no
      longer needed.
      
      Cc: stable@vger.kernel.org
      Fixes: 993161d3
      
       ("crypto: qat - fix handling of VF to PF interrupts")
      Signed-off-by: default avatarMarco Chiappero <marco.chiappero@intel.com>
      Co-developed-by: default avatarGiovanni Cabiddu <giovanni.cabiddu@intel.com>
      Signed-off-by: default avatarGiovanni Cabiddu <giovanni.cabiddu@intel.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f962066
    • Vitaly Chikunov's avatar
      crypto: ecrdsa - Fix incorrect use of vli_cmp · c98c48e0
      Vitaly Chikunov authored
      commit 7cc7ab73 upstream.
      
      Correctly compare values that shall be greater-or-equal and not just
      greater.
      
      Fixes: 0d7a7864
      
       ("crypto: ecrdsa - add EC-RDSA (GOST 34.10) algorithm")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarVitaly Chikunov <vt@altlinux.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c98c48e0
    • Fabio Estevam's avatar
      crypto: caam - fix i.MX6SX entropy delay value · 78ad61fa
      Fabio Estevam authored
      commit 4ee4cdad upstream.
      
      Since commit 358ba762 ("crypto: caam - enable prediction resistance
      in HRWNG") the following CAAM errors can be seen on i.MX6SX:
      
      caam_jr 2101000.jr: 20003c5b: CCB: desc idx 60: RNG: Hardware error
      hwrng: no data available
      
      This error is due to an incorrect entropy delay for i.MX6SX.
      
      Fix it by increasing the minimum entropy delay for i.MX6SX
      as done in U-Boot:
      https://patchwork.ozlabs.org/project/uboot/patch/20220415111049.2565744-1-gaurav.jain@nxp.com/
      
      As explained in the U-Boot patch:
      
      "RNG self tests are run to determine the correct entropy delay.
      Such tests are executed with different voltages and temperatures to identify
      the worst case value for the entropy delay. For i.MX6SX, it was determined
      that after adding a margin value of 1000 the minimum entropy delay should be
      at least 12000."
      
      Cc: <stable@vger.kernel.org>
      Fixes: 358ba762
      
       ("crypto: caam - enable prediction resistance in HRWNG")
      Signed-off-by: default avatarFabio Estevam <festevam@denx.de>
      Reviewed-by: default avatarHoria Geantă <horia.geanta@nxp.com>
      Reviewed-by: default avatarVabhav Sharma <vabhav.sharma@nxp.com>
      Reviewed-by: default avatarGaurav Jain <gaurav.jain@nxp.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78ad61fa
    • Ashish Kalra's avatar
      KVM: SVM: Use kzalloc for sev ioctl interfaces to prevent kernel data leak · 57a01725
      Ashish Kalra authored
      commit d22d2474
      
       upstream.
      
      For some sev ioctl interfaces, the length parameter that is passed maybe
      less than or equal to SEV_FW_BLOB_MAX_SIZE, but larger than the data
      that PSP firmware returns. In this case, kmalloc will allocate memory
      that is the size of the input rather than the size of the data.
      Since PSP firmware doesn't fully overwrite the allocated buffer, these
      sev ioctl interface may return uninitialized kernel slab memory.
      
      Reported-by: default avatarAndy Nguyen <theflow@google.com>
      Suggested-by: default avatarDavid Rientjes <rientjes@google.com>
      Suggested-by: default avatarPeter Gonda <pgonda@google.com>
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Fixes: eaf78265 ("KVM: SVM: Move SEV code to separate file")
      Fixes: 2c07ded0 ("KVM: SVM: add support for SEV attestation command")
      Fixes: 4cfdd47d ("KVM: SVM: Add KVM_SEV SEND_START command")
      Fixes: d3d1af85 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command")
      Fixes: eba04b20
      
       ("KVM: x86: Account a variety of miscellaneous allocations")
      Signed-off-by: default avatarAshish Kalra <ashish.kalra@amd.com>
      Reviewed-by: default avatarPeter Gonda <pgonda@google.com>
      Message-Id: <20220516154310.3685678-1-Ashish.Kalra@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57a01725
    • Hou Wenlong's avatar
      KVM: x86/mmu: Don't rebuild page when the page is synced and no tlb flushing is required · bd6fce7d
      Hou Wenlong authored
      commit 8d5678a7 upstream.
      
      Before Commit c3e5e415 ("KVM: X86: Change kvm_sync_page()
      to return true when remote flush is needed"), the return value
      of kvm_sync_page() indicates whether the page is synced, and
      kvm_mmu_get_page() would rebuild page when the sync fails.
      But now, kvm_sync_page() returns false when the page is
      synced and no tlb flushing is required, which leads to
      rebuild page in kvm_mmu_get_page(). So return the return
      value of mmu->sync_page() directly and check it in
      kvm_mmu_get_page(). If the sync fails, the page will be
      zapped and the invalid_list is not empty, so set flush as
      true is accepted in mmu_sync_children().
      
      Cc: stable@vger.kernel.org
      Fixes: c3e5e415
      
       ("KVM: X86: Change kvm_sync_page() to return true when remote flush is needed")
      Signed-off-by: default avatarHou Wenlong <houwenlong.hwl@antgroup.com>
      Acked-by: default avatarLai Jiangshan <jiangshanlai@gmail.com>
      Message-Id: <0dabeeb789f57b0d793f85d073893063e692032d.1647336064.git.houwenlong.hwl@antgroup.com>
      [mmu_sync_children should not flush if the page is zapped. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd6fce7d
    • Sean Christopherson's avatar
      KVM: x86: Drop WARNs that assert a triple fault never "escapes" from L2 · 7de373c9
      Sean Christopherson authored
      commit 45846661 upstream.
      
      Remove WARNs that sanity check that KVM never lets a triple fault for L2
      escape and incorrectly end up in L1.  In normal operation, the sanity
      check is perfectly valid, but it incorrectly assumes that it's impossible
      for userspace to induce KVM_REQ_TRIPLE_FAULT without bouncing through
      KVM_RUN (which guarantees kvm_check_nested_state() will see and handle
      the triple fault).
      
      The WARN can currently be triggered if userspace injects a machine check
      while L2 is active and CR4.MCE=0.  And a future fix to allow save/restore
      of KVM_REQ_TRIPLE_FAULT, e.g. so that a synthesized triple fault isn't
      lost on migration, will make it trivially easy for userspace to trigger
      the WARN.
      
      Clearing KVM_REQ_TRIPLE_FAULT when forcibly leaving guest mode is
      tempting, but wrong, especially if/when the request is saved/restored,
      e.g. if userspace restores events (including a triple fault) and then
      restores nested state (which may forcibly leave guest mode).  Ignoring
      the fact that KVM doesn't currently provide the necessary APIs, it's
      userspace's responsibility to manage pending events during save/restore.
      
        ------------[ cut here ]------------
        WARNING: CPU: 7 PID: 1399 at arch/x86/kvm/vmx/nested.c:4522 nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel]
        Modules linked in: kvm_intel kvm irqbypass
        CPU: 7 PID: 1399 Comm: state_test Not tainted 5.17.0-rc3+ #808
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0010:nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel]
        Call Trace:
         <TASK>
         vmx_leave_nested+0x30/0x40 [kvm_intel]
         vmx_set_nested_state+0xca/0x3e0 [kvm_intel]
         kvm_arch_vcpu_ioctl+0xf49/0x13e0 [kvm]
         kvm_vcpu_ioctl+0x4b9/0x660 [kvm]
         __x64_sys_ioctl+0x83/0xb0
         do_syscall_64+0x3b/0xc0
         entry_SYSCALL_64_after_hwframe+0x44/0xae
         </TASK>
        ---[ end trace 0000000000000000 ]---
      
      Fixes: cb6a32c2
      
       ("KVM: x86: Handle triple fault in L2 without killing L1")
      Cc: stable@vger.kernel.org
      Cc: Chenyi Qiang <chenyi.qiang@intel.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220407002315.78092-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7de373c9
    • Yanfei Xu's avatar
      KVM: x86: Fix the intel_pt PMI handling wrongly considered from guest · f095b997
      Yanfei Xu authored
      commit ffd1925a upstream.
      
      When kernel handles the vm-exit caused by external interrupts and NMI,
      it always sets kvm_intr_type to tell if it's dealing an IRQ or NMI. For
      the PMI scenario, it could be IRQ or NMI.
      
      However, intel_pt PMIs are only generated for HARDWARE perf events, and
      HARDWARE events are always configured to generate NMIs.  Use
      kvm_handling_nmi_from_guest() to precisely identify if the intel_pt PMI
      came from the guest; this avoids false positives if an intel_pt PMI/NMI
      arrives while the host is handling an unrelated IRQ VM-Exit.
      
      Fixes: db215756
      
       ("KVM: x86: More precisely identify NMI from guest when handling PMI")
      Signed-off-by: default avatarYanfei Xu <yanfei.xu@intel.com>
      Message-Id: <20220523140821.1345605-1-yanfei.xu@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f095b997
    • Maxim Levitsky's avatar
      KVM: x86: avoid loading a vCPU after .vm_destroy was called · c97c2730
      Maxim Levitsky authored
      commit 6fcee03d upstream.
      
      This can cause various unexpected issues, since VM is partially
      destroyed at that point.
      
      For example when AVIC is enabled, this causes avic_vcpu_load to
      access physical id page entry which is already freed by .vm_destroy.
      
      Fixes: 8221c137
      
       ("svm: Manage vcpu load/unload when enable AVIC")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220322172449.235575-2-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c97c2730
    • Sean Christopherson's avatar
      KVM: x86: avoid calling x86 emulator without a decoded instruction · 02ea15c0
      Sean Christopherson authored
      commit fee060cd upstream.
      
      Whenever x86_decode_emulated_instruction() detects a breakpoint, it
      returns the value that kvm_vcpu_check_breakpoint() writes into its
      pass-by-reference second argument.  Unfortunately this is completely
      bogus because the expected outcome of x86_decode_emulated_instruction
      is an EMULATION_* value.
      
      Then, if kvm_vcpu_check_breakpoint() does "*r = 0" (corresponding to
      a KVM_EXIT_DEBUG userspace exit), it is misunderstood as EMULATION_OK
      and x86_emulate_instruction() is called without having decoded the
      instruction.  This causes various havoc from running with a stale
      emulation context.
      
      The fix is to move the call to kvm_vcpu_check_breakpoint() where it was
      before commit 4aa2691d
      
       ("KVM: x86: Factor out x86 instruction
      emulation with decoding") introduced x86_decode_emulated_instruction().
      The other caller of the function does not need breakpoint checks,
      because it is invoked as part of a vmexit and the processor has already
      checked those before executing the instruction that #GP'd.
      
      This fixes CVE-2022-1852.
      
      Reported-by: default avatarQiuhao Li <qiuhao@sysec.org>
      Reported-by: default avatarGaoning Pan <pgn@zju.edu.cn>
      Reported-by: default avatarYongkang Jia <kangel@zju.edu.cn>
      Fixes: 4aa2691d
      
       ("KVM: x86: Factor out x86 instruction emulation with decoding")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220311032801.3467418-2-seanjc@google.com>
      [Rewrote commit message according to Qiuhao's report, since a patch
       already existed to fix the bug. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      02ea15c0
    • Maxim Levitsky's avatar
      KVM: x86: fix typo in __try_cmpxchg_user causing non-atomicness · 7cef7042
      Maxim Levitsky authored
      commit 33fbe6be upstream.
      
      This shows up as a TDP MMU leak when running nested.  Non-working cmpxchg on L0
      relies makes L1 install two different shadow pages under same spte, and one of
      them is leaked.
      
      Fixes: 1c2361f6
      
       ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses")
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220512101420.306759-1-mlevitsk@redhat.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7cef7042
    • Sean Christopherson's avatar
      KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses · e964665c
      Sean Christopherson authored
      commit 1c2361f6 upstream.
      
      Use the recently introduce __try_cmpxchg_user() to emulate atomic guest
      accesses via the associated userspace address instead of mapping the
      backing pfn into kernel address space.  Using kvm_vcpu_map() is unsafe as
      it does not coordinate with KVM's mmu_notifier to ensure the hva=>pfn
      translation isn't changed/unmapped in the memremap() path, i.e. when
      there's no struct page and thus no elevated refcount.
      
      Fixes: 42e35f80
      
       ("KVM/X86: Use kvm_vcpu_map in emulator_cmpxchg_emulated")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220202004945.2540433-5-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e964665c
    • Sean Christopherson's avatar
      KVM: x86: Use __try_cmpxchg_user() to update guest PTE A/D bits · 8089e5e1
      Sean Christopherson authored
      commit f122dfe4 upstream.
      
      Use the recently introduced __try_cmpxchg_user() to update guest PTE A/D
      bits instead of mapping the PTE into kernel address space.  The VM_PFNMAP
      path is broken as it assumes that vm_pgoff is the base pfn of the mapped
      VMA range, which is conceptually wrong as vm_pgoff is the offset relative
      to the file and has nothing to do with the pfn.  The horrific hack worked
      for the original use case (backing guest memory with /dev/mem), but leads
      to accessing "random" pfns for pretty much any other VM_PFNMAP case.
      
      Fixes: bd53cb35
      
       ("X86/KVM: Handle PFNs outside of kernel reach when touching GPTEs")
      Debugged-by: default avatarTadeusz Struk <tadeusz.struk@linaro.org>
      Tested-by: default avatarTadeusz Struk <tadeusz.struk@linaro.org>
      Reported-by: default avatar <syzbot+6cde2282daa792c49ab8@syzkaller.appspotmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220202004945.2540433-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8089e5e1
    • Peter Zijlstra's avatar
      x86/uaccess: Implement macros for CMPXCHG on user addresses · 256ded2d
      Peter Zijlstra authored
      commit 989b5db2
      
       upstream.
      
      Add support for CMPXCHG loops on userspace addresses.  Provide both an
      "unsafe" version for tight loops that do their own uaccess begin/end, as
      well as a "safe" version for use cases where the CMPXCHG is not buried in
      a loop, e.g. KVM will resume the guest instead of looping when emulation
      of a guest atomic accesses fails the CMPXCHG.
      
      Provide 8-byte versions for 32-bit kernels so that KVM can do CMPXCHG on
      guest PAE PTEs, which are accessed via userspace addresses.
      
      Guard the asm_volatile_goto() variation with CC_HAS_ASM_GOTO_TIED_OUTPUT,
      the "+m" constraint fails on some compilers that otherwise support
      CC_HAS_ASM_GOTO_OUTPUT.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Co-developed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220202004945.2540433-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      256ded2d
    • Paolo Bonzini's avatar
      x86, kvm: use correct GFP flags for preemption disabled · 2bfcab29
      Paolo Bonzini authored
      commit baec4f5a
      
       upstream.
      
      Commit ddd7ed842627 ("x86/kvm: Alloc dummy async #PF token outside of
      raw spinlock") leads to the following Smatch static checker warning:
      
      	arch/x86/kernel/kvm.c:212 kvm_async_pf_task_wake()
      	warn: sleeping in atomic context
      
      arch/x86/kernel/kvm.c
          202         raw_spin_lock(&b->lock);
          203         n = _find_apf_task(b, token);
          204         if (!n) {
          205                 /*
          206                  * Async #PF not yet handled, add a dummy entry for the token.
          207                  * Allocating the token must be down outside of the raw lock
          208                  * as the allocator is preemptible on PREEMPT_RT kernels.
          209                  */
          210                 if (!dummy) {
          211                         raw_spin_unlock(&b->lock);
      --> 212                         dummy = kzalloc(sizeof(*dummy), GFP_KERNEL);
                                                                      ^^^^^^^^^^
      Smatch thinks the caller has preempt disabled.  The `smdb.py preempt
      kvm_async_pf_task_wake` output call tree is:
      
      sysvec_kvm_asyncpf_interrupt() <- disables preempt
      -> __sysvec_kvm_asyncpf_interrupt()
         -> kvm_async_pf_task_wake()
      
      The caller is this:
      
      arch/x86/kernel/kvm.c
         290        DEFINE_IDTENTRY_SYSVEC(sysvec_kvm_asyncpf_interrupt)
         291        {
         292                struct pt_regs *old_regs = set_irq_regs(regs);
         293                u32 token;
         294
         295                ack_APIC_irq();
         296
         297                inc_irq_stat(irq_hv_callback_count);
         298
         299                if (__this_cpu_read(apf_reason.enabled)) {
         300                        token = __this_cpu_read(apf_reason.token);
         301                        kvm_async_pf_task_wake(token);
         302                        __this_cpu_write(apf_reason.token, 0);
         303                        wrmsrl(MSR_KVM_ASYNC_PF_ACK, 1);
         304                }
         305
         306                set_irq_regs(old_regs);
         307        }
      
      The DEFINE_IDTENTRY_SYSVEC() is a wrapper that calls this function
      from the call_on_irqstack_cond().  It's inside the call_on_irqstack_cond()
      where preempt is disabled (unless it's already disabled).  The
      irq_enter/exit_rcu() functions disable/enable preempt.
      
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2bfcab29
    • Sean Christopherson's avatar
      x86/kvm: Alloc dummy async #PF token outside of raw spinlock · 9fd15d9f
      Sean Christopherson authored
      commit 0547758a
      
       upstream.
      
      Drop the raw spinlock in kvm_async_pf_task_wake() before allocating the
      the dummy async #PF token, the allocator is preemptible on PREEMPT_RT
      kernels and must not be called from truly atomic contexts.
      
      Opportunistically document why it's ok to loop on allocation failure,
      i.e. why the function won't get stuck in an infinite loop.
      
      Reported-by: default avatarYajun Deng <yajun.deng@linux.dev>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9fd15d9f
    • Sean Christopherson's avatar
      x86/fpu: KVM: Set the base guest FPU uABI size to sizeof(struct kvm_xsave) · c181acbd
      Sean Christopherson authored
      commit d187ba53 upstream.
      
      Set the starting uABI size of KVM's guest FPU to 'struct kvm_xsave',
      i.e. to KVM's historical uABI size.  When saving FPU state for usersapce,
      KVM (well, now the FPU) sets the FP+SSE bits in the XSAVE header even if
      the host doesn't support XSAVE.  Setting the XSAVE header allows the VM
      to be migrated to a host that does support XSAVE without the new host
      having to handle FPU state that may or may not be compatible with XSAVE.
      
      Setting the uABI size to the host's default size results in out-of-bounds
      writes (setting the FP+SSE bits) and data corruption (that is thankfully
      caught by KASAN) when running on hosts without XSAVE, e.g. on Core2 CPUs.
      
      WARN if the default size is larger than KVM's historical uABI size; all
      features that can push the FPU size beyond the historical size must be
      opt-in.
      
        ==================================================================
        BUG: KASAN: slab-out-of-bounds in fpu_copy_uabi_to_guest_fpstate+0x86/0x130
        Read of size 8 at addr ffff888011e33a00 by task qemu-build/681
        CPU: 1 PID: 681 Comm: qemu-build Not tainted 5.18.0-rc5-KASAN-amd64 #1
        Hardware name:  /DG35EC, BIOS ECG3510M.86A.0118.2010.0113.1426 01/13/2010
        Call Trace:
         <TASK>
         dump_stack_lvl+0x34/0x45
         print_report.cold+0x45/0x575
         kasan_report+0x9b/0xd0
         fpu_copy_uabi_to_guest_fpstate+0x86/0x130
         kvm_arch_vcpu_ioctl+0x72a/0x1c50 [kvm]
         kvm_vcpu_ioctl+0x47f/0x7b0 [kvm]
         __x64_sys_ioctl+0x5de/0xc90
         do_syscall_64+0x31/0x50
         entry_SYSCALL_64_after_hwframe+0x44/0xae
         </TASK>
        Allocated by task 0:
        (stack is not available)
        The buggy address belongs to the object at ffff888011e33800
         which belongs to the cache kmalloc-512 of size 512
        The buggy address is located 0 bytes to the right of
         512-byte region [ffff888011e33800, ffff888011e33a00)
        The buggy address belongs to the physical page:
        page:0000000089cd4adb refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11e30
        head:0000000089cd4adb order:2 compound_mapcount:0 compound_pincount:0
        flags: 0x4000000000010200(slab|head|zone=1)
        raw: 4000000000010200 dead000000000100 dead000000000122 ffff888001041c80
        raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
        page dumped because: kasan: bad access detected
        Memory state around the buggy address:
         ffff888011e33900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
         ffff888011e33980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        >ffff888011e33a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                           ^
         ffff888011e33a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
         ffff888011e33b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
        ==================================================================
        Disabling lock debugging due to kernel taint
      
      Fixes: be50b206 ("kvm: x86: Add support for getting/setting expanded xstate buffer")
      Fixes: c60427dd
      
       ("x86/fpu: Add uabi_size to guest_fpu")
      Reported-by: default avatarZdenek Kaspar <zkaspar82@gmail.com>
      Cc: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Tested-by: default avatarZdenek Kaspar <zkaspar82@gmail.com>
      Message-Id: <20220504001219.983513-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c181acbd
    • Xiaomeng Tong's avatar
      KVM: PPC: Book3S HV: fix incorrect NULL check on list iterator · 558ecc74
      Xiaomeng Tong authored
      commit 300981ab upstream.
      
      The bug is here:
      	if (!p)
                      return ret;
      
      The list iterator value 'p' will *always* be set and non-NULL by
      list_for_each_entry(), so it is incorrect to assume that the iterator
      value will be NULL if the list is empty or no element is found.
      
      To fix the bug, Use a new value 'iter' as the list iterator, while use
      the old value 'p' as a dedicated variable to point to the found element.
      
      Fixes: dfaa973a
      
       ("KVM: PPC: Book3S HV: In H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs")
      Cc: stable@vger.kernel.org # v5.9+
      Signed-off-by: default avatarXiaomeng Tong <xiam0nd.tong@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20220414062103.8153-1-xiam0nd.tong@gmail.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      558ecc74
    • Florian Westphal's avatar
      netfilter: conntrack: re-fetch conntrack after insertion · 04e4a11d
      Florian Westphal authored
      commit 56b14ece
      
       upstream.
      
      In case the conntrack is clashing, insertion can free skb->_nfct and
      set skb->_nfct to the already-confirmed entry.
      
      This wasn't found before because the conntrack entry and the extension
      space used to free'd after an rcu grace period, plus the race needs
      events enabled to trigger.
      
      Reported-by: default avatar <syzbot+793a590957d9c1b96620@syzkaller.appspotmail.com>
      Fixes: 71d8c47f ("netfilter: conntrack: introduce clash resolution on insertion race")
      Fixes: 2ad9d774
      
       ("netfilter: conntrack: free extension area immediately")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      04e4a11d
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: double hook unregistration in netns path · 86c0154f
      Pablo Neira Ayuso authored
      commit f9a43007 upstream.
      
      __nft_release_hooks() is called from pre_netns exit path which
      unregisters the hooks, then the NETDEV_UNREGISTER event is triggered
      which unregisters the hooks again.
      
      [  565.221461] WARNING: CPU: 18 PID: 193 at net/netfilter/core.c:495 __nf_unregister_net_hook+0x247/0x270
      [...]
      [  565.246890] CPU: 18 PID: 193 Comm: kworker/u64:1 Tainted: G            E     5.18.0-rc7+ #27
      [  565.253682] Workqueue: netns cleanup_net
      [  565.257059] RIP: 0010:__nf_unregister_net_hook+0x247/0x270
      [...]
      [  565.297120] Call Trace:
      [  565.300900]  <TASK>
      [  565.304683]  nf_tables_flowtable_event+0x16a/0x220 [nf_tables]
      [  565.308518]  raw_notifier_call_chain+0x63/0x80
      [  565.312386]  unregister_netdevice_many+0x54f/0xb50
      
      Unregister and destroy netdev hook from netns pre_exit via kfree_rcu
      so the NETDEV_UNREGISTER path see unregistered hooks.
      
      Fixes: 767d1216
      
       ("netfilter: nftables: fix possible UAF over chains from packet path in netns")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      86c0154f
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: hold mutex on netns pre_exit path · cc7c6e0a
      Pablo Neira Ayuso authored
      commit 3923b1e4 upstream.
      
      clean_net() runs in workqueue while walking over the lists, grab mutex.
      
      Fixes: 767d1216
      
       ("netfilter: nftables: fix possible UAF over chains from packet path in netns")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc7c6e0a
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: sanitize nft_set_desc_concat_parse() · c9a46a3d
      Pablo Neira Ayuso authored
      commit fecf31ee upstream.
      
      Add several sanity checks for nft_set_desc_concat_parse():
      
      - validate desc->field_count not larger than desc->field_len array.
      - field length cannot be larger than desc->field_len (ie. U8_MAX)
      - total length of the concatenation cannot be larger than register array.
      
      Joint work with Florian Westphal.
      
      Fixes: f3a2181e
      
       ("netfilter: nf_tables: Support for sets with multiple ranged fields")
      Reported-by: default avatar <zhangziming.zzm@antgroup.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9a46a3d
    • Phil Sutter's avatar
      netfilter: nft_limit: Clone packet limits' cost value · a51c6c58
      Phil Sutter authored
      commit 558254b0 upstream.
      
      When cloning a packet-based limit expression, copy the cost value as
      well. Otherwise the new limit is not functional anymore.
      
      Fixes: 3b9e2ea6
      
       ("netfilter: nft_limit: move stateful fields out of expression data")
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a51c6c58