Skip to content
  1. Feb 09, 2022
    • Alan Stern's avatar
      usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge · 291038f7
      Alan Stern authored
      commit 5b67b315
      
       upstream.
      
      Two people have reported (and mentioned numerous other reports on the
      web) that VIA's VL817 USB-SATA bridge does not work with the uas
      driver.  Typical log messages are:
      
      [ 3606.232149] sd 14:0:0:0: [sdg] tag#2 uas_zap_pending 0 uas-tag 1 inflight: CMD
      [ 3606.232154] sd 14:0:0:0: [sdg] tag#2 CDB: Write(16) 8a 00 00 00 00 00 18 0c c9 80 00 00 00 80 00 00
      [ 3606.306257] usb 4-4.4: reset SuperSpeed Plus Gen 2x1 USB device number 11 using xhci_hcd
      [ 3606.328584] scsi host14: uas_eh_device_reset_handler success
      
      Surprisingly, the devices do seem to work okay for some other people.
      The cause of the differing behaviors is not known.
      
      In the hope of getting the devices to work for the most users, even at
      the possible cost of degraded performance for some, this patch adds an
      unusual_devs entry for the VL817 to block it from binding to the uas
      driver by default.  Users will be able to override this entry by means
      of a module parameter, if they want.
      
      CC: <stable@vger.kernel.org>
      Reported-by: default avatarDocMAX <mail@vacharakis.de>
      Reported-and-tested-by: default avatarThomas Weißschuh <linux@weissschuh.net>
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Link: https://lore.kernel.org/r/Ye8IsK2sjlEv1rqU@rowland.harvard.edu
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      291038f7
    • Cameron Williams's avatar
      tty: Add support for Brainboxes UC cards. · be8c0496
      Cameron Williams authored
      commit 152d1afa
      
       upstream.
      
      This commit adds support for the some of the Brainboxes PCI range of
      cards, including the UC-101, UC-235/246, UC-257, UC-268, UC-275/279,
      UC-302, UC-310, UC-313, UC-320/324, UC-346, UC-357, UC-368
      and UC-420/431.
      
      Signed-off-by: default avatarCameron Williams <cang1@live.co.uk>
      Cc: stable <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/AM5PR0202MB2564688493F7DD9B9C610827C45E9@AM5PR0202MB2564.eurprd02.prod.outlook.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      be8c0496
    • daniel.starke@siemens.com's avatar
      tty: n_gsm: fix SW flow control encoding/handling · dea3412d
      daniel.starke@siemens.com authored
      commit 8838b2af upstream.
      
      n_gsm is based on the 3GPP 07.010 and its newer version is the 3GPP 27.010.
      See https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=1516
      The changes from 07.010 to 27.010 are non-functional. Therefore, I refer to
      the newer 27.010 here. Chapter 5.2.7.3 states that DC1 (XON) and DC3 (XOFF)
      are the control characters defined in ISO/IEC 646. These shall be quoted if
      seen in the data stream to avoid interpretation as flow control characters.
      
      ISO/IEC 646 refers to the set of ISO standards described as the ISO
      7-bit coded character set for information interchange. Its final version
      is also known as ITU T.50.
      See https://www.itu.int/rec/T-REC-T.50-199209-I/en
      
      To abide the standard it is needed to quote DC1 and DC3 correctly if these
      are seen as data bytes and not as control characters. The current
      implementation already tries to enforce this but fails to catch all
      defined cases. 3GPP 27.010 chapter 5.2.7.3 clearly states that the most
      significant bit shall be ignored for DC1 and DC3 handling. The current
      implementation handles only the case with the most significant bit set 0.
      Cases in which DC1 and DC3 have the most significant bit set 1 are left
      unhandled.
      
      This patch fixes this by masking the data bytes with ISO_IEC_646_MASK (only
      the 7 least significant bits set 1) before comparing them with XON
      (a.k.a. DC1) and XOFF (a.k.a. DC3) when testing which byte values need
      quotation via byte stuffing.
      
      Fixes: e1eaea46
      
       ("tty: n_gsm line discipline")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDaniel Starke <daniel.starke@siemens.com>
      Link: https://lore.kernel.org/r/20220120101857.2509-1-daniel.starke@siemens.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dea3412d
    • Valentin Caron's avatar
      serial: stm32: fix software flow control transfer · 63e5f610
      Valentin Caron authored
      commit 037b91ec upstream.
      
      x_char is ignored by stm32_usart_start_tx() when xmit buffer is empty.
      
      Fix start_tx condition to allow x_char to be sent.
      
      Fixes: 48a6092f
      
       ("serial: stm32-usart: Add STM32 USART Driver")
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarErwan Le Ray <erwan.leray@foss.st.com>
      Signed-off-by: default avatarValentin Caron <valentin.caron@foss.st.com>
      Link: https://lore.kernel.org/r/20220111164441.6178-3-valentin.caron@foss.st.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63e5f610
    • Greg Kroah-Hartman's avatar
      PM: wakeup: simplify the output logic of pm_show_wakelocks() · 661d011a
      Greg Kroah-Hartman authored
      commit c9d967b2
      
       upstream.
      
      The buffer handling in pm_show_wakelocks() is tricky, and hopefully
      correct.  Ensure it really is correct by using sysfs_emit_at() which
      handles all of the tricky string handling logic in a PAGE_SIZE buffer
      for us automatically as this is a sysfs file being read from.
      
      Reviewed-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      661d011a
    • Jan Kara's avatar
      udf: Fix NULL ptr deref when converting from inline format · f24454e4
      Jan Kara authored
      commit 7fc3b7c2
      
       upstream.
      
      udf_expand_file_adinicb() calls directly ->writepage to write data
      expanded into a page. This however misses to setup inode for writeback
      properly and so we can crash on inode->i_wb dereference when submitting
      page for IO like:
      
        BUG: kernel NULL pointer dereference, address: 0000000000000158
        #PF: supervisor read access in kernel mode
      ...
        <TASK>
        __folio_start_writeback+0x2ac/0x350
        __block_write_full_page+0x37d/0x490
        udf_expand_file_adinicb+0x255/0x400 [udf]
        udf_file_write_iter+0xbe/0x1b0 [udf]
        new_sync_write+0x125/0x1c0
        vfs_write+0x28e/0x400
      
      Fix the problem by marking the page dirty and going through the standard
      writeback path to write the page. Strictly speaking we would not even
      have to write the page but we want to catch e.g. ENOSPC errors early.
      
      Reported-by: default avatarbutt3rflyh4ck <butterflyhuangxx@gmail.com>
      CC: stable@vger.kernel.org
      Fixes: 52ebea74
      
       ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks")
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f24454e4
    • Jan Kara's avatar
      udf: Restore i_lenAlloc when inode expansion fails · de10d14c
      Jan Kara authored
      commit ea856919
      
       upstream.
      
      When we fail to expand inode from inline format to a normal format, we
      restore inode to contain the original inline formatting but we forgot to
      set i_lenAlloc back. The mismatch between i_lenAlloc and i_size was then
      causing further problems such as warnings and lost data down the line.
      
      Reported-by: default avatarbutt3rflyh4ck <butterflyhuangxx@gmail.com>
      CC: stable@vger.kernel.org
      Fixes: 7e49b6f2
      
       ("udf: Convert UDF to new truncate calling sequence")
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de10d14c
    • Steffen Maier's avatar
      scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices · 58f48bfc
      Steffen Maier authored
      commit 8c9db667 upstream.
      
      Suppose we have an environment with a number of non-NPIV FCP devices
      (virtual HBAs / FCP devices / zfcp "adapter"s) sharing the same physical
      FCP channel (HBA port) and its I_T nexus. Plus a number of storage target
      ports zoned to such shared channel. Now one target port logs out of the
      fabric causing an RSCN. Zfcp reacts with an ADISC ELS and subsequent port
      recovery depending on the ADISC result. This happens on all such FCP
      devices (in different Linux images) concurrently as they all receive a copy
      of this RSCN. In the following we look at one of those FCP devices.
      
      Requests other than FSF_QTCB_FCP_CMND can be slow until they get a
      response.
      
      Depending on which requests are affected by slow responses, there are
      different recovery outcomes. Here we want to fix failed recoveries on port
      or adapter level by avoiding recovery requests that can be slow.
      
      We need the cached N_Port_ID for the remote port "link" test with ADISC.
      Just before sending the ADISC, we now intentionally forget the old cached
      N_Port_ID. The idea is that on receiving an RSCN for a port, we have to
      assume that any cached information about this port is stale.  This forces a
      fresh new GID_PN [FC-GS] nameserver lookup on any subsequent recovery for
      the same port. Since we typically can still communicate with the nameserver
      efficiently, we now reach steady state quicker: Either the nameserver still
      does not know about the port so we stop recovery, or the nameserver already
      knows the port potentially with a new N_Port_ID and we can successfully and
      quickly perform open port recovery.  For the one case, where ADISC returns
      successfully, we re-initialize port->d_id because that case does not
      involve any port recovery.
      
      This also solves a problem if the storage WWPN quickly logs into the fabric
      again but with a different N_Port_ID. Such as on virtual WWPN takeover
      during target NPIV failover.
      [https://www.redbooks.ibm.com/abstracts/redp5477.html] In that case the
      RSCN from the storage FDISC was ignored by zfcp and we could not
      successfully recover the failover. On some later failback on the storage,
      we could have been lucky if the virtual WWPN got the same old N_Port_ID
      from the SAN switch as we still had cached.  Then the related RSCN
      triggered a successful port reopen recovery.  However, there is no
      guarantee to get the same N_Port_ID on NPIV FDISC.
      
      Even though NPIV-enabled FCP devices are not affected by this problem, this
      code change optimizes recovery time for gone remote ports as a side effect.
      The timely drop of cached N_Port_IDs prevents unnecessary slow open port
      attempts.
      
      While the problem might have been in code before v2.6.32 commit
      799b76d0 ("[SCSI] zfcp: Decouple gid_pn requests from erp") this fix
      depends on the gid_pn_work introduced with that commit, so we mark it as
      culprit to satisfy fix dependencies.
      
      Note: Point-to-point remote port is already handled separately and gets its
      N_Port_ID from the cached peer_d_id. So resetting port->d_id in general
      does not affect PtP.
      
      Link: https://lore.kernel.org/r/20220118165803.3667947-1-maier@linux.ibm.com
      Fixes: 799b76d0
      
       ("[SCSI] zfcp: Decouple gid_pn requests from erp")
      Cc: <stable@vger.kernel.org> #2.6.32+
      Suggested-by: default avatarBenjamin Block <bblock@linux.ibm.com>
      Reviewed-by: default avatarBenjamin Block <bblock@linux.ibm.com>
      Signed-off-by: default avatarSteffen Maier <maier@linux.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58f48bfc
    • Vasily Gorbik's avatar
      s390/hypfs: include z/VM guests with access control group set · 3637eac1
      Vasily Gorbik authored
      commit 663d34c8 upstream.
      
      Currently if z/VM guest is allowed to retrieve hypervisor performance
      data globally for all guests (privilege class B) the query is formed in a
      way to include all guests but the group name is left empty. This leads to
      that z/VM guests which have access control group set not being included
      in the results (even local vm).
      
      Change the query group identifier from empty to "any" to retrieve
      information about all guests from any groups (or without a group set).
      
      Cc: stable@vger.kernel.org
      Fixes: 31cb4bd3
      
       ("[S390] Hypervisor filesystem (s390_hypfs) for z/VM")
      Reviewed-by: default avatarGerald Schaefer <gerald.schaefer@linux.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3637eac1
    • Brian Gix's avatar
      Bluetooth: refactor malicious adv data check · 7889b38a
      Brian Gix authored
      commit 899663be
      
       upstream.
      
      Check for out-of-bound read was being performed at the end of while
      num_reports loop, and would fill journal with false positives. Added
      check to beginning of loop processing so that it doesn't get checked
      after ptr has been advanced.
      
      Signed-off-by: default avatarBrian Gix <brian.gix@intel.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Cc: syphyr <syphyr@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7889b38a
    • Ziyang Xuan's avatar
      can: bcm: fix UAF of bcm op · 3fb20d1a
      Ziyang Xuan authored
      Stopping tasklet and hrtimer rely on the active state of tasklet and
      hrtimer sequentially in bcm_remove_op(), the op object will be freed
      if they are all unactive. Assume the hrtimer timeout is short, the
      hrtimer cb has been excuted after tasklet conditional judgment which
      must be false after last round tasklet_kill() and before condition
      hrtimer_active(), it is false when execute to hrtimer_active(). Bug
      is triggerd, because the stopping action is end and the op object
      will be freed, but the tasklet is scheduled. The resources of the op
      object will occur UAF bug.
      
      Move hrtimer_cancel() behind tasklet_kill() and switch 'while () {...}'
      to 'do {...} while ()' to fix the op UAF problem.
      
      Fixes: a06393ed
      
       ("can: bcm: fix hrtimer/tasklet termination in bcm op removal")
      Reported-by: default avatar <syzbot+5ca851459ed04c778d1d@syzkaller.appspotmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3fb20d1a
  2. Jan 29, 2022
  3. Jan 27, 2022
    • Greg Kroah-Hartman's avatar
      Linux 4.9.298 · b5308593
      Greg Kroah-Hartman authored
      
      
      Link: https://lore.kernel.org/r/20220124183932.787526760@linuxfoundation.org
      Tested-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Tested-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20220125155253.051565866@linuxfoundation.org
      Tested-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Tested-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Tested-by: default avatarLinux Kernel Functional Testing <lkft@linaro.org>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      v4.9.298
      b5308593
    • Nicholas Piggin's avatar
      KVM: do not allow mapping valid but non-reference-counted pages · f4b2bfed
      Nicholas Piggin authored
      commit f8be156b
      
       upstream.
      
      It's possible to create a region which maps valid but non-refcounted
      pages (e.g., tail pages of non-compound higher order allocations). These
      host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
      of APIs, which take a reference to the page, which takes it from 0 to 1.
      When the reference is dropped, this will free the page incorrectly.
      
      Fix this by only taking a reference on valid pages if it was non-zero,
      which indicates it is participating in normal refcounting (and can be
      released with put_page).
      
      This addresses CVE-2021-22543.
      
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Tested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4b2bfed
    • Sean Christopherson's avatar
      KVM: Use kvm_pfn_t for local PFN variable in hva_to_pfn_remapped() · 29efa6b0
      Sean Christopherson authored
      commit a9545779 upstream.
      
      Use kvm_pfn_t, a.k.a. u64, for the local 'pfn' variable when retrieving
      a so called "remapped" hva/pfn pair.  In theory, the hva could resolve to
      a pfn in high memory on a 32-bit kernel.
      
      This bug was inadvertantly exposed by commit bd2fae8d ("KVM: do not
      assume PTE is writable after follow_pfn"), which added an error PFN value
      to the mix, causing gcc to comlain about overflowing the unsigned long.
      
        arch/x86/kvm/../../../virt/kvm/kvm_main.c: In function ‘hva_to_pfn_remapped’:
        include/linux/kvm_host.h:89:30: error: conversion from ‘long long unsigned int’
                                        to ‘long unsigned int’ changes value from
                                        ‘9218868437227405314’ to ‘2’ [-Werror=overflow]
         89 | #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2)
            |                              ^
      virt/kvm/kvm_main.c:1935:9: note: in expansion of macro ‘KVM_PFN_ERR_RO_FAULT’
      
      Cc: stable@vger.kernel.org
      Fixes: add6a0cd
      
       ("KVM: MMU: try to fix up page faults before giving up")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210208201940.1258328-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29efa6b0
    • Paolo Bonzini's avatar
      KVM: do not assume PTE is writable after follow_pfn · 854a6e01
      Paolo Bonzini authored
      commit bd2fae8d upstream.
      
      In order to convert an HVA to a PFN, KVM usually tries to use
      the get_user_pages family of functinso.  This however is not
      possible for VM_IO vmas; in that case, KVM instead uses follow_pfn.
      
      In doing this however KVM loses the information on whether the
      PFN is writable.  That is usually not a problem because the main
      use of VM_IO vmas with KVM is for BARs in PCI device assignment,
      however it is a bug.  To fix it, use follow_pte and check pte_write
      while under the protection of the PTE lock.  The information can
      be used to fail hva_to_pfn_remapped or passed back to the
      caller via *writable.
      
      Usage of follow_pfn was introduced in commit add6a0cd ("KVM: MMU: try to fix
      up page faults before giving up", 2016-07-05); however, even older version
      have the same issue, all the way back to commit 2e2e3738 ("KVM:
      Handle vma regions with no backing page", 2008-07-20), as they also did
      not check whether the PFN was writable.
      
      Fixes: 2e2e3738
      
       ("KVM: Handle vma regions with no backing page")
      Reported-by: default avatarDavid Stevens <stevensd@google.com>
      Cc: 3pvd@google.com
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [OP: backport to 4.19, adjust follow_pte() -> follow_pte_pmd()]
      Signed-off-by: default avatarOvidiu Panait <ovidiu.panait@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [bwh: Backport to 4.9: follow_pte_pmd() does not take start or end
       parameters]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      854a6e01
    • Ross Zwisler's avatar
      mm: add follow_pte_pmd() · 0dd4d649
      Ross Zwisler authored
      commit 09796395
      
       upstream.
      
      Patch series "Write protect DAX PMDs in *sync path".
      
      Currently dax_mapping_entry_mkclean() fails to clean and write protect
      the pmd_t of a DAX PMD entry during an *sync operation.  This can result
      in data loss, as detailed in patch 2.
      
      This series is based on Dan's "libnvdimm-pending" branch, which is the
      current home for Jan's "dax: Page invalidation fixes" series.  You can
      find a working tree here:
      
        https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=dax_pmd_clean
      
      This patch (of 2):
      
      Similar to follow_pte(), follow_pte_pmd() allows either a PTE leaf or a
      huge page PMD leaf to be found and returned.
      
      Link: http://lkml.kernel.org/r/1482272586-21177-2-git-send-email-ross.zwisler@linux.intel.com
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Suggested-by: default avatarDave Hansen <dave.hansen@intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 4.9: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0dd4d649
    • Davidlohr Bueso's avatar
      lib/timerqueue: Rely on rbtree semantics for next timer · ef2e6403
      Davidlohr Bueso authored
      commit 511885d7
      
       upstream.
      
      Simplify the timerqueue code by using cached rbtrees and rely on the tree
      leftmost node semantics to get the timer with earliest expiration time.
      This is a drop in conversion, and therefore semantics remain untouched.
      
      The runtime overhead of cached rbtrees is be pretty much the same as the
      current head->next method, noting that when removing the leftmost node,
      a common operation for the timerqueue, the rb_next(leftmost) is O(1) as
      well, so the next timer will either be the right node or its parent.
      Therefore no extra pointer chasing. Finally, the size of the struct
      timerqueue_head remains the same.
      
      Passes several hours of rcutorture.
      
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190724152323.bojciei3muvfxalm@linux-r8p5
      [bwh: While this was supposed to be just refactoring, it also fixed a
       security flaw (CVE-2021-20317).  Backported to 4.9:
       - Deleted code in timerqueue_del() is different before commit d852d394
      
      
         "timerqueue: Use rb_entry_safe() instead of open-coding it"
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef2e6403
    • Davidlohr Bueso's avatar
      rbtree: cache leftmost node internally · c89a7680
      Davidlohr Bueso authored
      commit cd9e61ed
      
       upstream.
      
      Patch series "rbtree: Cache leftmost node internally", v4.
      
      A series to extending rbtrees to internally cache the leftmost node such
      that we can have fast overlap check optimization for all interval tree
      users[1].  The benefits of this series are that:
      
      (i)   Unify users that do internal leftmost node caching.
      (ii)  Optimize all interval tree users.
      (iii) Convert at least two new users (epoll and procfs) to the new interface.
      
      This patch (of 16):
      
      Red-black tree semantics imply that nodes with smaller or greater (or
      equal for duplicates) keys always be to the left and right,
      respectively.  For the kernel this is extremely evident when considering
      our rb_first() semantics.  Enabling lookups for the smallest node in the
      tree in O(1) can save a good chunk of cycles in not having to walk down
      the tree each time.  To this end there are a few core users that
      explicitly do this, such as the scheduler and rtmutexes.  There is also
      the desire for interval trees to have this optimization allowing faster
      overlap checking.
      
      This patch introduces a new 'struct rb_root_cached' which is just the
      root with a cached pointer to the leftmost node.  The reason why the
      regular rb_root was not extended instead of adding a new structure was
      that this allows the user to have the choice between memory footprint
      and actual tree performance.  The new wrappers on top of the regular
      rb_root calls are:
      
       - rb_first_cached(cached_root) -- which is a fast replacement
           for rb_first.
      
       - rb_insert_color_cached(node, cached_root, new)
      
       - rb_erase_cached(node, cached_root)
      
      In addition, augmented cached interfaces are also added for basic
      insertion and deletion operations; which becomes important for the
      interval tree changes.
      
      With the exception of the inserts, which adds a bool for updating the
      new leftmost, the interfaces are kept the same.  To this end, porting rb
      users to the cached version becomes really trivial, and keeping current
      rbtree semantics for users that don't care about the optimization
      requires zero overhead.
      
      Link: http://lkml.kernel.org/r/20170719014603.19029-2-dave@stgolabs.net
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c89a7680
    • Paul Moore's avatar
      cipso,calipso: resolve a number of problems with the DOI refcounts · f49f0e65
      Paul Moore authored
      commit ad5d07f4 upstream.
      
      The current CIPSO and CALIPSO refcounting scheme for the DOI
      definitions is a bit flawed in that we:
      
      1. Don't correctly match gets/puts in netlbl_cipsov4_list().
      2. Decrement the refcount on each attempt to remove the DOI from the
         DOI list, only removing it from the list once the refcount drops
         to zero.
      
      This patch fixes these problems by adding the missing "puts" to
      netlbl_cipsov4_list() and introduces a more conventional, i.e.
      not-buggy, refcounting mechanism to the DOI definitions.  Upon the
      addition of a DOI to the DOI list, it is initialized with a refcount
      of one, removing a DOI from the list removes it from the list and
      drops the refcount by one; "gets" and "puts" behave as expected with
      respect to refcounts, increasing and decreasing the DOI's refcount by
      one.
      
      Fixes: b1edeb10 ("netlabel: Replace protocol/NetLabel linking with refrerence counts")
      Fixes: d7cce015
      
       ("netlabel: Add support for removing a CALIPSO DOI.")
      Reported-by: default avatar <syzbot+9ec037722d2603a9f52e@syzkaller.appspotmail.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 4.9: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f49f0e65
    • Michael Braun's avatar
      gianfar: fix jumbo packets+napi+rx overrun crash · 2cf34285
      Michael Braun authored
      commit d8861bab
      
       upstream.
      
      When using jumbo packets and overrunning rx queue with napi enabled,
      the following sequence is observed in gfar_add_rx_frag:
      
         | lstatus                              |       | skb                   |
      t  | lstatus,  size, flags                | first | len, data_len, *ptr   |
      ---+--------------------------------------+-------+-----------------------+
      13 | 18002348, 9032, INTERRUPT LAST       | 0     | 9600, 8000,  f554c12e |
      12 | 10000640, 1600, INTERRUPT            | 0     | 8000, 6400,  f554c12e |
      11 | 10000640, 1600, INTERRUPT            | 0     | 6400, 4800,  f554c12e |
      10 | 10000640, 1600, INTERRUPT            | 0     | 4800, 3200,  f554c12e |
      09 | 10000640, 1600, INTERRUPT            | 0     | 3200, 1600,  f554c12e |
      08 | 14000640, 1600, INTERRUPT FIRST      | 0     | 1600, 0,     f554c12e |
      07 | 14000640, 1600, INTERRUPT FIRST      | 1     | 0,    0,     f554c12e |
      06 | 1c000080, 128,  INTERRUPT LAST FIRST | 1     | 0,    0,     abf3bd6e |
      05 | 18002348, 9032, INTERRUPT LAST       | 0     | 8000, 6400,  c5a57780 |
      04 | 10000640, 1600, INTERRUPT            | 0     | 6400, 4800,  c5a57780 |
      03 | 10000640, 1600, INTERRUPT            | 0     | 4800, 3200,  c5a57780 |
      02 | 10000640, 1600, INTERRUPT            | 0     | 3200, 1600,  c5a57780 |
      01 | 10000640, 1600, INTERRUPT            | 0     | 1600, 0,     c5a57780 |
      00 | 14000640, 1600, INTERRUPT FIRST      | 1     | 0,    0,     c5a57780 |
      
      So at t=7 a new packets is started but not finished, probably due to rx
      overrun - but rx overrun is not indicated in the flags. Instead a new
      packets starts at t=8. This results in skb->len to exceed size for the LAST
      fragment at t=13 and thus a negative fragment size added to the skb.
      
      This then crashes:
      
      kernel BUG at include/linux/skbuff.h:2277!
      Oops: Exception in kernel mode, sig: 5 [#1]
      ...
      NIP [c04689f4] skb_pull+0x2c/0x48
      LR [c03f62ac] gfar_clean_rx_ring+0x2e4/0x844
      Call Trace:
      [ec4bfd38] [c06a84c4] _raw_spin_unlock_irqrestore+0x60/0x7c (unreliable)
      [ec4bfda8] [c03f6a44] gfar_poll_rx_sq+0x48/0xe4
      [ec4bfdc8] [c048d504] __napi_poll+0x54/0x26c
      [ec4bfdf8] [c048d908] net_rx_action+0x138/0x2c0
      [ec4bfe68] [c06a8f34] __do_softirq+0x3a4/0x4fc
      [ec4bfed8] [c0040150] run_ksoftirqd+0x58/0x70
      [ec4bfee8] [c0066ecc] smpboot_thread_fn+0x184/0x1cc
      [ec4bff08] [c0062718] kthread+0x140/0x144
      [ec4bff38] [c0012350] ret_from_kernel_thread+0x14/0x1c
      
      This patch fixes this by checking for computed LAST fragment size, so a
      negative sized fragment is never added.
      In order to prevent the newer rx frame from getting corrupted, the FIRST
      flag is checked to discard the incomplete older frame.
      
      Signed-off-by: default avatarMichael Braun <michael-dev@fami-braun.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2cf34285
    • Andy Spencer's avatar
      gianfar: simplify FCS handling and fix memory leak · 8d18509b
      Andy Spencer authored
      commit d903ec77
      
       upstream.
      
      Previously, buffer descriptors containing only the frame check sequence
      (FCS) were skipped and not added to the skb. However, the page reference
      count was still incremented, leading to a memory leak.
      
      Fixing this inside gfar_add_rx_frag() is difficult due to reserved
      memory handling and page reuse. Instead, move the FCS handling to
      gfar_process_frame() and trim off the FCS before passing the skb up the
      networking stack.
      
      Signed-off-by: default avatarAndy Spencer <aspencer@spacex.com>
      Signed-off-by: default avatarJim Gruen <jgruen@spacex.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d18509b
    • Dave Airlie's avatar
      drm/ttm/nouveau: don't call tt destroy callback on alloc failure. · 70f44dfb
      Dave Airlie authored
      commit 5de5b6ec
      
       upstream.
      
      This is confusing, and from my reading of all the drivers only
      nouveau got this right.
      
      Just make the API act under driver control of it's own allocation
      failing, and don't call destroy, if the page table fails to
      create there is nothing to cleanup here.
      
      (I'm willing to believe I've missed something here, so please
      review deeply).
      
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200728041736.20689-1-airlied@gmail.com
      [bwh: Backported to 4.14:
       - Drop change in ttm_sg_tt_init()
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70f44dfb
    • Linus Torvalds's avatar
      gup: document and work around "COW can break either way" issue · 0c29640b
      Linus Torvalds authored
      commit 9bbd42e7
      
       upstream.
      
      Doing a "get_user_pages()" on a copy-on-write page for reading can be
      ambiguous: the page can be COW'ed at any time afterwards, and the
      direction of a COW event isn't defined.
      
      Yes, whoever writes to it will generally do the COW, but if the thread
      that did the get_user_pages() unmapped the page before the write (and
      that could happen due to memory pressure in addition to any outright
      action), the writer could also just take over the old page instead.
      
      End result: the get_user_pages() call might result in a page pointer
      that is no longer associated with the original VM, and is associated
      with - and controlled by - another VM having taken it over instead.
      
      So when doing a get_user_pages() on a COW mapping, the only really safe
      thing to do would be to break the COW when getting the page, even when
      only getting it for reading.
      
      At the same time, some users simply don't even care.
      
      For example, the perf code wants to look up the page not because it
      cares about the page, but because the code simply wants to look up the
      physical address of the access for informational purposes, and doesn't
      really care about races when a page might be unmapped and remapped
      elsewhere.
      
      This adds logic to force a COW event by setting FOLL_WRITE on any
      copy-on-write mapping when FOLL_GET (or FOLL_PIN) is used to get a page
      pointer as a result.
      
      The current semantics end up being:
      
       - __get_user_pages_fast(): no change. If you don't ask for a write,
         you won't break COW. You'd better know what you're doing.
      
       - get_user_pages_fast(): the fast-case "look it up in the page tables
         without anything getting mmap_sem" now refuses to follow a read-only
         page, since it might need COW breaking.  Which happens in the slow
         path - the fast path doesn't know if the memory might be COW or not.
      
       - get_user_pages() (including the slow-path fallback for gup_fast()):
         for a COW mapping, turn on FOLL_WRITE for FOLL_GET/FOLL_PIN, with
         very similar semantics to FOLL_FORCE.
      
      If it turns out that we want finer granularity (ie "only break COW when
      it might actually matter" - things like the zero page are special and
      don't need to be broken) we might need to push these semantics deeper
      into the lookup fault path.  So if people care enough, it's possible
      that we might end up adding a new internal FOLL_BREAK_COW flag to go
      with the internal FOLL_COW flag we already have for tracking "I had a
      COW".
      
      Alternatively, if it turns out that different callers might want to
      explicitly control the forced COW break behavior, we might even want to
      make such a flag visible to the users of get_user_pages() instead of
      using the above default semantics.
      
      But for now, this is mostly commentary on the issue (this commit message
      being a lot bigger than the patch, and that patch in turn is almost all
      comments), with that minimal "enable COW breaking early" logic using the
      existing FOLL_WRITE behavior.
      
      [ It might be worth noting that we've always had this ambiguity, and it
        could arguably be seen as a user-space issue.
      
        You only get private COW mappings that could break either way in
        situations where user space is doing cooperative things (ie fork()
        before an execve() etc), but it _is_ surprising and very subtle, and
        fork() is supposed to give you independent address spaces.
      
        So let's treat this as a kernel issue and make the semantics of
        get_user_pages() easier to understand. Note that obviously a true
        shared mapping will still get a page that can change under us, so this
        does _not_ mean that get_user_pages() somehow returns any "stable"
        page ]
      
      [surenb: backport notes
      	Replaced (gup_flags | FOLL_WRITE) with write=1 in gup_pgd_range.
      	Removed FOLL_PIN usage in should_force_cow_break since it's missing in
      	the earlier kernels.]
      
      Reported-by: default avatarJann Horn <jannh@google.com>
      Tested-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarKirill Shutemov <kirill@shutemov.name>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [surenb: backport to 4.19 kernel]
      Cc: stable@vger.kernel.org # 4.19.x
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [bwh: Backported to 4.9:
       - Generic get_user_pages_fast() calls __get_user_pages_fast() here,
         so make it pass write=1
       - Various architectures have their own implementations of
         get_user_pages_fast(), so apply the corresponding change there
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c29640b
    • Ben Hutchings's avatar
      Revert "gup: document and work around "COW can break either way" issue" · 6fbb8383
      Ben Hutchings authored
      This reverts commit 9bbd42e7, which
      was commit 17839856
      
       upstream.  The
      backport was incorrect and incomplete:
      
      * It forced the write flag on in the generic __get_user_pages_fast(),
        whereas only get_user_pages_fast() was supposed to do that.
      * It only fixed the generic RCU-based implementation used by arm,
        arm64, and powerpc.  Before Linux 4.13, several other architectures
        had their own implementations: mips, s390, sparc, sh, and x86.
      
      This will be followed by a (hopefully) correct backport.
      
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6fbb8383
    • Miaoqian Lin's avatar
      lib82596: Fix IRQ check in sni_82596_probe · 31f96167
      Miaoqian Lin authored
      commit 99218cbf upstream.
      
      platform_get_irq() returns negative error number instead 0 on failure.
      And the doc of platform_get_irq() provides a usage example:
      
          int irq = platform_get_irq(pdev, 0);
          if (irq < 0)
              return irq;
      
      Fix the check of return value to catch errors correctly.
      
      Fixes: 11597885
      
       ("i825xx: Move the Intel 82586/82593/82596 based drivers")
      Signed-off-by: default avatarMiaoqian Lin <linmq006@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      31f96167
    • Matthias Schiffer's avatar
      scripts/dtc: dtx_diff: remove broken example from help text · 0febebd3
      Matthias Schiffer authored
      commit d8adf5b9 upstream.
      
      dtx_diff suggests to use <(...) syntax to pipe two inputs into it, but
      this has never worked: The /proc/self/fds/... paths passed by the shell
      will fail the `[ -f "${dtx}" ] && [ -r "${dtx}" ]` check in compile_to_dts,
      but even with this check removed, the function cannot work: hexdump will
      eat up the DTB magic, making the subsequent dtc call fail, as a pipe
      cannot be rewound.
      
      Simply remove this broken example, as there is already an alternative one
      that works fine.
      
      Fixes: 10eadc25
      
       ("dtc: create tool to diff device trees")
      Signed-off-by: default avatarMatthias Schiffer <matthias.schiffer@ew.tq-group.com>
      Reviewed-by: default avatarFrank Rowand <frank.rowand@sony.com>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Link: https://lore.kernel.org/r/20220113081918.10387-1-matthias.schiffer@ew.tq-group.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0febebd3
    • Sergey Shtylyov's avatar
      bcmgenet: add WOL IRQ check · 5d486412
      Sergey Shtylyov authored
      commit 9deb48b5 upstream.
      
      The driver neglects to check the result of platform_get_irq_optional()'s
      call and blithely passes the negative error codes to devm_request_irq()
      (which takes *unsigned* IRQ #), causing it to fail with -EINVAL.
      Stop calling devm_request_irq() with the invalid IRQ #s.
      
      Fixes: 8562056f
      
       ("net: bcmgenet: request Wake-on-LAN interrupt")
      Signed-off-by: default avatarSergey Shtylyov <s.shtylyov@omp.ru>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d486412
    • Kevin Bracey's avatar
      net_sched: restore "mpu xxx" handling · 019f0458
      Kevin Bracey authored
      commit fb80445c upstream.
      
      commit 56b765b7 ("htb: improved accuracy at high rates") broke
      "overhead X", "linklayer atm" and "mpu X" attributes.
      
      "overhead X" and "linklayer atm" have already been fixed. This restores
      the "mpu X" handling, as might be used by DOCSIS or Ethernet shaping:
      
          tc class add ... htb rate X overhead 4 mpu 64
      
      The code being fixed is used by htb, tbf and act_police. Cake has its
      own mpu handling. qdisc_calculate_pkt_len still uses the size table
      containing values adjusted for mpu by user space.
      
      iproute2 tc has always passed mpu into the kernel via a tc_ratespec
      structure, but the kernel never directly acted on it, merely stored it
      so that it could be read back by `tc class show`.
      
      Rather, tc would generate length-to-time tables that included the mpu
      (and linklayer) in their construction, and the kernel used those tables.
      
      Since v3.7, the tables were no longer used. Along with "mpu", this also
      broke "overhead" and "linklayer" which were fixed in 01cb71d2
      ("net_sched: restore "overhead xxx" handling", v3.10) and 8a8e3d84
      ("net_sched: restore "linklayer atm" handling", v3.11).
      
      "overhead" was fixed by simply restoring use of tc_ratespec::overhead -
      this had originally been used by the kernel but was initially omitted
      from the new non-table-based calculations.
      
      "linklayer" had been handled in the table like "mpu", but the mode was
      not originally passed in tc_ratespec. The new implementation was made to
      handle it by getting new versions of tc to pass the mode in an extended
      tc_ratespec, and for older versions of tc the table contents were analysed
      at load time to deduce linklayer.
      
      As "mpu" has always been given to the kernel in tc_ratespec,
      accompanying the mpu-based table, we can restore system functionality
      with no userspace change by making the kernel act on the tc_ratespec
      value.
      
      Fixes: 56b765b7
      
       ("htb: improved accuracy at high rates")
      Signed-off-by: default avatarKevin Bracey <kevin@bracey.fi>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: Vimalkumar <j.vimal@gmail.com>
      Link: https://lore.kernel.org/r/20220112170210.1014351-1-kevin@bracey.fi
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      019f0458
    • Tudor Ambarus's avatar
      dmaengine: at_xdmac: Fix at_xdmac_lld struct definition · 94ca32fe
      Tudor Ambarus authored
      commit 912f7c6f upstream.
      
      The hardware channel next descriptor view structure contains just
      fields of 32 bits, while dma_addr_t can be of type u64 or u32
      depending on CONFIG_ARCH_DMA_ADDR_T_64BIT. Force u32 to comply with
      what the hardware expects.
      
      Fixes: e1f7c9ee
      
       ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver")
      Signed-off-by: default avatarTudor Ambarus <tudor.ambarus@microchip.com>
      Link: https://lore.kernel.org/r/20211215110115.191749-11-tudor.ambarus@microchip.com
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      94ca32fe
    • Tudor Ambarus's avatar
      dmaengine: at_xdmac: Fix lld view setting · bc882453
      Tudor Ambarus authored
      commit 1385eb4d upstream.
      
      AT_XDMAC_CNDC_NDVIEW_NDV3 was set even for AT_XDMAC_MBR_UBC_NDV2,
      because of the wrong bit handling. Fix it.
      
      Fixes: ee0fe35c
      
       ("dmaengine: xdmac: Handle descriptor's view 3 registers")
      Signed-off-by: default avatarTudor Ambarus <tudor.ambarus@microchip.com>
      Link: https://lore.kernel.org/r/20211215110115.191749-10-tudor.ambarus@microchip.com
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc882453