Skip to content
  1. Jan 27, 2022
    • Greg Kroah-Hartman's avatar
      Linux 4.9.298 · b5308593
      Greg Kroah-Hartman authored
      
      
      Link: https://lore.kernel.org/r/20220124183932.787526760@linuxfoundation.org
      Tested-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Tested-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20220125155253.051565866@linuxfoundation.org
      Tested-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Tested-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Tested-by: default avatarLinux Kernel Functional Testing <lkft@linaro.org>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      v4.9.298
      b5308593
    • Nicholas Piggin's avatar
      KVM: do not allow mapping valid but non-reference-counted pages · f4b2bfed
      Nicholas Piggin authored
      commit f8be156b
      
       upstream.
      
      It's possible to create a region which maps valid but non-refcounted
      pages (e.g., tail pages of non-compound higher order allocations). These
      host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
      of APIs, which take a reference to the page, which takes it from 0 to 1.
      When the reference is dropped, this will free the page incorrectly.
      
      Fix this by only taking a reference on valid pages if it was non-zero,
      which indicates it is participating in normal refcounting (and can be
      released with put_page).
      
      This addresses CVE-2021-22543.
      
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Tested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4b2bfed
    • Sean Christopherson's avatar
      KVM: Use kvm_pfn_t for local PFN variable in hva_to_pfn_remapped() · 29efa6b0
      Sean Christopherson authored
      commit a9545779 upstream.
      
      Use kvm_pfn_t, a.k.a. u64, for the local 'pfn' variable when retrieving
      a so called "remapped" hva/pfn pair.  In theory, the hva could resolve to
      a pfn in high memory on a 32-bit kernel.
      
      This bug was inadvertantly exposed by commit bd2fae8d ("KVM: do not
      assume PTE is writable after follow_pfn"), which added an error PFN value
      to the mix, causing gcc to comlain about overflowing the unsigned long.
      
        arch/x86/kvm/../../../virt/kvm/kvm_main.c: In function ‘hva_to_pfn_remapped’:
        include/linux/kvm_host.h:89:30: error: conversion from ‘long long unsigned int’
                                        to ‘long unsigned int’ changes value from
                                        ‘9218868437227405314’ to ‘2’ [-Werror=overflow]
         89 | #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2)
            |                              ^
      virt/kvm/kvm_main.c:1935:9: note: in expansion of macro ‘KVM_PFN_ERR_RO_FAULT’
      
      Cc: stable@vger.kernel.org
      Fixes: add6a0cd
      
       ("KVM: MMU: try to fix up page faults before giving up")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210208201940.1258328-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29efa6b0
    • Paolo Bonzini's avatar
      KVM: do not assume PTE is writable after follow_pfn · 854a6e01
      Paolo Bonzini authored
      commit bd2fae8d upstream.
      
      In order to convert an HVA to a PFN, KVM usually tries to use
      the get_user_pages family of functinso.  This however is not
      possible for VM_IO vmas; in that case, KVM instead uses follow_pfn.
      
      In doing this however KVM loses the information on whether the
      PFN is writable.  That is usually not a problem because the main
      use of VM_IO vmas with KVM is for BARs in PCI device assignment,
      however it is a bug.  To fix it, use follow_pte and check pte_write
      while under the protection of the PTE lock.  The information can
      be used to fail hva_to_pfn_remapped or passed back to the
      caller via *writable.
      
      Usage of follow_pfn was introduced in commit add6a0cd ("KVM: MMU: try to fix
      up page faults before giving up", 2016-07-05); however, even older version
      have the same issue, all the way back to commit 2e2e3738 ("KVM:
      Handle vma regions with no backing page", 2008-07-20), as they also did
      not check whether the PFN was writable.
      
      Fixes: 2e2e3738
      
       ("KVM: Handle vma regions with no backing page")
      Reported-by: default avatarDavid Stevens <stevensd@google.com>
      Cc: 3pvd@google.com
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [OP: backport to 4.19, adjust follow_pte() -> follow_pte_pmd()]
      Signed-off-by: default avatarOvidiu Panait <ovidiu.panait@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [bwh: Backport to 4.9: follow_pte_pmd() does not take start or end
       parameters]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      854a6e01
    • Ross Zwisler's avatar
      mm: add follow_pte_pmd() · 0dd4d649
      Ross Zwisler authored
      commit 09796395
      
       upstream.
      
      Patch series "Write protect DAX PMDs in *sync path".
      
      Currently dax_mapping_entry_mkclean() fails to clean and write protect
      the pmd_t of a DAX PMD entry during an *sync operation.  This can result
      in data loss, as detailed in patch 2.
      
      This series is based on Dan's "libnvdimm-pending" branch, which is the
      current home for Jan's "dax: Page invalidation fixes" series.  You can
      find a working tree here:
      
        https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=dax_pmd_clean
      
      This patch (of 2):
      
      Similar to follow_pte(), follow_pte_pmd() allows either a PTE leaf or a
      huge page PMD leaf to be found and returned.
      
      Link: http://lkml.kernel.org/r/1482272586-21177-2-git-send-email-ross.zwisler@linux.intel.com
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Suggested-by: default avatarDave Hansen <dave.hansen@intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 4.9: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0dd4d649
    • Davidlohr Bueso's avatar
      lib/timerqueue: Rely on rbtree semantics for next timer · ef2e6403
      Davidlohr Bueso authored
      commit 511885d7
      
       upstream.
      
      Simplify the timerqueue code by using cached rbtrees and rely on the tree
      leftmost node semantics to get the timer with earliest expiration time.
      This is a drop in conversion, and therefore semantics remain untouched.
      
      The runtime overhead of cached rbtrees is be pretty much the same as the
      current head->next method, noting that when removing the leftmost node,
      a common operation for the timerqueue, the rb_next(leftmost) is O(1) as
      well, so the next timer will either be the right node or its parent.
      Therefore no extra pointer chasing. Finally, the size of the struct
      timerqueue_head remains the same.
      
      Passes several hours of rcutorture.
      
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190724152323.bojciei3muvfxalm@linux-r8p5
      [bwh: While this was supposed to be just refactoring, it also fixed a
       security flaw (CVE-2021-20317).  Backported to 4.9:
       - Deleted code in timerqueue_del() is different before commit d852d394
      
      
         "timerqueue: Use rb_entry_safe() instead of open-coding it"
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef2e6403
    • Davidlohr Bueso's avatar
      rbtree: cache leftmost node internally · c89a7680
      Davidlohr Bueso authored
      commit cd9e61ed
      
       upstream.
      
      Patch series "rbtree: Cache leftmost node internally", v4.
      
      A series to extending rbtrees to internally cache the leftmost node such
      that we can have fast overlap check optimization for all interval tree
      users[1].  The benefits of this series are that:
      
      (i)   Unify users that do internal leftmost node caching.
      (ii)  Optimize all interval tree users.
      (iii) Convert at least two new users (epoll and procfs) to the new interface.
      
      This patch (of 16):
      
      Red-black tree semantics imply that nodes with smaller or greater (or
      equal for duplicates) keys always be to the left and right,
      respectively.  For the kernel this is extremely evident when considering
      our rb_first() semantics.  Enabling lookups for the smallest node in the
      tree in O(1) can save a good chunk of cycles in not having to walk down
      the tree each time.  To this end there are a few core users that
      explicitly do this, such as the scheduler and rtmutexes.  There is also
      the desire for interval trees to have this optimization allowing faster
      overlap checking.
      
      This patch introduces a new 'struct rb_root_cached' which is just the
      root with a cached pointer to the leftmost node.  The reason why the
      regular rb_root was not extended instead of adding a new structure was
      that this allows the user to have the choice between memory footprint
      and actual tree performance.  The new wrappers on top of the regular
      rb_root calls are:
      
       - rb_first_cached(cached_root) -- which is a fast replacement
           for rb_first.
      
       - rb_insert_color_cached(node, cached_root, new)
      
       - rb_erase_cached(node, cached_root)
      
      In addition, augmented cached interfaces are also added for basic
      insertion and deletion operations; which becomes important for the
      interval tree changes.
      
      With the exception of the inserts, which adds a bool for updating the
      new leftmost, the interfaces are kept the same.  To this end, porting rb
      users to the cached version becomes really trivial, and keeping current
      rbtree semantics for users that don't care about the optimization
      requires zero overhead.
      
      Link: http://lkml.kernel.org/r/20170719014603.19029-2-dave@stgolabs.net
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c89a7680
    • Paul Moore's avatar
      cipso,calipso: resolve a number of problems with the DOI refcounts · f49f0e65
      Paul Moore authored
      commit ad5d07f4 upstream.
      
      The current CIPSO and CALIPSO refcounting scheme for the DOI
      definitions is a bit flawed in that we:
      
      1. Don't correctly match gets/puts in netlbl_cipsov4_list().
      2. Decrement the refcount on each attempt to remove the DOI from the
         DOI list, only removing it from the list once the refcount drops
         to zero.
      
      This patch fixes these problems by adding the missing "puts" to
      netlbl_cipsov4_list() and introduces a more conventional, i.e.
      not-buggy, refcounting mechanism to the DOI definitions.  Upon the
      addition of a DOI to the DOI list, it is initialized with a refcount
      of one, removing a DOI from the list removes it from the list and
      drops the refcount by one; "gets" and "puts" behave as expected with
      respect to refcounts, increasing and decreasing the DOI's refcount by
      one.
      
      Fixes: b1edeb10 ("netlabel: Replace protocol/NetLabel linking with refrerence counts")
      Fixes: d7cce015
      
       ("netlabel: Add support for removing a CALIPSO DOI.")
      Reported-by: default avatar <syzbot+9ec037722d2603a9f52e@syzkaller.appspotmail.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 4.9: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f49f0e65
    • Michael Braun's avatar
      gianfar: fix jumbo packets+napi+rx overrun crash · 2cf34285
      Michael Braun authored
      commit d8861bab
      
       upstream.
      
      When using jumbo packets and overrunning rx queue with napi enabled,
      the following sequence is observed in gfar_add_rx_frag:
      
         | lstatus                              |       | skb                   |
      t  | lstatus,  size, flags                | first | len, data_len, *ptr   |
      ---+--------------------------------------+-------+-----------------------+
      13 | 18002348, 9032, INTERRUPT LAST       | 0     | 9600, 8000,  f554c12e |
      12 | 10000640, 1600, INTERRUPT            | 0     | 8000, 6400,  f554c12e |
      11 | 10000640, 1600, INTERRUPT            | 0     | 6400, 4800,  f554c12e |
      10 | 10000640, 1600, INTERRUPT            | 0     | 4800, 3200,  f554c12e |
      09 | 10000640, 1600, INTERRUPT            | 0     | 3200, 1600,  f554c12e |
      08 | 14000640, 1600, INTERRUPT FIRST      | 0     | 1600, 0,     f554c12e |
      07 | 14000640, 1600, INTERRUPT FIRST      | 1     | 0,    0,     f554c12e |
      06 | 1c000080, 128,  INTERRUPT LAST FIRST | 1     | 0,    0,     abf3bd6e |
      05 | 18002348, 9032, INTERRUPT LAST       | 0     | 8000, 6400,  c5a57780 |
      04 | 10000640, 1600, INTERRUPT            | 0     | 6400, 4800,  c5a57780 |
      03 | 10000640, 1600, INTERRUPT            | 0     | 4800, 3200,  c5a57780 |
      02 | 10000640, 1600, INTERRUPT            | 0     | 3200, 1600,  c5a57780 |
      01 | 10000640, 1600, INTERRUPT            | 0     | 1600, 0,     c5a57780 |
      00 | 14000640, 1600, INTERRUPT FIRST      | 1     | 0,    0,     c5a57780 |
      
      So at t=7 a new packets is started but not finished, probably due to rx
      overrun - but rx overrun is not indicated in the flags. Instead a new
      packets starts at t=8. This results in skb->len to exceed size for the LAST
      fragment at t=13 and thus a negative fragment size added to the skb.
      
      This then crashes:
      
      kernel BUG at include/linux/skbuff.h:2277!
      Oops: Exception in kernel mode, sig: 5 [#1]
      ...
      NIP [c04689f4] skb_pull+0x2c/0x48
      LR [c03f62ac] gfar_clean_rx_ring+0x2e4/0x844
      Call Trace:
      [ec4bfd38] [c06a84c4] _raw_spin_unlock_irqrestore+0x60/0x7c (unreliable)
      [ec4bfda8] [c03f6a44] gfar_poll_rx_sq+0x48/0xe4
      [ec4bfdc8] [c048d504] __napi_poll+0x54/0x26c
      [ec4bfdf8] [c048d908] net_rx_action+0x138/0x2c0
      [ec4bfe68] [c06a8f34] __do_softirq+0x3a4/0x4fc
      [ec4bfed8] [c0040150] run_ksoftirqd+0x58/0x70
      [ec4bfee8] [c0066ecc] smpboot_thread_fn+0x184/0x1cc
      [ec4bff08] [c0062718] kthread+0x140/0x144
      [ec4bff38] [c0012350] ret_from_kernel_thread+0x14/0x1c
      
      This patch fixes this by checking for computed LAST fragment size, so a
      negative sized fragment is never added.
      In order to prevent the newer rx frame from getting corrupted, the FIRST
      flag is checked to discard the incomplete older frame.
      
      Signed-off-by: default avatarMichael Braun <michael-dev@fami-braun.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2cf34285
    • Andy Spencer's avatar
      gianfar: simplify FCS handling and fix memory leak · 8d18509b
      Andy Spencer authored
      commit d903ec77
      
       upstream.
      
      Previously, buffer descriptors containing only the frame check sequence
      (FCS) were skipped and not added to the skb. However, the page reference
      count was still incremented, leading to a memory leak.
      
      Fixing this inside gfar_add_rx_frag() is difficult due to reserved
      memory handling and page reuse. Instead, move the FCS handling to
      gfar_process_frame() and trim off the FCS before passing the skb up the
      networking stack.
      
      Signed-off-by: default avatarAndy Spencer <aspencer@spacex.com>
      Signed-off-by: default avatarJim Gruen <jgruen@spacex.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d18509b
    • Dave Airlie's avatar
      drm/ttm/nouveau: don't call tt destroy callback on alloc failure. · 70f44dfb
      Dave Airlie authored
      commit 5de5b6ec
      
       upstream.
      
      This is confusing, and from my reading of all the drivers only
      nouveau got this right.
      
      Just make the API act under driver control of it's own allocation
      failing, and don't call destroy, if the page table fails to
      create there is nothing to cleanup here.
      
      (I'm willing to believe I've missed something here, so please
      review deeply).
      
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200728041736.20689-1-airlied@gmail.com
      [bwh: Backported to 4.14:
       - Drop change in ttm_sg_tt_init()
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70f44dfb
    • Linus Torvalds's avatar
      gup: document and work around "COW can break either way" issue · 0c29640b
      Linus Torvalds authored
      commit 9bbd42e7
      
       upstream.
      
      Doing a "get_user_pages()" on a copy-on-write page for reading can be
      ambiguous: the page can be COW'ed at any time afterwards, and the
      direction of a COW event isn't defined.
      
      Yes, whoever writes to it will generally do the COW, but if the thread
      that did the get_user_pages() unmapped the page before the write (and
      that could happen due to memory pressure in addition to any outright
      action), the writer could also just take over the old page instead.
      
      End result: the get_user_pages() call might result in a page pointer
      that is no longer associated with the original VM, and is associated
      with - and controlled by - another VM having taken it over instead.
      
      So when doing a get_user_pages() on a COW mapping, the only really safe
      thing to do would be to break the COW when getting the page, even when
      only getting it for reading.
      
      At the same time, some users simply don't even care.
      
      For example, the perf code wants to look up the page not because it
      cares about the page, but because the code simply wants to look up the
      physical address of the access for informational purposes, and doesn't
      really care about races when a page might be unmapped and remapped
      elsewhere.
      
      This adds logic to force a COW event by setting FOLL_WRITE on any
      copy-on-write mapping when FOLL_GET (or FOLL_PIN) is used to get a page
      pointer as a result.
      
      The current semantics end up being:
      
       - __get_user_pages_fast(): no change. If you don't ask for a write,
         you won't break COW. You'd better know what you're doing.
      
       - get_user_pages_fast(): the fast-case "look it up in the page tables
         without anything getting mmap_sem" now refuses to follow a read-only
         page, since it might need COW breaking.  Which happens in the slow
         path - the fast path doesn't know if the memory might be COW or not.
      
       - get_user_pages() (including the slow-path fallback for gup_fast()):
         for a COW mapping, turn on FOLL_WRITE for FOLL_GET/FOLL_PIN, with
         very similar semantics to FOLL_FORCE.
      
      If it turns out that we want finer granularity (ie "only break COW when
      it might actually matter" - things like the zero page are special and
      don't need to be broken) we might need to push these semantics deeper
      into the lookup fault path.  So if people care enough, it's possible
      that we might end up adding a new internal FOLL_BREAK_COW flag to go
      with the internal FOLL_COW flag we already have for tracking "I had a
      COW".
      
      Alternatively, if it turns out that different callers might want to
      explicitly control the forced COW break behavior, we might even want to
      make such a flag visible to the users of get_user_pages() instead of
      using the above default semantics.
      
      But for now, this is mostly commentary on the issue (this commit message
      being a lot bigger than the patch, and that patch in turn is almost all
      comments), with that minimal "enable COW breaking early" logic using the
      existing FOLL_WRITE behavior.
      
      [ It might be worth noting that we've always had this ambiguity, and it
        could arguably be seen as a user-space issue.
      
        You only get private COW mappings that could break either way in
        situations where user space is doing cooperative things (ie fork()
        before an execve() etc), but it _is_ surprising and very subtle, and
        fork() is supposed to give you independent address spaces.
      
        So let's treat this as a kernel issue and make the semantics of
        get_user_pages() easier to understand. Note that obviously a true
        shared mapping will still get a page that can change under us, so this
        does _not_ mean that get_user_pages() somehow returns any "stable"
        page ]
      
      [surenb: backport notes
      	Replaced (gup_flags | FOLL_WRITE) with write=1 in gup_pgd_range.
      	Removed FOLL_PIN usage in should_force_cow_break since it's missing in
      	the earlier kernels.]
      
      Reported-by: default avatarJann Horn <jannh@google.com>
      Tested-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarKirill Shutemov <kirill@shutemov.name>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [surenb: backport to 4.19 kernel]
      Cc: stable@vger.kernel.org # 4.19.x
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [bwh: Backported to 4.9:
       - Generic get_user_pages_fast() calls __get_user_pages_fast() here,
         so make it pass write=1
       - Various architectures have their own implementations of
         get_user_pages_fast(), so apply the corresponding change there
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c29640b
    • Ben Hutchings's avatar
      Revert "gup: document and work around "COW can break either way" issue" · 6fbb8383
      Ben Hutchings authored
      This reverts commit 9bbd42e7, which
      was commit 17839856
      
       upstream.  The
      backport was incorrect and incomplete:
      
      * It forced the write flag on in the generic __get_user_pages_fast(),
        whereas only get_user_pages_fast() was supposed to do that.
      * It only fixed the generic RCU-based implementation used by arm,
        arm64, and powerpc.  Before Linux 4.13, several other architectures
        had their own implementations: mips, s390, sparc, sh, and x86.
      
      This will be followed by a (hopefully) correct backport.
      
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6fbb8383
    • Miaoqian Lin's avatar
      lib82596: Fix IRQ check in sni_82596_probe · 31f96167
      Miaoqian Lin authored
      commit 99218cbf upstream.
      
      platform_get_irq() returns negative error number instead 0 on failure.
      And the doc of platform_get_irq() provides a usage example:
      
          int irq = platform_get_irq(pdev, 0);
          if (irq < 0)
              return irq;
      
      Fix the check of return value to catch errors correctly.
      
      Fixes: 11597885
      
       ("i825xx: Move the Intel 82586/82593/82596 based drivers")
      Signed-off-by: default avatarMiaoqian Lin <linmq006@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      31f96167
    • Matthias Schiffer's avatar
      scripts/dtc: dtx_diff: remove broken example from help text · 0febebd3
      Matthias Schiffer authored
      commit d8adf5b9 upstream.
      
      dtx_diff suggests to use <(...) syntax to pipe two inputs into it, but
      this has never worked: The /proc/self/fds/... paths passed by the shell
      will fail the `[ -f "${dtx}" ] && [ -r "${dtx}" ]` check in compile_to_dts,
      but even with this check removed, the function cannot work: hexdump will
      eat up the DTB magic, making the subsequent dtc call fail, as a pipe
      cannot be rewound.
      
      Simply remove this broken example, as there is already an alternative one
      that works fine.
      
      Fixes: 10eadc25
      
       ("dtc: create tool to diff device trees")
      Signed-off-by: default avatarMatthias Schiffer <matthias.schiffer@ew.tq-group.com>
      Reviewed-by: default avatarFrank Rowand <frank.rowand@sony.com>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Link: https://lore.kernel.org/r/20220113081918.10387-1-matthias.schiffer@ew.tq-group.com
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0febebd3
    • Sergey Shtylyov's avatar
      bcmgenet: add WOL IRQ check · 5d486412
      Sergey Shtylyov authored
      commit 9deb48b5 upstream.
      
      The driver neglects to check the result of platform_get_irq_optional()'s
      call and blithely passes the negative error codes to devm_request_irq()
      (which takes *unsigned* IRQ #), causing it to fail with -EINVAL.
      Stop calling devm_request_irq() with the invalid IRQ #s.
      
      Fixes: 8562056f
      
       ("net: bcmgenet: request Wake-on-LAN interrupt")
      Signed-off-by: default avatarSergey Shtylyov <s.shtylyov@omp.ru>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d486412
    • Kevin Bracey's avatar
      net_sched: restore "mpu xxx" handling · 019f0458
      Kevin Bracey authored
      commit fb80445c upstream.
      
      commit 56b765b7 ("htb: improved accuracy at high rates") broke
      "overhead X", "linklayer atm" and "mpu X" attributes.
      
      "overhead X" and "linklayer atm" have already been fixed. This restores
      the "mpu X" handling, as might be used by DOCSIS or Ethernet shaping:
      
          tc class add ... htb rate X overhead 4 mpu 64
      
      The code being fixed is used by htb, tbf and act_police. Cake has its
      own mpu handling. qdisc_calculate_pkt_len still uses the size table
      containing values adjusted for mpu by user space.
      
      iproute2 tc has always passed mpu into the kernel via a tc_ratespec
      structure, but the kernel never directly acted on it, merely stored it
      so that it could be read back by `tc class show`.
      
      Rather, tc would generate length-to-time tables that included the mpu
      (and linklayer) in their construction, and the kernel used those tables.
      
      Since v3.7, the tables were no longer used. Along with "mpu", this also
      broke "overhead" and "linklayer" which were fixed in 01cb71d2
      ("net_sched: restore "overhead xxx" handling", v3.10) and 8a8e3d84
      ("net_sched: restore "linklayer atm" handling", v3.11).
      
      "overhead" was fixed by simply restoring use of tc_ratespec::overhead -
      this had originally been used by the kernel but was initially omitted
      from the new non-table-based calculations.
      
      "linklayer" had been handled in the table like "mpu", but the mode was
      not originally passed in tc_ratespec. The new implementation was made to
      handle it by getting new versions of tc to pass the mode in an extended
      tc_ratespec, and for older versions of tc the table contents were analysed
      at load time to deduce linklayer.
      
      As "mpu" has always been given to the kernel in tc_ratespec,
      accompanying the mpu-based table, we can restore system functionality
      with no userspace change by making the kernel act on the tc_ratespec
      value.
      
      Fixes: 56b765b7
      
       ("htb: improved accuracy at high rates")
      Signed-off-by: default avatarKevin Bracey <kevin@bracey.fi>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: Vimalkumar <j.vimal@gmail.com>
      Link: https://lore.kernel.org/r/20220112170210.1014351-1-kevin@bracey.fi
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      019f0458
    • Tudor Ambarus's avatar
      dmaengine: at_xdmac: Fix at_xdmac_lld struct definition · 94ca32fe
      Tudor Ambarus authored
      commit 912f7c6f upstream.
      
      The hardware channel next descriptor view structure contains just
      fields of 32 bits, while dma_addr_t can be of type u64 or u32
      depending on CONFIG_ARCH_DMA_ADDR_T_64BIT. Force u32 to comply with
      what the hardware expects.
      
      Fixes: e1f7c9ee
      
       ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver")
      Signed-off-by: default avatarTudor Ambarus <tudor.ambarus@microchip.com>
      Link: https://lore.kernel.org/r/20211215110115.191749-11-tudor.ambarus@microchip.com
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      94ca32fe
    • Tudor Ambarus's avatar
      dmaengine: at_xdmac: Fix lld view setting · bc882453
      Tudor Ambarus authored
      commit 1385eb4d upstream.
      
      AT_XDMAC_CNDC_NDVIEW_NDV3 was set even for AT_XDMAC_MBR_UBC_NDV2,
      because of the wrong bit handling. Fix it.
      
      Fixes: ee0fe35c
      
       ("dmaengine: xdmac: Handle descriptor's view 3 registers")
      Signed-off-by: default avatarTudor Ambarus <tudor.ambarus@microchip.com>
      Link: https://lore.kernel.org/r/20211215110115.191749-10-tudor.ambarus@microchip.com
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc882453
    • Tudor Ambarus's avatar
      dmaengine: at_xdmac: Print debug message after realeasing the lock · 3e279f7a
      Tudor Ambarus authored
      commit 5edc24ac upstream.
      
      It is desirable to do the prints without the lock held if possible, so
      move the print after the lock is released.
      
      Fixes: e1f7c9ee
      
       ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver")
      Signed-off-by: default avatarTudor Ambarus <tudor.ambarus@microchip.com>
      Link: https://lore.kernel.org/r/20211215110115.191749-4-tudor.ambarus@microchip.com
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e279f7a
    • Tudor Ambarus's avatar
      dmaengine: at_xdmac: Don't start transactions at tx_submit level · d7ab44f5
      Tudor Ambarus authored
      commit bccfb96b upstream.
      
      tx_submit is supposed to push the current transaction descriptor to a
      pending queue, waiting for issue_pending() to be called. issue_pending()
      must start the transfer, not tx_submit(), thus remove
      at_xdmac_start_xfer() from at_xdmac_tx_submit(). Clients of at_xdmac that
      assume that tx_submit() starts the transfer must be updated and call
      dma_async_issue_pending() if they miss to call it (one example is
      atmel_serial).
      
      As the at_xdmac_start_xfer() is now called only from
      at_xdmac_advance_work() when !at_xdmac_chan_is_enabled(), the
      at_xdmac_chan_is_enabled() check is no longer needed in
      at_xdmac_start_xfer(), thus remove it.
      
      Fixes: e1f7c9ee
      
       ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver")
      Signed-off-by: default avatarTudor Ambarus <tudor.ambarus@microchip.com>
      Link: https://lore.kernel.org/r/20211215110115.191749-2-tudor.ambarus@microchip.com
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7ab44f5
    • Guillaume Nault's avatar
      libcxgb: Don't accidentally set RTO_ONLINK in cxgb_find_route() · 6272a314
      Guillaume Nault authored
      commit a915deaa upstream.
      
      Mask the ECN bits before calling ip_route_output_ports(). The tos
      variable might be passed directly from an IPv4 header, so it may have
      the last ECN bit set. This interferes with the route lookup process as
      ip_route_output_key_hash() interpretes this bit specially (to restrict
      the route scope).
      
      Found by code inspection, compile tested only.
      
      Fixes: 804c2f3e
      
       ("libcxgb,iw_cxgb4,cxgbit: add cxgb_find_route()")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6272a314
    • Eric Dumazet's avatar
      netns: add schedule point in ops_exit_list() · b2fd2514
      Eric Dumazet authored
      commit 2836615a upstream.
      
      When under stress, cleanup_net() can have to dismantle
      netns in big numbers. ops_exit_list() currently calls
      many helpers [1] that have no schedule point, and we can
      end up with soft lockups, particularly on hosts
      with many cpus.
      
      Even for moderate amount of netns processed by cleanup_net()
      this patch avoids latency spikes.
      
      [1] Some of these helpers like fib_sync_up() and fib_sync_down_dev()
      are very slow because net/ipv4/fib_semantics.c uses host-wide hash tables,
      and ifindex is used as the only input of two hash functions.
          ifindexes tend to be the same for all netns (lo.ifindex==1 per instance)
          This will be fixed in a separate patch.
      
      Fixes: 72ad937a
      
       ("net: Add support for batching network namespace cleanups")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2fd2514
    • Robert Hancock's avatar
      net: axienet: fix number of TX ring slots for available check · 81fb2351
      Robert Hancock authored
      commit aba57a82 upstream.
      
      The check for the number of available TX ring slots was off by 1 since a
      slot is required for the skb header as well as each fragment. This could
      result in overwriting a TX ring slot that was still in use.
      
      Fixes: 8a3b7a25
      
       ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
      Signed-off-by: default avatarRobert Hancock <robert.hancock@calian.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81fb2351
    • Robert Hancock's avatar
      net: axienet: Wait for PhyRstCmplt after core reset · 9f1a3c13
      Robert Hancock authored
      commit b400c2f4 upstream.
      
      When resetting the device, wait for the PhyRstCmplt bit to be set
      in the interrupt status register before continuing initialization, to
      ensure that the core is actually ready. When using an external PHY, this
      also ensures we do not start trying to access the PHY while it is still
      in reset. The PHY reset is initiated by the core reset which is
      triggered just above, but remains asserted for 5ms after the core is
      reset according to the documentation.
      
      The MgtRdy bit could also be waited for, but unfortunately when using
      7-series devices, the bit does not appear to work as documented (it
      seems to behave as some sort of link state indication and not just an
      indication the transceiver is ready) so it can't really be relied on for
      this purpose.
      
      Fixes: 8a3b7a25
      
       ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
      Signed-off-by: default avatarRobert Hancock <robert.hancock@calian.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f1a3c13
    • Eric Dumazet's avatar
      af_unix: annote lockless accesses to unix_tot_inflight & gc_in_progress · f87f80d8
      Eric Dumazet authored
      commit 9d6d7f1c upstream.
      
      wait_for_unix_gc() reads unix_tot_inflight & gc_in_progress
      without synchronization.
      
      Adds READ_ONCE()/WRITE_ONCE() and their associated comments
      to better document the intent.
      
      BUG: KCSAN: data-race in unix_inflight / wait_for_unix_gc
      
      write to 0xffffffff86e2b7c0 of 4 bytes by task 9380 on cpu 0:
       unix_inflight+0x1e8/0x260 net/unix/scm.c:63
       unix_attach_fds+0x10c/0x1e0 net/unix/scm.c:121
       unix_scm_to_skb net/unix/af_unix.c:1674 [inline]
       unix_dgram_sendmsg+0x679/0x16b0 net/unix/af_unix.c:1817
       unix_seqpacket_sendmsg+0xcc/0x110 net/unix/af_unix.c:2258
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg net/socket.c:724 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2409
       ___sys_sendmsg net/socket.c:2463 [inline]
       __sys_sendmmsg+0x267/0x4c0 net/socket.c:2549
       __do_sys_sendmmsg net/socket.c:2578 [inline]
       __se_sys_sendmmsg net/socket.c:2575 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2575
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffffffff86e2b7c0 of 4 bytes by task 9375 on cpu 1:
       wait_for_unix_gc+0x24/0x160 net/unix/garbage.c:196
       unix_dgram_sendmsg+0x8e/0x16b0 net/unix/af_unix.c:1772
       unix_seqpacket_sendmsg+0xcc/0x110 net/unix/af_unix.c:2258
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg net/socket.c:724 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2409
       ___sys_sendmsg net/socket.c:2463 [inline]
       __sys_sendmmsg+0x267/0x4c0 net/socket.c:2549
       __do_sys_sendmmsg net/socket.c:2578 [inline]
       __se_sys_sendmmsg net/socket.c:2575 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2575
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x00000002 -> 0x00000004
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 9375 Comm: syz-executor.1 Not tainted 5.16.0-rc7-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 9915672d
      
       ("af_unix: limit unix_tot_inflight")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220114164328.2038499-1-eric.dumazet@gmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f87f80d8
    • Miaoqian Lin's avatar
      parisc: pdc_stable: Fix memory leak in pdcs_register_pathentries · 46e96071
      Miaoqian Lin authored
      commit d24846a4 upstream.
      
      kobject_init_and_add() takes reference even when it fails.
      According to the doc of kobject_init_and_add():
      
         If this function returns an error, kobject_put() must be called to
         properly clean up the memory associated with the object.
      
      Fix memory leak by calling kobject_put().
      
      Fixes: 73f368cf
      
       ("Kobject: change drivers/parisc/pdc_stable.c to use kobject_init_and_add")
      Signed-off-by: default avatarMiaoqian Lin <linmq006@gmail.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      46e96071
    • Tobias Waldekranz's avatar
      net/fsl: xgmac_mdio: Fix incorrect iounmap when removing module · 738f88c9
      Tobias Waldekranz authored
      commit 3f7c239c upstream.
      
      As reported by sparse: In the remove path, the driver would attempt to
      unmap its own priv pointer - instead of the io memory that it mapped
      in probe.
      
      Fixes: 9f35a734
      
       ("net/fsl: introduce Freescale 10G MDIO driver")
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      738f88c9
    • Tobias Waldekranz's avatar
      powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO buses · 68a03b3d
      Tobias Waldekranz authored
      commit 0d375d61 upstream.
      
      This block is used in (at least) T1024 and T1040, including their
      variants like T1023 etc.
      
      Fixes: d55ad296
      
       ("powerpc/mpc85xx: Create dts components for the FSL QorIQ DPAA FMan")
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68a03b3d
    • Chengguang Xu's avatar
      RDMA/rxe: Fix a typo in opcode name · a046a42f
      Chengguang Xu authored
      commit 8d1cfb88 upstream.
      
      There is a redundant ']' in the name of opcode IB_OPCODE_RC_SEND_MIDDLE,
      so just fix it.
      
      Fixes: 8700e3e7
      
       ("Soft RoCE driver")
      Link: https://lore.kernel.org/r/20211218112320.3558770-1-cgxu519@mykernel.net
      Signed-off-by: default avatarChengguang Xu <cgxu519@mykernel.net>
      Acked-by: default avatarZhu Yanjun <zyjzyj2000@gmail.com>
      Reviewed-by: default avatarBob Pearson <rpearsonhpe@gmail.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a046a42f
    • Yixing Liu's avatar
      RDMA/hns: Modify the mapping attribute of doorbell to device · 902650b0
      Yixing Liu authored
      commit 39d5534b upstream.
      
      It is more general for ARM device drivers to use the device attribute to
      map PCI BAR spaces.
      
      Fixes: 9a443537
      
       ("IB/hns: Add driver files for hns RoCE driver")
      Link: https://lore.kernel.org/r/20211206133652.27476-1-liangwenpeng@huawei.com
      Signed-off-by: default avatarYixing Liu <liuyixing1@huawei.com>
      Signed-off-by: default avatarWenpeng Liang <liangwenpeng@huawei.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      902650b0
    • Christian König's avatar
      drm/radeon: fix error handling in radeon_driver_open_kms · 47ed5eed
      Christian König authored
      commit 4722f463 upstream.
      
      The return value was never initialized so the cleanup code executed when
      it isn't even necessary.
      
      Just add proper error handling.
      
      Fixes: ab50cb9d
      
       ("drm/radeon/radeon_kms: Fix a NULL pointer dereference in radeon_driver_open_kms()")
      Signed-off-by: default avatarChristian König <christian.koenig@amd.com>
      Tested-by: default avatarJan Stancek <jstancek@redhat.com>
      Tested-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      47ed5eed
    • Amir Goldstein's avatar
      fuse: fix live lock in fuse_iget() · fde32bbe
      Amir Goldstein authored
      commit 775c5033 upstream.
      
      Commit 5d069dbe ("fuse: fix bad inode") replaced make_bad_inode()
      in fuse_iget() with a private implementation fuse_make_bad().
      
      The private implementation fails to remove the bad inode from inode
      cache, so the retry loop with iget5_locked() finds the same bad inode
      and marks it bad forever.
      
      kmsg snip:
      
      [ ] rcu: INFO: rcu_sched self-detected stall on CPU
      ...
      [ ]  ? bit_wait_io+0x50/0x50
      [ ]  ? fuse_init_file_inode+0x70/0x70
      [ ]  ? find_inode.isra.32+0x60/0xb0
      [ ]  ? fuse_init_file_inode+0x70/0x70
      [ ]  ilookup5_nowait+0x65/0x90
      [ ]  ? fuse_init_file_inode+0x70/0x70
      [ ]  ilookup5.part.36+0x2e/0x80
      [ ]  ? fuse_init_file_inode+0x70/0x70
      [ ]  ? fuse_inode_eq+0x20/0x20
      [ ]  iget5_locked+0x21/0x80
      [ ]  ? fuse_inode_eq+0x20/0x20
      [ ]  fuse_iget+0x96/0x1b0
      
      Fixes: 5d069dbe
      
       ("fuse: fix bad inode")
      Cc: stable@vger.kernel.org # 5.10+
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fde32bbe
    • Miklos Szeredi's avatar
      fuse: fix bad inode · 3a2f8823
      Miklos Szeredi authored
      commit 5d069dbe
      
       upstream.
      
      Jan Kara's analysis of the syzbot report (edited):
      
        The reproducer opens a directory on FUSE filesystem, it then attaches
        dnotify mark to the open directory.  After that a fuse_do_getattr() call
        finds that attributes returned by the server are inconsistent, and calls
        make_bad_inode() which, among other things does:
      
                inode->i_mode = S_IFREG;
      
        This then confuses dnotify which doesn't tear down its structures
        properly and eventually crashes.
      
      Avoid calling make_bad_inode() on a live inode: switch to a private flag on
      the fuse inode.  Also add the test to ops which the bad_inode_ops would
      have caught.
      
      This bug goes back to the initial merge of fuse in 2.6.14...
      
      Reported-by: default avatar <syzbot+f427adf9324b92652ccc@syzkaller.appspotmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Tested-by: default avatarJan Kara <jack@suse.cz>
      Cc: <stable@vger.kernel.org>
      [bwh: Backported to 4.9:
       - Drop changes in fuse_dir_fsync(), fuse_readahead(), fuse_evict_inode()
       - In fuse_get_link(), return ERR_PTR(-EIO) for bad inodes
       - Convert some additional calls to is_bad_inode()
       - Adjust filename, context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a2f8823
    • Theodore Ts'o's avatar
      ext4: don't use the orphan list when migrating an inode · ab5edcdd
      Theodore Ts'o authored
      commit 6eeaf88f
      
       upstream.
      
      We probably want to remove the indirect block to extents migration
      feature after a deprecation window, but until then, let's fix a
      potential data loss problem caused by the fact that we put the
      tmp_inode on the orphan list.  In the unlikely case where we crash and
      do a journal recovery, the data blocks belonging to the inode being
      migrated are also represented in the tmp_inode on the orphan list ---
      and so its data blocks will get marked unallocated, and available for
      reuse.
      
      Instead, stop putting the tmp_inode on the oprhan list.  So in the
      case where we crash while migrating the inode, we'll leak an inode,
      which is not a disaster.  It will be easily fixed the next time we run
      fsck, and it's better than potentially having blocks getting claimed
      by two different files, and losing data as a result.
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarLukas Czerner <lczerner@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab5edcdd
    • Ye Bin's avatar
      ext4: Fix BUG_ON in ext4_bread when write quota data · 581658a6
      Ye Bin authored
      commit 380a0091
      
       upstream.
      
      We got issue as follows when run syzkaller:
      [  167.936972] EXT4-fs error (device loop0): __ext4_remount:6314: comm rep: Abort forced by user
      [  167.938306] EXT4-fs (loop0): Remounting filesystem read-only
      [  167.981637] Assertion failure in ext4_getblk() at fs/ext4/inode.c:847: '(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) || handle != NULL || create == 0'
      [  167.983601] ------------[ cut here ]------------
      [  167.984245] kernel BUG at fs/ext4/inode.c:847!
      [  167.984882] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
      [  167.985624] CPU: 7 PID: 2290 Comm: rep Tainted: G    B             5.16.0-rc5-next-20211217+ #123
      [  167.986823] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
      [  167.988590] RIP: 0010:ext4_getblk+0x17e/0x504
      [  167.989189] Code: c6 01 74 28 49 c7 c0 a0 a3 5c 9b b9 4f 03 00 00 48 c7 c2 80 9c 5c 9b 48 c7 c6 40 b6 5c 9b 48 c7 c7 20 a4 5c 9b e8 77 e3 fd ff <0f> 0b 8b 04 244
      [  167.991679] RSP: 0018:ffff8881736f7398 EFLAGS: 00010282
      [  167.992385] RAX: 0000000000000094 RBX: 1ffff1102e6dee75 RCX: 0000000000000000
      [  167.993337] RDX: 0000000000000001 RSI: ffffffff9b6e29e0 RDI: ffffed102e6dee66
      [  167.994292] RBP: ffff88816a076210 R08: 0000000000000094 R09: ffffed107363fa09
      [  167.995252] R10: ffff88839b1fd047 R11: ffffed107363fa08 R12: ffff88816a0761e8
      [  167.996205] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000001
      [  167.997158] FS:  00007f6a1428c740(0000) GS:ffff88839b000000(0000) knlGS:0000000000000000
      [  167.998238] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  167.999025] CR2: 00007f6a140716c8 CR3: 0000000133216000 CR4: 00000000000006e0
      [  167.999987] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  168.000944] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  168.001899] Call Trace:
      [  168.002235]  <TASK>
      [  168.007167]  ext4_bread+0xd/0x53
      [  168.007612]  ext4_quota_write+0x20c/0x5c0
      [  168.010457]  write_blk+0x100/0x220
      [  168.010944]  remove_free_dqentry+0x1c6/0x440
      [  168.011525]  free_dqentry.isra.0+0x565/0x830
      [  168.012133]  remove_tree+0x318/0x6d0
      [  168.014744]  remove_tree+0x1eb/0x6d0
      [  168.017346]  remove_tree+0x1eb/0x6d0
      [  168.019969]  remove_tree+0x1eb/0x6d0
      [  168.022128]  qtree_release_dquot+0x291/0x340
      [  168.023297]  v2_release_dquot+0xce/0x120
      [  168.023847]  dquot_release+0x197/0x3e0
      [  168.024358]  ext4_release_dquot+0x22a/0x2d0
      [  168.024932]  dqput.part.0+0x1c9/0x900
      [  168.025430]  __dquot_drop+0x120/0x190
      [  168.025942]  ext4_clear_inode+0x86/0x220
      [  168.026472]  ext4_evict_inode+0x9e8/0xa22
      [  168.028200]  evict+0x29e/0x4f0
      [  168.028625]  dispose_list+0x102/0x1f0
      [  168.029148]  evict_inodes+0x2c1/0x3e0
      [  168.030188]  generic_shutdown_super+0xa4/0x3b0
      [  168.030817]  kill_block_super+0x95/0xd0
      [  168.031360]  deactivate_locked_super+0x85/0xd0
      [  168.031977]  cleanup_mnt+0x2bc/0x480
      [  168.033062]  task_work_run+0xd1/0x170
      [  168.033565]  do_exit+0xa4f/0x2b50
      [  168.037155]  do_group_exit+0xef/0x2d0
      [  168.037666]  __x64_sys_exit_group+0x3a/0x50
      [  168.038237]  do_syscall_64+0x3b/0x90
      [  168.038751]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      In order to reproduce this problem, the following conditions need to be met:
      1. Ext4 filesystem with no journal;
      2. Filesystem image with incorrect quota data;
      3. Abort filesystem forced by user;
      4. umount filesystem;
      
      As in ext4_quota_write:
      ...
               if (EXT4_SB(sb)->s_journal && !handle) {
                       ext4_msg(sb, KERN_WARNING, "Quota write (off=%llu, len=%llu)"
                               " cancelled because transaction is not started",
                               (unsigned long long)off, (unsigned long long)len);
                       return -EIO;
               }
      ...
      We only check handle if NULL when filesystem has journal. There is need
      check handle if NULL even when filesystem has no journal.
      
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20211223015506.297766-1-yebin10@huawei.com
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      581658a6
    • Luís Henriques's avatar
      ext4: set csum seed in tmp inode while migrating to extents · 0626ef10
      Luís Henriques authored
      commit e81c9302
      
       upstream.
      
      When migrating to extents, the temporary inode will have it's own checksum
      seed.  This means that, when swapping the inodes data, the inode checksums
      will be incorrect.
      
      This can be fixed by recalculating the extents checksums again.  Or simply
      by copying the seed into the temporary inode.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=213357
      Reported-by: default avatarJeroen van Wolffelaar <jeroen@wolffelaar.nl>
      Signed-off-by: default avatarLuís Henriques <lhenriques@suse.de>
      Link: https://lore.kernel.org/r/20211214175058.19511-1-lhenriques@suse.de
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0626ef10
    • Ilan Peer's avatar
      iwlwifi: mvm: Increase the scan timeout guard to 30 seconds · e0b2eba8
      Ilan Peer authored
      commit ced50f11
      
       upstream.
      
      With the introduction of 6GHz channels the scan guard timeout should
      be adjusted to account for the following extreme case:
      
      - All 6GHz channels are scanned passively: 58 channels.
      - The scan is fragmented with the following parameters: 3 fragments,
        95 TUs suspend time, 44 TUs maximal out of channel time.
      
      The above would result with scan time of more than 24 seconds. Thus,
      set the timeout to 30 seconds.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarIlan Peer <ilan.peer@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Link: https://lore.kernel.org/r/iwlwifi.20211210090244.3c851b93aef5.I346fa2e1d79220a6770496e773c6f87a2ad9e6c4@changeid
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e0b2eba8
    • Petr Cvachoucek's avatar
      ubifs: Error path in ubifs_remount_rw() seems to wrongly free write buffers · ab76022f
      Petr Cvachoucek authored
      commit 3fea4d9d upstream.
      
      it seems freeing the write buffers in the error path of the
      ubifs_remount_rw() is wrong. It leads later to a kernel oops like this:
      
      [10016.431274] UBIFS (ubi0:0): start fixing up free space
      [10090.810042] UBIFS (ubi0:0): free space fixup complete
      [10090.814623] UBIFS error (ubi0:0 pid 512): ubifs_remount_fs: cannot
      spawn "ubifs_bgt0_0", error -4
      [10101.915108] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started,
      PID 517
      [10105.275498] Unable to handle kernel NULL pointer dereference at
      virtual address 0000000000000030
      [10105.284352] Mem abort info:
      [10105.287160]   ESR = 0x96000006
      [10105.290252]   EC = 0x25: DABT (current EL), IL = 32 bits
      [10105.295592]   SET = 0, FnV = 0
      [10105.298652]   EA = 0, S1PTW = 0
      [10105.301848] Data abort info:
      [10105.304723]   ISV = 0, ISS = 0x00000006
      [10105.308573]   CM = 0, WnR = 0
      [10105.311564] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000f03d1000
      [10105.318034] [0000000000000030] pgd=00000000f6cee003,
      pud=00000000f4884003, pmd=0000000000000000
      [10105.326783] Internal error: Oops: 96000006 [#1] PREEMPT SMP
      [10105.332355] Modules linked in: ath10k_pci ath10k_core ath mac80211
      libarc4 cfg80211 nvme nvme_core cryptodev(O)
      [10105.342468] CPU: 3 PID: 518 Comm: touch Tainted: G           O
      5.4.3 #1
      [10105.349517] Hardware name: HYPEX CPU (DT)
      [10105.353525] pstate: 40000005 (nZcv daif -PAN -UAO)
      [10105.358324] pc : atomic64_try_cmpxchg_acquire.constprop.22+0x8/0x34
      [10105.364596] lr : mutex_lock+0x1c/0x34
      [10105.368253] sp : ffff000075633aa0
      [10105.371563] x29: ffff000075633aa0 x28: 0000000000000001
      [10105.376874] x27: ffff000076fa80c8 x26: 0000000000000004
      [10105.382185] x25: 0000000000000030 x24: 0000000000000000
      [10105.387495] x23: 0000000000000000 x22: 0000000000000038
      [10105.392807] x21: 000000000000000c x20: ffff000076fa80c8
      [10105.398119] x19: ffff000076fa8000 x18: 0000000000000000
      [10105.403429] x17: 0000000000000000 x16: 0000000000000000
      [10105.408741] x15: 0000000000000000 x14: fefefefefefefeff
      [10105.414052] x13: 0000000000000000 x12: 0000000000000fe0
      [10105.419364] x11: 0000000000000fe0 x10: ffff000076709020
      [10105.424675] x9 : 0000000000000000 x8 : 00000000000000a0
      [10105.429986] x7 : ffff000076fa80f4 x6 : 0000000000000030
      [10105.435297] x5 : 0000000000000000 x4 : 0000000000000000
      [10105.440609] x3 : 0000000000000000 x2 : ffff00006f276040
      [10105.445920] x1 : ffff000075633ab8 x0 : 0000000000000030
      [10105.451232] Call trace:
      [10105.453676]  atomic64_try_cmpxchg_acquire.constprop.22+0x8/0x34
      [10105.459600]  ubifs_garbage_collect+0xb4/0x334
      [10105.463956]  ubifs_budget_space+0x398/0x458
      [10105.468139]  ubifs_create+0x50/0x180
      [10105.471712]  path_openat+0x6a0/0x9b0
      [10105.475284]  do_filp_open+0x34/0x7c
      [10105.478771]  do_sys_open+0x78/0xe4
      [10105.482170]  __arm64_sys_openat+0x1c/0x24
      [10105.486180]  el0_svc_handler+0x84/0xc8
      [10105.489928]  el0_svc+0x8/0xc
      [10105.492808] Code: 52800013 17fffffb d2800003 f9800011 (c85ffc05)
      [10105.498903] ---[ end trace 46b721d93267a586 ]---
      
      To reproduce the problem:
      
      1. Filesystem initially mounted read-only, free space fixup flag set.
      
      2. mount -o remount,rw <mountpoint>
      
      3. it takes some time (free space fixup running)
          ... try to terminate running mount by CTRL-C
          ... does not respond, only after free space fixup is complete
          ... then "ubifs_remount_fs: cannot spawn "ubifs_bgt0_0", error -4"
      
      4. mount -o remount,rw <mountpoint>
          ... now finished instantly (fixup already done).
      
      5. Create file or just unmount the filesystem and we get the oops.
      
      Cc: <stable@vger.kernel.org>
      Fixes: b50b9f40
      
       ("UBIFS: do not free write-buffers when in R/O mode")
      Signed-off-by: default avatarPetr Cvachoucek <cvachoucek@gmail.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab76022f
    • Yauhen Kharuzhy's avatar
      power: bq25890: Enable continuous conversion for ADC at charging · 032e2829
      Yauhen Kharuzhy authored
      [ Upstream commit 80211be1
      
       ]
      
      Instead of one shot run of ADC at beginning of charging, run continuous
      conversion to ensure that all charging-related values are monitored
      properly (input voltage, input current, themperature etc.).
      
      Signed-off-by: default avatarYauhen Kharuzhy <jekhor@gmail.com>
      Reviewed-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarSebastian Reichel <sebastian.reichel@collabora.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      032e2829