Skip to content
  1. Jul 16, 2020
    • Junxiao Bi's avatar
      ocfs2: fix value of OCFS2_INVALID_SLOT · 6afad432
      Junxiao Bi authored
      commit 9277f833
      
       upstream.
      
      In the ocfs2 disk layout, slot number is 16 bits, but in ocfs2
      implementation, slot number is 32 bits.  Usually this will not cause any
      issue, because slot number is converted from u16 to u32, but
      OCFS2_INVALID_SLOT was defined as -1, when an invalid slot number from
      disk was obtained, its value was (u16)-1, and it was converted to u32.
      Then the following checking in get_local_system_inode will be always
      skipped:
      
       static struct inode **get_local_system_inode(struct ocfs2_super *osb,
                                                     int type,
                                                     u32 slot)
       {
       	BUG_ON(slot == OCFS2_INVALID_SLOT);
      	...
       }
      
      Link: http://lkml.kernel.org/r/20200616183829.87211-5-junxiao.bi@oracle.com
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      6afad432
    • Junxiao Bi's avatar
      ocfs2: load global_inode_alloc · 178275dd
      Junxiao Bi authored
      commit 7569d3c7
      
       upstream.
      
      Set global_inode_alloc as OCFS2_FIRST_ONLINE_SYSTEM_INODE, that will
      make it load during mount.  It can be used to test whether some
      global/system inodes are valid.  One use case is that nfsd will test
      whether root inode is valid.
      
      Link: http://lkml.kernel.org/r/20200616183829.87211-3-junxiao.bi@oracle.com
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      178275dd
    • Junxiao Bi's avatar
      ocfs2: avoid inode removal while nfsd is accessing it · 967b7e40
      Junxiao Bi authored
      commit 4cd9973f
      
       upstream.
      
      Patch series "ocfs2: fix nfsd over ocfs2 issues", v2.
      
      This is a series of patches to fix issues on nfsd over ocfs2.  patch 1
      is to avoid inode removed while nfsd access it patch 2 & 3 is to fix a
      panic issue.
      
      This patch (of 4):
      
      When nfsd is getting file dentry using handle or parent dentry of some
      dentry, one cluster lock is used to avoid inode removed from other node,
      but it still could be removed from local node, so use a rw lock to avoid
      this.
      
      Link: http://lkml.kernel.org/r/20200616183829.87211-1-junxiao.bi@oracle.com
      Link: http://lkml.kernel.org/r/20200616183829.87211-2-junxiao.bi@oracle.com
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      967b7e40
    • Waiman Long's avatar
      mm/slab: use memzero_explicit() in kzfree() · 539cafc4
      Waiman Long authored
      commit 8982ae52 upstream.
      
      The kzfree() function is normally used to clear some sensitive
      information, like encryption keys, in the buffer before freeing it back to
      the pool.  Memset() is currently used for buffer clearing.  However
      unlikely, there is still a non-zero probability that the compiler may
      choose to optimize away the memory clearing especially if LTO is being
      used in the future.
      
      To make sure that this optimization will never happen,
      memzero_explicit(), which is introduced in v3.18, is now used in
      kzfree() to future-proof it.
      
      Link: http://lkml.kernel.org/r/20200616154311.12314-2-longman@redhat.com
      Fixes: 3ef0e5ba
      
       ("slab: introduce kzfree()")
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      539cafc4
    • Filipe Manana's avatar
      btrfs: fix failure of RWF_NOWAIT write into prealloc extent beyond eof · 109e0b0d
      Filipe Manana authored
      commit 4b194628 upstream.
      
      If we attempt to write to prealloc extent located after eof using a
      RWF_NOWAIT write, we always fail with -EAGAIN.
      
      We do actually check if we have an allocated extent for the write at
      the start of btrfs_file_write_iter() through a call to check_can_nocow(),
      but later when we go into the actual direct IO write path we simply
      return -EAGAIN if the write starts at or beyond EOF.
      
      Trivial to reproduce:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ touch /mnt/foo
        $ chattr +C /mnt/foo
      
        $ xfs_io -d -c "pwrite -S 0xab 0 64K" /mnt/foo
        wrote 65536/65536 bytes at offset 0
        64 KiB, 16 ops; 0.0004 sec (135.575 MiB/sec and 34707.1584 ops/sec)
      
        $ xfs_io -c "falloc -k 64K 1M" /mnt/foo
      
        $ xfs_io -d -c "pwrite -N -V 1 -S 0xfe -b 64K 64K 64K" /mnt/foo
        pwrite: Resource temporarily unavailable
      
      On xfs and ext4 the write succeeds, as expected.
      
      Fix this by removing the wrong check at btrfs_direct_IO().
      
      Fixes: edf064e7
      
       ("btrfs: nowait aio support")
      CC: stable@vger.kernel.org # 4.14+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      109e0b0d
    • Filipe Manana's avatar
      btrfs: fix data block group relocation failure due to concurrent scrub · b39e286f
      Filipe Manana authored
      commit 432cd2a1
      
       upstream.
      
      When running relocation of a data block group while scrub is running in
      parallel, it is possible that the relocation will fail and abort the
      current transaction with an -EINVAL error:
      
         [134243.988595] BTRFS info (device sdc): found 14 extents, stage: move data extents
         [134243.999871] ------------[ cut here ]------------
         [134244.000741] BTRFS: Transaction aborted (error -22)
         [134244.001692] WARNING: CPU: 0 PID: 26954 at fs/btrfs/ctree.c:1071 __btrfs_cow_block+0x6a7/0x790 [btrfs]
         [134244.003380] Modules linked in: btrfs blake2b_generic xor raid6_pq (...)
         [134244.012577] CPU: 0 PID: 26954 Comm: btrfs Tainted: G        W         5.6.0-rc7-btrfs-next-58 #5
         [134244.014162] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
         [134244.016184] RIP: 0010:__btrfs_cow_block+0x6a7/0x790 [btrfs]
         [134244.017151] Code: 48 c7 c7 (...)
         [134244.020549] RSP: 0018:ffffa41607863888 EFLAGS: 00010286
         [134244.021515] RAX: 0000000000000000 RBX: ffff9614bdfe09c8 RCX: 0000000000000000
         [134244.022822] RDX: 0000000000000001 RSI: ffffffffb3d63980 RDI: 0000000000000001
         [134244.024124] RBP: ffff961589e8c000 R08: 0000000000000000 R09: 0000000000000001
         [134244.025424] R10: ffffffffc0ae5955 R11: 0000000000000000 R12: ffff9614bd530d08
         [134244.026725] R13: ffff9614ced41b88 R14: ffff9614bdfe2a48 R15: 0000000000000000
         [134244.028024] FS:  00007f29b63c08c0(0000) GS:ffff9615ba600000(0000) knlGS:0000000000000000
         [134244.029491] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
         [134244.030560] CR2: 00007f4eb339b000 CR3: 0000000130d6e006 CR4: 00000000003606f0
         [134244.031997] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
         [134244.033153] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
         [134244.034484] Call Trace:
         [134244.034984]  btrfs_cow_block+0x12b/0x2b0 [btrfs]
         [134244.035859]  do_relocation+0x30b/0x790 [btrfs]
         [134244.036681]  ? do_raw_spin_unlock+0x49/0xc0
         [134244.037460]  ? _raw_spin_unlock+0x29/0x40
         [134244.038235]  relocate_tree_blocks+0x37b/0x730 [btrfs]
         [134244.039245]  relocate_block_group+0x388/0x770 [btrfs]
         [134244.040228]  btrfs_relocate_block_group+0x161/0x2e0 [btrfs]
         [134244.041323]  btrfs_relocate_chunk+0x36/0x110 [btrfs]
         [134244.041345]  btrfs_balance+0xc06/0x1860 [btrfs]
         [134244.043382]  ? btrfs_ioctl_balance+0x27c/0x310 [btrfs]
         [134244.045586]  btrfs_ioctl_balance+0x1ed/0x310 [btrfs]
         [134244.045611]  btrfs_ioctl+0x1880/0x3760 [btrfs]
         [134244.049043]  ? do_raw_spin_unlock+0x49/0xc0
         [134244.049838]  ? _raw_spin_unlock+0x29/0x40
         [134244.050587]  ? __handle_mm_fault+0x11b3/0x14b0
         [134244.051417]  ? ksys_ioctl+0x92/0xb0
         [134244.052070]  ksys_ioctl+0x92/0xb0
         [134244.052701]  ? trace_hardirqs_off_thunk+0x1a/0x1c
         [134244.053511]  __x64_sys_ioctl+0x16/0x20
         [134244.054206]  do_syscall_64+0x5c/0x280
         [134244.054891]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
         [134244.055819] RIP: 0033:0x7f29b51c9dd7
         [134244.056491] Code: 00 00 00 (...)
         [134244.059767] RSP: 002b:00007ffcccc1dd08 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
         [134244.061168] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f29b51c9dd7
         [134244.062474] RDX: 00007ffcccc1dda0 RSI: 00000000c4009420 RDI: 0000000000000003
         [134244.063771] RBP: 0000000000000003 R08: 00005565cea4b000 R09: 0000000000000000
         [134244.065032] R10: 0000000000000541 R11: 0000000000000202 R12: 00007ffcccc2060a
         [134244.066327] R13: 00007ffcccc1dda0 R14: 0000000000000002 R15: 00007ffcccc1dec0
         [134244.067626] irq event stamp: 0
         [134244.068202] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
         [134244.069351] hardirqs last disabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020
         [134244.070909] softirqs last  enabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020
         [134244.072392] softirqs last disabled at (0): [<0000000000000000>] 0x0
         [134244.073432] ---[ end trace bd7c03622e0b0a99 ]---
      
      The -EINVAL error comes from the following chain of function calls:
      
        __btrfs_cow_block() <-- aborts the transaction
          btrfs_reloc_cow_block()
            replace_file_extents()
              get_new_location() <-- returns -EINVAL
      
      When relocating a data block group, for each allocated extent of the block
      group, we preallocate another extent (at prealloc_file_extent_cluster()),
      associated with the data relocation inode, and then dirty all its pages.
      These preallocated extents have, and must have, the same size that extents
      from the data block group being relocated have.
      
      Later before we start the relocation stage that updates pointers (bytenr
      field of file extent items) to point to the the new extents, we trigger
      writeback for the data relocation inode. The expectation is that writeback
      will write the pages to the previously preallocated extents, that it
      follows the NOCOW path. That is generally the case, however, if a scrub
      is running it may have turned the block group that contains those extents
      into RO mode, in which case writeback falls back to the COW path.
      
      However in the COW path instead of allocating exactly one extent with the
      expected size, the allocator may end up allocating several smaller extents
      due to free space fragmentation - because we tell it at cow_file_range()
      that the minimum allocation size can match the filesystem's sector size.
      This later breaks the relocation's expectation that an extent associated
      to a file extent item in the data relocation inode has the same size as
      the respective extent pointed by a file extent item in another tree - in
      this case the extent to which the relocation inode poins to is smaller,
      causing relocation.c:get_new_location() to return -EINVAL.
      
      For example, if we are relocating a data block group X that has a logical
      address of X and the block group has an extent allocated at the logical
      address X + 128KiB with a size of 64KiB:
      
      1) At prealloc_file_extent_cluster() we allocate an extent for the data
         relocation inode with a size of 64KiB and associate it to the file
         offset 128KiB (X + 128KiB - X) of the data relocation inode. This
         preallocated extent was allocated at block group Z;
      
      2) A scrub running in parallel turns block group Z into RO mode and
         starts scrubing its extents;
      
      3) Relocation triggers writeback for the data relocation inode;
      
      4) When running delalloc (btrfs_run_delalloc_range()), we try first the
         NOCOW path because the data relocation inode has BTRFS_INODE_PREALLOC
         set in its flags. However, because block group Z is in RO mode, the
         NOCOW path (run_delalloc_nocow()) falls back into the COW path, by
         calling cow_file_range();
      
      5) At cow_file_range(), in the first iteration of the while loop we call
         btrfs_reserve_extent() to allocate a 64KiB extent and pass it a minimum
         allocation size of 4KiB (fs_info->sectorsize). Due to free space
         fragmentation, btrfs_reserve_extent() ends up allocating two extents
         of 32KiB each, each one on a different iteration of that while loop;
      
      6) Writeback of the data relocation inode completes;
      
      7) Relocation proceeds and ends up at relocation.c:replace_file_extents(),
         with a leaf which has a file extent item that points to the data extent
         from block group X, that has a logical address (bytenr) of X + 128KiB
         and a size of 64KiB. Then it calls get_new_location(), which does a
         lookup in the data relocation tree for a file extent item starting at
         offset 128KiB (X + 128KiB - X) and belonging to the data relocation
         inode. It finds a corresponding file extent item, however that item
         points to an extent that has a size of 32KiB, which doesn't match the
         expected size of 64KiB, resuling in -EINVAL being returned from this
         function and propagated up to __btrfs_cow_block(), which aborts the
         current transaction.
      
      To fix this make sure that at cow_file_range() when we call the allocator
      we pass it a minimum allocation size corresponding the desired extent size
      if the inode belongs to the data relocation tree, otherwise pass it the
      filesystem's sector size as the minimum allocation size.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      b39e286f
    • Matt Fleming's avatar
      x86/asm/64: Align start of __clear_user() loop to 16-bytes · d3806c61
      Matt Fleming authored
      commit bb5570ad upstream.
      
      x86 CPUs can suffer severe performance drops if a tight loop, such as
      the ones in __clear_user(), straddles a 16-byte instruction fetch
      window, or worse, a 64-byte cacheline. This issues was discovered in the
      SUSE kernel with the following commit,
      
        11539337 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants")
      
      which increased the code object size from 10 bytes to 15 bytes and
      caused the 8-byte copy loop in __clear_user() to be split across a
      64-byte cacheline.
      
      Aligning the start of the loop to 16-bytes makes this fit neatly inside
      a single instruction fetch window again and restores the performance of
      __clear_user() which is used heavily when reading from /dev/zero.
      
      Here are some numbers from running libmicro's read_z* and pread_z*
      microbenchmarks which read from /dev/zero:
      
        Zen 1 (Naples)
      
        libmicro-file
                                              5.7.0-rc6              5.7.0-rc6              5.7.0-rc6
                                                          revert-11539337+               align16+
        Time mean95-pread_z100k       9.9195 (   0.00%)      5.9856 (  39.66%)      5.9938 (  39.58%)
        Time mean95-pread_z10k        1.1378 (   0.00%)      0.7450 (  34.52%)      0.7467 (  34.38%)
        Time mean95-pread_z1k         0.2623 (   0.00%)      0.2251 (  14.18%)      0.2252 (  14.15%)
        Time mean95-pread_zw100k      9.9974 (   0.00%)      6.0648 (  39.34%)      6.0756 (  39.23%)
        Time mean95-read_z100k        9.8940 (   0.00%)      5.9885 (  39.47%)      5.9994 (  39.36%)
        Time mean95-read_z10k         1.1394 (   0.00%)      0.7483 (  34.33%)      0.7482 (  34.33%)
      
      Note that this doesn't affect Haswell or Broadwell microarchitectures
      which seem to avoid the alignment issue by executing the loop straight
      out of the Loop Stream Detector (verified using perf events).
      
      Fixes: 11539337
      
       ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants")
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org> # v4.19+
      Link: https://lkml.kernel.org/r/20200618102002.30034-1-matt@codeblueprint.co.uk
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      d3806c61
    • Kees Cook's avatar
      x86/cpu: Use pinning mask for CR4 bits needing to be 0 · fec14c85
      Kees Cook authored
      commit a13b9d0b
      
       upstream.
      
      The X86_CR4_FSGSBASE bit of CR4 should not change after boot[1]. Older
      kernels should enforce this bit to zero, and newer kernels need to
      enforce it depending on boot-time configuration (e.g. "nofsgsbase").
      To support a pinned bit being either 1 or 0, use an explicit mask in
      combination with the expected pinned bit values.
      
      [1] https://lore.kernel.org/lkml/20200527103147.GI325280@hirez.programming.kicks-ass.net
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/202006082013.71E29A42@keescook
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      fec14c85
    • Sean Christopherson's avatar
      KVM: nVMX: Plumb L2 GPA through to PML emulation · 7bc7a239
      Sean Christopherson authored
      commit 2dbebf7a upstream.
      
      Explicitly pass the L2 GPA to kvm_arch_write_log_dirty(), which for all
      intents and purposes is vmx_write_pml_buffer(), instead of having the
      latter pull the GPA from vmcs.GUEST_PHYSICAL_ADDRESS.  If the dirty bit
      update is the result of KVM emulation (rare for L2), then the GPA in the
      VMCS may be stale and/or hold a completely unrelated GPA.
      
      Fixes: c5f983f6
      
       ("nVMX: Implement emulated Page Modification Logging")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200622215832.22090-2-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      7bc7a239
    • Xiaoyao Li's avatar
      KVM: X86: Fix MSR range of APIC registers in X2APIC mode · e8421cc7
      Xiaoyao Li authored
      commit bf10bd0b upstream.
      
      Only MSR address range 0x800 through 0x8ff is architecturally reserved
      and dedicated for accessing APIC registers in x2APIC mode.
      
      Fixes: 0105d1a5
      
       ("KVM: x2apic interface to lapic")
      Signed-off-by: default avatarXiaoyao Li <xiaoyao.li@intel.com>
      Message-Id: <20200616073307.16440-1-xiaoyao.li@intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      e8421cc7
    • Paolo Bonzini's avatar
      KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack it · 5c7f82ae
      Paolo Bonzini authored
      commit b07a5c53
      
       upstream.
      
      If X86_FEATURE_RTM is disabled, the guest should not be able to access
      MSR_IA32_TSX_CTRL.  We can therefore use it in KVM to force all
      transactions from the guest to abort.
      
      Tested-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      5c7f82ae
    • Paolo Bonzini's avatar
      KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality · edede026
      Paolo Bonzini authored
      commit c11f83e0
      
       upstream.
      
      The current guest mitigation of TAA is both too heavy and not really
      sufficient.  It is too heavy because it will cause some affected CPUs
      (those that have MDS_NO but lack TAA_NO) to fall back to VERW and
      get the corresponding slowdown.  It is not really sufficient because
      it will cause the MDS_NO bit to disappear upon microcode update, so
      that VMs started before the microcode update will not be runnable
      anymore afterwards, even with tsx=on.
      
      Instead, if tsx=on on the host, we can emulate MSR_IA32_TSX_CTRL for
      the guest and let it run without the VERW mitigation.  Even though
      MSR_IA32_TSX_CTRL is quite heavyweight, and we do not want to write
      it on every vmentry, we can use the shared MSR functionality because
      the host kernel need not protect itself from TSX-based side-channels.
      
      Tested-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      edede026
    • Gao Xiang's avatar
      erofs: fix partially uninitialized misuse in z_erofs_onlinepage_fixup · 5fc06f87
      Gao Xiang authored
      commit 3c597282
      
       upstream.
      
      Hongyu reported "id != index" in z_erofs_onlinepage_fixup() with
      specific aarch64 environment easily, which wasn't shown before.
      
      After digging into that, I found that high 32 bits of page->private
      was set to 0xaaaaaaaa rather than 0 (due to z_erofs_onlinepage_init
      behavior with specific compiler options). Actually we only use low
      32 bits to keep the page information since page->private is only 4
      bytes on most 32-bit platforms. However z_erofs_onlinepage_fixup()
      uses the upper 32 bits by mistake.
      
      Let's fix it now.
      
      Reported-and-tested-by: default avatarHongyu Jin <hongyu.jin@unisoc.com>
      Fixes: 3883a79a
      
       ("staging: erofs: introduce VLE decompression support")
      Cc: <stable@vger.kernel.org> # 4.19+
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Link: https://lore.kernel.org/r/20200618234349.22553-1-hsiangkao@aol.com
      Signed-off-by: default avatarGao Xiang <hsiangkao@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [PG: use v4.19.131 stable version (pre relocate out of staging).]
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      5fc06f87
    • Nathan Chancellor's avatar
      ACPI: sysfs: Fix pm_profile_attr type · 10a10c0e
      Nathan Chancellor authored
      commit e6d701dc upstream.
      
      When running a kernel with Clang's Control Flow Integrity implemented,
      there is a violation that happens when accessing
      /sys/firmware/acpi/pm_profile:
      
      $ cat /sys/firmware/acpi/pm_profile
      0
      
      $ dmesg
      ...
      [   17.352564] ------------[ cut here ]------------
      [   17.352568] CFI failure (target: acpi_show_profile+0x0/0x8):
      [   17.352572] WARNING: CPU: 3 PID: 497 at kernel/cfi.c:29 __cfi_check_fail+0x33/0x40
      [   17.352573] Modules linked in:
      [   17.352575] CPU: 3 PID: 497 Comm: cat Tainted: G        W         5.7.0-microsoft-standard+ #1
      [   17.352576] RIP: 0010:__cfi_check_fail+0x33/0x40
      [   17.352577] Code: 48 c7 c7 50 b3 85 84 48 c7 c6 50 0a 4e 84 e8 a4 d8 60 00 85 c0 75 02 5b c3 48 c7 c7 dc 5e 49 84 48 89 de 31 c0 e8 7d 06 eb ff <0f> 0b 5b c3 00 00 cc cc 00 00 cc cc 00 85 f6 74 25 41 b9 ea ff ff
      [   17.352577] RSP: 0018:ffffaa6dc3c53d30 EFLAGS: 00010246
      [   17.352578] RAX: 331267e0c06cee00 RBX: ffffffff83d85890 RCX: ffffffff8483a6f8
      [   17.352579] RDX: ffff9cceabbb37c0 RSI: 0000000000000082 RDI: ffffffff84bb9e1c
      [   17.352579] RBP: ffffffff845b2bc8 R08: 0000000000000001 R09: ffff9cceabbba200
      [   17.352579] R10: 000000000000019d R11: 0000000000000000 R12: ffff9cc947766f00
      [   17.352580] R13: ffffffff83d6bd50 R14: ffff9ccc6fa80000 R15: ffffffff845bd328
      [   17.352582] FS:  00007fdbc8d13580(0000) GS:ffff9cce91ac0000(0000) knlGS:0000000000000000
      [   17.352582] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   17.352583] CR2: 00007fdbc858e000 CR3: 00000005174d0000 CR4: 0000000000340ea0
      [   17.352584] Call Trace:
      [   17.352586]  ? rev_id_show+0x8/0x8
      [   17.352587]  ? __cfi_check+0x45bac/0x4b640
      [   17.352589]  ? kobj_attr_show+0x73/0x80
      [   17.352590]  ? sysfs_kf_seq_show+0xc1/0x140
      [   17.352592]  ? ext4_seq_options_show.cfi_jt+0x8/0x8
      [   17.352593]  ? seq_read+0x180/0x600
      [   17.352595]  ? sysfs_create_file_ns.cfi_jt+0x10/0x10
      [   17.352596]  ? tlbflush_read_file+0x8/0x8
      [   17.352597]  ? __vfs_read+0x6b/0x220
      [   17.352598]  ? handle_mm_fault+0xa23/0x11b0
      [   17.352599]  ? vfs_read+0xa2/0x130
      [   17.352599]  ? ksys_read+0x6a/0xd0
      [   17.352601]  ? __do_sys_getpgrp+0x8/0x8
      [   17.352602]  ? do_syscall_64+0x72/0x120
      [   17.352603]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   17.352604] ---[ end trace 7b1fa81dc897e419 ]---
      
      When /sys/firmware/acpi/pm_profile is read, sysfs_kf_seq_show is called,
      which in turn calls kobj_attr_show, which gets the ->show callback
      member by calling container_of on attr (casting it to struct
      kobj_attribute) then calls it.
      
      There is a CFI violation because pm_profile_attr is of type
      struct device_attribute but kobj_attr_show calls ->show expecting it
      to be from struct kobj_attribute. CFI checking ensures that function
      pointer types match when doing indirect calls. Fix pm_profile_attr to
      be defined in terms of kobj_attribute so there is no violation or
      mismatch.
      
      Fixes: 362b6460
      
       ("ACPI: Export FADT pm_profile integer value to userspace")
      Link: https://github.com/ClangBuiltLinux/linux/issues/1051
      Reported-by: default avataryuu ichii <byahu140@heisei.be>
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      10a10c0e
    • Kai-Heng Feng's avatar
      ALSA: hda/realtek: Add mute LED and micmute LED support for HP systems · 81a6bdb5
      Kai-Heng Feng authored
      commit b2c22910
      
       upstream.
      
      There are two more HP systems control mute LED from HDA codec and need
      to expose micmute led class so SoF can control micmute LED.
      
      Add quirks to support them.
      
      Signed-off-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20200617102906.16156-2-kai.heng.feng@canonical.com
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      81a6bdb5
    • Takashi Iwai's avatar
      ALSA: hda/realtek - Add quirk for MSI GE63 laptop · c9a18bdc
      Takashi Iwai authored
      commit a0b03952
      
       upstream.
      
      MSI GE63 laptop with ALC1220 codec requires the very same quirk
      (ALC1220_FIXUP_CLEVO_P950) as other MSI devices for the proper sound
      output.
      
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=208057
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20200616132150.8778-1-tiwai@suse.de
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      c9a18bdc
    • Aaron Plattner's avatar
      ALSA: hda: Add NVIDIA codec IDs 9a & 9d through a0 to patch table · d5327252
      Aaron Plattner authored
      commit adb36a82
      
       upstream.
      
      These IDs are for upcoming NVIDIA chips with audio functions that are largely
      similar to the existing ones.
      
      Signed-off-by: default avatarAaron Plattner <aplattner@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20200611180845.39942-1-aplattner@nvidia.com
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      d5327252
    • Yash Shah's avatar
      RISC-V: Don't allow write+exec only page mapping request in mmap · 02919bb1
      Yash Shah authored
      commit e0d17c84
      
       upstream.
      
      As per the table 4.4 of version "20190608-Priv-MSU-Ratified" of the
      RISC-V instruction set manual[0], the PTE permission bit combination of
      "write+exec only" is reserved for future use. Hence, don't allow such
      mapping request in mmap call.
      
      An issue is been reported by David Abdurachmanov, that while running
      stress-ng with "sysbadaddr" argument, RCU stalls are observed on RISC-V
      specific kernel.
      
      This issue arises when the stress-sysbadaddr request for pages with
      "write+exec only" permission bits and then passes the address obtain
      from this mmap call to various system call. For the riscv kernel, the
      mmap call should fail for this particular combination of permission bits
      since it's not valid.
      
      [0]: http://dabbelt.com/~palmer/keep/riscv-isa-manual/riscv-privileged-20190608-1.pdf
      
      Signed-off-by: default avatarYash Shah <yash.shah@sifive.com>
      Reported-by: default avatarDavid Abdurachmanov <david.abdurachmanov@gmail.com>
      [Palmer: Refer to the latest ISA specification at the only link I could
      find, and update the terminology.]
      Signed-off-by: default avatarPalmer Dabbelt <palmerdabbelt@google.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      02919bb1
    • Weiping Zhang's avatar
      block: update hctx map when use multiple maps · ba38f63a
      Weiping Zhang authored
      commit fe35ec58
      
       upstream.
      
      There is an issue when tune the number for read and write queues,
      if the total queue count was not changed. The hctx->type cannot
      be updated, since __blk_mq_update_nr_hw_queues will return directly
      if the total queue count has not been changed.
      
      Reproduce:
      
      dmesg | grep "default/read/poll"
      [    2.607459] nvme nvme0: 48/0/0 default/read/poll queues
      cat /sys/kernel/debug/block/nvme0n1/hctx*/type | sort | uniq -c
           48 default
      
      tune the write queues to 24:
      echo 24 > /sys/module/nvme/parameters/write_queues
      echo 1 > /sys/block/nvme0n1/device/reset_controller
      
      dmesg | grep "default/read/poll"
      [  433.547235] nvme nvme0: 24/24/0 default/read/poll queues
      
      cat /sys/kernel/debug/block/nvme0n1/hctx*/type | sort | uniq -c
           48 default
      
      The driver's hardware queue mapping is not same as block layer.
      
      Signed-off-by: default avatarWeiping Zhang <zhangweiping@didiglobal.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      ba38f63a
    • Luis Chamberlain's avatar
      blktrace: break out of blktrace setup on concurrent calls · 40b2dab9
      Luis Chamberlain authored
      commit 1b0b2836
      
       upstream.
      
      We use one blktrace per request_queue, that means one per the entire
      disk.  So we cannot run one blktrace on say /dev/vda and then /dev/vda1,
      or just two calls on /dev/vda.
      
      We check for concurrent setup only at the very end of the blktrace setup though.
      
      If we try to run two concurrent blktraces on the same block device the
      second one will fail, and the first one seems to go on. However when
      one tries to kill the first one one will see things like this:
      
      The kernel will show these:
      
      ```
      debugfs: File 'dropped' in directory 'nvme1n1' already present!
      debugfs: File 'msg' in directory 'nvme1n1' already present!
      debugfs: File 'trace0' in directory 'nvme1n1' already present!
      ``
      
      And userspace just sees this error message for the second call:
      
      ```
      blktrace /dev/nvme1n1
      BLKTRACESETUP(2) /dev/nvme1n1 failed: 5/Input/output error
      ```
      
      The first userspace process #1 will also claim that the files
      were taken underneath their nose as well. The files are taken
      away form the first process given that when the second blktrace
      fails, it will follow up with a BLKTRACESTOP and BLKTRACETEARDOWN.
      This means that even if go-happy process #1 is waiting for blktrace
      data, we *have* been asked to take teardown the blktrace.
      
      This can easily be reproduced with break-blktrace [0] run_0005.sh test.
      
      Just break out early if we know we're already going to fail, this will
      prevent trying to create the files all over again, which we know still
      exist.
      
      [0] https://github.com/mcgrof/break-blktrace
      
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      40b2dab9
    • Masami Hiramatsu's avatar
      kprobes: Suppress the suspicious RCU warning on kprobes · 8aa52599
      Masami Hiramatsu authored
      commit 6743ad43
      
       upstream.
      
      Anders reported that the lockdep warns that suspicious
      RCU list usage in register_kprobe() (detected by
      CONFIG_PROVE_RCU_LIST.) This is because get_kprobe()
      access kprobe_table[] by hlist_for_each_entry_rcu()
      without rcu_read_lock.
      
      If we call get_kprobe() from the breakpoint handler context,
      it is run with preempt disabled, so this is not a problem.
      But in other cases, instead of rcu_read_lock(), we locks
      kprobe_mutex so that the kprobe_table[] is not updated.
      So, current code is safe, but still not good from the view
      point of RCU.
      
      Joel suggested that we can silent that warning by passing
      lockdep_is_held() to the last argument of
      hlist_for_each_entry_rcu().
      
      Add lockdep_is_held(&kprobe_mutex) at the end of the
      hlist_for_each_entry_rcu() to suppress the warning.
      
      Link: http://lkml.kernel.org/r/158927055350.27680.10261450713467997503.stgit@devnote2
      
      Reported-by: default avatarAnders Roxell <anders.roxell@linaro.org>
      Suggested-by: default avatarJoel Fernandes <joel@joelfernandes.org>
      Reviewed-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      8aa52599
    • Masahiro Yamada's avatar
      kbuild: improve cc-option to clean up all temporary files · aa5bbaec
      Masahiro Yamada authored
      commit f2f02ebd
      
       upstream.
      
      When cc-option and friends evaluate compiler flags, the temporary file
      $$TMP is created as an output object, and automatically cleaned up.
      The actual file path of $$TMP is .<pid>.tmp, here <pid> is the process
      ID of $(shell ...) invoked from cc-option. (Please note $$$$ is the
      escape sequence of $$).
      
      Such garbage files are cleaned up in most cases, but some compiler flags
      create additional output files.
      
      For example, -gsplit-dwarf creates a .dwo file.
      
      When CONFIG_DEBUG_INFO_SPLIT=y, you will see a bunch of .<pid>.dwo files
      left in the top of build directories. You may not notice them unless you
      do 'ls -a', but the garbage files will increase every time you run 'make'.
      
      This commit changes the temporary object path to .tmp_<pid>/tmp, and
      removes .tmp_<pid> directory when exiting. Separate build artifacts such
      as *.dwo will be cleaned up all together because their file paths are
      usually determined based on the base name of the object.
      
      Another example is -ftest-coverage, which outputs the coverage data into
      <base-name-of-object>.gcno
      
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      aa5bbaec
    • Will Deacon's avatar
      arm64: sve: Fix build failure when ARM64_SVE=y and SYSCTL=n · 2a7aeb4e
      Will Deacon authored
      commit e575fb9e upstream.
      
      When I squashed the 'allnoconfig' compiler warning about the
      set_sve_default_vl() function being defined but not used in commit
      1e570f51
      
       ("arm64/sve: Eliminate data races on sve_default_vl"), I
      accidentally broke the build for configs where ARM64_SVE is enabled, but
      SYSCTL is not.
      
      Fix this by only compiling the SVE sysctl support if both CONFIG_SVE=y
      and CONFIG_SYSCTL=y.
      
      Cc: Dave Martin <Dave.Martin@arm.com>
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Link: https://lore.kernel.org/r/20200616131808.GA1040@lca.pw
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      2a7aeb4e
    • Vincenzo Frascino's avatar
      s390/vdso: fix vDSO clock_getres() · 5a9b8950
      Vincenzo Frascino authored
      commit 478237a5
      
       upstream.
      
      clock_getres in the vDSO library has to preserve the same behaviour
      of posix_get_hrtimer_res().
      
      In particular, posix_get_hrtimer_res() does:
          sec = 0;
          ns = hrtimer_resolution;
      and hrtimer_resolution depends on the enablement of the high
      resolution timers that can happen either at compile or at run time.
      
      Fix the s390 vdso implementation of clock_getres keeping a copy of
      hrtimer_resolution in vdso data and using that directly.
      
      Link: https://lkml.kernel.org/r/20200324121027.21665-1-vincenzo.frascino@arm.com
      Signed-off-by: default avatarVincenzo Frascino <vincenzo.frascino@arm.com>
      Acked-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      [heiko.carstens@de.ibm.com: use llgf for proper zero extension]
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      5a9b8950
    • Nathan Chancellor's avatar
      s390/vdso: Use $(LD) instead of $(CC) to link vDSO · 7b8539a3
      Nathan Chancellor authored
      commit 2b2a2584
      
       upstream.
      
      Currently, the VDSO is being linked through $(CC). This does not match
      how the rest of the kernel links objects, which is through the $(LD)
      variable.
      
      When clang is built in a default configuration, it first attempts to use
      the target triple's default linker, which is just ld. However, the user
      can override this through the CLANG_DEFAULT_LINKER cmake define so that
      clang uses another linker by default, such as LLVM's own linker, ld.lld.
      This can be useful to get more optimized links across various different
      projects.
      
      However, this is problematic for the s390 vDSO because ld.lld does not
      have any s390 emulatiom support:
      
      https://github.com/llvm/llvm-project/blob/llvmorg-10.0.1-rc1/lld/ELF/Driver.cpp#L132-L150
      
      Thus, if a user is using a toolchain with ld.lld as the default, they
      will see an error, even if they have specified ld.bfd through the LD
      make variable:
      
      $ make -j"$(nproc)" -s ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- LLVM=1 \
                             LD=s390x-linux-gnu-ld \
                             defconfig arch/s390/kernel/vdso64/
      ld.lld: error: unknown emulation: elf64_s390
      clang-11: error: linker command failed with exit code 1 (use -v to see invocation)
      
      Normally, '-fuse-ld=bfd' could be used to get around this; however, this
      can be fragile, depending on paths and variable naming. The cleaner
      solution for the kernel is to take advantage of the fact that $(LD) can
      be invoked directly, which bypasses the heuristics of $(CC) and respects
      the user's choice. Similar changes have been done for ARM, ARM64, and
      MIPS.
      
      Link: https://lkml.kernel.org/r/20200602192523.32758-1-natechancellor@gmail.com
      Link: https://github.com/ClangBuiltLinux/linux/issues/1041
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      [heiko.carstens@de.ibm.com: add --build-id flag]
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      7b8539a3
    • Sven Schnelle's avatar
      s390/ptrace: fix setting syscall number · 20bbccad
      Sven Schnelle authored
      commit 873e5a76
      
       upstream.
      
      When strace wants to update the syscall number, it sets GPR2
      to the desired number and updates the GPR via PTRACE_SETREGSET.
      It doesn't update regs->int_code which would cause the old syscall
      executed on syscall restart. As we cannot change the ptrace ABI and
      don't have a field for the interruption code, check whether the tracee
      is in a syscall and the last instruction was svc. In that case assume
      that the tracer wants to update the syscall number and copy the GPR2
      value to regs->int_code.
      
      Signed-off-by: default avatarSven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      20bbccad
    • Sven Schnelle's avatar
      s390/ptrace: pass invalid syscall numbers to tracing · 377141e1
      Sven Schnelle authored
      commit 00332c16
      
       upstream.
      
      tracing expects to see invalid syscalls, so pass it through.
      The syscall path in entry.S checks the syscall number before
      looking up the handler, so it is still safe.
      
      Signed-off-by: default avatarSven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      377141e1
    • Aditya Pakki's avatar
      test_objagg: Fix potential memory leak in error handling · a06f07fc
      Aditya Pakki authored
      commit a6379f0a
      
       upstream.
      
      In case of failure of check_expect_hints_stats(), the resources
      allocated by objagg_hints_get should be freed. The patch fixes
      this issue.
      
      Signed-off-by: default avatarAditya Pakki <pakki001@umn.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      a06f07fc
    • Zekun Shen's avatar
      net: alx: fix race condition in alx_remove · 360857fc
      Zekun Shen authored
      commit e89df5c4
      
       upstream.
      
      There is a race condition exist during termination. The path is
      alx_stop and then alx_remove. An alx_schedule_link_check could be called
      before alx_stop by interrupt handler and invoke alx_link_check later.
      Alx_stop frees the napis, and alx_remove cancels any pending works.
      If any of the work is scheduled before termination and invoked before
      alx_remove, a null-ptr-deref occurs because both expect alx->napis[i].
      
      This patch fix the race condition by moving cancel_work_sync functions
      before alx_free_napis inside alx_stop. Because interrupt handler can call
      alx_schedule_link_check again, alx_free_irq is moved before
      cancel_work_sync calls too.
      
      Signed-off-by: default avatarZekun Shen <bruceshenzk@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      360857fc
    • Thomas Falcon's avatar
      ibmvnic: Harden device login requests · f51a8bb4
      Thomas Falcon authored
      commit dff515a3
      
       upstream.
      
      The VNIC driver's "login" command sequence is the final step
      in the driver's initialization process with device firmware,
      confirming the available device queue resources to be utilized
      by the driver. Under high system load, firmware may not respond
      to the request in a timely manner or may abort the request. In
      such cases, the driver should reattempt the login command
      sequence. In case of a device error, the number of retries
      is bounded.
      
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      f51a8bb4
    • Dinghao Liu's avatar
      hwrng: ks-sa - Fix runtime PM imbalance on error · c135e920
      Dinghao Liu authored
      commit 95459261
      
       upstream.
      
      pm_runtime_get_sync() increments the runtime PM usage counter even
      the call returns an error code. Thus a pairing decrement is needed
      on the error handling path to keep the counter balanced.
      
      Signed-off-by: default avatarDinghao Liu <dinghao.liu@zju.edu.cn>
      Reviewed-by: default avatarAlexander Sverdlin <alexander.sverdlin@nokia.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      c135e920
    • Nathan Huckleberry's avatar
      riscv/atomic: Fix sign extension for RV64I · 1c449636
      Nathan Huckleberry authored
      commit 6c58f25e
      
       upstream.
      
      The argument passed to cmpxchg is not guaranteed to be sign
      extended, but lr.w sign extends on RV64I. This makes cmpxchg
      fail on clang built kernels when __old is negative.
      
      To fix this, we just cast __old to long which sign extends on
      RV64I. With this fix, clang built RISC-V kernels now boot.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/867
      Signed-off-by: default avatarNathan Huckleberry <nhuck@google.com>
      Signed-off-by: default avatarPalmer Dabbelt <palmerdabbelt@google.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      1c449636
    • Denis Efremov's avatar
      drm/amd/display: Use kfree() to free rgb_user in calculate_user_regamma_ramp() · c92e32d8
      Denis Efremov authored
      commit 43a56277
      
       upstream.
      
      Use kfree() instead of kvfree() to free rgb_user in
      calculate_user_regamma_ramp() because the memory is allocated with
      kcalloc().
      
      Signed-off-by: default avatarDenis Efremov <efremov@linux.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      c92e32d8
    • Ye Bin's avatar
      ata/libata: Fix usage of page address by page_address in ata_scsi_mode_select_xlat function · e2f71e14
      Ye Bin authored
      commit f650ef61
      
       upstream.
      
      BUG: KASAN: use-after-free in ata_scsi_mode_select_xlat+0x10bd/0x10f0
      drivers/ata/libata-scsi.c:4045
      Read of size 1 at addr ffff88803b8cd003 by task syz-executor.6/12621
      
      CPU: 1 PID: 12621 Comm: syz-executor.6 Not tainted 4.19.95 #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      1.10.2-1ubuntu1 04/01/2014
      Call Trace:
      __dump_stack lib/dump_stack.c:77 [inline]
      dump_stack+0xac/0xee lib/dump_stack.c:118
      print_address_description+0x60/0x223 mm/kasan/report.c:253
      kasan_report_error mm/kasan/report.c:351 [inline]
      kasan_report mm/kasan/report.c:409 [inline]
      kasan_report.cold+0xae/0x2d8 mm/kasan/report.c:393
      ata_scsi_mode_select_xlat+0x10bd/0x10f0 drivers/ata/libata-scsi.c:4045
      ata_scsi_translate+0x2da/0x680 drivers/ata/libata-scsi.c:2035
      __ata_scsi_queuecmd drivers/ata/libata-scsi.c:4360 [inline]
      ata_scsi_queuecmd+0x2e4/0x790 drivers/ata/libata-scsi.c:4409
      scsi_dispatch_cmd+0x2ee/0x6c0 drivers/scsi/scsi_lib.c:1867
      scsi_queue_rq+0xfd7/0x1990 drivers/scsi/scsi_lib.c:2170
      blk_mq_dispatch_rq_list+0x1e1/0x19a0 block/blk-mq.c:1186
      blk_mq_do_dispatch_sched+0x147/0x3d0 block/blk-mq-sched.c:108
      blk_mq_sched_dispatch_requests+0x427/0x680 block/blk-mq-sched.c:204
      __blk_mq_run_hw_queue+0xbc/0x200 block/blk-mq.c:1308
      __blk_mq_delay_run_hw_queue+0x3c0/0x460 block/blk-mq.c:1376
      blk_mq_run_hw_queue+0x152/0x310 block/blk-mq.c:1413
      blk_mq_sched_insert_request+0x337/0x6c0 block/blk-mq-sched.c:397
      blk_execute_rq_nowait+0x124/0x320 block/blk-exec.c:64
      blk_execute_rq+0xc5/0x112 block/blk-exec.c:101
      sg_scsi_ioctl+0x3b0/0x6a0 block/scsi_ioctl.c:507
      sg_ioctl+0xd37/0x23f0 drivers/scsi/sg.c:1106
      vfs_ioctl fs/ioctl.c:46 [inline]
      file_ioctl fs/ioctl.c:501 [inline]
      do_vfs_ioctl+0xae6/0x1030 fs/ioctl.c:688
      ksys_ioctl+0x76/0xa0 fs/ioctl.c:705
      __do_sys_ioctl fs/ioctl.c:712 [inline]
      __se_sys_ioctl fs/ioctl.c:710 [inline]
      __x64_sys_ioctl+0x6f/0xb0 fs/ioctl.c:710
      do_syscall_64+0xa0/0x2e0 arch/x86/entry/common.c:293
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x45c479
      Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89
      f7 48
      89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f
      83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fb0e9602c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00007fb0e96036d4 RCX: 000000000045c479
      RDX: 0000000020000040 RSI: 0000000000000001 RDI: 0000000000000003
      RBP: 000000000076bfc0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 000000000000046d R14: 00000000004c6e1a R15: 000000000076bfcc
      
      Allocated by task 12577:
      set_track mm/kasan/kasan.c:460 [inline]
      kasan_kmalloc mm/kasan/kasan.c:553 [inline]
      kasan_kmalloc+0xbf/0xe0 mm/kasan/kasan.c:531
      __kmalloc+0xf3/0x1e0 mm/slub.c:3749
      kmalloc include/linux/slab.h:520 [inline]
      load_elf_phdrs+0x118/0x1b0 fs/binfmt_elf.c:441
      load_elf_binary+0x2de/0x4610 fs/binfmt_elf.c:737
      search_binary_handler fs/exec.c:1654 [inline]
      search_binary_handler+0x15c/0x4e0 fs/exec.c:1632
      exec_binprm fs/exec.c:1696 [inline]
      __do_execve_file.isra.0+0xf52/0x1a90 fs/exec.c:1820
      do_execveat_common fs/exec.c:1866 [inline]
      do_execve fs/exec.c:1883 [inline]
      __do_sys_execve fs/exec.c:1964 [inline]
      __se_sys_execve fs/exec.c:1959 [inline]
      __x64_sys_execve+0x8a/0xb0 fs/exec.c:1959
      do_syscall_64+0xa0/0x2e0 arch/x86/entry/common.c:293
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Freed by task 12577:
      set_track mm/kasan/kasan.c:460 [inline]
      __kasan_slab_free+0x129/0x170 mm/kasan/kasan.c:521
      slab_free_hook mm/slub.c:1370 [inline]
      slab_free_freelist_hook mm/slub.c:1397 [inline]
      slab_free mm/slub.c:2952 [inline]
      kfree+0x8b/0x1a0 mm/slub.c:3904
      load_elf_binary+0x1be7/0x4610 fs/binfmt_elf.c:1118
      search_binary_handler fs/exec.c:1654 [inline]
      search_binary_handler+0x15c/0x4e0 fs/exec.c:1632
      exec_binprm fs/exec.c:1696 [inline]
      __do_execve_file.isra.0+0xf52/0x1a90 fs/exec.c:1820
      do_execveat_common fs/exec.c:1866 [inline]
      do_execve fs/exec.c:1883 [inline]
      __do_sys_execve fs/exec.c:1964 [inline]
      __se_sys_execve fs/exec.c:1959 [inline]
      __x64_sys_execve+0x8a/0xb0 fs/exec.c:1959
      do_syscall_64+0xa0/0x2e0 arch/x86/entry/common.c:293
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The buggy address belongs to the object at ffff88803b8ccf00
      which belongs to the cache kmalloc-512 of size 512
      The buggy address is located 259 bytes inside of
      512-byte region [ffff88803b8ccf00, ffff88803b8cd100)
      The buggy address belongs to the page:
      page:ffffea0000ee3300 count:1 mapcount:0 mapping:ffff88806cc03080
      index:0xffff88803b8cc780 compound_mapcount: 0
      flags: 0x100000000008100(slab|head)
      raw: 0100000000008100 ffffea0001104080 0000000200000002 ffff88806cc03080
      raw: ffff88803b8cc780 00000000800c000b 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
      ffff88803b8ccf00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ffff88803b8ccf80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff88803b8cd000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ^
      ffff88803b8cd080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ffff88803b8cd100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      
      You can refer to "https://www.lkml.org/lkml/2019/1/17/474" reproduce
      this error.
      
      The exception code is "bd_len = p[3];", "p" value is ffff88803b8cd000
      which belongs to the cache kmalloc-512 of size 512. The "page_address(sg_page(scsi_sglist(scmd)))"
      maybe from sg_scsi_ioctl function "buffer" which allocated by kzalloc, so "buffer"
      may not page aligned.
      This also looks completely buggy on highmem systems and really needs to use a
      kmap_atomic.      --Christoph Hellwig
      To address above bugs, Paolo Bonzini advise to simpler to just make a char array
      of size CACHE_MPAGE_LEN+8+8+4-2(or just 64 to make it easy), use sg_copy_to_buffer
      to copy from the sglist into the buffer, and workthere.
      
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      e2f71e14
    • Navid Emamdoost's avatar
      sata_rcar: handle pm_runtime_get_sync failure cases · b77b71c1
      Navid Emamdoost authored
      commit eea12388
      
       upstream.
      
      Calling pm_runtime_get_sync increments the counter even in case of
      failure, causing incorrect ref count. Call pm_runtime_put if
      pm_runtime_get_sync fails.
      
      Signed-off-by: default avatarNavid Emamdoost <navid.emamdoost@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      b77b71c1
    • Juri Lelli's avatar
      sched/core: Fix PI boosting between RT and DEADLINE tasks · 049c3048
      Juri Lelli authored
      commit 740797ce upstream.
      
      syzbot reported the following warning:
      
       WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628
       enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
      
      At deadline.c:628 we have:
      
       623 static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se)
       624 {
       625 	struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
       626 	struct rq *rq = rq_of_dl_rq(dl_rq);
       627
       628 	WARN_ON(dl_se->dl_boosted);
       629 	WARN_ON(dl_time_before(rq_clock(rq), dl_se->deadline));
              [...]
           }
      
      Which means that setup_new_dl_entity() has been called on a task
      currently boosted. This shouldn't happen though, as setup_new_dl_entity()
      is only called when the 'dynamic' deadline of the new entity
      is in the past w.r.t. rq_clock and boosted tasks shouldn't verify this
      condition.
      
      Digging through the PI code I noticed that what above might in fact happen
      if an RT tasks blocks on an rt_mutex hold by a DEADLINE task. In the
      first branch of boosting conditions we check only if a pi_task 'dynamic'
      deadline is earlier than mutex holder's and in this case we set mutex
      holder to be dl_boosted. However, since RT 'dynamic' deadlines are only
      initialized if such tasks get boosted at some point (or if they become
      DEADLINE of course), in general RT 'dynamic' deadlines are usually equal
      to 0 and this verifies the aforementioned condition.
      
      Fix it by checking that the potential donor task is actually (even if
      temporary because in turn boosted) running at DEADLINE priority before
      using its 'dynamic' deadline value.
      
      Fixes: 2d3d891d
      
       ("sched/deadline: Add SCHED_DEADLINE inheritance logic")
      Reported-by: default avatar <syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com>
      Signed-off-by: default avatarJuri Lelli <juri.lelli@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarDaniel Bristot de Oliveira <bristot@redhat.com>
      Tested-by: default avatarDaniel Wagner <dwagner@suse.de>
      Link: https://lkml.kernel.org/r/20181119153201.GB2119@localhost.localdomain
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      049c3048
    • Juri Lelli's avatar
      sched/deadline: Initialize ->dl_boosted · 13ac0959
      Juri Lelli authored
      commit ce9bc3b2 upstream.
      
      syzbot reported the following warning triggered via SYSC_sched_setattr():
      
        WARNING: CPU: 0 PID: 6973 at kernel/sched/deadline.c:593 setup_new_dl_entity /kernel/sched/deadline.c:594 [inline]
        WARNING: CPU: 0 PID: 6973 at kernel/sched/deadline.c:593 enqueue_dl_entity /kernel/sched/deadline.c:1370 [inline]
        WARNING: CPU: 0 PID: 6973 at kernel/sched/deadline.c:593 enqueue_task_dl+0x1c17/0x2ba0 /kernel/sched/deadline.c:1441
      
      This happens because the ->dl_boosted flag is currently not initialized by
      __dl_clear_params() (unlike the other flags) and setup_new_dl_entity()
      rightfully complains about it.
      
      Initialize dl_boosted to 0.
      
      Fixes: 2d3d891d
      
       ("sched/deadline: Add SCHED_DEADLINE inheritance logic")
      Reported-by: default avatar <syzbot+5ac8bac25f95e8b221e7@syzkaller.appspotmail.com>
      Signed-off-by: default avatarJuri Lelli <juri.lelli@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Tested-by: default avatarDaniel Wagner <dwagner@suse.de>
      Link: https://lkml.kernel.org/r/20200617072919.818409-1-juri.lelli@redhat.com
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      13ac0959
    • David Howells's avatar
      afs: Fix storage of cell names · 5f2fbffb
      David Howells authored
      commit 719fdd32 upstream.
      
      The cell name stored in the afs_cell struct is a 64-char + NUL buffer -
      when it needs to be able to handle up to AFS_MAXCELLNAME (256 chars) + NUL.
      
      Fix this by changing the array to a pointer and allocating the string.
      
      Found using Coverity.
      
      Fixes: 989782dc
      
       ("afs: Overhaul cell database management")
      Reported-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      5f2fbffb
    • Mans Rullgard's avatar
      i2c: core: check returned size of emulated smbus block read · 10682b9e
      Mans Rullgard authored
      commit 40e05200 upstream.
      
      If the i2c bus driver ignores the I2C_M_RECV_LEN flag (as some of
      them do), it is possible for an I2C_SMBUS_BLOCK_DATA read issued
      on some random device to return an arbitrary value in the first
      byte (and nothing else).  When this happens, i2c_smbus_xfer_emulated()
      will happily write past the end of the supplied data buffer, thus
      causing Bad Things to happen.  To prevent this, check the size
      before copying the data block and return an error if it is too large.
      
      Fixes: 209d27c3
      
       ("i2c: Emulate SMBus block read over I2C")
      Signed-off-by: default avatarMans Rullgard <mans@mansr.com>
      [wsa: use better errno]
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      10682b9e
    • Eddie James's avatar
      i2c: fsi: Fix the port number field in status register · 33cf1bda
      Eddie James authored
      commit 502035e2 upstream.
      
      The port number field in the status register was not correct, so fix it.
      
      Fixes: d6ffb630
      
       ("i2c: Add FSI-attached I2C master algorithm")
      Signed-off-by: default avatarEddie James <eajames@linux.ibm.com>
      Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      33cf1bda