Skip to content
  1. Sep 28, 2022
    • Greg Kroah-Hartman's avatar
    • Jan Kara's avatar
      ext4: make directory inode spreading reflect flexbg size · 547262c5
      Jan Kara authored
      commit 613c5a85
      
       upstream.
      
      Currently the Orlov inode allocator searches for free inodes for a
      directory only in flex block groups with at most inodes_per_group/16
      more directory inodes than average per flex block group. However with
      growing size of flex block group this becomes unnecessarily strict.
      Scale allowed difference from average directory count per flex block
      group with flex block group size as we do with other metrics.
      
      Tested-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Tested-by: default avatarOjaswin Mujoo <ojaswin@linux.ibm.com>
      Cc: stable@kernel.org
      Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220908092136.11770-3-jack@suse.cz
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      547262c5
    • Jan Kara's avatar
      ext4: fixup possible uninitialized variable access in ext4_mb_choose_next_group_cr1() · cdefe8dd
      Jan Kara authored
      commit a078dff8 upstream.
      
      Variable 'grp' may be left uninitialized if there's no group with
      suitable average fragment size (or larger). Fix the problem by
      initializing it earlier.
      
      Link: https://lore.kernel.org/r/20220922091542.pkhedytey7wzp5fi@quack3
      Fixes: 83e80a6e
      
       ("ext4: use buckets for cr 1 block scan instead of rbtree")
      Cc: stable@kernel.org
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cdefe8dd
    • Christoph Hellwig's avatar
      Revert "block: freeze the queue earlier in del_gendisk" · 48a12961
      Christoph Hellwig authored
      commit 4c66a326 upstream.
      
      This reverts commit a09b3140
      
      .
      
      Dusty Mabe reported consistent hang during CoreOS shutdown with a MD
      RAID1 setup.  Although apparently similar hangs happened before,
      and this patch most likely is not the root cause it made it much
      more severe.  Revert it until we can figure out what is going on
      with the md driver.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220919144049.978907-1-hch@lst.de
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      48a12961
    • Jan Kara's avatar
      ext4: use buckets for cr 1 block scan instead of rbtree · 398a0fdb
      Jan Kara authored
      commit 83e80a6e upstream.
      
      Using rbtree for sorting groups by average fragment size is relatively
      expensive (needs rbtree update on every block freeing or allocation) and
      leads to wide spreading of allocations because selection of block group
      is very sentitive both to changes in free space and amount of blocks
      allocated. Furthermore selecting group with the best matching average
      fragment size is not necessary anyway, even more so because the
      variability of fragment sizes within a group is likely large so average
      is not telling much. We just need a group with large enough average
      fragment size so that we have high probability of finding large enough
      free extent and we don't want average fragment size to be too big so
      that we are likely to find free extent only somewhat larger than what we
      need.
      
      So instead of maintaing rbtree of groups sorted by fragment size keep
      bins (lists) or groups where average fragment size is in the interval
      [2^i, 2^(i+1)). This structure requires less updates on block allocation
      / freeing, generally avoids chaotic spreading of allocations into block
      groups, and still is able to quickly (even faster that the rbtree)
      provide a block group which is likely to have a suitably sized free
      space extent.
      
      This patch reduces number of block groups used when untarring archive
      with medium sized files (size somewhat above 64k which is default
      mballoc limit for avoiding locality group preallocation) to about half
      and thus improves write speeds for eMMC flash significantly.
      
      Fixes: 196e402a
      
       ("ext4: improve cr 0 / cr 1 group scanning")
      CC: stable@kernel.org
      Reported-and-tested-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Tested-by: default avatarOjaswin Mujoo <ojaswin@linux.ibm.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarRitesh Harjani (IBM) <ritesh.list@gmail.com>
      Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/
      Link: https://lore.kernel.org/r/20220908092136.11770-5-jack@suse.cz
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      398a0fdb
    • Jan Kara's avatar
      ext4: use locality group preallocation for small closed files · 52e8d671
      Jan Kara authored
      commit a9f2a293 upstream.
      
      Curently we don't use any preallocation when a file is already closed
      when allocating blocks (from writeback code when converting delayed
      allocation). However for small files, using locality group preallocation
      is actually desirable as that is not specific to a particular file.
      Rather it is a method to pack small files together to reduce
      fragmentation and for that the fact the file is closed is actually even
      stronger hint the file would benefit from packing. So change the logic
      to allow locality group preallocation in this case.
      
      Fixes: 196e402a
      
       ("ext4: improve cr 0 / cr 1 group scanning")
      CC: stable@kernel.org
      Reported-and-tested-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Tested-by: default avatarOjaswin Mujoo <ojaswin@linux.ibm.com>
      Reviewed-by: default avatarRitesh Harjani (IBM) <ritesh.list@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/
      Link: https://lore.kernel.org/r/20220908092136.11770-4-jack@suse.cz
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52e8d671
    • Jan Kara's avatar
      ext4: avoid unnecessary spreading of allocations among groups · 405a6094
      Jan Kara authored
      commit 1940265e upstream.
      
      mb_set_largest_free_order() updates lists containing groups with largest
      chunk of free space of given order. The way it updates it leads to
      always moving the group to the tail of the list. Thus allocations
      looking for free space of given order effectively end up cycling through
      all groups (and due to initialization in last to first order). This
      spreads allocations among block groups which reduces performance for
      rotating disks or low-end flash media. Change
      mb_set_largest_free_order() to only update lists if the order of the
      largest free chunk in the group changed.
      
      Fixes: 196e402a
      
       ("ext4: improve cr 0 / cr 1 group scanning")
      CC: stable@kernel.org
      Reported-and-tested-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Tested-by: default avatarOjaswin Mujoo <ojaswin@linux.ibm.com>
      Reviewed-by: default avatarRitesh Harjani (IBM) <ritesh.list@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/
      Link: https://lore.kernel.org/r/20220908092136.11770-2-jack@suse.cz
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      405a6094
    • Jan Kara's avatar
      ext4: make mballoc try target group first even with mb_optimize_scan · b82d312f
      Jan Kara authored
      commit 4fca50d4 upstream.
      
      One of the side-effects of mb_optimize_scan was that the optimized
      functions to select next group to try were called even before we tried
      the goal group. As a result we no longer allocate files close to
      corresponding inodes as well as we don't try to expand currently
      allocated extent in the same group. This results in reaim regression
      with workfile.disk workload of upto 8% with many clients on my test
      machine:
      
                           baseline               mb_optimize_scan
      Hmean     disk-1       2114.16 (   0.00%)     2099.37 (  -0.70%)
      Hmean     disk-41     87794.43 (   0.00%)    83787.47 *  -4.56%*
      Hmean     disk-81    148170.73 (   0.00%)   135527.05 *  -8.53%*
      Hmean     disk-121   177506.11 (   0.00%)   166284.93 *  -6.32%*
      Hmean     disk-161   220951.51 (   0.00%)   207563.39 *  -6.06%*
      Hmean     disk-201   208722.74 (   0.00%)   203235.59 (  -2.63%)
      Hmean     disk-241   222051.60 (   0.00%)   217705.51 (  -1.96%)
      Hmean     disk-281   252244.17 (   0.00%)   241132.72 *  -4.41%*
      Hmean     disk-321   255844.84 (   0.00%)   245412.84 *  -4.08%*
      
      Also this is causing huge regression (time increased by a factor of 5 or
      so) when untarring archive with lots of small files on some eMMC storage
      cards.
      
      Fix the problem by making sure we try goal group first.
      
      Fixes: 196e402a
      
       ("ext4: improve cr 0 / cr 1 group scanning")
      CC: stable@kernel.org
      Reported-and-tested-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Tested-by: default avatarOjaswin Mujoo <ojaswin@linux.ibm.com>
      Reviewed-by: default avatarRitesh Harjani (IBM) <ritesh.list@gmail.com>
      Link: https://lore.kernel.org/all/20220727105123.ckwrhbilzrxqpt24@quack3/
      Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220908092136.11770-1-jack@suse.cz
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b82d312f
    • Theodore Ts'o's avatar
      ext4: limit the number of retries after discarding preallocations blocks · 17eb9845
      Theodore Ts'o authored
      commit 80fa46d6
      
       upstream.
      
      This patch avoids threads live-locking for hours when a large number
      threads are competing over the last few free extents as they blocks
      getting added and removed from preallocation pools.  From our bug
      reporter:
      
         A reliable way for triggering this has multiple writers
         continuously write() to files when the filesystem is full, while
         small amounts of space are freed (e.g. by truncating a large file
         -1MiB at a time). In the local filesystem, this can be done by
         simply not checking the return code of write (0) and/or the error
         (ENOSPACE) that is set. Over NFS with an async mount, even clients
         with proper error checking will behave this way since the linux NFS
         client implementation will not propagate the server errors [the
         write syscalls immediately return success] until the file handle is
         closed. This leads to a situation where NFS clients send a
         continuous stream of WRITE rpcs which result in ERRNOSPACE -- but
         since the client isn't seeing this, the stream of writes continues
         at maximum network speed.
      
         When some space does appear, multiple writers will all attempt to
         claim it for their current write. For NFS, we may see dozens to
         hundreds of threads that do this.
      
         The real-world scenario of this is database backup tooling (in
         particular, github.com/mdkent/percona-xtrabackup) which may write
         large files (>1TiB) to NFS for safe keeping. Some temporary files
         are written, rewound, and read back -- all before closing the file
         handle (the temp file is actually unlinked, to trigger automatic
         deletion on close/crash.) An application like this operating on an
         async NFS mount will not see an error code until TiB have been
         written/read.
      
         The lockup was observed when running this database backup on large
         filesystems (64 TiB in this case) with a high number of block
         groups and no free space. Fragmentation is generally not a factor
         in this filesystem (~thousands of large files, mostly contiguous
         except for the parts written while the filesystem is at capacity.)
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      17eb9845
    • Luís Henriques's avatar
      ext4: fix bug in extents parsing when eh_entries == 0 and eh_depth > 0 · 2f5e9de1
      Luís Henriques authored
      commit 29a5b8a1
      
       upstream.
      
      When walking through an inode extents, the ext4_ext_binsearch_idx() function
      assumes that the extent header has been previously validated.  However, there
      are no checks that verify that the number of entries (eh->eh_entries) is
      non-zero when depth is > 0.  And this will lead to problems because the
      EXT_FIRST_INDEX() and EXT_LAST_INDEX() will return garbage and result in this:
      
      [  135.245946] ------------[ cut here ]------------
      [  135.247579] kernel BUG at fs/ext4/extents.c:2258!
      [  135.249045] invalid opcode: 0000 [#1] PREEMPT SMP
      [  135.250320] CPU: 2 PID: 238 Comm: tmp118 Not tainted 5.19.0-rc8+ #4
      [  135.252067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
      [  135.255065] RIP: 0010:ext4_ext_map_blocks+0xc20/0xcb0
      [  135.256475] Code:
      [  135.261433] RSP: 0018:ffffc900005939f8 EFLAGS: 00010246
      [  135.262847] RAX: 0000000000000024 RBX: ffffc90000593b70 RCX: 0000000000000023
      [  135.264765] RDX: ffff8880038e5f10 RSI: 0000000000000003 RDI: ffff8880046e922c
      [  135.266670] RBP: ffff8880046e9348 R08: 0000000000000001 R09: ffff888002ca580c
      [  135.268576] R10: 0000000000002602 R11: 0000000000000000 R12: 0000000000000024
      [  135.270477] R13: 0000000000000000 R14: 0000000000000024 R15: 0000000000000000
      [  135.272394] FS:  00007fdabdc56740(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
      [  135.274510] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  135.276075] CR2: 00007ffc26bd4f00 CR3: 0000000006261004 CR4: 0000000000170ea0
      [  135.277952] Call Trace:
      [  135.278635]  <TASK>
      [  135.279247]  ? preempt_count_add+0x6d/0xa0
      [  135.280358]  ? percpu_counter_add_batch+0x55/0xb0
      [  135.281612]  ? _raw_read_unlock+0x18/0x30
      [  135.282704]  ext4_map_blocks+0x294/0x5a0
      [  135.283745]  ? xa_load+0x6f/0xa0
      [  135.284562]  ext4_mpage_readpages+0x3d6/0x770
      [  135.285646]  read_pages+0x67/0x1d0
      [  135.286492]  ? folio_add_lru+0x51/0x80
      [  135.287441]  page_cache_ra_unbounded+0x124/0x170
      [  135.288510]  filemap_get_pages+0x23d/0x5a0
      [  135.289457]  ? path_openat+0xa72/0xdd0
      [  135.290332]  filemap_read+0xbf/0x300
      [  135.291158]  ? _raw_spin_lock_irqsave+0x17/0x40
      [  135.292192]  new_sync_read+0x103/0x170
      [  135.293014]  vfs_read+0x15d/0x180
      [  135.293745]  ksys_read+0xa1/0xe0
      [  135.294461]  do_syscall_64+0x3c/0x80
      [  135.295284]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      This patch simply adds an extra check in __ext4_ext_check(), verifying that
      eh_entries is not 0 when eh_depth is > 0.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215941
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216283
      Cc: Baokun Li <libaokun1@huawei.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLuís Henriques <lhenriques@suse.de>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarBaokun Li <libaokun1@huawei.com>
      Link: https://lore.kernel.org/r/20220822094235.2690-1-lhenriques@suse.de
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f5e9de1
    • Dan Williams's avatar
      devdax: Fix soft-reservation memory description · 034ef0c4
      Dan Williams authored
      commit 67feaba4 upstream.
      
      The "hmem" platform-devices that are created to represent the
      platform-advertised "Soft Reserved" memory ranges end up inserting a
      resource that causes the iomem_resource tree to look like this:
      
      340000000-43fffffff : hmem.0
        340000000-43fffffff : Soft Reserved
          340000000-43fffffff : dax0.0
      
      This is because insert_resource() reparents ranges when they completely
      intersect an existing range.
      
      This matters because code that uses region_intersects() to scan for a
      given IORES_DESC will only check that top-level 'hmem.0' resource and
      not the 'Soft Reserved' descendant.
      
      So, to support EINJ (via einj_error_inject()) to inject errors into
      memory hosted by a dax-device, be sure to describe the memory as
      IORES_DESC_SOFT_RESERVED. This is a follow-on to:
      
      commit b13a3e5f ("ACPI: APEI: Fix _EINJ vs EFI_MEMORY_SP")
      
      ...that fixed EINJ support for "Soft Reserved" ranges in the first
      instance.
      
      Fixes: 262b45ae
      
       ("x86/efi: EFI soft reservation to E820 enumeration")
      Reported-by: default avatarRicardo Sandoval Torres <ricardo.sandoval.torres@intel.com>
      Tested-by: default avatarRicardo Sandoval Torres <ricardo.sandoval.torres@intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Omar Avelar <omar.avelar@intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mark Gross <markgross@kernel.org>
      Link: https://lore.kernel.org/r/166397075670.389916.7435722208896316387.stgit@dwillia2-xfh.jf.intel.com
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      034ef0c4
    • Nick Desaulniers's avatar
      Makefile.debug: re-enable debug info for .S files · 27d5563e
      Nick Desaulniers authored
      [ Upstream commit 32ef9e50 ]
      
      Alexey reported that the fraction of unknown filename instances in
      kallsyms grew from ~0.3% to ~10% recently; Bill and Greg tracked it down
      to assembler defined symbols, which regressed as a result of:
      
      commit b8a90923 ("Kbuild: do not emit debug info for assembly with LLVM_IAS=1")
      
      In that commit, I allude to restoring debug info for assembler defined
      symbols in a follow up patch, but it seems I forgot to do so in
      
      commit a66049e2 ("Kbuild: make DWARF version a choice")
      
      Link: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=31bf18645d98b4d3d7357353be840e320649a67d
      Fixes: b8a90923
      
       ("Kbuild: do not emit debug info for assembly with LLVM_IAS=1")
      Reported-by: default avatarAlexey Alexandrov <aalexand@google.com>
      Reported-by: default avatarBill Wendling <morbo@google.com>
      Reported-by: default avatarGreg Thelen <gthelen@google.com>
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Suggested-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Signed-off-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      27d5563e
    • Nick Desaulniers's avatar
      Makefile.debug: set -g unconditional on CONFIG_DEBUG_INFO_SPLIT · 6ba8627f
      Nick Desaulniers authored
      [ Upstream commit 61f2b7c7 ]
      
      Dmitrii, Fangrui, and Mashahiro note:
      
        Before GCC 11 and Clang 12 -gsplit-dwarf implicitly uses -g2.
      
      Fix CONFIG_DEBUG_INFO_SPLIT for gcc-11+ & clang-12+ which now need -g
      specified in order for -gsplit-dwarf to work at all.
      
      -gsplit-dwarf has been mutually exclusive with -g since support for
      CONFIG_DEBUG_INFO_SPLIT was introduced in
      commit 866ced95
      
       ("kbuild: Support split debug info v4")
      I don't think it ever needed to be.
      
      Link: https://lore.kernel.org/lkml/20220815013317.26121-1-dmitrii.bundin.a@gmail.com/
      Link: https://lore.kernel.org/lkml/CAK7LNARPAmsJD5XKAw7m_X2g7Fi-CAAsWDQiP7+ANBjkg7R7ng@mail.gmail.com/
      Link: https://reviews.llvm.org/D80391
      Cc: Andi Kleen <ak@linux.intel.com>
      Reported-by: default avatarDmitrii Bundin <dmitrii.bundin.a@gmail.com>
      Reported-by: default avatarFangrui Song <maskray@google.com>
      Reported-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Suggested-by: default avatarDmitrii Bundin <dmitrii.bundin.a@gmail.com>
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Stable-dep-of: 32ef9e50
      
       ("Makefile.debug: re-enable debug info for .S files")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6ba8627f
    • Masahiro Yamada's avatar
      certs: make system keyring depend on built-in x509 parser · c4f8b89f
      Masahiro Yamada authored
      [ Upstream commit 2154aca2 ]
      
      Commit e9088629 ("certs: make system keyring depend on x509 parser")
      is not the right fix because x509_load_certificate_list() can be modular.
      
      The combination of CONFIG_SYSTEM_TRUSTED_KEYRING=y and
      CONFIG_X509_CERTIFICATE_PARSER=m still results in the following error:
      
          LD      .tmp_vmlinux.kallsyms1
        ld: certs/system_keyring.o: in function `load_system_certificate_list':
        system_keyring.c:(.init.text+0x8c): undefined reference to `x509_load_certificate_list'
        make: *** [Makefile:1169: vmlinux] Error 1
      
      Fixes: e9088629
      
       ("certs: make system keyring depend on x509 parser")
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Tested-by: default avatarAdam Borowski <kilobyte@angband.pl>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c4f8b89f
    • Alex Deucher's avatar
      drm/amdgpu: don't register a dirty callback for non-atomic · c2eab6fa
      Alex Deucher authored
      [ Upstream commit abbc7a3d ]
      
      Some asics still support non-atomic code paths.
      
      Fixes: 66f99628
      
       ("drm/amdgpu: use dirty framebuffer helper")
      Reported-by: default avatarArthur Marsh <arthur.marsh@internode.on.net>
      Reviewed-by: default avatarHamza Mahfooz <hamza.mahfooz@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c2eab6fa
    • Dan Carpenter's avatar
      i2c: mux: harden i2c_mux_alloc() against integer overflows · 7f0dcbb0
      Dan Carpenter authored
      [ Upstream commit b7af938f ]
      
      A couple years back we went through the kernel an automatically
      converted size calculations to use struct_size() instead.  The
      struct_size() calculation is protected against integer overflows.
      
      However it does not make sense to use the result from struct_size()
      for additional math operations as that would negate any safeness.
      
      Fixes: 1f3b69b6
      
       ("i2c: mux: Use struct_size() in devm_kzalloc()")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarPeter Rosin <peda@axentia.se>
      Reviewed-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7f0dcbb0
    • Asmaa Mnebhi's avatar
      i2c: mlxbf: Fix frequency calculation · 4925e5e9
      Asmaa Mnebhi authored
      [ Upstream commit 37f071ec ]
      
      The i2c-mlxbf.c driver is currently broken because there is a bug
      in the calculation of the frequency. core_f, core_r and core_od
      are components read from hardware registers and are used to
      compute the frequency used to compute different timing parameters.
      The shifting mechanism used to get core_f, core_r and core_od is
      wrong. Use FIELD_GET to mask and shift the bitfields properly.
      
      Fixes: b5b5b320
      
       (i2c: mlxbf: I2C SMBus driver for Mellanox BlueField SoC)
      Reviewed-by: default avatarKhalil Blaiech <kblaiech@nvidia.com>
      Signed-off-by: default avatarAsmaa Mnebhi <asmaa@nvidia.com>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4925e5e9
    • Asmaa Mnebhi's avatar
      i2c: mlxbf: prevent stack overflow in mlxbf_i2c_smbus_start_transaction() · 3b5ab5fb
      Asmaa Mnebhi authored
      [ Upstream commit de24aceb ]
      
      memcpy() is called in a loop while 'operation->length' upper bound
      is not checked and 'data_idx' also increments.
      
      Fixes: b5b5b320
      
       ("i2c: mlxbf: I2C SMBus driver for Mellanox BlueField SoC")
      Reviewed-by: default avatarKhalil Blaiech <kblaiech@nvidia.com>
      Signed-off-by: default avatarAsmaa Mnebhi <asmaa@nvidia.com>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3b5ab5fb
    • Asmaa Mnebhi's avatar
      i2c: mlxbf: incorrect base address passed during io write · 5a7547ee
      Asmaa Mnebhi authored
      [ Upstream commit 2a5be6d1 ]
      
      Correct the base address used during io write.
      This bug had no impact over the overall functionality of the read and write
      transactions. MLXBF_I2C_CAUSE_OR_CLEAR=0x18 so writing to (smbus->io + 0x18)
      instead of (mst_cause->ioi + 0x18) actually writes to the sc_low_timeout
      register which just sets the timeout value before a read/write aborts.
      
      Fixes: b5b5b320
      
       (i2c: mlxbf: I2C SMBus driver for Mellanox BlueField SoC)
      Reviewed-by: default avatarKhalil Blaiech <kblaiech@nvidia.com>
      Signed-off-by: default avatarAsmaa Mnebhi <asmaa@nvidia.com>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5a7547ee
    • Uwe Kleine-König's avatar
      i2c: imx: If pm_runtime_get_sync() returned 1 device access is possible · e46e177f
      Uwe Kleine-König authored
      [ Upstream commit 085aacaa
      
       ]
      
      pm_runtime_get_sync() returning 1 also means the device is powered. So
      resetting the chip registers in .remove() is possible and should be
      done.
      
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Fixes: d98bdd3a
      
       ("i2c: imx: Make sure to unregister adapter on remove()")
      Signed-off-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Acked-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e46e177f
    • Tetsuo Handa's avatar
      workqueue: don't skip lockdep work dependency in cancel_work_sync() · c9245ea4
      Tetsuo Handa authored
      [ Upstream commit c0feea59 ]
      
      Like Hillf Danton mentioned
      
        syzbot should have been able to catch cancel_work_sync() in work context
        by checking lockdep_map in __flush_work() for both flush and cancel.
      
      in [1], being unable to report an obvious deadlock scenario shown below is
      broken. From locking dependency perspective, sync version of cancel request
      should behave as if flush request, for it waits for completion of work if
      that work has already started execution.
      
        ----------
        #include <linux/module.h>
        #include <linux/sched.h>
        static DEFINE_MUTEX(mutex);
        static void work_fn(struct work_struct *work)
        {
          schedule_timeout_uninterruptible(HZ / 5);
          mutex_lock(&mutex);
          mutex_unlock(&mutex);
        }
        static DECLARE_WORK(work, work_fn);
        static int __init test_init(void)
        {
          schedule_work(&work);
          schedule_timeout_uninterruptible(HZ / 10);
          mutex_lock(&mutex);
          cancel_work_sync(&work);
          mutex_unlock(&mutex);
          return -EINVAL;
        }
        module_init(test_init);
        MODULE_LICENSE("GPL");
        ----------
      
      The check this patch restores was added by commit 0976dfc1
      ("workqueue: Catch more locking problems with flush_work()").
      
      Then, lockdep's crossrelease feature was added by commit b09be676
      ("locking/lockdep: Implement the 'crossrelease' feature"). As a result,
      this check was once removed by commit fd1a5b04 ("workqueue: Remove
      now redundant lock acquisitions wrt. workqueue flushes").
      
      But lockdep's crossrelease feature was removed by commit e966eaee
      ("locking/lockdep: Remove the cross-release locking checks"). At this
      point, this check should have been restored.
      
      Then, commit d6e89786 ("workqueue: skip lockdep wq dependency in
      cancel_work_sync()") introduced a boolean flag in order to distinguish
      flush_work() and cancel_work_sync(), for checking "struct workqueue_struct"
      dependency when called from cancel_work_sync() was causing false positives.
      
      Then, commit 87915adc
      
       ("workqueue: re-add lockdep dependencies for
      flushing") tried to restore "struct work_struct" dependency check, but by
      error checked this boolean flag. Like an example shown above indicates,
      "struct work_struct" dependency needs to be checked for both flush_work()
      and cancel_work_sync().
      
      Link: https://lkml.kernel.org/r/20220504044800.4966-1-hdanton@sina.com [1]
      Reported-by: default avatarHillf Danton <hdanton@sina.com>
      Suggested-by: default avatarLai Jiangshan <jiangshanlai@gmail.com>
      Fixes: 87915adc
      
       ("workqueue: re-add lockdep dependencies for flushing")
      Cc: Johannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c9245ea4
    • Li Jinlin's avatar
      fsdax: Fix infinite loop in dax_iomap_rw() · 60644dff
      Li Jinlin authored
      [ Upstream commit 17d9c15c ]
      
      I got an infinite loop and a WARNING report when executing a tail command
      in virtiofs.
      
        WARNING: CPU: 10 PID: 964 at fs/iomap/iter.c:34 iomap_iter+0x3a2/0x3d0
        Modules linked in:
        CPU: 10 PID: 964 Comm: tail Not tainted 5.19.0-rc7
        Call Trace:
        <TASK>
        dax_iomap_rw+0xea/0x620
        ? __this_cpu_preempt_check+0x13/0x20
        fuse_dax_read_iter+0x47/0x80
        fuse_file_read_iter+0xae/0xd0
        new_sync_read+0xfe/0x180
        ? 0xffffffff81000000
        vfs_read+0x14d/0x1a0
        ksys_read+0x6d/0xf0
        __x64_sys_read+0x1a/0x20
        do_syscall_64+0x3b/0x90
        entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      The tail command will call read() with a count of 0. In this case,
      iomap_iter() will report this WARNING, and always return 1 which casuing
      the infinite loop in dax_iomap_rw().
      
      Fixing by checking count whether is 0 in dax_iomap_rw().
      
      Fixes: ca289e0b
      
       ("fsdax: switch dax_iomap_rw to use iomap_iter")
      Signed-off-by: default avatarLi Jinlin <lijinlin3@huawei.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Link: https://lore.kernel.org/r/20220725032050.3873372-1-lijinlin3@huawei.com
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      60644dff
    • Jane Chu's avatar
      pmem: fix a name collision · 8054beba
      Jane Chu authored
      [ Upstream commit 149d1714 ]
      
      Kernel test robot detected name collision when compiled on 'um'
      architecture.  Rename "to_phys()"  to "pmem_to_phys()".
      
      >> drivers/nvdimm/pmem.c:48:20: error: conflicting types for 'to_phys'; have 'phys_addr_t(struct pmem_device *, phys_addr_t)' {aka 'long long unsigned int(struct pmem_device *, long long unsigned int)'}
            48 | static phys_addr_t to_phys(struct pmem_device *pmem, phys_addr_t offset)
               |                    ^~~~~~~
         In file included from arch/um/include/asm/page.h:98,
                          from arch/um/include/asm/thread_info.h:15,
                          from include/linux/thread_info.h:60,
                          from include/asm-generic/preempt.h:5,
                          from ./arch/um/include/generated/asm/preempt.h:1,
      
         arch/um/include/shared/mem.h:12:29: note: previous definition of 'to_phys' with type 'long unsigned int(void *)'
            12 | static inline unsigned long to_phys(void *virt)
               |                             ^~~~~~~
      
      vim +48 drivers/nvdimm/pmem.c
          47
        > 48	static phys_addr_t to_phys(struct pmem_device *pmem, phys_addr_t offset)
          49	{
          50		return pmem->phys_addr + offset;
          51	}
          52
      
      Fixes: 9409c9b6
      
       (pmem: refactor pmem_clear_poison())
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarJane Chu <jane.chu@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220630182802.3250449-1-jane.chu@oracle.com
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8054beba
    • Sergio Paracuellos's avatar
      gpio: mt7621: Make the irqchip immutable · c62322e6
      Sergio Paracuellos authored
      [ Upstream commit 09eed5a1 ]
      
      Commit 6c846d02
      
       ("gpio: Don't fiddle with irqchips marked as
      immutable") added a warning to indicate if the gpiolib is altering the
      internals of irqchips.  Following this change the following warnings
      are now observed for the mt7621 driver:
      
      gpio gpiochip0: (1e000600.gpio-bank0): not an immutable chip, please consider fixing it!
      gpio gpiochip1: (1e000600.gpio-bank1): not an immutable chip, please consider fixing it!
      gpio gpiochip2: (1e000600.gpio-bank2): not an immutable chip, please consider fixing it!
      
      Fix this by making the irqchip in the mt7621 driver immutable.
      
      Tested-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarSergio Paracuellos <sergio.paracuellos@gmail.com>
      Signed-off-by: default avatarBartosz Golaszewski <brgl@bgdev.pl>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c62322e6
    • Nathan Huckleberry's avatar
      drm/rockchip: Fix return type of cdn_dp_connector_mode_valid · 2d57e46f
      Nathan Huckleberry authored
      [ Upstream commit b0b9408f
      
       ]
      
      The mode_valid field in drm_connector_helper_funcs is expected to be of
      type:
      enum drm_mode_status (* mode_valid) (struct drm_connector *connector,
      				     struct drm_display_mode *mode);
      
      The mismatched return type breaks forward edge kCFI since the underlying
      function definition does not match the function hook definition.
      
      The return type of cdn_dp_connector_mode_valid should be changed from
      int to enum drm_mode_status.
      
      Reported-by: default avatarDan Carpenter <error27@gmail.com>
      Link: https://github.com/ClangBuiltLinux/linux/issues/1703
      Cc: llvm@lists.linux.dev
      Signed-off-by: default avatarNathan Huckleberry <nhuck@google.com>
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220913205555.155149-1-nhuck@google.com
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2d57e46f
    • Nathan Chancellor's avatar
      drm/amd/display: Mark dml30's UseMinimumDCFCLK() as noinline for stack usage · 4822afcf
      Nathan Chancellor authored
      [ Upstream commit 41012d71 ]
      
      This function consumes a lot of stack space and it blows up the size of
      dml30_ModeSupportAndSystemConfigurationFull() with clang:
      
        drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn30/display_mode_vba_30.c:3542:6: error: stack frame size (2200) exceeds limit (2048) in 'dml30_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
        void dml30_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
             ^
        1 error generated.
      
      Commit a0f7e7f7
      
       ("drm/amd/display: fix i386 frame size warning")
      aimed to address this for i386 but it did not help x86_64.
      
      To reduce the amount of stack space that
      dml30_ModeSupportAndSystemConfigurationFull() uses, mark
      UseMinimumDCFCLK() as noinline, using the _for_stack variant for
      documentation. While this will increase the total amount of stack usage
      between the two functions (1632 and 1304 bytes respectively), it will
      make sure both stay below the limit of 2048 bytes for these files. The
      aforementioned change does help reduce UseMinimumDCFCLK()'s stack usage
      so it should not be reverted in favor of this change.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/1681
      Reported-by: default avatar"Sudip Mukherjee (Codethink)" <sudipm.mukherjee@gmail.com>
      Tested-by: default avatarMaíra Canal <mairacanal@riseup.net>
      Reviewed-by: default avatarRodrigo Siqueira <Rodrigo.Siqueira@amd.com>
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarRodrigo Siqueira <Rodrigo.Siqueira@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4822afcf
    • Nathan Chancellor's avatar
      drm/amd/display: Reduce number of arguments of dml31's CalculateFlipSchedule() · 6f14c55d
      Nathan Chancellor authored
      [ Upstream commit 21485d3d
      
       ]
      
      Most of the arguments are identical between the two call sites and they
      can be accessed through the 'struct vba_vars_st' pointer. This reduces
      the total amount of stack space that
      dml31_ModeSupportAndSystemConfigurationFull() uses by 112 bytes with
      LLVM 16 (1976 -> 1864), helping clear up the following clang warning:
      
        drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c:3908:6: error: stack frame size (2216) exceeds limit (2048) in 'dml31_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
        void dml31_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
            ^
        1 error generated.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/1681
      Reported-by: default avatar"Sudip Mukherjee (Codethink)" <sudipm.mukherjee@gmail.com>
      Tested-by: default avatarMaíra Canal <mairacanal@riseup.net>
      Reviewed-by: default avatarRodrigo Siqueira <Rodrigo.Siqueira@amd.com>
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarRodrigo Siqueira <Rodrigo.Siqueira@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6f14c55d
    • Nathan Chancellor's avatar
      drm/amd/display: Reduce number of arguments of dml31's... · 8836e42e
      Nathan Chancellor authored
      drm/amd/display: Reduce number of arguments of dml31's CalculateWatermarksAndDRAMSpeedChangeSupport()
      
      [ Upstream commit 37934d41
      
       ]
      
      Most of the arguments are identical between the two call sites and they
      can be accessed through the 'struct vba_vars_st' pointer. This reduces
      the total amount of stack space that
      dml31_ModeSupportAndSystemConfigurationFull() uses by 240 bytes with
      LLVM 16 (2216 -> 1976), helping clear up the following clang warning:
      
        drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c:3908:6: error: stack frame size (2216) exceeds limit (2048) in 'dml31_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
        void dml31_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
            ^
        1 error generated.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/1681
      Reported-by: default avatar"Sudip Mukherjee (Codethink)" <sudipm.mukherjee@gmail.com>
      Tested-by: default avatarMaíra Canal <mairacanal@riseup.net>
      Reviewed-by: default avatarRodrigo Siqueira <Rodrigo.Siqueira@amd.com>
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarRodrigo Siqueira <Rodrigo.Siqueira@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8836e42e
    • Yao Wang1's avatar
      drm/amd/display: Limit user regamma to a valid value · 88e78969
      Yao Wang1 authored
      [ Upstream commit 3601d620
      
       ]
      
      [Why]
      For HDR mode, we get total 512 tf_point and after switching to SDR mode
      we actually get 400 tf_point and the rest of points(401~512) still use
      dirty value from HDR mode. We should limit the rest of the points to max
      value.
      
      [How]
      Limit the value when coordinates_x.x > 1, just like what we do in
      translate_from_linear_space for other re-gamma build paths.
      
      Tested-by: default avatarDaniel Wheeler <daniel.wheeler@amd.com>
      Reviewed-by: default avatarKrunoslav Kovac <Krunoslav.Kovac@amd.com>
      Reviewed-by: default avatarAric Cyr <Aric.Cyr@amd.com>
      Acked-by: default avatarPavle Kotarac <Pavle.Kotarac@amd.com>
      Signed-off-by: default avatarYao Wang1 <Yao.Wang1@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      88e78969
    • Candice Li's avatar
      drm/amdgpu: Skip reset error status for psp v13_0_0 · 9757b3ad
      Candice Li authored
      [ Upstream commit 86875d55
      
       ]
      
      No need to reset error status since only umc ras supported on psp v13_0_0.
      
      Signed-off-by: default avatarCandice Li <candice.li@amd.com>
      Reviewed-by: default avatarHawking Zhang <Hawking.Zhang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9757b3ad
    • Alex Deucher's avatar
      drm/amdgpu: add HDP remap functionality to nbio 7.7 · 83dfcae6
      Alex Deucher authored
      [ Upstream commit 8c5708d3
      
       ]
      
      Was missing before and would have resulted in a write to
      a non-existant register. Normally APUs don't use HDP, but
      other asics could use this code and APUs do use the HDP
      when used in passthrough.
      
      Reviewed-by: default avatarLijo Lazar <lijo.lazar@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      83dfcae6
    • Yang Wang's avatar
      drm/amdgpu: change the alignment size of TMR BO to 1M · 386ca672
      Yang Wang authored
      [ Upstream commit 36de13fd
      
       ]
      
      align TMR BO size TO tmr size is not necessary,
      modify the size to 1M to avoid re-create BO fail
      when serious VRAM fragmentation.
      
      v2:
      add new macro PSP_TMR_ALIGNMENT for TMR BO alignment size
      
      Signed-off-by: default avatarYang Wang <KevinYang.Wang@amd.com>
      Reviewed-by: default avatarHawking Zhang <Hawking.Zhang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      386ca672
    • Hamza Mahfooz's avatar
      drm/amdgpu: use dirty framebuffer helper · 8442bc84
      Hamza Mahfooz authored
      [ Upstream commit 66f99628
      
       ]
      
      Currently, we aren't handling DRM_IOCTL_MODE_DIRTYFB. So, use
      drm_atomic_helper_dirtyfb() as the dirty callback in the amdgpu_fb_funcs
      struct.
      
      Signed-off-by: default avatarHamza Mahfooz <hamza.mahfooz@amd.com>
      Acked-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8442bc84
    • Guchun Chen's avatar
      drm/amd/pm: disable BACO entry/exit completely on several sienna cichlid cards · 444574f8
      Guchun Chen authored
      [ Upstream commit 7c6fb61a
      
       ]
      
      To avoid hardware intermittent failures.
      
      Signed-off-by: default avatarGuchun Chen <guchun.chen@amd.com>
      Reviewed-by: default avatarLijo Lazar <lijo.lazar@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      444574f8
    • Linus Walleij's avatar
      gpio: ixp4xx: Make irqchip immutable · a5de0801
      Linus Walleij authored
      [ Upstream commit 94e9bc73
      
       ]
      
      This turns the IXP4xx GPIO irqchip into an immutable
      irqchip, a bit different from the standard template due
      to being hierarchical.
      
      Tested on the IXP4xx which uses drivers/ata/pata_ixp4xx_cf.c
      for a rootfs on compact flash with IRQs from this GPIO
      block to the CF ATA controller.
      
      Cc: Marc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Acked-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarBartosz Golaszewski <brgl@bgdev.pl>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a5de0801
    • Hans de Goede's avatar
      drm/gma500: Fix (vblank) IRQs not working after suspend/resume · 7718cac8
      Hans de Goede authored
      [ Upstream commit 235fdbc3
      
       ]
      
      Fix gnome-shell (and other page-flip users) hanging after suspend/resume
      because of the gma500's IRQs not working.
      
      This fixes 2 problems with the IRQ handling:
      
      1. gma_power_off() calls gma_irq_uninstall() which does a free_irq(), but
         gma_power_on() called gma_irq_preinstall() + gma_irq_postinstall() which
         do not call request_irq. Replace the pre- + post-install calls with
         gma_irq_install() which does prep + request + post.
      
      2. After fixing 1. IRQs still do not work on a Packard Bell Dot SC (Intel
         Atom N2600, cedarview) netbook.
      
         Cederview uses MSI interrupts and it seems that the BIOS re-configures
         things back to normal APIC based interrupts during S3 suspend. There is
         some MSI PCI-config registers save/restore code which tries to deal with
         this, but on the Packard Bell Dot SC this is not sufficient to restore
         MSI IRQ functionality after a suspend/resume.
      
         Replace the PCI-config registers save/restore with pci_disable_msi() on
         suspend + pci_enable_msi() on resume. Fixing e.g. gnome-shell hanging.
      
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarPatrik Jakobsson <patrik.r.jakobsson@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220906203852.527663-4-hdegoede@redhat.com
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7718cac8
    • Hans de Goede's avatar
      drm/gma500: Fix WARN_ON(lock->magic != lock) error · 55c077d9
      Hans de Goede authored
      [ Upstream commit b6f25c3b
      
       ]
      
      psb_gem_unpin() calls dma_resv_lock() but the underlying ww_mutex
      gets destroyed by drm_gem_object_release() move the
      drm_gem_object_release() call in psb_gem_free_object() to after
      the unpin to fix the below warning:
      
      [   79.693962] ------------[ cut here ]------------
      [   79.693992] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
      [   79.694015] WARNING: CPU: 0 PID: 240 at kernel/locking/mutex.c:582 __ww_mutex_lock.constprop.0+0x569/0xfb0
      [   79.694052] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer qrtr bnep ath9k ath9k_common ath9k_hw snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel ath3k snd_intel_dspcfg mac80211 snd_intel_sdw_acpi btusb snd_hda_codec btrtl btbcm btintel btmtk bluetooth at24 snd_hda_core snd_hwdep uvcvideo snd_seq libarc4 videobuf2_vmalloc ath videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq_device videodev acer_wmi intel_powerclamp coretemp mc snd_pcm joydev sparse_keymap ecdh_generic pcspkr wmi_bmof cfg80211 i2c_i801 i2c_smbus snd_timer snd r8169 rfkill lpc_ich soundcore acpi_cpufreq zram rtsx_pci_sdmmc mmc_core serio_raw rtsx_pci gma500_gfx(E) video wmi ip6_tables ip_tables i2c_dev fuse
      [   79.694436] CPU: 0 PID: 240 Comm: plymouthd Tainted: G        W   E      6.0.0-rc3+ #490
      [   79.694457] Hardware name: Packard Bell dot s/SJE01_CT, BIOS V1.10 07/23/2013
      [   79.694469] RIP: 0010:__ww_mutex_lock.constprop.0+0x569/0xfb0
      [   79.694496] Code: ff 85 c0 0f 84 15 fb ff ff 8b 05 ca 3c 11 01 85 c0 0f 85 07 fb ff ff 48 c7 c6 30 cb 84 aa 48 c7 c7 a3 e1 82 aa e8 ac 29 f8 ff <0f> 0b e9 ed fa ff ff e8 5b 83 8a ff 85 c0 74 10 44 8b 0d 98 3c 11
      [   79.694513] RSP: 0018:ffffad1dc048bbe0 EFLAGS: 00010282
      [   79.694623] RAX: 0000000000000028 RBX: 0000000000000000 RCX: 0000000000000000
      [   79.694636] RDX: 0000000000000001 RSI: ffffffffaa8b0ffc RDI: 00000000ffffffff
      [   79.694650] RBP: ffffad1dc048bc80 R08: 0000000000000000 R09: ffffad1dc048ba90
      [   79.694662] R10: 0000000000000003 R11: ffffffffaad62fe8 R12: ffff9ff302103138
      [   79.694675] R13: ffff9ff306ec8000 R14: ffff9ff307779078 R15: ffff9ff3014c0270
      [   79.694690] FS:  00007ff1cccf1740(0000) GS:ffff9ff3bc200000(0000) knlGS:0000000000000000
      [   79.694705] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   79.694719] CR2: 0000559ecbcb4420 CR3: 0000000013210000 CR4: 00000000000006f0
      [   79.694734] Call Trace:
      [   79.694749]  <TASK>
      [   79.694761]  ? __schedule+0x47f/0x1670
      [   79.694796]  ? psb_gem_unpin+0x27/0x1a0 [gma500_gfx]
      [   79.694830]  ? lock_is_held_type+0xe3/0x140
      [   79.694864]  ? ww_mutex_lock+0x38/0xa0
      [   79.694885]  ? __cond_resched+0x1c/0x30
      [   79.694902]  ww_mutex_lock+0x38/0xa0
      [   79.694925]  psb_gem_unpin+0x27/0x1a0 [gma500_gfx]
      [   79.694964]  psb_gem_unpin+0x199/0x1a0 [gma500_gfx]
      [   79.694996]  drm_gem_object_release_handle+0x50/0x60
      [   79.695020]  ? drm_gem_object_handle_put_unlocked+0xf0/0xf0
      [   79.695042]  idr_for_each+0x4b/0xb0
      [   79.695066]  ? _raw_spin_unlock_irqrestore+0x30/0x60
      [   79.695095]  drm_gem_release+0x1c/0x30
      [   79.695118]  drm_file_free.part.0+0x1ea/0x260
      [   79.695150]  drm_release+0x6a/0x120
      [   79.695175]  __fput+0x9f/0x260
      [   79.695203]  task_work_run+0x59/0xa0
      [   79.695227]  do_exit+0x387/0xbe0
      [   79.695250]  ? seqcount_lockdep_reader_access.constprop.0+0x82/0x90
      [   79.695275]  ? lockdep_hardirqs_on+0x7d/0x100
      [   79.695304]  do_group_exit+0x33/0xb0
      [   79.695331]  __x64_sys_exit_group+0x14/0x20
      [   79.695353]  do_syscall_64+0x58/0x80
      [   79.695376]  ? up_read+0x17/0x20
      [   79.695401]  ? lock_is_held_type+0xe3/0x140
      [   79.695429]  ? asm_exc_page_fault+0x22/0x30
      [   79.695450]  ? lockdep_hardirqs_on+0x7d/0x100
      [   79.695473]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [   79.695493] RIP: 0033:0x7ff1ccefe3f1
      [   79.695516] Code: Unable to access opcode bytes at RIP 0x7ff1ccefe3c7.
      [   79.695607] RSP: 002b:00007ffed4413378 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
      [   79.695629] RAX: ffffffffffffffda RBX: 00007ff1cd0159e0 RCX: 00007ff1ccefe3f1
      [   79.695644] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
      [   79.695656] RBP: 0000000000000000 R08: ffffffffffffff80 R09: 00007ff1cd020b20
      [   79.695671] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ff1cd0159e0
      [   79.695684] R13: 0000000000000000 R14: 00007ff1cd01aee8 R15: 00007ff1cd01af00
      [   79.695733]  </TASK>
      [   79.695746] irq event stamp: 725979
      [   79.695757] hardirqs last  enabled at (725979): [<ffffffffa9132d54>] finish_task_switch.isra.0+0xe4/0x3f0
      [   79.695780] hardirqs last disabled at (725978): [<ffffffffa9eb4113>] __schedule+0xdd3/0x1670
      [   79.695803] softirqs last  enabled at (725974): [<ffffffffa90fbc9d>] __irq_exit_rcu+0xed/0x160
      [   79.695825] softirqs last disabled at (725969): [<ffffffffa90fbc9d>] __irq_exit_rcu+0xed/0x160
      [   79.695845] ---[ end trace 0000000000000000 ]---
      
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarPatrik Jakobsson <patrik.r.jakobsson@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220906203852.527663-3-hdegoede@redhat.com
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      55c077d9
    • Hans de Goede's avatar
      drm/gma500: Fix BUG: sleeping function called from invalid context errors · a6ed7624
      Hans de Goede authored
      [ Upstream commit 63e37a79
      
       ]
      
      gma_crtc_page_flip() was holding the event_lock spinlock while calling
      crtc_funcs->mode_set_base() which takes ww_mutex.
      
      The only reason to hold event_lock is to clear gma_crtc->page_flip_event
      on mode_set_base() errors.
      
      Instead unlock it after setting gma_crtc->page_flip_event and on
      errors re-take the lock and clear gma_crtc->page_flip_event it
      it is still set.
      
      This fixes the following WARN/stacktrace:
      
      [  512.122953] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:870
      [  512.123004] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1253, name: gnome-shell
      [  512.123031] preempt_count: 1, expected: 0
      [  512.123048] RCU nest depth: 0, expected: 0
      [  512.123066] INFO: lockdep is turned off.
      [  512.123080] irq event stamp: 0
      [  512.123094] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
      [  512.123134] hardirqs last disabled at (0): [<ffffffff8d0ec28c>] copy_process+0x9fc/0x1de0
      [  512.123176] softirqs last  enabled at (0): [<ffffffff8d0ec28c>] copy_process+0x9fc/0x1de0
      [  512.123207] softirqs last disabled at (0): [<0000000000000000>] 0x0
      [  512.123233] Preemption disabled at:
      [  512.123241] [<0000000000000000>] 0x0
      [  512.123275] CPU: 3 PID: 1253 Comm: gnome-shell Tainted: G        W         5.19.0+ #1
      [  512.123304] Hardware name: Packard Bell dot s/SJE01_CT, BIOS V1.10 07/23/2013
      [  512.123323] Call Trace:
      [  512.123346]  <TASK>
      [  512.123370]  dump_stack_lvl+0x5b/0x77
      [  512.123412]  __might_resched.cold+0xff/0x13a
      [  512.123458]  ww_mutex_lock+0x1e/0xa0
      [  512.123495]  psb_gem_pin+0x2c/0x150 [gma500_gfx]
      [  512.123601]  gma_pipe_set_base+0x76/0x240 [gma500_gfx]
      [  512.123708]  gma_crtc_page_flip+0x95/0x130 [gma500_gfx]
      [  512.123808]  drm_mode_page_flip_ioctl+0x57d/0x5d0
      [  512.123897]  ? drm_mode_cursor2_ioctl+0x10/0x10
      [  512.123936]  drm_ioctl_kernel+0xa1/0x150
      [  512.123984]  drm_ioctl+0x21f/0x420
      [  512.124025]  ? drm_mode_cursor2_ioctl+0x10/0x10
      [  512.124070]  ? rcu_read_lock_bh_held+0xb/0x60
      [  512.124104]  ? lock_release+0x1ef/0x2d0
      [  512.124161]  __x64_sys_ioctl+0x8d/0xd0
      [  512.124203]  do_syscall_64+0x58/0x80
      [  512.124239]  ? do_syscall_64+0x67/0x80
      [  512.124267]  ? trace_hardirqs_on_prepare+0x55/0xe0
      [  512.124300]  ? do_syscall_64+0x67/0x80
      [  512.124340]  ? rcu_read_lock_sched_held+0x10/0x80
      [  512.124377]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [  512.124411] RIP: 0033:0x7fcc4a70740f
      [  512.124442] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
      [  512.124470] RSP: 002b:00007ffda73f5390 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [  512.124503] RAX: ffffffffffffffda RBX: 000055cc9e474500 RCX: 00007fcc4a70740f
      [  512.124524] RDX: 00007ffda73f5420 RSI: 00000000c01864b0 RDI: 0000000000000009
      [  512.124544] RBP: 00007ffda73f5420 R08: 000055cc9c0b0cb0 R09: 0000000000000034
      [  512.124564] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c01864b0
      [  512.124584] R13: 0000000000000009 R14: 000055cc9df484d0 R15: 000055cc9af5d0c0
      [  512.124647]  </TASK>
      
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarPatrik Jakobsson <patrik.r.jakobsson@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220906203852.527663-2-hdegoede@redhat.com
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a6ed7624
    • Vitaly Kuznetsov's avatar
      Drivers: hv: Never allocate anything besides framebuffer from framebuffer memory region · 9812e9ed
      Vitaly Kuznetsov authored
      [ Upstream commit f0880e2c
      
       ]
      
      Passed through PCI device sometimes misbehave on Gen1 VMs when Hyper-V
      DRM driver is also loaded. Looking at IOMEM assignment, we can see e.g.
      
      $ cat /proc/iomem
      ...
      f8000000-fffbffff : PCI Bus 0000:00
        f8000000-fbffffff : 0000:00:08.0
          f8000000-f8001fff : bb8c4f33-2ba2-4808-9f7f-02f3b4da22fe
      ...
      fe0000000-fffffffff : PCI Bus 0000:00
        fe0000000-fe07fffff : bb8c4f33-2ba2-4808-9f7f-02f3b4da22fe
          fe0000000-fe07fffff : 2ba2:00:02.0
            fe0000000-fe07fffff : mlx4_core
      
      the interesting part is the 'f8000000' region as it is actually the
      VM's framebuffer:
      
      $ lspci -v
      ...
      0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA (prog-if 00 [VGA controller])
      	Flags: bus master, fast devsel, latency 0, IRQ 11
      	Memory at f8000000 (32-bit, non-prefetchable) [size=64M]
      ...
      
       hv_vmbus: registering driver hyperv_drm
       hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] Synthvid Version major 3, minor 5
       hyperv_drm 0000:00:08.0: vgaarb: deactivate vga console
       hyperv_drm 0000:00:08.0: BAR 0: can't reserve [mem 0xf8000000-0xfbffffff]
       hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] Cannot request framebuffer, boot fb still active?
      
      Note: "Cannot request framebuffer" is not a fatal error in
      hyperv_setup_gen1() as the code assumes there's some other framebuffer
      device there but we actually have some other PCI device (mlx4 in this
      case) config space there!
      
      The problem appears to be that vmbus_allocate_mmio() can use dedicated
      framebuffer region to serve any MMIO request from any device. The
      semantics one might assume of a parameter named "fb_overlap_ok"
      aren't implemented because !fb_overlap_ok essentially has no effect.
      The existing semantics are really "prefer_fb_overlap". This patch
      implements the expected and needed semantics, which is to not allocate
      from the frame buffer space when !fb_overlap_ok.
      
      Note, Gen2 VMs are usually unaffected by the issue because
      framebuffer region is already taken by EFI fb (in case kernel supports
      it) but Gen1 VMs may have this region unclaimed by the time Hyper-V PCI
      pass-through driver tries allocating MMIO space if Hyper-V DRM/FB drivers
      load after it. Devices can be brought up in any sequence so let's
      resolve the issue by always ignoring 'fb_mmio' region for non-FB
      requests, even if the region is unclaimed.
      
      Reviewed-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Link: https://lore.kernel.org/r/20220827130345.1320254-4-vkuznets@redhat.com
      Signed-off-by: default avatarWei Liu <wei.liu@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9812e9ed
    • Rafael Mendonca's avatar
      block: Do not call blk_put_queue() if gendisk allocation fails · 98756ca2
      Rafael Mendonca authored
      commit aa0c680c upstream.
      
      Commit 6f8191fd ("block: simplify disk shutdown") removed the call
      to blk_get_queue() during gendisk allocation but missed to remove the
      corresponding cleanup code blk_put_queue() for it. Thus, if the gendisk
      allocation fails, the request_queue refcount gets decremented and
      reaches 0, causing blk_mq_release() to be called with a hctx still
      alive. That triggers a WARNING report, as found by syzkaller:
      
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 23016 at block/blk-mq.c:3881
      blk_mq_release+0xf8/0x3e0 block/blk-mq.c:3881
      [...] stripped
      RIP: 0010:blk_mq_release+0xf8/0x3e0 block/blk-mq.c:3881
      [...] stripped
      Call Trace:
       <TASK>
       blk_release_queue+0x153/0x270 block/blk-sysfs.c:780
       kobject_cleanup lib/kobject.c:673 [inline]
       kobject_release lib/kobject.c:704 [inline]
       kref_put include/linux/kref.h:65 [inline]
       kobject_put+0x1c8/0x540 lib/kobject.c:721
       __alloc_disk_node+0x4f7/0x610 block/genhd.c:1388
       __blk_mq_alloc_disk+0x13b/0x1f0 block/blk-mq.c:3961
       loop_add+0x3e2/0xaf0 drivers/block/loop.c:1978
       loop_control_ioctl+0x133/0x620 drivers/block/loop.c:2150
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:870 [inline]
       __se_sys_ioctl fs/ioctl.c:856 [inline]
       __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:856
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [...] stripped
      
      Fixes: 6f8191fd
      
       ("block: simplify disk shutdown")
      Reported-by: default avatar <syzbot+31c9594f6e43b9289b25@syzkaller.appspotmail.com>
      Suggested-by: default avatarHillf Danton <hdanton@sina.com>
      Signed-off-by: default avatarRafael Mendonca <rafaelmendsr@gmail.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220811232338.254673-1-rafaelmendsr@gmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      98756ca2