Skip to content
  1. Aug 03, 2022
    • David Howells's avatar
      watch_queue: Fix missing rcu annotation · 093610f2
      David Howells authored
      
      
      commit e0339f03 upstream.
      
      Since __post_watch_notification() walks wlist->watchers with only the
      RCU read lock held, we need to use RCU methods to add to the list (we
      already use RCU methods to remove from the list).
      
      Fix add_watch_to_object() to use hlist_add_head_rcu() instead of
      hlist_add_head() for that list.
      
      Fixes: c73be61c ("pipe: Add general notification queue support")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      093610f2
    • Nathan Chancellor's avatar
      drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid() · 11c1cc3f
      Nathan Chancellor authored
      commit 0c09bc33 upstream.
      
      When booting a kernel compiled with clang's CFI protection
      (CONFIG_CFI_CLANG), there is a CFI failure in
      drm_simple_kms_crtc_mode_valid() when trying to call
      simpledrm_simple_display_pipe_mode_valid() through ->mode_valid():
      
      [    0.322802] CFI failure (target: simpledrm_simple_display_pipe_mode_valid+0x0/0x8):
      ...
      [    0.324928] Call trace:
      [    0.324969]  __ubsan_handle_cfi_check_fail+0x58/0x60
      [    0.325053]  __cfi_check_fail+0x3c/0x44
      [    0.325120]  __cfi_slowpath_diag+0x178/0x200
      [    0.325192]  drm_simple_kms_crtc_mode_valid+0x58/0x80
      [    0.325279]  __drm_helper_update_and_validate+0x31c/0x464
      ...
      
      The ->mode_valid() member in 'struct drm_simple_display_pipe_funcs'
      expects a return type of 'enum drm_mode_status', not 'int'. Correct it
      to fix the CFI failure.
      
      Cc: stable@vger.kernel.org
      Fixes: 11e8f5fd ("drm: Add simpledrm driver")
      Link: https://github.com/ClangBuiltLinux/linux/issues/1647
      
      
      Reported-by: default avatarTomasz Paweł Gajc <tpgxyz@gmail.com>
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Reviewed-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220725233629.223223-1-nathan@kernel.org
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      11c1cc3f
    • Alistair Popple's avatar
      nouveau/svm: Fix to migrate all requested pages · 121c8993
      Alistair Popple authored
      
      
      commit 66cee909 upstream.
      
      Users may request that pages from an OpenCL SVM allocation be migrated
      to the GPU with clEnqueueSVMMigrateMem(). In Nouveau this will call into
      nouveau_dmem_migrate_vma() to do the migration. If the total range to be
      migrated exceeds SG_MAX_SINGLE_ALLOC the pages will be migrated in
      chunks of size SG_MAX_SINGLE_ALLOC. However a typo in updating the
      starting address means that only the first chunk will get migrated.
      
      Fix the calculation so that the entire range will get migrated if
      possible.
      
      Signed-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Fixes: e3d8b089 ("drm/nouveau/svm: map pages after migration")
      Reviewed-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: default avatarLyude Paul <lyude@redhat.com>
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220720062745.960701-1-apopple@nvidia.com
      
      
      Cc: <stable@vger.kernel.org> # v5.8+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      121c8993
    • Harald Freudenberger's avatar
      s390/archrandom: prevent CPACF trng invocations in interrupt context · 8bd9747d
      Harald Freudenberger authored
      
      
      commit 918e75f7 upstream.
      
      This patch slightly reworks the s390 arch_get_random_seed_{int,long}
      implementation: Make sure the CPACF trng instruction is never
      called in any interrupt context. This is done by adding an
      additional condition in_task().
      
      Justification:
      
      There are some constrains to satisfy for the invocation of the
      arch_get_random_seed_{int,long}() functions:
      - They should provide good random data during kernel initialization.
      - They should not be called in interrupt context as the TRNG
        instruction is relatively heavy weight and may for example
        make some network loads cause to timeout and buck.
      
      However, it was not clear what kind of interrupt context is exactly
      encountered during kernel init or network traffic eventually calling
      arch_get_random_seed_long().
      
      After some days of investigations it is clear that the s390
      start_kernel function is not running in any interrupt context and
      so the trng is called:
      
      Jul 11 18:33:39 t35lp54 kernel:  [<00000001064e90ca>] arch_get_random_seed_long.part.0+0x32/0x70
      Jul 11 18:33:39 t35lp54 kernel:  [<000000010715f246>] random_init+0xf6/0x238
      Jul 11 18:33:39 t35lp54 kernel:  [<000000010712545c>] start_kernel+0x4a4/0x628
      Jul 11 18:33:39 t35lp54 kernel:  [<000000010590402a>] startup_continue+0x2a/0x40
      
      The condition in_task() is true and the CPACF trng provides random data
      during kernel startup.
      
      The network traffic however, is more difficult. A typical call stack
      looks like this:
      
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b5600fc>] extract_entropy.constprop.0+0x23c/0x240
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b560136>] crng_reseed+0x36/0xd8
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b5604b8>] crng_make_state+0x78/0x340
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b5607e0>] _get_random_bytes+0x60/0xf8
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b56108a>] get_random_u32+0xda/0x248
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008aefe7a8>] kfence_guarded_alloc+0x48/0x4b8
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008aeff35e>] __kfence_alloc+0x18e/0x1b8
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008aef7f10>] __kmalloc_node_track_caller+0x368/0x4d8
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b611eac>] kmalloc_reserve+0x44/0xa0
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b611f98>] __alloc_skb+0x90/0x178
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b6120dc>] __napi_alloc_skb+0x5c/0x118
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b8f06b4>] qeth_extract_skb+0x13c/0x680
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b8f6526>] qeth_poll+0x256/0x3f8
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b63d76e>] __napi_poll.constprop.0+0x46/0x2f8
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b63dbec>] net_rx_action+0x1cc/0x408
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b937302>] __do_softirq+0x132/0x6b0
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008abf46ce>] __irq_exit_rcu+0x13e/0x170
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008abf531a>] irq_exit_rcu+0x22/0x50
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b922506>] do_io_irq+0xe6/0x198
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b935826>] io_int_handler+0xd6/0x110
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b9358a6>] psw_idle_exit+0x0/0xa
      Jul 06 17:37:07 t35lp54 kernel: ([<000000008ab9c59a>] arch_cpu_idle+0x52/0xe0)
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b933cfe>] default_idle_call+0x6e/0xd0
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008ac59f4e>] do_idle+0xf6/0x1b0
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008ac5a28e>] cpu_startup_entry+0x36/0x40
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008abb0d90>] smp_start_secondary+0x148/0x158
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b935b9e>] restart_int_handler+0x6e/0x90
      
      which confirms that the call is in softirq context. So in_task() covers exactly
      the cases where we want to have CPACF trng called: not in nmi, not in hard irq,
      not in soft irq but in normal task context and during kernel init.
      
      Signed-off-by: default avatarHarald Freudenberger <freude@linux.ibm.com>
      Acked-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: default avatarJuergen Christ <jchrist@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220713131721.257907-1-freude@linux.ibm.com
      
      
      Fixes: e4f74400 ("s390/archrandom: simplify back to earlier design and initialize earlier")
      [agordeev@linux.ibm.com changed desc, added Fixes and Link, removed -stable]
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8bd9747d
    • Lukas Bulwahn's avatar
      asm-generic: remove a broken and needless ifdef conditional · 71f71150
      Lukas Bulwahn authored
      
      
      commit e2a619ca upstream.
      
      Commit 527701ed ("lib: Add a generic version of devmem_is_allowed()")
      introduces the config symbol GENERIC_LIB_DEVMEM_IS_ALLOWED, but then
      falsely refers to CONFIG_GENERIC_DEVMEM_IS_ALLOWED (note the missing LIB
      in the reference) in ./include/asm-generic/io.h.
      
      Luckily, ./scripts/checkkconfigsymbols.py warns on non-existing configs:
      
      GENERIC_DEVMEM_IS_ALLOWED
      Referencing files: include/asm-generic/io.h
      
      The actual fix, though, is simply to not to make this function declaration
      dependent on any kernel config. For architectures that intend to use
      the generic version, the arch's 'select GENERIC_LIB_DEVMEM_IS_ALLOWED' will
      lead to picking the function definition, and for other architectures, this
      function is simply defined elsewhere.
      
      The wrong '#ifndef' on a non-existing config symbol also always had the
      same effect (although more by mistake than by intent). So, there is no
      functional change.
      
      Remove this broken and needless ifdef conditional.
      
      Fixes: 527701ed ("lib: Add a generic version of devmem_is_allowed()")
      Signed-off-by: default avatarLukas Bulwahn <lukas.bulwahn@gmail.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71f71150
    • Miaohe Lin's avatar
      hugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte · dc124c84
      Miaohe Lin authored
      commit da9a298f upstream.
      
      When alloc_huge_page fails, *pagep is set to NULL without put_page first.
      So the hugepage indicated by *pagep is leaked.
      
      Link: https://lkml.kernel.org/r/20220709092629.54291-1-linmiaohe@huawei.com
      
      
      Fixes: 8cc5fcbb ("mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Acked-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dc124c84
    • Josef Bacik's avatar
      mm: fix page leak with multiple threads mapping the same page · 2722fb0f
      Josef Bacik authored
      commit 3fe2895c upstream.
      
      We have an application with a lot of threads that use a shared mmap backed
      by tmpfs mounted with -o huge=within_size.  This application started
      leaking loads of huge pages when we upgraded to a recent kernel.
      
      Using the page ref tracepoints and a BPF program written by Tejun Heo we
      were able to determine that these pages would have multiple refcounts from
      the page fault path, but when it came to unmap time we wouldn't drop the
      number of refs we had added from the faults.
      
      I wrote a reproducer that mmap'ed a file backed by tmpfs with -o
      huge=always, and then spawned 20 threads all looping faulting random
      offsets in this map, while using madvise(MADV_DONTNEED) randomly for huge
      page aligned ranges.  This very quickly reproduced the problem.
      
      The problem here is that we check for the case that we have multiple
      threads faulting in a range that was previously unmapped.  One thread maps
      the PMD, the other thread loses the race and then returns 0.  However at
      this point we already have the page, and we are no longer putting this
      page into the processes address space, and so we leak the page.  We
      actually did the correct thing prior to f9ce0be7, however it looks
      like Kirill copied what we do in the anonymous page case.  In the
      anonymous page case we don't yet have a page, so we don't have to drop a
      reference on anything.  Previously we did the correct thing for file based
      faults by returning VM_FAULT_NOPAGE so we correctly drop the reference on
      the page we faulted in.
      
      Fix this by returning VM_FAULT_NOPAGE in the pmd_devmap_trans_unstable()
      case, this makes us drop the ref on the page properly, and now my
      reproducer no longer leaks the huge pages.
      
      [josef@toxicpanda.com: v2]
        Link: https://lkml.kernel.org/r/e90c8f0dbae836632b669c2afc434006a00d4a67.1657721478.git.josef@toxicpanda.com
      Link: https://lkml.kernel.org/r/2b798acfd95c9ab9395fe85e8d5a835e2e10a920.1657051137.git.josef@toxicpanda.com
      
      
      Fixes: f9ce0be7 ("mm: Cleanup faultaround and finish_fault() codepaths")
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarRik van Riel <riel@surriel.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2722fb0f
    • Mike Rapoport's avatar
      secretmem: fix unhandled fault in truncate · 70d0ce33
      Mike Rapoport authored
      commit 84ac0130 upstream.
      
      syzkaller reports the following issue:
      
      BUG: unable to handle page fault for address: ffff888021f7e005
      PGD 11401067 P4D 11401067 PUD 11402067 PMD 21f7d063 PTE 800fffffde081060
      Oops: 0002 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 3761 Comm: syz-executor281 Not tainted 5.19.0-rc4-syzkaller-00014-g941e3e791269 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:memset_erms+0x9/0x10 arch/x86/lib/memset_64.S:64
      Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
      RSP: 0018:ffffc9000329fa90 EFLAGS: 00010202
      RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000ffb
      RDX: 0000000000000ffb RSI: 0000000000000000 RDI: ffff888021f7e005
      RBP: ffffea000087df80 R08: 0000000000000001 R09: ffff888021f7e005
      R10: ffffed10043efdff R11: 0000000000000000 R12: 0000000000000005
      R13: 0000000000000000 R14: 0000000000001000 R15: 0000000000000ffb
      FS:  00007fb29d8b2700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff888021f7e005 CR3: 0000000026e7b000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       zero_user_segments include/linux/highmem.h:272 [inline]
       folio_zero_range include/linux/highmem.h:428 [inline]
       truncate_inode_partial_folio+0x76a/0xdf0 mm/truncate.c:237
       truncate_inode_pages_range+0x83b/0x1530 mm/truncate.c:381
       truncate_inode_pages mm/truncate.c:452 [inline]
       truncate_pagecache+0x63/0x90 mm/truncate.c:753
       simple_setattr+0xed/0x110 fs/libfs.c:535
       secretmem_setattr+0xae/0xf0 mm/secretmem.c:170
       notify_change+0xb8c/0x12b0 fs/attr.c:424
       do_truncate+0x13c/0x200 fs/open.c:65
       do_sys_ftruncate+0x536/0x730 fs/open.c:193
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7fb29d900899
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 11 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fb29d8b2318 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
      RAX: ffffffffffffffda RBX: 00007fb29d988408 RCX: 00007fb29d900899
      RDX: 00007fb29d900899 RSI: 0000000000000005 RDI: 0000000000000003
      RBP: 00007fb29d988400 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb29d98840c
      R13: 00007ffca01a23bf R14: 00007fb29d8b2400 R15: 0000000000022000
       </TASK>
      Modules linked in:
      CR2: ffff888021f7e005
      ---[ end trace 0000000000000000 ]---
      
      Eric Biggers suggested that this happens when
      secretmem_setattr()->simple_setattr() races with secretmem_fault() so that
      a page that is faulted in by secretmem_fault() (and thus removed from the
      direct map) is zeroed by inode truncation right afterwards.
      
      Use mapping->invalidate_lock to make secretmem_fault() and
      secretmem_setattr() mutually exclusive.
      
      [rppt@linux.ibm.com: v3]
        Link: https://lkml.kernel.org/r/20220714091337.412297-1-rppt@kernel.org
      Link: https://lkml.kernel.org/r/20220707165650.248088-1-rppt@kernel.org
      
      
      Reported-by: default avatar <syzbot+9bd2b7adbd34b30b87e4@syzkaller.appspotmail.com>
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Suggested-by: default avatarEric Biggers <ebiggers@kernel.org>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70d0ce33
    • Andrei Vagin's avatar
      fs: sendfile handles O_NONBLOCK of out_fd · 3ef8040a
      Andrei Vagin authored
      commit bdeb77bc upstream.
      
      sendfile has to return EAGAIN if out_fd is nonblocking and the write into
      it would block.
      
      Here is a small reproducer for the problem:
      
      #define _GNU_SOURCE /* See feature_test_macros(7) */
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <errno.h>
      #include <sys/stat.h>
      #include <sys/types.h>
      #include <sys/sendfile.h>
      
      
      #define FILE_SIZE (1UL << 30)
      int main(int argc, char **argv) {
              int p[2], fd;
      
              if (pipe2(p, O_NONBLOCK))
                      return 1;
      
              fd = open(argv[1], O_RDWR | O_TMPFILE, 0666);
              if (fd < 0)
                      return 1;
              ftruncate(fd, FILE_SIZE);
      
              if (sendfile(p[1], fd, 0, FILE_SIZE) == -1) {
                      fprintf(stderr, "FAIL\n");
              }
              if (sendfile(p[1], fd, 0, FILE_SIZE) != -1 || errno != EAGAIN) {
                      fprintf(stderr, "FAIL\n");
              }
              return 0;
      }
      
      It worked before b964bf53, it is stuck after b964bf53, and it
      works again with this fix.
      
      This regression occurred because do_splice_direct() calls pipe_write
      that handles O_NONBLOCK.  Here is a trace log from the reproducer:
      
       1)               |  __x64_sys_sendfile64() {
       1)               |    do_sendfile() {
       1)               |      __fdget()
       1)               |      rw_verify_area()
       1)               |      __fdget()
       1)               |      rw_verify_area()
       1)               |      do_splice_direct() {
       1)               |        rw_verify_area()
       1)               |        splice_direct_to_actor() {
       1)               |          do_splice_to() {
       1)               |            rw_verify_area()
       1)               |            generic_file_splice_read()
       1) + 74.153 us   |          }
       1)               |          direct_splice_actor() {
       1)               |            iter_file_splice_write() {
       1)               |              __kmalloc()
       1)   0.148 us    |              pipe_lock();
       1)   0.153 us    |              splice_from_pipe_next.part.0();
       1)   0.162 us    |              page_cache_pipe_buf_confirm();
      ... 16 times
       1)   0.159 us    |              page_cache_pipe_buf_confirm();
       1)               |              vfs_iter_write() {
       1)               |                do_iter_write() {
       1)               |                  rw_verify_area()
       1)               |                  do_iter_readv_writev() {
       1)               |                    pipe_write() {
       1)               |                      mutex_lock()
       1)   0.153 us    |                      mutex_unlock();
       1)   1.368 us    |                    }
       1)   1.686 us    |                  }
       1)   5.798 us    |                }
       1)   6.084 us    |              }
       1)   0.174 us    |              kfree();
       1)   0.152 us    |              pipe_unlock();
       1) + 14.461 us   |            }
       1) + 14.783 us   |          }
       1)   0.164 us    |          page_cache_pipe_buf_release();
      ... 16 times
       1)   0.161 us    |          page_cache_pipe_buf_release();
       1)               |          touch_atime()
       1) + 95.854 us   |        }
       1) + 99.784 us   |      }
       1) ! 107.393 us  |    }
       1) ! 107.699 us  |  }
      
      Link: https://lkml.kernel.org/r/20220415005015.525191-1-avagin@gmail.com
      
      
      Fixes: b964bf53 ("teach sendfile(2) to handle send-to-pipe directly")
      Signed-off-by: default avatarAndrei Vagin <avagin@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ef8040a
    • ChenXiaoSong's avatar
      ntfs: fix use-after-free in ntfs_ucsncmp() · 518df26b
      ChenXiaoSong authored
      commit 38c9c22a upstream.
      
      Syzkaller reported use-after-free bug as follows:
      
      ==================================================================
      BUG: KASAN: use-after-free in ntfs_ucsncmp+0x123/0x130
      Read of size 2 at addr ffff8880751acee8 by task a.out/879
      
      CPU: 7 PID: 879 Comm: a.out Not tainted 5.19.0-rc4-next-20220630-00001-gcc5218c8bd2c-dirty #7
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x1c0/0x2b0
       print_address_description.constprop.0.cold+0xd4/0x484
       print_report.cold+0x55/0x232
       kasan_report+0xbf/0xf0
       ntfs_ucsncmp+0x123/0x130
       ntfs_are_names_equal.cold+0x2b/0x41
       ntfs_attr_find+0x43b/0xb90
       ntfs_attr_lookup+0x16d/0x1e0
       ntfs_read_locked_attr_inode+0x4aa/0x2360
       ntfs_attr_iget+0x1af/0x220
       ntfs_read_locked_inode+0x246c/0x5120
       ntfs_iget+0x132/0x180
       load_system_files+0x1cc6/0x3480
       ntfs_fill_super+0xa66/0x1cf0
       mount_bdev+0x38d/0x460
       legacy_get_tree+0x10d/0x220
       vfs_get_tree+0x93/0x300
       do_new_mount+0x2da/0x6d0
       path_mount+0x496/0x19d0
       __x64_sys_mount+0x284/0x300
       do_syscall_64+0x3b/0xc0
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7f3f2118d9ea
      Code: 48 8b 0d a9 f4 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 76 f4 0b 00 f7 d8 64 89 01 48
      RSP: 002b:00007ffc269deac8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3f2118d9ea
      RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffc269dec00
      RBP: 00007ffc269dec80 R08: 00007ffc269deb00 R09: 00007ffc269dec44
      R10: 0000000000000000 R11: 0000000000000202 R12: 000055f81ab1d220
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
       </TASK>
      
      The buggy address belongs to the physical page:
      page:0000000085430378 refcount:1 mapcount:1 mapping:0000000000000000 index:0x555c6a81d pfn:0x751ac
      memcg:ffff888101f7e180
      anon flags: 0xfffffc00a0014(uptodate|lru|mappedtodisk|swapbacked|node=0|zone=1|lastcpupid=0x1fffff)
      raw: 000fffffc00a0014 ffffea0001bf2988 ffffea0001de2448 ffff88801712e201
      raw: 0000000555c6a81d 0000000000000000 0000000100000000 ffff888101f7e180
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8880751acd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff8880751ace00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      >ffff8880751ace80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
                                                                ^
       ffff8880751acf00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff8880751acf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      ==================================================================
      
      The reason is that struct ATTR_RECORD->name_offset is 6485, end address of
      name string is out of bounds.
      
      Fix this by adding sanity check on end address of attribute name string.
      
      [akpm@linux-foundation.org: coding-style cleanups]
      [chenxiaosong2@huawei.com: cleanup suggested by Hawkins Jiawei]
        Link: https://lkml.kernel.org/r/20220709064511.3304299-1-chenxiaosong2@huawei.com
      Link: https://lkml.kernel.org/r/20220707105329.4020708-1-chenxiaosong2@huawei.com
      
      
      Signed-off-by: default avatarChenXiaoSong <chenxiaosong2@huawei.com>
      Signed-off-by: default avatarHawkins Jiawei <yin31149@gmail.com>
      Cc: Anton Altaparmakov <anton@tuxera.com>
      Cc: ChenXiaoSong <chenxiaosong2@huawei.com>
      Cc: Yongqiang Liu <liuyongqiang13@huawei.com>
      Cc: Zhang Yi <yi.zhang@huawei.com>
      Cc: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      518df26b
    • Junxiao Bi's avatar
      Revert "ocfs2: mount shared volume without ha stack" · 46f6301f
      Junxiao Bi authored
      commit c80af0c2 upstream.
      
      This reverts commit 912f655d.
      
      This commit introduced a regression that can cause mount hung.  The
      changes in __ocfs2_find_empty_slot causes that any node with none-zero
      node number can grab the slot that was already taken by node 0, so node 1
      will access the same journal with node 0, when it try to grab journal
      cluster lock, it will hung because it was already acquired by node 0.
      It's very easy to reproduce this, in one cluster, mount node 0 first, then
      node 1, you will see the following call trace from node 1.
      
      [13148.735424] INFO: task mount.ocfs2:53045 blocked for more than 122 seconds.
      [13148.739691]       Not tainted 5.15.0-2148.0.4.el8uek.mountracev2.x86_64 #2
      [13148.742560] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [13148.745846] task:mount.ocfs2     state:D stack:    0 pid:53045 ppid: 53044 flags:0x00004000
      [13148.749354] Call Trace:
      [13148.750718]  <TASK>
      [13148.752019]  ? usleep_range+0x90/0x89
      [13148.753882]  __schedule+0x210/0x567
      [13148.755684]  schedule+0x44/0xa8
      [13148.757270]  schedule_timeout+0x106/0x13c
      [13148.759273]  ? __prepare_to_swait+0x53/0x78
      [13148.761218]  __wait_for_common+0xae/0x163
      [13148.763144]  __ocfs2_cluster_lock.constprop.0+0x1d6/0x870 [ocfs2]
      [13148.765780]  ? ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
      [13148.768312]  ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
      [13148.770968]  ocfs2_journal_init+0x91/0x340 [ocfs2]
      [13148.773202]  ocfs2_check_volume+0x39/0x461 [ocfs2]
      [13148.775401]  ? iput+0x69/0xba
      [13148.777047]  ocfs2_mount_volume.isra.0.cold+0x40/0x1f5 [ocfs2]
      [13148.779646]  ocfs2_fill_super+0x54b/0x853 [ocfs2]
      [13148.781756]  mount_bdev+0x190/0x1b7
      [13148.783443]  ? ocfs2_remount+0x440/0x440 [ocfs2]
      [13148.785634]  legacy_get_tree+0x27/0x48
      [13148.787466]  vfs_get_tree+0x25/0xd0
      [13148.789270]  do_new_mount+0x18c/0x2d9
      [13148.791046]  __x64_sys_mount+0x10e/0x142
      [13148.792911]  do_syscall_64+0x3b/0x89
      [13148.794667]  entry_SYSCALL_64_after_hwframe+0x170/0x0
      [13148.797051] RIP: 0033:0x7f2309f6e26e
      [13148.798784] RSP: 002b:00007ffdcee7d408 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
      [13148.801974] RAX: ffffffffffffffda RBX: 00007ffdcee7d4a0 RCX: 00007f2309f6e26e
      [13148.804815] RDX: 0000559aa762a8ae RSI: 0000559aa939d340 RDI: 0000559aa93a22b0
      [13148.807719] RBP: 00007ffdcee7d5b0 R08: 0000559aa93a2290 R09: 00007f230a0b4820
      [13148.810659] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcee7d420
      [13148.813609] R13: 0000000000000000 R14: 0000559aa939f000 R15: 0000000000000000
      [13148.816564]  </TASK>
      
      To fix it, we can just fix __ocfs2_find_empty_slot.  But original commit
      introduced the feature to mount ocfs2 locally even it is cluster based,
      that is a very dangerous, it can easily cause serious data corruption,
      there is no way to stop other nodes mounting the fs and corrupting it.
      Setup ha or other cluster-aware stack is just the cost that we have to
      take for avoiding corruption, otherwise we have to do it in kernel.
      
      Link: https://lkml.kernel.org/r/20220603222801.42488-1-junxiao.bi@oracle.com
      
      
      Fixes: 912f655d("ocfs2: mount shared volume without ha stack")
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Acked-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <heming.zhao@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      46f6301f
    • Luiz Augusto von Dentz's avatar
      Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put · f32d5615
      Luiz Augusto von Dentz authored
      
      
      commit d0be8347 upstream.
      
      This fixes the following trace which is caused by hci_rx_work starting up
      *after* the final channel reference has been put() during sock_close() but
      *before* the references to the channel have been destroyed, so instead
      the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to
      prevent referencing a channel that is about to be destroyed.
      
        refcount_t: increment on 0; use-after-free.
        BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0
        Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705
      
        CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S      W
        4.14.234-00003-g1fb6d0bd49a4-dirty #28
        Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150
        Google Inc. MSM sm8150 Flame DVT (DT)
        Workqueue: hci0 hci_rx_work
        Call trace:
         dump_backtrace+0x0/0x378
         show_stack+0x20/0x2c
         dump_stack+0x124/0x148
         print_address_description+0x80/0x2e8
         __kasan_report+0x168/0x188
         kasan_report+0x10/0x18
         __asan_load4+0x84/0x8c
         refcount_dec_and_test+0x20/0xd0
         l2cap_chan_put+0x48/0x12c
         l2cap_recv_frame+0x4770/0x6550
         l2cap_recv_acldata+0x44c/0x7a4
         hci_acldata_packet+0x100/0x188
         hci_rx_work+0x178/0x23c
         process_one_work+0x35c/0x95c
         worker_thread+0x4cc/0x960
         kthread+0x1a8/0x1c4
         ret_from_fork+0x10/0x18
      
      Cc: stable@kernel.org
      Reported-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Tested-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f32d5615
  2. Jul 29, 2022