Skip to content
  1. Feb 23, 2022
    • John David Anglin's avatar
      parisc: Fix sglist access in ccio-dma.c · 77567168
      John David Anglin authored
      commit d7da660c
      
       upstream.
      
      This patch implements the same bug fix to ccio-dma.c as to sba_iommu.c.
      It ensures that only the allocated entries of the sglist are accessed.
      
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      77567168
    • John David Anglin's avatar
      parisc: Fix data TLB miss in sba_unmap_sg · f8f519d7
      John David Anglin authored
      commit b7d6f44a
      
       upstream.
      
      Rolf Eike Beer reported the following bug:
      
      [1274934.746891] Bad Address (null pointer deref?): Code=15 (Data TLB miss fault) at addr 0000004140000018
      [1274934.746891] CPU: 3 PID: 5549 Comm: cmake Not tainted 5.15.4-gentoo-parisc64 #4
      [1274934.746891] Hardware name: 9000/785/C8000
      [1274934.746891]
      [1274934.746891]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
      [1274934.746891] PSW: 00001000000001001111111000001110 Not tainted
      [1274934.746891] r00-03  000000ff0804fe0e 0000000040bc9bc0 00000000406760e4 0000004140000000
      [1274934.746891] r04-07  0000000040b693c0 0000004140000000 000000004a2b08b0 0000000000000001
      [1274934.746891] r08-11  0000000041f98810 0000000000000000 000000004a0a7000 0000000000000001
      [1274934.746891] r12-15  0000000040bddbc0 0000000040c0cbc0 0000000040bddbc0 0000000040bddbc0
      [1274934.746891] r16-19  0000000040bde3c0 0000000040bddbc0 0000000040bde3c0 0000000000000007
      [1274934.746891] r20-23  0000000000000006 000000004a368950 0000000000000000 0000000000000001
      [1274934.746891] r24-27  0000000000001fff 000000000800000e 000000004a1710f0 0000000040b693c0
      [1274934.746891] r28-31  0000000000000001 0000000041f988b0 0000000041f98840 000000004a171118
      [1274934.746891] sr00-03  00000000066e5800 0000000000000000 0000000000000000 00000000066e5800
      [1274934.746891] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
      [1274934.746891]
      [1274934.746891] IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000406760e8 00000000406760ec
      [1274934.746891]  IIR: 48780030    ISR: 0000000000000000  IOR: 0000004140000018
      [1274934.746891]  CPU:        3   CR30: 00000040e3a9c000 CR31: ffffffffffffffff
      [1274934.746891]  ORIG_R28: 0000000040acdd58
      [1274934.746891]  IAOQ[0]: sba_unmap_sg+0xb0/0x118
      [1274934.746891]  IAOQ[1]: sba_unmap_sg+0xb4/0x118
      [1274934.746891]  RP(r2): sba_unmap_sg+0xac/0x118
      [1274934.746891] Backtrace:
      [1274934.746891]  [<00000000402740cc>] dma_unmap_sg_attrs+0x6c/0x70
      [1274934.746891]  [<000000004074d6bc>] scsi_dma_unmap+0x54/0x60
      [1274934.746891]  [<00000000407a3488>] mptscsih_io_done+0x150/0xd70
      [1274934.746891]  [<0000000040798600>] mpt_interrupt+0x168/0xa68
      [1274934.746891]  [<0000000040255a48>] __handle_irq_event_percpu+0xc8/0x278
      [1274934.746891]  [<0000000040255c34>] handle_irq_event_percpu+0x3c/0xd8
      [1274934.746891]  [<000000004025ecb4>] handle_percpu_irq+0xb4/0xf0
      [1274934.746891]  [<00000000402548e0>] generic_handle_irq+0x50/0x70
      [1274934.746891]  [<000000004019a254>] call_on_stack+0x18/0x24
      [1274934.746891]
      [1274934.746891] Kernel panic - not syncing: Bad Address (null pointer deref?)
      
      The bug is caused by overrunning the sglist and incorrectly testing
      sg_dma_len(sglist) before nents. Normally this doesn't cause a crash,
      but in this case sglist crossed a page boundary. This occurs in the
      following code:
      
      	while (sg_dma_len(sglist) && nents--) {
      
      The fix is simply to test nents first and move the decrement of nents
      into the loop.
      
      Reported-by: default avatarRolf Eike Beer <eike-kernel@sf-tec.de>
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f8f519d7
    • John David Anglin's avatar
      parisc: Drop __init from map_pages declaration · 4d569b95
      John David Anglin authored
      commit 9129886b
      
       upstream.
      
      With huge kernel pages, we randomly eat a SPARC in map_pages(). This
      is fixed by dropping __init from the declaration.
      
      However, map_pages references the __init routine memblock_alloc_try_nid
      via memblock_alloc.  Thus, it needs to be marked with __ref.
      
      memblock_alloc is only called before the kernel text is set to readonly.
      
      The __ref on free_initmem is no longer needed.
      
      Comment regarding map_pages being in the init section is removed.
      
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org # v5.4+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d569b95
    • Randy Dunlap's avatar
      serial: parisc: GSC: fix build when IOSAPIC is not set · 8e3f9a09
      Randy Dunlap authored
      commit 6e879367
      
       upstream.
      
      There is a build error when using a kernel .config file from
      'kernel test robot' for a different build problem:
      
      hppa64-linux-ld: drivers/tty/serial/8250/8250_gsc.o: in function `.LC3':
      (.data.rel.ro+0x18): undefined reference to `iosapic_serial_irq'
      
      when:
        CONFIG_GSC=y
        CONFIG_SERIO_GSCPS2=y
        CONFIG_SERIAL_8250_GSC=y
        CONFIG_PCI is not set
          and hence PCI_LBA is not set.
        IOSAPIC depends on PCI_LBA, so IOSAPIC is not set/enabled.
      
      Make the use of iosapic_serial_irq() conditional to fix the build error.
      
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: linux-parisc@vger.kernel.org
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: linux-serial@vger.kernel.org
      Cc: Jiri Slaby <jirislaby@kernel.org>
      Cc: Johan Hovold <johan@kernel.org>
      Suggested-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e3f9a09
    • Sean Christopherson's avatar
      Revert "svm: Add warning message for AVIC IPI invalid target" · fe383750
      Sean Christopherson authored
      commit dd4589ee upstream.
      
      Remove a WARN on an "AVIC IPI invalid target" exit, the WARN is trivial
      to trigger from guest as it will fail on any destination APIC ID that
      doesn't exist from the guest's perspective.
      
      Don't bother recording anything in the kernel log, the common tracepoint
      for kvm_avic_incomplete_ipi() is sufficient for debugging.
      
      This reverts commit 37ef0c44
      
      .
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220204214205.3306634-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe383750
    • Sergio Costas's avatar
      HID:Add support for UGTABLET WP5540 · 126382b5
      Sergio Costas authored
      commit fd5dd6ac
      
       upstream.
      
      This patch adds support for the UGTABLET WP5540 digitizer tablet
      devices. Without it, the pen moves the cursor, but neither the
      buttons nor the tap sensor in the tip do work.
      
      Signed-off-by: default avatarSergio Costas <rastersoft@gmail.com>
      Link: https://lore.kernel.org/r/63dece1d-91ca-1b1b-d90d-335be66896be@gmail.com
      
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      126382b5
    • James Smart's avatar
      scsi: lpfc: Fix mailbox command failure during driver initialization · f100e758
      James Smart authored
      commit efe1dc57 upstream.
      
      Contention for the mailbox interface may occur during driver initialization
      (immediately after a function reset), between mailbox commands initiated
      via ioctl (bsg) and those driver requested by the driver.
      
      After setting SLI_ACTIVE flag for a port, there is a window in which the
      driver will allow an ioctl to be initiated while the adapter is
      initializing and issuing mailbox commands via polling. The polling logic
      then gets confused.
      
      Correct by having thread setting SLI_ACTIVE spot an active mailbox command
      and allow it complete before proceeding.
      
      Link: https://lore.kernel.org/r/20210921143008.64212-1-jsmart2021@gmail.com
      
      
      Co-developed-by: default avatarNigel Kirkland <nkirkland2304@gmail.com>
      Signed-off-by: default avatarNigel Kirkland <nkirkland2304@gmail.com>
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f100e758
    • Oliver Hartkopp's avatar
      can: isotp: add SF_BROADCAST support for functional addressing · 4578b979
      Oliver Hartkopp authored
      commit 921ca574
      
       upstream.
      
      When CAN_ISOTP_SF_BROADCAST is set in the CAN_ISOTP_OPTS flags the CAN_ISOTP
      socket is switched into functional addressing mode, where only single frame
      (SF) protocol data units can be send on the specified CAN interface and the
      given tp.tx_id after bind().
      
      In opposite to normal and extended addressing this socket does not register a
      CAN-ID for reception which would be needed for a 1-to-1 ISOTP connection with a
      segmented bi-directional data transfer.
      
      Sending SFs on this socket is therefore a TX-only 'broadcast' operation.
      
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarThomas Wagner <thwa1@web.de>
      Link: https://lore.kernel.org/r/20201206144731.4609-1-socketcan@hartkopp.net
      
      
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4578b979
    • Norbert Slusarek's avatar
      can: isotp: prevent race between isotp_bind() and isotp_setsockopt() · 5d42865f
      Norbert Slusarek authored
      commit 2b17c400 upstream.
      
      A race condition was found in isotp_setsockopt() which allows to
      change socket options after the socket was bound.
      For the specific case of SF_BROADCAST support, this might lead to possible
      use-after-free because can_rx_unregister() is not called.
      
      Checking for the flag under the socket lock in isotp_bind() and taking
      the lock in isotp_setsockopt() fixes the issue.
      
      Fixes: 921ca574 ("can: isotp: add SF_BROADCAST support for functional addressing")
      Link: https://lore.kernel.org/r/trinity-e6ae9efa-9afb-4326-84c0-f3609b9b8168-1620773528307@3c-app-gmx-bs06
      
      
      Reported-by: default avatarNorbert Slusarek <nslusarek@gmx.net>
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Signed-off-by: default avatarNorbert Slusarek <nslusarek@gmx.net>
      Acked-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d42865f
    • Yang Shi's avatar
      fs/proc: task_mmu.c: don't read mapcount for migration entry · db3f3636
      Yang Shi authored
      commit 24d7275c upstream.
      
      The syzbot reported the below BUG:
      
        kernel BUG at include/linux/page-flags.h:785!
        invalid opcode: 0000 [#1] PREEMPT SMP KASAN
        CPU: 1 PID: 4392 Comm: syz-executor560 Not tainted 5.16.0-rc6-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:PageDoubleMap include/linux/page-flags.h:785 [inline]
        RIP: 0010:__page_mapcount+0x2d2/0x350 mm/util.c:744
        Call Trace:
          page_mapcount include/linux/mm.h:837 [inline]
          smaps_account+0x470/0xb10 fs/proc/task_mmu.c:466
          smaps_pte_entry fs/proc/task_mmu.c:538 [inline]
          smaps_pte_range+0x611/0x1250 fs/proc/task_mmu.c:601
          walk_pmd_range mm/pagewalk.c:128 [inline]
          walk_pud_range mm/pagewalk.c:205 [inline]
          walk_p4d_range mm/pagewalk.c:240 [inline]
          walk_pgd_range mm/pagewalk.c:277 [inline]
          __walk_page_range+0xe23/0x1ea0 mm/pagewalk.c:379
          walk_page_vma+0x277/0x350 mm/pagewalk.c:530
          smap_gather_stats.part.0+0x148/0x260 fs/proc/task_mmu.c:768
          smap_gather_stats fs/proc/task_mmu.c:741 [inline]
          show_smap+0xc6/0x440 fs/proc/task_mmu.c:822
          seq_read_iter+0xbb0/0x1240 fs/seq_file.c:272
          seq_read+0x3e0/0x5b0 fs/seq_file.c:162
          vfs_read+0x1b5/0x600 fs/read_write.c:479
          ksys_read+0x12d/0x250 fs/read_write.c:619
          do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
          entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The reproducer was trying to read /proc/$PID/smaps when calling
      MADV_FREE at the mean time.  MADV_FREE may split THPs if it is called
      for partial THP.  It may trigger the below race:
      
                 CPU A                         CPU B
                 -----                         -----
        smaps walk:                      MADV_FREE:
        page_mapcount()
          PageCompound()
                                         split_huge_page()
          page = compound_head(page)
          PageDoubleMap(page)
      
      When calling PageDoubleMap() this page is not a tail page of THP anymore
      so the BUG is triggered.
      
      This could be fixed by elevated refcount of the page before calling
      mapcount, but that would prevent it from counting migration entries, and
      it seems overkilling because the race just could happen when PMD is
      split so all PTE entries of tail pages are actually migration entries,
      and smaps_account() does treat migration entries as mapcount == 1 as
      Kirill pointed out.
      
      Add a new parameter for smaps_account() to tell this entry is migration
      entry then skip calling page_mapcount().  Don't skip getting mapcount
      for device private entries since they do track references with mapcount.
      
      Pagemap also has the similar issue although it was not reported.  Fixed
      it as well.
      
      [shy828301@gmail.com: v4]
        Link: https://lkml.kernel.org/r/20220203182641.824731-1-shy828301@gmail.com
      [nathan@kernel.org: avoid unused variable warning in pagemap_pmd_range()]
        Link: https://lkml.kernel.org/r/20220207171049.1102239-1-nathan@kernel.org
      Link: https://lkml.kernel.org/r/20220120202805.3369-1-shy828301@gmail.com
      Fixes: e9b61f19
      
       ("thp: reintroduce split_huge_page()")
      Signed-off-by: default avatarYang Shi <shy828301@gmail.com>
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Reported-by: default avatar <syzbot+1f52b3a18d5633fa7f82@syzkaller.appspotmail.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db3f3636
    • Linus Torvalds's avatar
      fget: clarify and improve __fget_files() implementation · 0849f83e
      Linus Torvalds authored
      commit e386dfc5 upstream.
      
      Commit 054aa8d4 ("fget: check that the fd still exists after getting
      a ref to it") fixed a race with getting a reference to a file just as it
      was being closed.  It was a fairly minimal patch, and I didn't think
      re-checking the file pointer lookup would be a measurable overhead,
      since it was all right there and cached.
      
      But I was wrong, as pointed out by the kernel test robot.
      
      The 'poll2' case of the will-it-scale.per_thread_ops benchmark regressed
      quite noticeably.  Admittedly it seems to be a very artificial test:
      doing "poll()" system calls on regular files in a very tight loop in
      multiple threads.
      
      That means that basically all the time is spent just looking up file
      descriptors without ever doing anything useful with them (not that doing
      'poll()' on a regular file is useful to begin with).  And as a result it
      shows the extra "re-check fd" cost as a sore thumb.
      
      Happily, the regression is fixable by just writing the code to loook up
      the fd to be better and clearer.  There's still a cost to verify the
      file pointer, but now it's basically in the noise even for that
      benchmark that does nothing else - and the code is more understandable
      and has better comments too.
      
      [ Side note: this patch is also a classic case of one that looks very
        messy with the default greedy Myers diff - it's much more legible with
        either the patience of histogram diff algorithm ]
      
      Link: https://lore.kernel.org/lkml/20211210053743.GA36420@xsang-OptiPlex-9020/
      Link: https://lore.kernel.org/lkml/20211213083154.GA20853@linux.intel.com/
      
      
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Tested-by: default avatarCarel Si <beibei.si@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0849f83e
    • Paul E. McKenney's avatar
      rcu: Do not report strict GPs for outgoing CPUs · 657991fb
      Paul E. McKenney authored
      commit bfb3aa73 upstream.
      
      An outgoing CPU is marked offline in a stop-machine handler and most
      of that CPU's services stop at that point, including IRQ work queues.
      However, that CPU must take another pass through the scheduler and through
      a number of CPU-hotplug notifiers, many of which contain RCU readers.
      In the past, these readers were not a problem because the outgoing CPU
      has interrupts disabled, so that rcu_read_unlock_special() would not
      be invoked, and thus RCU would never attempt to queue IRQ work on the
      outgoing CPU.
      
      This changed with the advent of the CONFIG_RCU_STRICT_GRACE_PERIOD
      Kconfig option, in which rcu_read_unlock_special() is invoked upon exit
      from almost all RCU read-side critical sections.  Worse yet, because
      interrupts are disabled, rcu_read_unlock_special() cannot immediately
      report a quiescent state and will therefore attempt to defer this
      reporting, for example, by queueing IRQ work.  Which fails with a splat
      because the CPU is already marked as being offline.
      
      But it turns out that there is no need to report this quiescent state
      because rcu_report_dead() will do this job shortly after the outgoing
      CPU makes its final dive into the idle loop.  This commit therefore
      makes rcu_read_unlock_special() refrain from queuing IRQ work onto
      outgoing CPUs.
      
      Fixes: 44bad5b3
      
       ("rcu: Do full report for .need_qs for strict GPs")
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Cc: Jann Horn <jannh@google.com>
      Signed-off-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      657991fb
    • Roman Gushchin's avatar
      mm: memcg: synchronize objcg lists with a dedicated spinlock · 8c838597
      Roman Gushchin authored
      commit 0764db9b upstream.
      
      Alexander reported a circular lock dependency revealed by the mmap1 ltp
      test:
      
        LOCKDEP_CIRCULAR (suite: ltp, case: mtest06 (mmap1))
                WARNING: possible circular locking dependency detected
                5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Not tainted
                ------------------------------------------------------
                mmap1/202299 is trying to acquire lock:
                00000001892c0188 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x4a/0xe0
                but task is already holding lock:
                00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
                which lock already depends on the new lock.
                the existing dependency chain (in reverse order) is:
                -> #1 (&sighand->siglock){-.-.}-{2:2}:
                       __lock_acquire+0x604/0xbd8
                       lock_acquire.part.0+0xe2/0x238
                       lock_acquire+0xb0/0x200
                       _raw_spin_lock_irqsave+0x6a/0xd8
                       __lock_task_sighand+0x90/0x190
                       cgroup_freeze_task+0x2e/0x90
                       cgroup_migrate_execute+0x11c/0x608
                       cgroup_update_dfl_csses+0x246/0x270
                       cgroup_subtree_control_write+0x238/0x518
                       kernfs_fop_write_iter+0x13e/0x1e0
                       new_sync_write+0x100/0x190
                       vfs_write+0x22c/0x2d8
                       ksys_write+0x6c/0xf8
                       __do_syscall+0x1da/0x208
                       system_call+0x82/0xb0
                -> #0 (css_set_lock){..-.}-{2:2}:
                       check_prev_add+0xe0/0xed8
                       validate_chain+0x736/0xb20
                       __lock_acquire+0x604/0xbd8
                       lock_acquire.part.0+0xe2/0x238
                       lock_acquire+0xb0/0x200
                       _raw_spin_lock_irqsave+0x6a/0xd8
                       obj_cgroup_release+0x4a/0xe0
                       percpu_ref_put_many.constprop.0+0x150/0x168
                       drain_obj_stock+0x94/0xe8
                       refill_obj_stock+0x94/0x278
                       obj_cgroup_charge+0x164/0x1d8
                       kmem_cache_alloc+0xac/0x528
                       __sigqueue_alloc+0x150/0x308
                       __send_signal+0x260/0x550
                       send_signal+0x7e/0x348
                       force_sig_info_to_task+0x104/0x180
                       force_sig_fault+0x48/0x58
                       __do_pgm_check+0x120/0x1f0
                       pgm_check_handler+0x11e/0x180
                other info that might help us debug this:
                 Possible unsafe locking scenario:
                       CPU0                    CPU1
                       ----                    ----
                  lock(&sighand->siglock);
                                               lock(css_set_lock);
                                               lock(&sighand->siglock);
                  lock(css_set_lock);
                 *** DEADLOCK ***
                2 locks held by mmap1/202299:
                 #0: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
                 #1: 00000001892ad560 (rcu_read_lock){....}-{1:2}, at: percpu_ref_put_many.constprop.0+0x0/0x168
                stack backtrace:
                CPU: 15 PID: 202299 Comm: mmap1 Not tainted 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1
                Hardware name: IBM 3906 M04 704 (LPAR)
                Call Trace:
                  dump_stack_lvl+0x76/0x98
                  check_noncircular+0x136/0x158
                  check_prev_add+0xe0/0xed8
                  validate_chain+0x736/0xb20
                  __lock_acquire+0x604/0xbd8
                  lock_acquire.part.0+0xe2/0x238
                  lock_acquire+0xb0/0x200
                  _raw_spin_lock_irqsave+0x6a/0xd8
                  obj_cgroup_release+0x4a/0xe0
                  percpu_ref_put_many.constprop.0+0x150/0x168
                  drain_obj_stock+0x94/0xe8
                  refill_obj_stock+0x94/0x278
                  obj_cgroup_charge+0x164/0x1d8
                  kmem_cache_alloc+0xac/0x528
                  __sigqueue_alloc+0x150/0x308
                  __send_signal+0x260/0x550
                  send_signal+0x7e/0x348
                  force_sig_info_to_task+0x104/0x180
                  force_sig_fault+0x48/0x58
                  __do_pgm_check+0x120/0x1f0
                  pgm_check_handler+0x11e/0x180
                INFO: lockdep is turned off.
      
      In this example a slab allocation from __send_signal() caused a
      refilling and draining of a percpu objcg stock, resulted in a releasing
      of another non-related objcg.  Objcg release path requires taking the
      css_set_lock, which is used to synchronize objcg lists.
      
      This can create a circular dependency with the sighandler lock, which is
      taken with the locked css_set_lock by the freezer code (to freeze a
      task).
      
      In general it seems that using css_set_lock to synchronize objcg lists
      makes any slab allocations and deallocation with the locked css_set_lock
      and any intervened locks risky.
      
      To fix the problem and make the code more robust let's stop using
      css_set_lock to synchronize objcg lists and use a new dedicated spinlock
      instead.
      
      Link: https://lkml.kernel.org/r/Yfm1IHmoGdyUR81T@carbon.dhcp.thefacebook.com
      Fixes: bf4f0599
      
       ("mm: memcg/slab: obj_cgroup API")
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Reported-by: default avatarAlexander Egorenkov <egorenar@linux.ibm.com>
      Tested-by: default avatarAlexander Egorenkov <egorenar@linux.ibm.com>
      Reviewed-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Reviewed-by: default avatarJeremy Linton <jeremy.linton@arm.com>
      Tested-by: default avatarJeremy Linton <jeremy.linton@arm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c838597
    • Ben Skeggs's avatar
  2. Feb 16, 2022