Skip to content
  1. Jan 25, 2021
    • Christoph Hellwig's avatar
      nvme: allow revalidate to set a namespace read-only · d11cd289
      Christoph Hellwig authored
      
      
      Unconditionally call set_disk_ro now that it only updates the hardware
      state.  This allows to properly set up the Linux devices read-only when
      the controller turns a previously writable namespace read-only.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d11cd289
    • Christoph Hellwig's avatar
      rbd: remove the ->set_read_only method · cbf72cce
      Christoph Hellwig authored
      
      
      Now that the hardware read-only state can't be changed by the BLKROSET
      ioctl, the code in this method is not required anymore.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Acked-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cbf72cce
    • Christoph Hellwig's avatar
      block: propagate BLKROSET on the whole device to all partitions · 947139bf
      Christoph Hellwig authored
      
      
      Change the policy so that a BLKROSET on the whole device also affects
      partitions.  To quote Martin K. Petersen:
      
      It's very common for database folks to twiddle the read-only state of
      block devices and partitions. I know that our users will find it very
      counter-intuitive that setting /dev/sda read-only won't prevent writes
      to /dev/sda1.
      
      The existing behavior is inconsistent in the sense that doing:
      
        # blockdev --setro /dev/sda
        # echo foo > /dev/sda1
      
      permits writes. But:
      
        # blockdev --setro /dev/sda
        <something triggers revalidate>
        # echo foo > /dev/sda1
      
      doesn't.
      
      And a subsequent:
      
        # blockdev --setrw /dev/sda
        # echo foo > /dev/sda1
      
      doesn't work either since sda1's read-only policy has been inherited
      from the whole-disk device.
      
      You need to do:
      
        # blockdev --rereadpt
      
      after setting the whole-disk device rw to effectuate the same change on
      the partitions, otherwise they are stuck being read-only indefinitely.
      
      However, setting the read-only policy on a partition does *not* require
      the revalidate step. As a matter of fact, doing the revalidate will blow
      away the policy setting you just made.
      
      So the user needs to take different actions depending on whether they
      are trying to read-protect a whole-disk device or a partition. Despite
      using the same ioctl. That is really confusing.
      
      I have lost count how many times our customers have had data clobbered
      because of ambiguity of the existing whole-disk device policy. The
      current behavior violates the principle of least surprise by letting the
      user think they write protected the whole disk when they actually
      didn't.
      
      Suggested-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      947139bf
    • Christoph Hellwig's avatar
      block: add a hard-readonly flag to struct gendisk · 52f019d4
      Christoph Hellwig authored
      Commit 20bd1d02
      
       ("scsi: sd: Keep disk read-only when re-reading
      partition") addressed a long-standing problem with user read-only
      policy being overridden as a result of a device-initiated revalidate.
      The commit has since been reverted due to a regression that left some
      USB devices read-only indefinitely.
      
      To fix the underlying problems with revalidate we need to keep track
      of hardware state and user policy separately.
      
      The gendisk has been updated to reflect the current hardware state set
      by the device driver. This is done to allow returning the device to
      the hardware state once the user clears the BLKROSET flag.
      
      The resulting semantics are as follows:
      
       - If BLKROSET sets a given partition read-only, that partition will
         remain read-only even if the underlying storage stack initiates a
         revalidate. However, the BLKRRPART ioctl will cause the partition
         table to be dropped and any user policy on partitions will be lost.
      
       - If BLKROSET has not been set, both the whole disk device and any
         partitions will reflect the current write-protect state of the
         underlying device.
      
      Based on a patch from Martin K. Petersen <martin.petersen@oracle.com>.
      
      Reported-by: default avatarOleksii Kurochko <olkuroch@cisco.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201221
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      52f019d4
    • Christoph Hellwig's avatar
      block: remove the NULL bdev check in bdev_read_only · 6f0d9689
      Christoph Hellwig authored
      
      
      Only a single caller can end up in bdev_read_only, so move the check
      there.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6f0d9689
    • Christoph Hellwig's avatar
      dm: use bdev_read_only to check if a device is read-only · 1e0dcca9
      Christoph Hellwig authored
      
      
      dm-thin and dm-cache also work on partitions, so use the proper
      interface to check if the device is read-only.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1e0dcca9
    • Linus Torvalds's avatar
      Linux 5.11-rc5 · 6ee1d745
      Linus Torvalds authored
      v5.11-rc5
      6ee1d745
    • Linus Torvalds's avatar
      Merge tag 'sh-for-5.11' of git://git.libc.org/linux-sh · 228a65d4
      Linus Torvalds authored
      Pull arch/sh updates from Rich Felker:
       "Cleanup and warning fixes"
      
      * tag 'sh-for-5.11' of git://git.libc.org/linux-sh:
        sh/intc: Restore devm_ioremap() alignment
        sh: mach-sh03: remove duplicate include
        arch: sh: remove duplicate include
        sh: Drop ARCH_NR_GPIOS definition
        sh: Remove unused HAVE_COPY_THREAD_TLS macro
        sh: remove CONFIG_IDE from most defconfig
        sh: mm: Convert to DEFINE_SHOW_ATTRIBUTE
        sh: intc: Convert to DEFINE_SHOW_ATTRIBUTE
        arch/sh: hyphenate Non-Uniform in Kconfig prompt
        sh: dma: fix kconfig dependency for G2_DMA
      228a65d4
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.11-2021-01-24' of git://git.kernel.dk/linux-block · ef7b1a0e
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "Still need a final cancelation fix that isn't quite done done,
        expected in the next day or two. That said, this contains:
      
         - Wakeup fix for IOPOLL requests
      
         - SQPOLL split close op handling fix
      
         - Ensure that any use of io_uring fd itself is marked as inflight
      
         - Short non-regular file read fix (Pavel)
      
         - Fix up bad false positive warning (Pavel)
      
         - SQPOLL fixes (Pavel)
      
         - In-flight removal fix (Pavel)"
      
      * tag 'io_uring-5.11-2021-01-24' of git://git.kernel.dk/linux-block:
        io_uring: account io_uring internal files as REQ_F_INFLIGHT
        io_uring: fix sleeping under spin in __io_clean_op
        io_uring: fix short read retries for non-reg files
        io_uring: fix SQPOLL IORING_OP_CLOSE cancelation state
        io_uring: fix skipping disabling sqo on exec
        io_uring: fix uring_flush in exit_files() warning
        io_uring: fix false positive sqo warning on flush
        io_uring: iopoll requests should also wake task ->in_idle state
      ef7b1a0e
    • Linus Torvalds's avatar
      Merge tag 'block-5.11-2021-01-24' of git://git.kernel.dk/linux-block · a692a610
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - NVMe pull request from Christoph:
            - fix a status code in nvmet (Chaitanya Kulkarni)
            - avoid double completions in nvme-rdma/nvme-tcp (Chao Leng)
            - fix the CMB support to cope with NVMe 1.4 controllers (Klaus Jensen)
            - fix PRINFO handling in the passthrough ioctl (Revanth Rajashekar)
            - fix a double DMA unmap in nvme-pci
      
       - lightnvm error path leak fix (Pan)
      
       - MD pull request from Song:
            - Flush request fix (Xiao)
      
      * tag 'block-5.11-2021-01-24' of git://git.kernel.dk/linux-block:
        lightnvm: fix memory leak when submit fails
        nvme-pci: fix error unwind in nvme_map_data
        nvme-pci: refactor nvme_unmap_data
        md: Set prev_flush_start and flush_bio in an atomic way
        nvmet: set right status on error in id-ns handler
        nvme-pci: allow use of cmb on v1.4 controllers
        nvme-tcp: avoid request double completion for concurrent nvme_tcp_timeout
        nvme-rdma: avoid request double completion for concurrent nvme_rdma_timeout
        nvme: check the PRINFO bit before deciding the host buffer length
      a692a610
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 51306806
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "18 patches.
      
        Subsystems affected by this patch series: mm (pagealloc, memcg, kasan,
        memory-failure, and highmem), ubsan, proc, and MAINTAINERS"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        MAINTAINERS: add a couple more files to the Clang/LLVM section
        proc_sysctl: fix oops caused by incorrect command parameters
        powerpc/mm/highmem: use __set_pte_at() for kmap_local()
        mips/mm/highmem: use set_pte() for kmap_local()
        mm/highmem: prepare for overriding set_pte_at()
        sparc/mm/highmem: flush cache and TLB
        mm: fix page reference leak in soft_offline_page()
        ubsan: disable unsigned-overflow check for i386
        kasan, mm: fix resetting page_alloc tags for HW_TAGS
        kasan, mm: fix conflicts with init_on_alloc/free
        kasan: fix HW_TAGS boot parameters
        kasan: fix incorrect arguments passing in kasan_add_zero_shadow
        kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow
        mm: fix numa stats for thp migration
        mm: memcg: fix memcg file_dirty numa stat
        mm: memcg/slab: optimize objcg stock draining
        mm: fix initialization of struct page for holes in memory layout
        x86/setup: don't remove E820_TYPE_RAM for pfn 0
      51306806
    • Linus Torvalds's avatar
      Merge tag 'char-misc-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · fdbc80bd
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are some small char/misc driver fixes for 5.11-rc5:
      
         - habanalabs driver fixes
      
         - phy driver fixes
      
         - hwtracing driver fixes
      
         - rtsx cardreader driver fix
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'char-misc-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        misc: rtsx: init value of aspm_enabled
        habanalabs: disable FW events on device removal
        habanalabs: fix backward compatibility of idle check
        habanalabs: zero pci counters packet before submit to FW
        intel_th: pci: Add Alder Lake-P support
        stm class: Fix module init return on allocation failure
        habanalabs: prevent soft lockup during unmap
        habanalabs: fix reset process in case of failures
        habanalabs: fix dma_addr passed to dma_mmap_coherent
        phy: mediatek: allow compile-testing the dsi phy
        phy: cpcap-usb: Fix warning for missing regulator_disable
        PHY: Ingenic: fix unconditional build of phy-ingenic-usb
      fdbc80bd
    • Linus Torvalds's avatar
      Merge tag 'driver-core-5.11-rc5' of... · 443d1129
      Linus Torvalds authored
      Merge tag 'driver-core-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core fixes from Greg KH:
       "Here are some small driver core fixes for 5.11-rc5 that resolve some
        reported problems:
      
         - revert of a -rc1 patch that was causing problems with some machines
      
         - device link device name collision problem fix (busses only have to
           name devices unique to their bus, not unique to all busses)
      
         - kernfs splice bugfixes to resolve firmware loading problems for
           Qualcomm systems.
      
         - other tiny driver core fixes for minor issues reported.
      
        All of these have been in linux-next with no reported problems"
      
      * tag 'driver-core-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        driver core: Fix device link device name collision
        driver core: Extend device_is_dependent()
        kernfs: wire up ->splice_read and ->splice_write
        kernfs: implement ->write_iter
        kernfs: implement ->read_iter
        Revert "driver core: Reorder devices on successful probe"
        Driver core: platform: Add extra error check in devm_platform_get_irqs_affinity()
        drivers core: Free dma_range_map when driver probe failed
      443d1129
    • Linus Torvalds's avatar
      Merge tag 'staging-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 832bceef
      Linus Torvalds authored
      Pull staging/IIO driver fixes from Greg KH:
       "Here are some IIO driver fixes for 5.11-rc5 to resolve some reported
        problems.
      
        Nothing major, just a few small fixes, all of these have been in
        linux-next for a while and full details are in the shortlog"
      
      * tag 'staging-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        iio: sx9310: Fix semtech,avg-pos-strength setting when > 16
        iio: common: st_sensors: fix possible infinite loop in st_sensors_irq_thread
        iio: ad5504: Fix setting power-down state
        counter:ti-eqep: remove floor
        drivers: iio: temperature: Add delay after the addressed reset command in mlx90632.c
        iio: adc: ti_am335x_adc: remove omitted iio_kfifo_free()
        dt-bindings: iio: accel: bma255: Fix bmc150/bmi055 compatible
        iio: sx9310: Off by one in sx9310_read_thresh()
      832bceef
    • Linus Torvalds's avatar
      Merge tag 'tty-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 4da81fa2
      Linus Torvalds authored
      Pull tty/serial fixes from Greg KH:
       "Here are three small tty/serial fixes for 5.11-rc5 to resolve reported
        problems:
      
         - two patches to fix up writing to ttys with splice
      
         - mvebu-uart driver fix for reported problem
      
        All of these have been in linux-next with no reported problems"
      
      * tag 'tty-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        tty: fix up hung_up_tty_write() conversion
        tty: implement write_iter
        serial: mvebu-uart: fix tx lost characters at power off
      4da81fa2
    • Linus Torvalds's avatar
      Merge tag 'usb-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 8f3bfd21
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB driver fixes for 5.11-rc5.  They resolve:
      
         - xhci issues for some reported problems
      
         - ehci driver issue for one specific device
      
         - USB gadget fixes for some reported problems
      
         - cdns3 driver fixes for issues reported
      
         - MAINTAINERS file update
      
         - thunderbolt minor fix
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'usb-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: bdc: Make bdc pci driver depend on BROKEN
        xhci: tegra: Delay for disabling LFPS detector
        xhci: make sure TRB is fully written before giving it to the controller
        usb: udc: core: Use lock when write to soft_connect
        USB: gadget: dummy-hcd: Fix errors in port-reset handling
        usb: gadget: aspeed: fix stop dma register setting.
        USB: ehci: fix an interrupt calltrace error
        ehci: fix EHCI host controller initialization sequence
        MAINTAINERS: update Peter Chen's email address
        thunderbolt: Drop duplicated 0x prefix from format string
        MAINTAINERS: Update address for Cadence USB3 driver
        usb: cdns3: imx: improve driver .remove API
        usb: cdns3: imx: fix can't create core device the second time issue
        usb: cdns3: imx: fix writing read-only memory issue
      8f3bfd21
    • Nathan Chancellor's avatar
      MAINTAINERS: add a couple more files to the Clang/LLVM section · e82d891a
      Nathan Chancellor authored
      
      
      The K: entry should ensure that Nick and I always get CC'd on patches that
      touch these files but it is better to be explicit rather than implicit.
      
      Link: https://lkml.kernel.org/r/20210114004059.2129921-1-natechancellor@gmail.com
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e82d891a
    • Xiaoming Ni's avatar
      proc_sysctl: fix oops caused by incorrect command parameters · 697edcb0
      Xiaoming Ni authored
      The process_sysctl_arg() does not check whether val is empty before
      invoking strlen(val).  If the command line parameter () is incorrectly
      configured and val is empty, oops is triggered.
      
      For example:
        "hung_task_panic=1" is incorrectly written as "hung_task_panic", oops is
        triggered. The call stack is as follows:
          Kernel command line: .... hung_task_panic
          ......
          Call trace:
          __pi_strlen+0x10/0x98
          parse_args+0x278/0x344
          do_sysctl_args+0x8c/0xfc
          kernel_init+0x5c/0xf4
          ret_from_fork+0x10/0x30
      
      To fix it, check whether "val" is empty when "phram" is a sysctl field.
      Error codes are returned in the failure branch, and error logs are
      generated by parse_args().
      
      Link: https://lkml.kernel.org/r/20210118133029.28580-1-nixiaoming@huawei.com
      Fixes: 3db978d4
      
       ("kernel/sysctl: support setting sysctl parameters from kernel command line")
      Signed-off-by: default avatarXiaoming Ni <nixiaoming@huawei.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: <stable@vger.kernel.org>	[5.8+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      697edcb0
    • Thomas Gleixner's avatar
      powerpc/mm/highmem: use __set_pte_at() for kmap_local() · 78502582
      Thomas Gleixner authored
      The original PowerPC highmem mapping function used __set_pte_at() to
      denote that the mapping is per CPU.  This got lost with the conversion
      to the generic implementation.
      
      Override the default map function.
      
      Link: https://lkml.kernel.org/r/20210112170411.281464308@linutronix.de
      Fixes: 47da42b2
      
       ("powerpc/mm/highmem: Switch to generic kmap atomic")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Paul Cercueil <paul@crapouillou.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      78502582
    • Thomas Gleixner's avatar
      mips/mm/highmem: use set_pte() for kmap_local() · 8c0d5d78
      Thomas Gleixner authored
      set_pte_at() on MIPS invokes update_cache() which might recurse into
      kmap_local().
      
      Use set_pte() like the original MIPS highmem implementation did.
      
      Link: https://lkml.kernel.org/r/20210112170411.187513575@linutronix.de
      Fixes: a4c33e83
      
       ("mips/mm/highmem: Switch to generic kmap atomic")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reported-by: default avatarPaul Cercueil <paul@crapouillou.net>
      Reported-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Acked-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8c0d5d78
    • Thomas Gleixner's avatar
      mm/highmem: prepare for overriding set_pte_at() · a1dce7fd
      Thomas Gleixner authored
      
      
      The generic kmap_local() map function uses set_pte_at(), but MIPS requires
      set_pte() and PowerPC wants __set_pte_at().
      
      Provide arch_kmap_local_set_pte() and default it to set_pte_at().
      
      Link: https://lkml.kernel.org/r/20210112170411.056306194@linutronix.de
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Cercueil <paul@crapouillou.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a1dce7fd
    • Thomas Gleixner's avatar
      sparc/mm/highmem: flush cache and TLB · f99e0237
      Thomas Gleixner authored
      Patch series "mm/highmem: Fix fallout from generic kmap_local
      conversions".
      
      The kmap_local conversion wreckaged sparc, mips and powerpc as it missed
      some of the details in the original implementation.
      
      This patch (of 4):
      
      The recent conversion to the generic kmap_local infrastructure failed to
      assign the proper pre/post map/unmap flush operations for sparc.
      
      Sparc requires cache flush before map/unmap and tlb flush afterwards.
      
      Link: https://lkml.kernel.org/r/20210112170136.078559026@linutronix.de
      Link: https://lkml.kernel.org/r/20210112170410.905976187@linutronix.de
      Fixes: 3293efa9
      
       ("sparc/mm/highmem: Switch to generic kmap atomic")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reported-by: default avatarAndreas Larsson <andreas@gaisler.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Cercueil <paul@crapouillou.net>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f99e0237
    • Dan Williams's avatar
      mm: fix page reference leak in soft_offline_page() · dad4e5b3
      Dan Williams authored
      The conversion to move pfn_to_online_page() internal to
      soft_offline_page() missed that the get_user_pages() reference taken by
      the madvise() path needs to be dropped when pfn_to_online_page() fails.
      
      Note the direct sysfs-path to soft_offline_page() does not perform a
      get_user_pages() lookup.
      
      When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page()
      pfn the kernel hangs at dax-device shutdown due to a leaked reference.
      
      Link: https://lkml.kernel.org/r/161058501210.1840162.8108917599181157327.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: feec24a6
      
       ("mm, soft-offline: convert parameter to pfn")
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dad4e5b3
    • Arnd Bergmann's avatar
      ubsan: disable unsigned-overflow check for i386 · 251b5497
      Arnd Bergmann authored
      Building ubsan kernels even for compile-testing introduced these
      warnings in my randconfig environment:
      
        crypto/blake2b_generic.c:98:13: error: stack frame size of 9636 bytes in function 'blake2b_compress' [-Werror,-Wframe-larger-than=]
        static void blake2b_compress(struct blake2b_state *S,
      
        crypto/sha512_generic.c:151:13: error: stack frame size of 1292 bytes in function 'sha512_generic_block_fn' [-Werror,-Wframe-larger-than=]
        static void sha512_generic_block_fn(struct sha512_state *sst, u8 const *src,
      
        lib/crypto/curve25519-fiat32.c:312:22: error: stack frame size of 2180 bytes in function 'fe_mul_impl' [-Werror,-Wframe-larger-than=]
        static noinline void fe_mul_impl(u32 out[10], const u32 in1[10], const u32 in2[10])
      
        lib/crypto/curve25519-fiat32.c:444:22: error: stack frame size of 1588 bytes in function 'fe_sqr_impl' [-Werror,-Wframe-larger-than=]
        static noinline void fe_sqr_impl(u32 out[10], const u32 in1[10])
      
      Further testing showed that ...
      251b5497
    • Andrey Konovalov's avatar
      kasan, mm: fix resetting page_alloc tags for HW_TAGS · acb35b17
      Andrey Konovalov authored
      A previous commit added resetting KASAN page tags to
      kernel_init_free_pages() to avoid false-positives due to accesses to
      metadata with the hardware tag-based mode.
      
      That commit did reset page tags before the metadata access, but didn't
      restore them after.  As the result, KASAN fails to detect bad accesses
      to page_alloc allocations on some configurations.
      
      Fix this by recovering the tag after the metadata access.
      
      Link: https://lkml.kernel.org/r/02b5bcd692e912c27d484030f666b350ad7e4ae4.1611074450.git.andreyknvl@google.com
      Fixes: aa1ef4d7
      
       ("kasan, mm: reset tags when accessing metadata")
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      acb35b17
    • Andrey Konovalov's avatar
      kasan, mm: fix conflicts with init_on_alloc/free · ce5716c6
      Andrey Konovalov authored
      A few places where SLUB accesses object's data or metadata were missed
      in a previous patch.  This leads to false positives with hardware
      tag-based KASAN when bulk allocations are used with init_on_alloc/free.
      
      Fix the false-positives by resetting pointer tags during these accesses.
      
      (The kasan_reset_tag call is removed from slab_alloc_node, as it's added
       into maybe_wipe_obj_freeptr.)
      
      Link: https://linux-review.googlesource.com/id/I50dd32838a666e173fe06c3c5c766f2c36aae901
      Link: https://lkml.kernel.org/r/093428b5d2ca8b507f4a79f92f9929b35f7fada7.1610731872.git.andreyknvl@google.com
      Fixes: aa1ef4d7
      
       ("kasan, mm: reset tags when accessing metadata")
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ce5716c6
    • Andrey Konovalov's avatar
      kasan: fix HW_TAGS boot parameters · 76bc99e8
      Andrey Konovalov authored
      
      
      The initially proposed KASAN command line parameters are redundant.
      
      This change drops the complex "kasan.mode=off/prod/full" parameter and
      adds a simpler kill switch "kasan=off/on" instead.  The new parameter
      together with the already existing ones provides a cleaner way to
      express the same set of features.
      
      The full set of parameters with this change:
      
        kasan=off/on             - whether KASAN is enabled
        kasan.fault=report/panic - whether to only print a report or also panic
        kasan.stacktrace=off/on  - whether to collect alloc/free stack traces
      
      Default values:
      
        kasan=on
        kasan.fault=report
        kasan.stacktrace=on  (if CONFIG_DEBUG_KERNEL=y)
        kasan.stacktrace=off (otherwise)
      
      Link: https://linux-review.googlesource.com/id/Ib3694ed90b1e8ccac6cf77dfd301847af4aba7b8
      Link: https://lkml.kernel.org/r/4e9c4a4bdcadc168317deb2419144582a9be6e61.1610736745.git.andreyknvl@google.com
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: default avatarVincenzo Frascino <vincenzo.frascino@arm.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      76bc99e8
    • Lecopzer Chen's avatar
      kasan: fix incorrect arguments passing in kasan_add_zero_shadow · 5dabd171
      Lecopzer Chen authored
      kasan_remove_zero_shadow() shall use original virtual address, start and
      size, instead of shadow address.
      
      Link: https://lkml.kernel.org/r/20210103063847.5963-1-lecopzer@gmail.com
      Fixes: 0207df4f
      
       ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN")
      Signed-off-by: default avatarLecopzer Chen <lecopzer.chen@mediatek.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5dabd171
    • Lecopzer Chen's avatar
      kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow · a11a496e
      Lecopzer Chen authored
      During testing kasan_populate_early_shadow and kasan_remove_zero_shadow,
      if the shadow start and end address in kasan_remove_zero_shadow() is not
      aligned to PMD_SIZE, the remain unaligned PTE won't be removed.
      
      In the test case for kasan_remove_zero_shadow():
      
          shadow_start: 0xffffffb802000000, shadow end: 0xffffffbfbe000000
      
          3-level page table:
            PUD_SIZE: 0x40000000 PMD_SIZE: 0x200000 PAGE_SIZE: 4K
      
      0xffffffbf80000000 ~ 0xffffffbfbdf80000 will not be removed because in
      kasan_remove_pud_table(), kasan_pmd_table(*pud) is true but the next
      address is 0xffffffbfbdf80000 which is not aligned to PUD_SIZE.
      
      In the correct condition, this should fallback to the next level
      kasan_remove_pmd_table() but the condition flow always continue to skip
      the unaligned part.
      
      Fix by correcting the condition when next and addr are neither aligned.
      
      Link: https://lkml.kernel.org/r/20210103135621.83129-1-lecopzer@gmail.com
      Fixes: 0207df4f
      
       ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN")
      Signed-off-by: default avatarLecopzer Chen <lecopzer.chen@mediatek.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: YJ Chiang <yj.chiang@mediatek.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a11a496e
    • Linus Torvalds's avatar
      Merge tag 'irq_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e6806137
      Linus Torvalds authored
      Pull irq fixes from Borislav Petkov:
      
       - Fix a kernel panic in mips-cpu due to invalid irq domain hierarchy.
      
       - Fix to not lose IPIs on bcm2836.
      
       - Fix for a bogus marking of ITS devices as shared due to unitialized
         stack variable.
      
       - Clear a phantom interrupt on qcom-pdc to unblock suspend.
      
       - Small cleanups, warning and build fixes.
      
      * tag 'irq_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Export irq_check_status_bit()
        irqchip/mips-cpu: Set IPI domain parent chip
        irqchip/pruss: Simplify the TI_PRUSS_INTC Kconfig
        irqchip/loongson-liointc: Fix build warnings
        driver core: platform: Add extra error check in devm_platform_get_irqs_affinity()
        irqchip/bcm2836: Fix IPI acknowledgement after conversion to handle_percpu_devid_irq
        irqchip/irq-sl28cpld: Convert comma to semicolon
        genirq/msi: Initialize msi_alloc_info before calling msi_domain_prepare_irqs()
      e6806137
    • Linus Torvalds's avatar
      Merge tag 'objtool_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 32d43270
      Linus Torvalds authored
      Pull objtool fixes from Borislav Petkov:
      
       - Adjust objtool to handle a recent binutils change to not generate
         unused symbols anymore.
      
       - Revert the fail-the-build-on-fatal-errors objtool strategy for now
         due to the ever-increasing matrix of supported toolchains/plugins and
         them causing too many such fatal errors currently.
      
       - Do not add empty symbols to objdump's rbtree to accommodate clang
         removing section symbols.
      
      * tag 'objtool_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        objtool: Don't fail on missing symbol table
        objtool: Don't fail the kernel build on fatal errors
        objtool: Don't add empty symbols to the rbtree
      32d43270
    • Linus Torvalds's avatar
      Merge tag 'sched_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 24c56ee0
      Linus Torvalds authored
      Pull scheduler fixes from Borislav Petkov:
      
       - Correct the marking of kthreads which are supposed to run on a
         specific, single CPU vs such which are affine to only one CPU, mark
         per-cpu workqueue threads as such and make sure that marking
         "survives" CPU hotplug. Fix CPU hotplug issues with such kthreads.
      
       - A fix to not push away tasks on CPUs coming online.
      
       - Have workqueue CPU hotplug code use cpu_possible_mask when breaking
         affinity on CPU offlining so that pending workers can finish on newly
         arrived onlined CPUs too.
      
       - Dump tasks which haven't vacated a CPU which is currently being
         unplugged.
      
       - Register a special scale invariance callback which gets called on
         resume from RAM to read out APERF/MPERF after resume and thus make
         the schedutil scaling governor more precise.
      
      * tag 'sched_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched: Relax the set_cpus_allowed_ptr() semantics
        sched: Fix CPU hotplug / tighten is_per_cpu_kthread()
        sched: Prepare to use balance_push in ttwu()
        workqueue: Restrict affinity change to rescuer
        workqueue: Tag bound workers with KTHREAD_IS_PER_CPU
        kthread: Extract KTHREAD_IS_PER_CPU
        sched: Don't run cpu-online with balance_push() enabled
        workqueue: Use cpu_possible_mask instead of cpu_active_mask to break affinity
        sched/core: Print out straggler tasks in sched_cpu_dying()
        x86: PM: Register syscore_ops for scale invariance
      24c56ee0
    • Linus Torvalds's avatar
      Merge tag 'timers_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 025929f4
      Linus Torvalds authored
      Pull timer fixes from Borislav Petkov:
      
       - Fix an integer overflow in the NTP RTC synchronization which led to
         the latter happening every 2 seconds instead of the intended every 11
         minutes.
      
       - Get rid of now unused get_seconds().
      
      * tag 'timers_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        ntp: Fix RTC synchronization on 32-bit platforms
        timekeeping: Remove unused get_seconds()
      025929f4
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 17b6c49d
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
      
       - Add a new Intel model number for Alder Lake
      
       - Differentiate which aspects of the FPU state get saved/restored when
         the FPU is used in-kernel and fix a boot crash on K7 due to early
         MXCSR access before CR4.OSFXSR is even set.
      
       - A couple of noinstr annotation fixes
      
       - Correct die ID setting on AMD for users of topology information which
         need the correct die ID
      
       - A SEV-ES fix to handle string port IO to/from kernel memory properly
      
      * tag 'x86_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/cpu: Add another Alder Lake CPU to the Intel family
        x86/mmx: Use KFPU_387 for MMX string operations
        x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
        x86/topology: Make __max_die_per_package available unconditionally
        x86: __always_inline __{rd,wr}msr()
        x86/mce: Remove explicit/superfluous tracing
        locking/lockdep: Avoid noinstr warning for DEBUG_LOCKDEP
        locking/lockdep: Cure noinstr fail
        x86/sev: Fix nonistr violation
        x86/entry: Fix noinstr fail
        x86/cpu/amd: Set __max_die_per_package on AMD
        x86/sev-es: Handle string port IO to kernel memory properly
      17b6c49d
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 14c50a66
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - Fix a bad interaction between the scv handling and the fallback L1D
         flush, which could lead to user register corruption. Only affects
         people using scv (~no one) on machines with old firmware that are
         missing the L1D flush.
      
       - Two small selftest fixes.
      
      Thanks to Eirik Fuller, Libor Pechacek, Nicholas Piggin, Sandipan Das,
      and Tulio Magno Quites Machado Filho.
      
      * tag 'powerpc-5.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/64s: fix scv entry fallback flush vs interrupt
        selftests/powerpc: Only test lwm/stmw on big endian
        selftests/powerpc: Fix exit status of pkey tests
      14c50a66
    • Linus Torvalds's avatar
      Merge tag 'for-linus-2021-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · c509ce23
      Linus Torvalds authored
      Pull misc fixes from Christian Brauner:
      
       - Jann reported sparse complaints because of a missing __user
         annotation in a helper we added way back when we added
         pidfd_send_signal() to avoid compat syscall handling. Fix it.
      
       - Yanfei replaces a reference in a comment to the _do_fork() helper I
         removed a while ago with a reference to the new kernel_clone()
         replacement
      
       - Alexander Guril added a simple coding style fix
      
      * tag 'for-linus-2021-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        kthread: remove comments about old _do_fork() helper
        Kernel: fork.c: Fix coding style: Do not use {} around single-line statements
        signal: Add missing __user annotation to copy_siginfo_from_user_any
      c509ce23
    • Linus Torvalds's avatar
      Merge tag '5.11-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6 · 4dcd3bcc
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "An important signal handling patch for stable, and two small cleanup
        patches"
      
      * tag '5.11-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: do not fail __smb_send_rqst if non-fatal signals are pending
        fs/cifs: Simplify bool comparison.
        fs/cifs: Assign boolean values to a bool variable
      4dcd3bcc
    • Shakeel Butt's avatar
      mm: fix numa stats for thp migration · 5c447d27
      Shakeel Butt authored
      Currently the kernel is not correctly updating the numa stats for
      NR_FILE_PAGES and NR_SHMEM on THP migration.  Fix that.
      
      For NR_FILE_DIRTY and NR_ZONE_WRITE_PENDING, although at the moment
      there is no need to handle THP migration as kernel still does not have
      write support for file THP but to be more future proof, this patch adds
      the THP support for those stats as well.
      
      Link: https://lkml.kernel.org/r/20210108155813.2914586-2-shakeelb@google.com
      Fixes: e71769ae
      
       ("mm: enable thp migration for shmem thp")
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5c447d27
    • Shakeel Butt's avatar
      mm: memcg: fix memcg file_dirty numa stat · 8a8792f6
      Shakeel Butt authored
      The kernel updates the per-node NR_FILE_DIRTY stats on page migration
      but not the memcg numa stats.
      
      That was not an issue until recently the commit 5f9a4f4a ("mm:
      memcontrol: add the missing numa_stat interface for cgroup v2") exposed
      numa stats for the memcg.
      
      So fix the file_dirty per-memcg numa stat.
      
      Link: https://lkml.kernel.org/r/20210108155813.2914586-1-shakeelb@google.com
      Fixes: 5f9a4f4a
      
       ("mm: memcontrol: add the missing numa_stat interface for cgroup v2")
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Acked-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8a8792f6
    • Roman Gushchin's avatar
      mm: memcg/slab: optimize objcg stock draining · 3de7d4f2
      Roman Gushchin authored
      Imran Khan reported a 16% regression in hackbench results caused by the
      commit f2fe7b09 ("mm: memcg/slab: charge individual slab objects
      instead of pages").  The regression is noticeable in the case of a
      consequent allocation of several relatively large slab objects, e.g.
      skb's.  As soon as the amount of stocked bytes exceeds PAGE_SIZE,
      drain_obj_stock() and __memcg_kmem_uncharge() are called, and it leads
      to a number of atomic operations in page_counter_uncharge().
      
      The corresponding call graph is below (provided by Imran Khan):
      
        |__alloc_skb
        |    |
        |    |__kmalloc_reserve.isra.61
        |    |    |
        |    |    |__kmalloc_node_track_caller
        |    |    |    |
        |    |    |    |slab_pre_alloc_hook.constprop.88
        |    |    |     obj_cgroup_charge
        |    |    |    |    |
        |    |    |    |    |__memcg_kmem_charge
        |    |    |    |    |    |
        |    |    |    |    |    |page_counter_try_charge
        |    |    |    |    |
        |    |    |    |    |refill_obj_stock
        |    |    |    |    |    |
        |    |    |    |    |    |drain_obj_stock.isra.68
        |    |    |    |    |    |    |
        |    |    |    |    |    |    |__memcg_kmem_uncharge
        |    |    |    |    |    |    |    |
        |    |    |    |    |    |    |    |page_counter_uncharge
        |    |    |    |    |    |    |    |    |
        |    |    |    |    |    |    |    |    |page_counter_cancel
        |    |    |    |
        |    |    |    |
        |    |    |    |__slab_alloc
        |    |    |    |    |
        |    |    |    |    |___slab_alloc
        |    |    |    |    |
        |    |    |    |slab_post_alloc_hook
      
      Instead of directly uncharging the accounted kernel memory, it's
      possible to refill the generic page-sized per-cpu stock instead.  It's a
      much faster operation, especially on a default hierarchy.  As a bonus,
      __memcg_kmem_uncharge_page() will also get faster, so the freeing of
      page-sized kernel allocations (e.g.  large kmallocs) will become faster.
      
      A similar change has been done earlier for the socket memory by the
      commit 475d0487 ("mm: memcontrol: use per-cpu stocks for socket
      memory uncharging").
      
      Link: https://lkml.kernel.org/r/20210106042239.2860107-1-guro@fb.com
      Fixes: f2fe7b09
      
       ("mm: memcg/slab: charge individual slab objects instead of pages")
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Reported-by: default avatarImran Khan <imran.f.khan@oracle.com>
      Tested-by: default avatarImran Khan <imran.f.khan@oracle.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Reviewed-by: default avatarMichal Koutn <mkoutny@suse.com>
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3de7d4f2