Skip to content
  1. Feb 06, 2021
    • Lai Jiangshan's avatar
      x86/debug: Prevent data breakpoints on __per_cpu_offset · c4bed4b9
      Lai Jiangshan authored
      When FSGSBASE is enabled, paranoid_entry() fetches the per-CPU GSBASE value
      via __per_cpu_offset or pcpu_unit_offsets.
      
      When a data breakpoint is set on __per_cpu_offset[cpu] (read-write
      operation), the specific CPU will be stuck in an infinite #DB loop.
      
      RCU will try to send an NMI to the specific CPU, but it is not working
      either since NMI also relies on paranoid_entry(). Which means it's
      undebuggable.
      
      Fixes: eaad9812
      
      ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
      Signed-off-by: default avatarLai Jiangshan <laijs@linux.alibaba.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20210204152708.21308-1-jiangshanlai@gmail.com
      c4bed4b9
  2. Feb 05, 2021
    • Dave Hansen's avatar
      x86/apic: Add extra serialization for non-serializing MSRs · 25a068b8
      Dave Hansen authored
      
      
      Jan Kiszka reported that the x2apic_wrmsr_fence() function uses a plain
      MFENCE while the Intel SDM (10.12.3 MSR Access in x2APIC Mode) calls for
      MFENCE; LFENCE.
      
      Short summary: we have special MSRs that have weaker ordering than all
      the rest. Add fencing consistent with current SDM recommendations.
      
      This is not known to cause any issues in practice, only in theory.
      
      Longer story below:
      
      The reason the kernel uses a different semantic is that the SDM changed
      (roughly in late 2017). The SDM changed because folks at Intel were
      auditing all of the recommended fences in the SDM and realized that the
      x2apic fences were insufficient.
      
      Why was the pain MFENCE judged insufficient?
      
      WRMSR itself is normally a serializing instruction. No fences are needed
      because the instruction itself serializes everything.
      
      But, there are explicit exceptions for this serializing behavior written
      into the WRMSR instruction documentation for two classes of MSRs:
      IA32_TSC_DEADLINE and the X2APIC MSRs.
      
      Back to x2apic: WRMSR is *not* serializing in this specific case.
      But why is MFENCE insufficient? MFENCE makes writes visible, but
      only affects load/store instructions. WRMSR is unfortunately not a
      load/store instruction and is unaffected by MFENCE. This means that a
      non-serializing WRMSR could be reordered by the CPU to execute before
      the writes made visible by the MFENCE have even occurred in the first
      place.
      
      This means that an x2apic IPI could theoretically be triggered before
      there is any (visible) data to process.
      
      Does this affect anything in practice? I honestly don't know. It seems
      quite possible that by the time an interrupt gets to consume the (not
      yet) MFENCE'd data, it has become visible, mostly by accident.
      
      To be safe, add the SDM-recommended fences for all x2apic WRMSRs.
      
      This also leaves open the question of the _other_ weakly-ordered WRMSR:
      MSR_IA32_TSC_DEADLINE. While it has the same ordering architecture as
      the x2APIC MSRs, it seems substantially less likely to be a problem in
      practice. While writes to the in-memory Local Vector Table (LVT) might
      theoretically be reordered with respect to a weakly-ordered WRMSR like
      TSC_DEADLINE, the SDM has this to say:
      
        In x2APIC mode, the WRMSR instruction is used to write to the LVT
        entry. The processor ensures the ordering of this write and any
        subsequent WRMSR to the deadline; no fencing is required.
      
      But, that might still leave xAPIC exposed. The safest thing to do for
      now is to add the extra, recommended LFENCE.
      
       [ bp: Massage commit message, fix typos, drop accidentally added
         newline to tools/arch/x86/include/asm/barrier.h. ]
      
      Reported-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20200305174708.F77040DD@viggo.jf.intel.com
      25a068b8
  3. Feb 03, 2021
  4. Feb 02, 2021
  5. Feb 01, 2021
    • Peter Zijlstra's avatar
      x86/debug: Fix DR6 handling · 9ad22e16
      Peter Zijlstra authored
      Tom reported that one of the GDB test-cases failed, and Boris bisected
      it to commit:
      
        d53d9bc0 ("x86/debug: Change thread.debugreg6 to thread.virtual_dr6")
      
      The debugging session led us to commit:
      
        6c0aca28 ("x86: Ignore trap bits on single step exceptions")
      
      It turns out that TF and data breakpoints are both traps and will be
      merged, while instruction breakpoints are faults and will not be merged.
      This means 6c0aca28 is wrong, only TF and instruction breakpoints
      need to be excluded while TF and data breakpoints can be merged.
      
       [ bp: Massage commit message. ]
      
      Fixes: d53d9bc0 ("x86/debug: Change thread.debugreg6 to thread.virtual_dr6")
      Fixes: 6c0aca28
      
       ("x86: Ignore trap bits on single step exceptions")
      Reported-by: default avatarTom de Vries <tdevries@suse.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/YBMAbQGACujjfz%2Bi@hirez.programming.kicks-ass.net
      Link: https://lkml.kernel.org/r/20210128211627.GB4348@worktop.programming.kicks-ass.net
      9ad22e16
  6. Jan 30, 2021
    • Josh Poimboeuf's avatar
      x86/build: Disable CET instrumentation in the kernel · 20bf2b37
      Josh Poimboeuf authored
      With retpolines disabled, some configurations of GCC, and specifically
      the GCC versions 9 and 10 in Ubuntu will add Intel CET instrumentation
      to the kernel by default. That breaks certain tracing scenarios by
      adding a superfluous ENDBR64 instruction before the fentry call, for
      functions which can be called indirectly.
      
      CET instrumentation isn't currently necessary in the kernel, as CET is
      only supported in user space. Disable it unconditionally and move it
      into the x86's Makefile as CET/CFI... enablement should be a per-arch
      decision anyway.
      
       [ bp: Massage and extend commit message. ]
      
      Fixes: 29be86d7
      
       ("kbuild: add -fcf-protection=none when using retpoline flags")
      Reported-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Tested-by: default avatarNikolay Borisov <nborisov@suse.com>
      Cc: <stable@vger.kernel.org>
      Cc: Seth Forshee <seth.forshee@canonical.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Link: https://lkml.kernel.org/r/20210128215219.6kct3h2eiustncws@treble
      20bf2b37
  7. Jan 25, 2021
    • Linus Torvalds's avatar
      Linux 5.11-rc5 · 6ee1d745
      Linus Torvalds authored
      6ee1d745
    • Linus Torvalds's avatar
      Merge tag 'sh-for-5.11' of git://git.libc.org/linux-sh · 228a65d4
      Linus Torvalds authored
      Pull arch/sh updates from Rich Felker:
       "Cleanup and warning fixes"
      
      * tag 'sh-for-5.11' of git://git.libc.org/linux-sh:
        sh/intc: Restore devm_ioremap() alignment
        sh: mach-sh03: remove duplicate include
        arch: sh: remove duplicate include
        sh: Drop ARCH_NR_GPIOS definition
        sh: Remove unused HAVE_COPY_THREAD_TLS macro
        sh: remove CONFIG_IDE from most defconfig
        sh: mm: Convert to DEFINE_SHOW_ATTRIBUTE
        sh: intc: Convert to DEFINE_SHOW_ATTRIBUTE
        arch/sh: hyphenate Non-Uniform in Kconfig prompt
        sh: dma: fix kconfig dependency for G2_DMA
      228a65d4
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.11-2021-01-24' of git://git.kernel.dk/linux-block · ef7b1a0e
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "Still need a final cancelation fix that isn't quite done done,
        expected in the next day or two. That said, this contains:
      
         - Wakeup fix for IOPOLL requests
      
         - SQPOLL split close op handling fix
      
         - Ensure that any use of io_uring fd itself is marked as inflight
      
         - Short non-regular file read fix (Pavel)
      
         - Fix up bad false positive warning (Pavel)
      
         - SQPOLL fixes (Pavel)
      
         - In-flight removal fix (Pavel)"
      
      * tag 'io_uring-5.11-2021-01-24' of git://git.kernel.dk/linux-block:
        io_uring: account io_uring internal files as REQ_F_INFLIGHT
        io_uring: fix sleeping under spin in __io_clean_op
        io_uring: fix short read retries for non-reg files
        io_uring: fix SQPOLL IORING_OP_CLOSE cancelation state
        io_uring: fix skipping disabling sqo on exec
        io_uring: fix uring_flush in exit_files() warning
        io_uring: fix false positive sqo warning on flush
        io_uring: iopoll requests should also wake task ->in_idle state
      ef7b1a0e
    • Linus Torvalds's avatar
      Merge tag 'block-5.11-2021-01-24' of git://git.kernel.dk/linux-block · a692a610
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - NVMe pull request from Christoph:
            - fix a status code in nvmet (Chaitanya Kulkarni)
            - avoid double completions in nvme-rdma/nvme-tcp (Chao Leng)
            - fix the CMB support to cope with NVMe 1.4 controllers (Klaus Jensen)
            - fix PRINFO handling in the passthrough ioctl (Revanth Rajashekar)
            - fix a double DMA unmap in nvme-pci
      
       - lightnvm error path leak fix (Pan)
      
       - MD pull request from Song:
            - Flush request fix (Xiao)
      
      * tag 'block-5.11-2021-01-24' of git://git.kernel.dk/linux-block:
        lightnvm: fix memory leak when submit fails
        nvme-pci: fix error unwind in nvme_map_data
        nvme-pci: refactor nvme_unmap_data
        md: Set prev_flush_start and flush_bio in an atomic way
        nvmet: set right status on error in id-ns handler
        nvme-pci: allow use of cmb on v1.4 controllers
        nvme-tcp: avoid request double completion for concurrent nvme_tcp_timeout
        nvme-rdma: avoid request double completion for concurrent nvme_rdma_timeout
        nvme: check the PRINFO bit before deciding the host buffer length
      a692a610
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 51306806
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "18 patches.
      
        Subsystems affected by this patch series: mm (pagealloc, memcg, kasan,
        memory-failure, and highmem), ubsan, proc, and MAINTAINERS"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        MAINTAINERS: add a couple more files to the Clang/LLVM section
        proc_sysctl: fix oops caused by incorrect command parameters
        powerpc/mm/highmem: use __set_pte_at() for kmap_local()
        mips/mm/highmem: use set_pte() for kmap_local()
        mm/highmem: prepare for overriding set_pte_at()
        sparc/mm/highmem: flush cache and TLB
        mm: fix page reference leak in soft_offline_page()
        ubsan: disable unsigned-overflow check for i386
        kasan, mm: fix resetting page_alloc tags for HW_TAGS
        kasan, mm: fix conflicts with init_on_alloc/free
        kasan: fix HW_TAGS boot parameters
        kasan: fix incorrect arguments passing in kasan_add_zero_shadow
        kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow
        mm: fix numa stats for thp migration
        mm: memcg: fix memcg file_dirty numa stat
        mm: memcg/slab: optimize objcg stock draining
        mm: fix initialization of struct page for holes in memory layout
        x86/setup: don't remove E820_TYPE_RAM for pfn 0
      51306806
    • Linus Torvalds's avatar
      Merge tag 'char-misc-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · fdbc80bd
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are some small char/misc driver fixes for 5.11-rc5:
      
         - habanalabs driver fixes
      
         - phy driver fixes
      
         - hwtracing driver fixes
      
         - rtsx cardreader driver fix
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'char-misc-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        misc: rtsx: init value of aspm_enabled
        habanalabs: disable FW events on device removal
        habanalabs: fix backward compatibility of idle check
        habanalabs: zero pci counters packet before submit to FW
        intel_th: pci: Add Alder Lake-P support
        stm class: Fix module init return on allocation failure
        habanalabs: prevent soft lockup during unmap
        habanalabs: fix reset process in case of failures
        habanalabs: fix dma_addr passed to dma_mmap_coherent
        phy: mediatek: allow compile-testing the dsi phy
        phy: cpcap-usb: Fix warning for missing regulator_disable
        PHY: Ingenic: fix unconditional build of phy-ingenic-usb
      fdbc80bd
    • Linus Torvalds's avatar
      Merge tag 'driver-core-5.11-rc5' of... · 443d1129
      Linus Torvalds authored
      Merge tag 'driver-core-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core fixes from Greg KH:
       "Here are some small driver core fixes for 5.11-rc5 that resolve some
        reported problems:
      
         - revert of a -rc1 patch that was causing problems with some machines
      
         - device link device name collision problem fix (busses only have to
           name devices unique to their bus, not unique to all busses)
      
         - kernfs splice bugfixes to resolve firmware loading problems for
           Qualcomm systems.
      
         - other tiny driver core fixes for minor issues reported.
      
        All of these have been in linux-next with no reported problems"
      
      * tag 'driver-core-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        driver core: Fix device link device name collision
        driver core: Extend device_is_dependent()
        kernfs: wire up ->splice_read and ->splice_write
        kernfs: implement ->write_iter
        kernfs: implement ->read_iter
        Revert "driver core: Reorder devices on successful probe"
        Driver core: platform: Add extra error check in devm_platform_get_irqs_affinity()
        drivers core: Free dma_range_map when driver probe failed
      443d1129
    • Linus Torvalds's avatar
      Merge tag 'staging-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 832bceef
      Linus Torvalds authored
      Pull staging/IIO driver fixes from Greg KH:
       "Here are some IIO driver fixes for 5.11-rc5 to resolve some reported
        problems.
      
        Nothing major, just a few small fixes, all of these have been in
        linux-next for a while and full details are in the shortlog"
      
      * tag 'staging-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        iio: sx9310: Fix semtech,avg-pos-strength setting when > 16
        iio: common: st_sensors: fix possible infinite loop in st_sensors_irq_thread
        iio: ad5504: Fix setting power-down state
        counter:ti-eqep: remove floor
        drivers: iio: temperature: Add delay after the addressed reset command in mlx90632.c
        iio: adc: ti_am335x_adc: remove omitted iio_kfifo_free()
        dt-bindings: iio: accel: bma255: Fix bmc150/bmi055 compatible
        iio: sx9310: Off by one in sx9310_read_thresh()
      832bceef
    • Linus Torvalds's avatar
      Merge tag 'tty-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 4da81fa2
      Linus Torvalds authored
      Pull tty/serial fixes from Greg KH:
       "Here are three small tty/serial fixes for 5.11-rc5 to resolve reported
        problems:
      
         - two patches to fix up writing to ttys with splice
      
         - mvebu-uart driver fix for reported problem
      
        All of these have been in linux-next with no reported problems"
      
      * tag 'tty-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        tty: fix up hung_up_tty_write() conversion
        tty: implement write_iter
        serial: mvebu-uart: fix tx lost characters at power off
      4da81fa2
    • Linus Torvalds's avatar
      Merge tag 'usb-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 8f3bfd21
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB driver fixes for 5.11-rc5.  They resolve:
      
         - xhci issues for some reported problems
      
         - ehci driver issue for one specific device
      
         - USB gadget fixes for some reported problems
      
         - cdns3 driver fixes for issues reported
      
         - MAINTAINERS file update
      
         - thunderbolt minor fix
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'usb-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: bdc: Make bdc pci driver depend on BROKEN
        xhci: tegra: Delay for disabling LFPS detector
        xhci: make sure TRB is fully written before giving it to the controller
        usb: udc: core: Use lock when write to soft_connect
        USB: gadget: dummy-hcd: Fix errors in port-reset handling
        usb: gadget: aspeed: fix stop dma register setting.
        USB: ehci: fix an interrupt calltrace error
        ehci: fix EHCI host controller initialization sequence
        MAINTAINERS: update Peter Chen's email address
        thunderbolt: Drop duplicated 0x prefix from format string
        MAINTAINERS: Update address for Cadence USB3 driver
        usb: cdns3: imx: improve driver .remove API
        usb: cdns3: imx: fix can't create core device the second time issue
        usb: cdns3: imx: fix writing read-only memory issue
      8f3bfd21
    • Nathan Chancellor's avatar
      MAINTAINERS: add a couple more files to the Clang/LLVM section · e82d891a
      Nathan Chancellor authored
      
      
      The K: entry should ensure that Nick and I always get CC'd on patches that
      touch these files but it is better to be explicit rather than implicit.
      
      Link: https://lkml.kernel.org/r/20210114004059.2129921-1-natechancellor@gmail.com
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e82d891a
    • Xiaoming Ni's avatar
      proc_sysctl: fix oops caused by incorrect command parameters · 697edcb0
      Xiaoming Ni authored
      The process_sysctl_arg() does not check whether val is empty before
      invoking strlen(val).  If the command line parameter () is incorrectly
      configured and val is empty, oops is triggered.
      
      For example:
        "hung_task_panic=1" is incorrectly written as "hung_task_panic", oops is
        triggered. The call stack is as follows:
          Kernel command line: .... hung_task_panic
          ......
          Call trace:
          __pi_strlen+0x10/0x98
          parse_args+0x278/0x344
          do_sysctl_args+0x8c/0xfc
          kernel_init+0x5c/0xf4
          ret_from_fork+0x10/0x30
      
      To fix it, check whether "val" is empty when "phram" is a sysctl field.
      Error codes are returned in the failure branch, and error logs are
      generated by parse_args().
      
      Link: https://lkml.kernel.org/r/20210118133029.28580-1-nixiaoming@huawei.com
      Fixes: 3db978d4
      
       ("kernel/sysctl: support setting sysctl parameters from kernel command line")
      Signed-off-by: default avatarXiaoming Ni <nixiaoming@huawei.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: <stable@vger.kernel.org>	[5.8+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      697edcb0
    • Thomas Gleixner's avatar
      powerpc/mm/highmem: use __set_pte_at() for kmap_local() · 78502582
      Thomas Gleixner authored
      The original PowerPC highmem mapping function used __set_pte_at() to
      denote that the mapping is per CPU.  This got lost with the conversion
      to the generic implementation.
      
      Override the default map function.
      
      Link: https://lkml.kernel.org/r/20210112170411.281464308@linutronix.de
      Fixes: 47da42b2
      
       ("powerpc/mm/highmem: Switch to generic kmap atomic")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Paul Cercueil <paul@crapouillou.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      78502582
    • Thomas Gleixner's avatar
      mips/mm/highmem: use set_pte() for kmap_local() · 8c0d5d78
      Thomas Gleixner authored
      set_pte_at() on MIPS invokes update_cache() which might recurse into
      kmap_local().
      
      Use set_pte() like the original MIPS highmem implementation did.
      
      Link: https://lkml.kernel.org/r/20210112170411.187513575@linutronix.de
      Fixes: a4c33e83
      
       ("mips/mm/highmem: Switch to generic kmap atomic")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reported-by: default avatarPaul Cercueil <paul@crapouillou.net>
      Reported-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Acked-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8c0d5d78
    • Thomas Gleixner's avatar
      mm/highmem: prepare for overriding set_pte_at() · a1dce7fd
      Thomas Gleixner authored
      
      
      The generic kmap_local() map function uses set_pte_at(), but MIPS requires
      set_pte() and PowerPC wants __set_pte_at().
      
      Provide arch_kmap_local_set_pte() and default it to set_pte_at().
      
      Link: https://lkml.kernel.org/r/20210112170411.056306194@linutronix.de
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Cercueil <paul@crapouillou.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a1dce7fd
    • Thomas Gleixner's avatar
      sparc/mm/highmem: flush cache and TLB · f99e0237
      Thomas Gleixner authored
      Patch series "mm/highmem: Fix fallout from generic kmap_local
      conversions".
      
      The kmap_local conversion wreckaged sparc, mips and powerpc as it missed
      some of the details in the original implementation.
      
      This patch (of 4):
      
      The recent conversion to the generic kmap_local infrastructure failed to
      assign the proper pre/post map/unmap flush operations for sparc.
      
      Sparc requires cache flush before map/unmap and tlb flush afterwards.
      
      Link: https://lkml.kernel.org/r/20210112170136.078559026@linutronix.de
      Link: https://lkml.kernel.org/r/20210112170410.905976187@linutronix.de
      Fixes: 3293efa9
      
       ("sparc/mm/highmem: Switch to generic kmap atomic")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reported-by: default avatarAndreas Larsson <andreas@gaisler.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Cercueil <paul@crapouillou.net>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f99e0237
    • Dan Williams's avatar
      mm: fix page reference leak in soft_offline_page() · dad4e5b3
      Dan Williams authored
      The conversion to move pfn_to_online_page() internal to
      soft_offline_page() missed that the get_user_pages() reference taken by
      the madvise() path needs to be dropped when pfn_to_online_page() fails.
      
      Note the direct sysfs-path to soft_offline_page() does not perform a
      get_user_pages() lookup.
      
      When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page()
      pfn the kernel hangs at dax-device shutdown due to a leaked reference.
      
      Link: https://lkml.kernel.org/r/161058501210.1840162.8108917599181157327.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: feec24a6
      
       ("mm, soft-offline: convert parameter to pfn")
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dad4e5b3
    • Arnd Bergmann's avatar
      ubsan: disable unsigned-overflow check for i386 · 251b5497
      Arnd Bergmann authored
      Building ubsan kernels even for compile-testing introduced these
      warnings in my randconfig environment:
      
        crypto/blake2b_generic.c:98:13: error: stack frame size of 9636 bytes in function 'blake2b_compress' [-Werror,-Wframe-larger-than=]
        static void blake2b_compress(struct blake2b_state *S,
      
        crypto/sha512_generic.c:151:13: error: stack frame size of 1292 bytes in function 'sha512_generic_block_fn' [-Werror,-Wframe-larger-than=]
        static void sha512_generic_block_fn(struct sha512_state *sst, u8 const *src,
      
        lib/crypto/curve25519-fiat32.c:312:22: error: stack frame size of 2180 bytes in function 'fe_mul_impl' [-Werror,-Wframe-larger-than=]
        static noinline void fe_mul_impl(u32 out[10], const u32 in1[10], const u32 in2[10])
      
        lib/crypto/curve25519-fiat32.c:444:22: error: stack frame size of 1588 bytes in function 'fe_sqr_impl' [-Werror,-Wframe-larger-than=]
        static noinline void fe_sqr_impl(u32 out[10], const u32 in1[10])
      
      Further testing showed that this is caused by
      -fsanitize=unsigned-integer-overflow, but is isolated to the 32-bit x86
      architecture.
      
      The one in blake2b immediately overflows the 8KB stack area
      architectures, so better ensure this never happens by disabling the
      option for 32-bit x86.
      
      Link: https://lkml.kernel.org/r/20210112202922.2454435-1-arnd@kernel.org
      Link: https://lore.kernel.org/lkml/20201230154749.746641-1-arnd@kernel.org/
      Fixes: d0a3ac54
      
       ("ubsan: enable for all*config builds")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Marco Elver <elver@google.com>
      Cc: George Popescu <georgepope@android.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      251b5497
    • Andrey Konovalov's avatar
      kasan, mm: fix resetting page_alloc tags for HW_TAGS · acb35b17
      Andrey Konovalov authored
      A previous commit added resetting KASAN page tags to
      kernel_init_free_pages() to avoid false-positives due to accesses to
      metadata with the hardware tag-based mode.
      
      That commit did reset page tags before the metadata access, but didn't
      restore them after.  As the result, KASAN fails to detect bad accesses
      to page_alloc allocations on some configurations.
      
      Fix this by recovering the tag after the metadata access.
      
      Link: https://lkml.kernel.org/r/02b5bcd692e912c27d484030f666b350ad7e4ae4.1611074450.git.andreyknvl@google.com
      Fixes: aa1ef4d7
      
       ("kasan, mm: reset tags when accessing metadata")
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      acb35b17
    • Andrey Konovalov's avatar
      kasan, mm: fix conflicts with init_on_alloc/free · ce5716c6
      Andrey Konovalov authored
      A few places where SLUB accesses object's data or metadata were missed
      in a previous patch.  This leads to false positives with hardware
      tag-based KASAN when bulk allocations are used with init_on_alloc/free.
      
      Fix the false-positives by resetting pointer tags during these accesses.
      
      (The kasan_reset_tag call is removed from slab_alloc_node, as it's added
       into maybe_wipe_obj_freeptr.)
      
      Link: https://linux-review.googlesource.com/id/I50dd32838a666e173fe06c3c5c766f2c36aae901
      Link: https://lkml.kernel.org/r/093428b5d2ca8b507f4a79f92f9929b35f7fada7.1610731872.git.andreyknvl@google.com
      Fixes: aa1ef4d7
      
       ("kasan, mm: reset tags when accessing metadata")
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ce5716c6
    • Andrey Konovalov's avatar
      kasan: fix HW_TAGS boot parameters · 76bc99e8
      Andrey Konovalov authored
      
      
      The initially proposed KASAN command line parameters are redundant.
      
      This change drops the complex "kasan.mode=off/prod/full" parameter and
      adds a simpler kill switch "kasan=off/on" instead.  The new parameter
      together with the already existing ones provides a cleaner way to
      express the same set of features.
      
      The full set of parameters with this change:
      
        kasan=off/on             - whether KASAN is enabled
        kasan.fault=report/panic - whether to only print a report or also panic
        kasan.stacktrace=off/on  - whether to collect alloc/free stack traces
      
      Default values:
      
        kasan=on
        kasan.fault=report
        kasan.stacktrace=on  (if CONFIG_DEBUG_KERNEL=y)
        kasan.stacktrace=off (otherwise)
      
      Link: https://linux-review.googlesource.com/id/Ib3694ed90b1e8ccac6cf77dfd301847af4aba7b8
      Link: https://lkml.kernel.org/r/4e9c4a4bdcadc168317deb2419144582a9be6e61.1610736745.git.andreyknvl@google.com
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: default avatarVincenzo Frascino <vincenzo.frascino@arm.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      76bc99e8
    • Lecopzer Chen's avatar
      kasan: fix incorrect arguments passing in kasan_add_zero_shadow · 5dabd171
      Lecopzer Chen authored
      kasan_remove_zero_shadow() shall use original virtual address, start and
      size, instead of shadow address.
      
      Link: https://lkml.kernel.org/r/20210103063847.5963-1-lecopzer@gmail.com
      Fixes: 0207df4f
      
       ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN")
      Signed-off-by: default avatarLecopzer Chen <lecopzer.chen@mediatek.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5dabd171
    • Lecopzer Chen's avatar
      kasan: fix unaligned address is unhandled in kasan_remove_zero_shadow · a11a496e
      Lecopzer Chen authored
      During testing kasan_populate_early_shadow and kasan_remove_zero_shadow,
      if the shadow start and end address in kasan_remove_zero_shadow() is not
      aligned to PMD_SIZE, the remain unaligned PTE won't be removed.
      
      In the test case for kasan_remove_zero_shadow():
      
          shadow_start: 0xffffffb802000000, shadow end: 0xffffffbfbe000000
      
          3-level page table:
            PUD_SIZE: 0x40000000 PMD_SIZE: 0x200000 PAGE_SIZE: 4K
      
      0xffffffbf80000000 ~ 0xffffffbfbdf80000 will not be removed because in
      kasan_remove_pud_table(), kasan_pmd_table(*pud) is true but the next
      address is 0xffffffbfbdf80000 which is not aligned to PUD_SIZE.
      
      In the correct condition, this should fallback to the next level
      kasan_remove_pmd_table() but the condition flow always continue to skip
      the unaligned part.
      
      Fix by correcting the condition when next and addr are neither aligned.
      
      Link: https://lkml.kernel.org/r/20210103135621.83129-1-lecopzer@gmail.com
      Fixes: 0207df4f
      
       ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN")
      Signed-off-by: default avatarLecopzer Chen <lecopzer.chen@mediatek.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: YJ Chiang <yj.chiang@mediatek.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a11a496e
    • Linus Torvalds's avatar
      Merge tag 'irq_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e6806137
      Linus Torvalds authored
      Pull irq fixes from Borislav Petkov:
      
       - Fix a kernel panic in mips-cpu due to invalid irq domain hierarchy.
      
       - Fix to not lose IPIs on bcm2836.
      
       - Fix for a bogus marking of ITS devices as shared due to unitialized
         stack variable.
      
       - Clear a phantom interrupt on qcom-pdc to unblock suspend.
      
       - Small cleanups, warning and build fixes.
      
      * tag 'irq_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Export irq_check_status_bit()
        irqchip/mips-cpu: Set IPI domain parent chip
        irqchip/pruss: Simplify the TI_PRUSS_INTC Kconfig
        irqchip/loongson-liointc: Fix build warnings
        driver core: platform: Add extra error check in devm_platform_get_irqs_affinity()
        irqchip/bcm2836: Fix IPI acknowledgement after conversion to handle_percpu_devid_irq
        irqchip/irq-sl28cpld: Convert comma to semicolon
        genirq/msi: Initialize msi_alloc_info before calling msi_domain_prepare_irqs()
      e6806137
    • Linus Torvalds's avatar
      Merge tag 'objtool_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 32d43270
      Linus Torvalds authored
      Pull objtool fixes from Borislav Petkov:
      
       - Adjust objtool to handle a recent binutils change to not generate
         unused symbols anymore.
      
       - Revert the fail-the-build-on-fatal-errors objtool strategy for now
         due to the ever-increasing matrix of supported toolchains/plugins and
         them causing too many such fatal errors currently.
      
       - Do not add empty symbols to objdump's rbtree to accommodate clang
         removing section symbols.
      
      * tag 'objtool_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        objtool: Don't fail on missing symbol table
        objtool: Don't fail the kernel build on fatal errors
        objtool: Don't add empty symbols to the rbtree
      32d43270
    • Linus Torvalds's avatar
      Merge tag 'sched_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 24c56ee0
      Linus Torvalds authored
      Pull scheduler fixes from Borislav Petkov:
      
       - Correct the marking of kthreads which are supposed to run on a
         specific, single CPU vs such which are affine to only one CPU, mark
         per-cpu workqueue threads as such and make sure that marking
         "survives" CPU hotplug. Fix CPU hotplug issues with such kthreads.
      
       - A fix to not push away tasks on CPUs coming online.
      
       - Have workqueue CPU hotplug code use cpu_possible_mask when breaking
         affinity on CPU offlining so that pending workers can finish on newly
         arrived onlined CPUs too.
      
       - Dump tasks which haven't vacated a CPU which is currently being
         unplugged.
      
       - Register a special scale invariance callback which gets called on
         resume from RAM to read out APERF/MPERF after resume and thus make
         the schedutil scaling governor more precise.
      
      * tag 'sched_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched: Relax the set_cpus_allowed_ptr() semantics
        sched: Fix CPU hotplug / tighten is_per_cpu_kthread()
        sched: Prepare to use balance_push in ttwu()
        workqueue: Restrict affinity change to rescuer
        workqueue: Tag bound workers with KTHREAD_IS_PER_CPU
        kthread: Extract KTHREAD_IS_PER_CPU
        sched: Don't run cpu-online with balance_push() enabled
        workqueue: Use cpu_possible_mask instead of cpu_active_mask to break affinity
        sched/core: Print out straggler tasks in sched_cpu_dying()
        x86: PM: Register syscore_ops for scale invariance
      24c56ee0
    • Linus Torvalds's avatar
      Merge tag 'timers_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 025929f4
      Linus Torvalds authored
      Pull timer fixes from Borislav Petkov:
      
       - Fix an integer overflow in the NTP RTC synchronization which led to
         the latter happening every 2 seconds instead of the intended every 11
         minutes.
      
       - Get rid of now unused get_seconds().
      
      * tag 'timers_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        ntp: Fix RTC synchronization on 32-bit platforms
        timekeeping: Remove unused get_seconds()
      025929f4
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 17b6c49d
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
      
       - Add a new Intel model number for Alder Lake
      
       - Differentiate which aspects of the FPU state get saved/restored when
         the FPU is used in-kernel and fix a boot crash on K7 due to early
         MXCSR access before CR4.OSFXSR is even set.
      
       - A couple of noinstr annotation fixes
      
       - Correct die ID setting on AMD for users of topology information which
         need the correct die ID
      
       - A SEV-ES fix to handle string port IO to/from kernel memory properly
      
      * tag 'x86_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/cpu: Add another Alder Lake CPU to the Intel family
        x86/mmx: Use KFPU_387 for MMX string operations
        x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
        x86/topology: Make __max_die_per_package available unconditionally
        x86: __always_inline __{rd,wr}msr()
        x86/mce: Remove explicit/superfluous tracing
        locking/lockdep: Avoid noinstr warning for DEBUG_LOCKDEP
        locking/lockdep: Cure noinstr fail
        x86/sev: Fix nonistr violation
        x86/entry: Fix noinstr fail
        x86/cpu/amd: Set __max_die_per_package on AMD
        x86/sev-es: Handle string port IO to kernel memory properly
      17b6c49d
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 14c50a66
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - Fix a bad interaction between the scv handling and the fallback L1D
         flush, which could lead to user register corruption. Only affects
         people using scv (~no one) on machines with old firmware that are
         missing the L1D flush.
      
       - Two small selftest fixes.
      
      Thanks to Eirik Fuller, Libor Pechacek, Nicholas Piggin, Sandipan Das,
      and Tulio Magno Quites Machado Filho.
      
      * tag 'powerpc-5.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/64s: fix scv entry fallback flush vs interrupt
        selftests/powerpc: Only test lwm/stmw on big endian
        selftests/powerpc: Fix exit status of pkey tests
      14c50a66
    • Linus Torvalds's avatar
      Merge tag 'for-linus-2021-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · c509ce23
      Linus Torvalds authored
      Pull misc fixes from Christian Brauner:
      
       - Jann reported sparse complaints because of a missing __user
         annotation in a helper we added way back when we added
         pidfd_send_signal() to avoid compat syscall handling. Fix it.
      
       - Yanfei replaces a reference in a comment to the _do_fork() helper I
         removed a while ago with a reference to the new kernel_clone()
         replacement
      
       - Alexander Guril added a simple coding style fix
      
      * tag 'for-linus-2021-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        kthread: remove comments about old _do_fork() helper
        Kernel: fork.c: Fix coding style: Do not use {} around single-line statements
        signal: Add missing __user annotation to copy_siginfo_from_user_any
      c509ce23
    • Linus Torvalds's avatar
      Merge tag '5.11-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6 · 4dcd3bcc
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "An important signal handling patch for stable, and two small cleanup
        patches"
      
      * tag '5.11-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: do not fail __smb_send_rqst if non-fatal signals are pending
        fs/cifs: Simplify bool comparison.
        fs/cifs: Assign boolean values to a bool variable
      4dcd3bcc
    • Shakeel Butt's avatar
      mm: fix numa stats for thp migration · 5c447d27
      Shakeel Butt authored
      Currently the kernel is not correctly updating the numa stats for
      NR_FILE_PAGES and NR_SHMEM on THP migration.  Fix that.
      
      For NR_FILE_DIRTY and NR_ZONE_WRITE_PENDING, although at the moment
      there is no need to handle THP migration as kernel still does not have
      write support for file THP but to be more future proof, this patch adds
      the THP support for those stats as well.
      
      Link: https://lkml.kernel.org/r/20210108155813.2914586-2-shakeelb@google.com
      Fixes: e71769ae
      
       ("mm: enable thp migration for shmem thp")
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5c447d27
    • Shakeel Butt's avatar
      mm: memcg: fix memcg file_dirty numa stat · 8a8792f6
      Shakeel Butt authored
      The kernel updates the per-node NR_FILE_DIRTY stats on page migration
      but not the memcg numa stats.
      
      That was not an issue until recently the commit 5f9a4f4a ("mm:
      memcontrol: add the missing numa_stat interface for cgroup v2") exposed
      numa stats for the memcg.
      
      So fix the file_dirty per-memcg numa stat.
      
      Link: https://lkml.kernel.org/r/20210108155813.2914586-1-shakeelb@google.com
      Fixes: 5f9a4f4a
      
       ("mm: memcontrol: add the missing numa_stat interface for cgroup v2")
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Acked-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8a8792f6
    • Roman Gushchin's avatar
      mm: memcg/slab: optimize objcg stock draining · 3de7d4f2
      Roman Gushchin authored
      Imran Khan reported a 16% regression in hackbench results caused by the
      commit f2fe7b09 ("mm: memcg/slab: charge individual slab objects
      instead of pages").  The regression is noticeable in the case of a
      consequent allocation of several relatively large slab objects, e.g.
      skb's.  As soon as the amount of stocked bytes exceeds PAGE_SIZE,
      drain_obj_stock() and __memcg_kmem_uncharge() are called, and it leads
      to a number of atomic operations in page_counter_uncharge().
      
      The corresponding call graph is below (provided by Imran Khan):
      
        |__alloc_skb
        |    |
        |    |__kmalloc_reserve.isra.61
        |    |    |
        |    |    |__kmalloc_node_track_caller
        |    |    |    |
        |    |    |    |slab_pre_alloc_hook.constprop.88
        |    |    |     obj_cgroup_charge
        |    |    |    |    |
        |    |    |    |    |__memcg_kmem_charge
        |    |    |    |    |    |
        |    |    |    |    |    |page_counter_try_charge
        |    |    |    |    |
        |    |    |    |    |refill_obj_stock
        |    |    |    |    |    |
        |    |    |    |    |    |drain_obj_stock.isra.68
        |    |    |    |    |    |    |
        |    |    |    |    |    |    |__memcg_kmem_uncharge
        |    |    |    |    |    |    |    |
        |    |    |    |    |    |    |    |page_counter_uncharge
        |    |    |    |    |    |    |    |    |
        |    |    |    |    |    |    |    |    |page_counter_cancel
        |    |    |    |
        |    |    |    |
        |    |    |    |__slab_alloc
        |    |    |    |    |
        |    |    |    |    |___slab_alloc
        |    |    |    |    |
        |    |    |    |slab_post_alloc_hook
      
      Instead of directly uncharging the accounted kernel memory, it's
      possible to refill the generic page-sized per-cpu stock instead.  It's a
      much faster operation, especially on a default hierarchy.  As a bonus,
      __memcg_kmem_uncharge_page() will also get faster, so the freeing of
      page-sized kernel allocations (e.g.  large kmallocs) will become faster.
      
      A similar change has been done earlier for the socket memory by the
      commit 475d0487 ("mm: memcontrol: use per-cpu stocks for socket
      memory uncharging").
      
      Link: https://lkml.kernel.org/r/20210106042239.2860107-1-guro@fb.com
      Fixes: f2fe7b09
      
       ("mm: memcg/slab: charge individual slab objects instead of pages")
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Reported-by: default avatarImran Khan <imran.f.khan@oracle.com>
      Tested-by: default avatarImran Khan <imran.f.khan@oracle.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Reviewed-by: default avatarMichal Koutn <mkoutny@suse.com>
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3de7d4f2