Skip to content
  1. Aug 19, 2021
    • Linus Torvalds's avatar
      pipe: avoid unnecessary EPOLLET wakeups under normal loads · 3b844826
      Linus Torvalds authored
      I had forgotten just how sensitive hackbench is to extra pipe wakeups,
      and commit 3a34b13a ("pipe: make pipe writes always wake up
      readers") ended up causing a quite noticeable regression on larger
      machines.
      
      Now, hackbench isn't necessarily a hugely meaningful benchmark, and it's
      not clear that this matters in real life all that much, but as Mel
      points out, it's used often enough when comparing kernels and so the
      performance regression shows up like a sore thumb.
      
      It's easy enough to fix at least for the common cases where pipes are
      used purely for data transfer, and you never have any exciting poll
      usage at all.  So set a special 'poll_usage' flag when there is polling
      activity, and make the ugly "EPOLLET has crazy legacy expectations"
      semantics explicit to only that case.
      
      I would love to limit it to just the broken EPOLLET case, but the pipe
      code can't see the difference between epoll and regular select/poll, so
      any non-read/write waiting will trigger the extra wakeup behavior.  That
      is sufficient for at least the hackbench case.
      
      Apart from making the odd extra wakeup cases more explicitly about
      EPOLLET, this also makes the extra wakeup be at the _end_ of the pipe
      write, not at the first write chunk.  That is actually much saner
      semantics (as much as you can call any of the legacy edge-triggered
      expectations for EPOLLET "sane") since it means that you know the wakeup
      will happen once the write is done, rather than possibly in the middle
      of one.
      
      [ For stable people: I'm putting a "Fixes" tag on this, but I leave it
        up to you to decide whether you actually want to backport it or not.
        It likely has no impact outside of synthetic benchmarks  - Linus ]
      
      Link: https://lore.kernel.org/lkml/20210802024945.GA8372@xsang-OptiPlex-9020/
      
      
      Fixes: 3a34b13a ("pipe: make pipe writes always wake up readers")
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Tested-by: default avatarSandeep Patil <sspatil@android.com>
      Tested-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3b844826
  2. Aug 18, 2021
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 614cb275
      Linus Torvalds authored
      Pull tracing fix from Steven Rostedt:
       "Limit the shooting in the foot of tp_printk
      
        The "tp_printk" option redirects the trace event output to printk at
        boot up. This is useful when a machine crashes before boot where the
        trace events can not be retrieved by the in kernel ring buffer. But it
        can be "dangerous" because trace events can be located in high
        frequency locations such as interrupts and the scheduler, where a
        printk can slow it down that it live locks the machine (because by the
        time the printk finishes, the next event is triggered). Thus tp_printk
        must be used with care.
      
        It was discovered that the filter logic to trace events does not apply
        to the tp_printk events. This can cause a surprise and live lock when
        the user expects it to be filtered to limit the amount of events
        printed to the console when in fact it still prints everything"
      
      * tag 'trace-v5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Apply trace filters on all output channels
      614cb275
  3. Aug 17, 2021
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 794c7931
      Linus Torvalds authored
      Pull crypto fix from Herbert Xu:
       "This contains a fix for a potential boot failure due to a missing
        Kconfig dependency for people upgrading with the DRBG enabled"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: drbg - select SHA512
      794c7931
    • Linus Torvalds's avatar
      Merge tag 'mtd/fixes-for-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux · a2824f19
      Linus Torvalds authored
      Pull MTD fixes from Miquel Raynal:
       "MTD core fixes:
         - Fix lock hierarchy in deregister_mtd_blktrans
         - Handle flashes without OTP gracefully
         - Break circular locks in register_mtd_blktrans
      
        MTD device fixes:
         - mchp48l640:
            - Fix memory leak on cmd
            - Silence some uninitialized variable warnings
         - blkdevs:
            - Initialize rq.limits.discard_granularity
      
        CFI fixes:
         - Fix crash when erasing/writing AMD cards
      
        Raw NAND fixes:
         - Fix of_get_nand_secure_regions():
            - Add a missing check
            - Avoid an unwanted probe failure when a DT property is missing"
      
      * tag 'mtd/fixes-for-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
        mtd: rawnand: Fix probe failure due to of_get_nand_secure_regions()
        mtd: fix lock hierarchy in deregister_mtd_blktrans
        mtd: devices: mchp48l640: Fix memory leak on cmd
        mtd: cfi_cmdset_0002: fix crash when erasing/writing AMD cards
        mtd: core: handle flashes without OTP gracefully
        mtd: mchp48l640: silence some uninitialized variable warnings
        mtd: break circular locks in register_mtd_blktrans
        mtd: rawnand: Add a check in of_get_nand_secure_regions()
        mtd: mtd_blkdevs: Initialize rq.limits.discard_granularity
      a2824f19
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.14-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · b88bcc7d
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "Fixes and clean ups to tracing:
      
         - Fix header alignment when PREEMPT_RT is enabled for osnoise tracer
      
         - Inject "stop" event to see where osnoise stopped the trace
      
         - Define DYNAMIC_FTRACE_WITH_ARGS as some code had an #ifdef for it
      
         - Fix erroneous message for bootconfig cmdline parameter
      
         - Fix crash caused by not found variable in histograms"
      
      * tag 'trace-v5.14-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event name
        init: Suppress wrong warning for bootconfig cmdline parameter
        tracing: define needed config DYNAMIC_FTRACE_WITH_ARGS
        trace/osnoise: Print a stop tracing message
        trace/timerlat: Add a header with PREEMPT_RT additional fields
        trace/osnoise: Add a header with PREEMPT_RT additional fields
      b88bcc7d
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 02a37154
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "Two nested virtualization fixes for AMD processors"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: nSVM: always intercept VMLOAD/VMSAVE when nested (CVE-2021-3656)
        KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl (CVE-2021-3653)
      02a37154
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 94e95d58
      Linus Torvalds authored
      Pull virtio fixes from Michael Tsirkin:
       "Fixes in virtio, vhost, and vdpa drivers"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        vdpa/mlx5: Fix queue type selection logic
        vdpa/mlx5: Avoid destroying MR on empty iotlb
        tools/virtio: fix build
        virtio_ring: pull in spinlock header
        vringh: pull in spinlock header
        virtio-blk: Add validation for block size in config space
        vringh: Use wiov->used to check for read/write desc order
        virtio_vdpa: reject invalid vq indices
        vdpa: Add documentation for vdpa_alloc_device() macro
        vDPA/ifcvf: Fix return value check for vdpa_alloc_device()
        vp_vdpa: Fix return value check for vdpa_alloc_device()
        vdpa_sim: Fix return value check for vdpa_alloc_device()
        vhost: Fix the calculation in vhost_overflow()
        vhost-vdpa: Fix integer overflow in vhost_vdpa_process_iotlb_update()
        virtio_pci: Support surprise removal of virtio pci device
        virtio: Protect vqs list access
        virtio: Keep vring_del_virtqueue() mirror of VQ create
        virtio: Improve vq->broken access to avoid any compiler optimization
      94e95d58
  4. Aug 16, 2021
    • Pingfan Liu's avatar
      tracing: Apply trace filters on all output channels · 6c34df6f
      Pingfan Liu authored
      The event filters are not applied on all of the output, which results in
      the flood of printk when using tp_printk. Unfolding
      event_trigger_unlock_commit_regs() into trace_event_buffer_commit(), so
      the filters can be applied on every output.
      
      Link: https://lkml.kernel.org/r/20210814034538.8428-1-kernelfans@gmail.com
      
      
      
      Cc: stable@vger.kernel.org
      Fixes: 0daa2302 ("tracing: Add tp_printk cmdline to have tracepoints go to printk()")
      Signed-off-by: default avatarPingfan Liu <kernelfans@gmail.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      6c34df6f
    • Maxim Levitsky's avatar
      KVM: nSVM: always intercept VMLOAD/VMSAVE when nested (CVE-2021-3656) · c7dfa400
      Maxim Levitsky authored
      
      
      If L1 disables VMLOAD/VMSAVE intercepts, and doesn't enable
      Virtual VMLOAD/VMSAVE (currently not supported for the nested hypervisor),
      then VMLOAD/VMSAVE must operate on the L1 physical memory, which is only
      possible by making L0 intercept these instructions.
      
      Failure to do so allowed the nested guest to run VMLOAD/VMSAVE unintercepted,
      and thus read/write portions of the host physical memory.
      
      Fixes: 89c8a498 ("KVM: SVM: Enable Virtual VMLOAD VMSAVE feature")
      
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c7dfa400
    • Maxim Levitsky's avatar
      KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl (CVE-2021-3653) · 0f923e07
      Maxim Levitsky authored
      
      
      * Invert the mask of bits that we pick from L2 in
        nested_vmcb02_prepare_control
      
      * Invert and explicitly use VIRQ related bits bitmask in svm_clear_vintr
      
      This fixes a security issue that allowed a malicious L1 to run L2 with
      AVIC enabled, which allowed the L2 to exploit the uninitialized and enabled
      AVIC to read/write the host physical memory at some offsets.
      
      Fixes: 3d6368ef ("KVM: SVM: Add VMRUN handler")
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0f923e07
    • Linus Torvalds's avatar
      Linux 5.14-rc6 · 7c60610d
      Linus Torvalds authored
      7c60610d
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · ecf93431
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - Fix crashes coming out of nap on 32-bit Book3s (eg. powerbooks).
      
       - Fix critical and debug interrupts on BookE, seen as crashes when
         using ptrace.
      
       - Fix an oops when running an SMP kernel on a UP system.
      
       - Update pseries LPAR security flavor after partition migration.
      
       - Fix an oops when using kprobes on BookE.
      
       - Fix oops on 32-bit pmac by not calling do_IRQ() from
         timer_interrupt().
      
       - Fix softlockups on CPU hotplug into a CPU-less node with xive (P9).
      
      Thanks to Cédric Le Goater, Christophe Leroy, Finn Thain, Geetika
      Moolchandani, Laurent Dufour, Laurent Vivier, Nicholas Piggin, Pu Lehui,
      Radu Rendec, Srikar Dronamraju, and Stan Johnson.
      
      * tag 'powerpc-5.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/xive: Do not skip CPU-less nodes when creating the IPIs
        powerpc/interrupt: Do not call single_step_exception() from other exceptions
        powerpc/interrupt: Fix OOPS by not calling do_IRQ() from timer_interrupt()
        powerpc/kprobes: Fix kprobe Oops happens in booke
        powerpc/pseries: Fix update of LPAR security flavor after LPM
        powerpc/smp: Fix OOPS in topology_init()
        powerpc/32: Fix critical and debug interrupts on BOOKE
        powerpc/32s: Fix napping restore in data storage interrupt (DSI)
      ecf93431
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c4f14eac
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "A set of fixes for PCI/MSI and x86 interrupt startup:
      
         - Mask all MSI-X entries when enabling MSI-X otherwise stale unmasked
           entries stay around e.g. when a crashkernel is booted.
      
         - Enforce masking of a MSI-X table entry when updating it, which
           mandatory according to speification
      
         - Ensure that writes to MSI[-X} tables are flushed.
      
         - Prevent invalid bits being set in the MSI mask register
      
         - Properly serialize modifications to the mask cache and the mask
           register for multi-MSI.
      
         - Cure the violation of the affinity setting rules on X86 during
           interrupt startup which can cause lost and stale interrupts. Move
           the initial affinity setting ahead of actualy enabling the
           interrupt.
      
         - Ensure that MSI interrupts are completely torn down before freeing
           them in the error handling case.
      
         - Prevent an array out of bounds access in the irq timings code"
      
      * tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        driver core: Add missing kernel doc for device::msi_lock
        genirq/msi: Ensure deactivation on teardown
        genirq/timings: Prevent potential array overflow in __irq_timings_store()
        x86/msi: Force affinity setup before startup
        x86/ioapic: Force affinity setup before startup
        genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP
        PCI/MSI: Protect msi_desc::masked for multi-MSI
        PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown()
        PCI/MSI: Correct misleading comments
        PCI/MSI: Do not set invalid bits in MSI mask
        PCI/MSI: Enforce MSI[X] entry updates to be visible
        PCI/MSI: Enforce that MSI-X table entry is masked for update
        PCI/MSI: Mask all unused MSI-X entries
        PCI/MSI: Enable and mask MSI-X early
      c4f14eac
    • Linus Torvalds's avatar
      Merge tag 'locking_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 839da253
      Linus Torvalds authored
      Pull locking fix from Borislav Petkov:
      
       - Fix a CONFIG symbol's spelling
      
      * tag 'locking_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/rtmutex: Use the correct rtmutex debugging config option
      839da253
    • Linus Torvalds's avatar
      Merge tag 'efi_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 12aef8ac
      Linus Torvalds authored
      Pull EFI fixes from Borislav Petkov:
       "A batch of fixes for the arm64 stub image loader:
      
         - fix a logic bug that can make the random page allocator fail
           spuriously
      
         - force reallocation of the Image when it overlaps with firmware
           reserved memory regions
      
         - fix an oversight that defeated on optimization introduced earlier
           where images loaded at a suitable offset are never moved if booting
           without randomization
      
         - complain about images that were not loaded at the right offset by
           the firmware image loader"
      
      * tag 'efi_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi/libstub: arm64: Double check image alignment at entry
        efi/libstub: arm64: Warn when efi_random_alloc() fails
        efi/libstub: arm64: Relax 2M alignment again for relocatable kernels
        efi/libstub: arm64: Force Image reallocation if BSS was not reserved
        arm64: efi: kaslr: Fix occasional random alloc (and boot) failure
      12aef8ac
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b045b8cc
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
       "Two fixes:
      
         - An objdump checker fix to ignore parenthesized strings in the
           objdump version
      
         - Fix resctrl default monitoring groups reporting when new subgroups
           get created"
      
      * tag 'x86_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/resctrl: Fix default monitoring groups reporting
        x86/tools: Fix objdump version check again
      b045b8cc
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 3e763ec7
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "ARM:
      
         - Plug race between enabling MTE and creating vcpus
      
         - Fix off-by-one bug when checking whether an address range is RAM
      
        x86:
      
         - Fixes for the new MMU, especially a memory leak on hosts with <39
           physical address bits
      
         - Remove bogus EFER.NX checks on 32-bit non-PAE hosts
      
         - WAITPKG fix"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock
        KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEs
        KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs
        KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PF
        kvm: vmx: Sync all matching EPTPs when injecting nested EPT fault
        KVM: x86: remove dead initialization
        KVM: x86: Allow guest to set EFER.NX=1 on non-PAE 32-bit kernels
        KVM: VMX: Use current VMCS to query WAITPKG support for MSR emulation
        KVM: arm64: Fix race when enabling KVM_ARM_CAP_MTE
        KVM: arm64: Fix off-by-one in range_is_memory
      3e763ec7
  5. Aug 15, 2021
  6. Aug 14, 2021
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · dfa377c3
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "7 patches.
      
        Subsystems affected by this patch series: mm (kasan, mm/slub,
        mm/madvise, and memcg), and lib"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        lib: use PFN_PHYS() in devmem_is_allowed()
        mm/memcg: fix incorrect flushing of lruvec data in obj_stock
        mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ|WRITE)
        mm: slub: fix slub_debug disabling for list of slabs
        slub: fix kmalloc_pagealloc_invalid_free unit test
        kasan, slub: reset tag when printing address
        kasan, kmemleak: reset tags when scanning block
      dfa377c3
    • Linus Torvalds's avatar
      Merge tag '5.14-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 27b2eaa1
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Four CIFS/SMB3 Fixes, all for stable, two relating to deferred close,
        and one for the 'modefromsid' mount option (when 'idsfromsid' not
        specified)"
      
      * tag '5.14-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: Call close synchronously during unlink/rename/lease break.
        cifs: Handle race conditions during rename
        cifs: use the correct max-length for dentry_path_raw()
        cifs: create sd context must be a multiple of 8
      27b2eaa1
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-fixes-5.14-rc6' of... · a83ed225
      Linus Torvalds authored
      Merge tag 'linux-kselftest-fixes-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull Kselftest fix from Shuah Khan:
       "A single patch to sgx test to fix Q1 and Q2 calculation"
      
      * tag 'linux-kselftest-fixes-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests/sgx: Fix Q1 and Q2 calculation in sigstruct.c
      a83ed225
    • Liang Wang's avatar
      lib: use PFN_PHYS() in devmem_is_allowed() · 854f3264
      Liang Wang authored
      The physical address may exceed 32 bits on 32-bit systems with more than
      32 bits of physcial address.  Use PFN_PHYS() in devmem_is_allowed(), or
      the physical address may overflow and be truncated.
      
      We found this bug when mapping a high addresses through devmem tool,
      when CONFIG_STRICT_DEVMEM is enabled on the ARM with ARM_LPAE and devmem
      is used to map a high address that is not in the iomem address range, an
      unexpected error indicating no permission is returned.
      
      This bug was initially introduced from v2.6.37, and the function was
      moved to lib in v5.11.
      
      Link: https://lkml.kernel.org/r/20210731025057.78825-1-wangliang101@huawei.com
      
      
      Fixes: 087aaffc ("ARM: implement CONFIG_STRICT_DEVMEM by disabling access to RAM via /dev/mem")
      Fixes: 527701ed ("lib: Add a generic version of devmem_is_allowed()")
      Signed-off-by: default avatarLiang Wang <wangliang101@huawei.com>
      Reviewed-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Cc: Palmer Dabbelt <palmerdabbelt@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Liang Wang <wangliang101@huawei.com>
      Cc: Xiaoming Ni <nixiaoming@huawei.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: <stable@vger.kernel.org>	[2.6.37+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      854f3264
    • Waiman Long's avatar
      mm/memcg: fix incorrect flushing of lruvec data in obj_stock · 7fa0dacb
      Waiman Long authored
      When mod_objcg_state() is called with a pgdat that is different from
      that in the obj_stock, the old lruvec data cached in obj_stock are
      flushed out.  Unfortunately, they were flushed to the new pgdat and so
      the data go to the wrong node.  This will screw up the slab data
      reported in /sys/devices/system/node/node*/meminfo.
      
      Fix that by flushing the data to the cached pgdat instead.
      
      Link: https://lkml.kernel.org/r/20210802143834.30578-1-longman@redhat.com
      
      
      Fixes: 68ac5b3c ("mm/memcg: cache vmstat data in percpu memcg_stock_pcp")
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Chris Down <chris@chrisdown.name>
      Cc: Yafang Shao <laoar.shao@gmail.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Waiman Long <longman@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7fa0dacb
    • David Hildenbrand's avatar
      mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ|WRITE) · eb2faa51
      David Hildenbrand authored
      Doing some extended tests and polishing the man page update for
      MADV_POPULATE_(READ|WRITE), I realized that we end up converting also
      SIGBUS (via -EFAULT) to -EINVAL, making it look like yet another
      madvise() user error.
      
      We want to report only problematic mappings and permission problems that
      the user could have know as -EINVAL.
      
      Let's not convert -EFAULT arising due to SIGBUS (or SIGSEGV) to -EINVAL,
      but instead indicate -EFAULT to user space.  While we could also convert
      it to -ENOMEM, using -EFAULT looks more helpful when user space might
      want to troubleshoot what's going wrong: MADV_POPULATE_(READ|WRITE) is
      not part of an final Linux release and we can still adjust the behavior.
      
      Link: https://lkml.kernel.org/r/20210726154932.102880-1-david@redhat.com
      
      
      Fixes: 4ca9b385 ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
      Cc: Ram Pai <linuxram@us.ibm.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eb2faa51
    • Vlastimil Babka's avatar
      mm: slub: fix slub_debug disabling for list of slabs · a7f1d485
      Vlastimil Babka authored
      Vijayanand Jitta reports:
      
        Consider the scenario where CONFIG_SLUB_DEBUG_ON is set and we would
        want to disable slub_debug for few slabs. Using boot parameter with
        slub_debug=-,slab_name syntax doesn't work as expected i.e; only
        disabling debugging for the specified list of slabs. Instead it
        disables debugging for all slabs, which is wrong.
      
      This patch fixes it by delaying the moment when the global slub_debug
      flags variable is updated.  In case a "slub_debug=-,slab_name" has been
      passed, the global flags remain as initialized (depending on
      CONFIG_SLUB_DEBUG_ON enabled or disabled) and are not simply reset to 0.
      
      Link: https://lkml.kernel.org/r/8a3d992a-473a-467b-28a0-4ad2ff60ab82@suse.cz
      
      
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarVijayanand Jitta <vjitta@codeaurora.org>
      Reviewed-by: default avatarVijayanand Jitta <vjitta@codeaurora.org>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vinayak Menon <vinmenon@codeaurora.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a7f1d485
    • Shakeel Butt's avatar
      slub: fix kmalloc_pagealloc_invalid_free unit test · 1ed7ce57
      Shakeel Butt authored
      The unit test kmalloc_pagealloc_invalid_free makes sure that for the
      higher order slub allocation which goes to page allocator, the free is
      called with the correct address i.e.  the virtual address of the head
      page.
      
      Commit f227f0fa ("slub: fix unreclaimable slab stat for bulk free")
      unified the free code paths for page allocator based slub allocations
      but instead of using the address passed by the caller, it extracted the
      address from the page.  Thus making the unit test
      kmalloc_pagealloc_invalid_free moot.  So, fix this by using the address
      passed by the caller.
      
      Should we fix this? I think yes because dev expect kasan to catch these
      type of programming bugs.
      
      Link: https://lkml.kernel.org/r/20210802180819.1110165-1-shakeelb@google.com
      
      
      Fixes: f227f0fa ("slub: fix unreclaimable slab stat for bulk free")
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Reported-by: default avatarNathan Chancellor <nathan@kernel.org>
      Tested-by: default avatarNathan Chancellor <nathan@kernel.org>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1ed7ce57
    • Kuan-Ying Lee's avatar
      kasan, slub: reset tag when printing address · 340caf17
      Kuan-Ying Lee authored
      The address still includes the tags when it is printed.  With hardware
      tag-based kasan enabled, we will get a false positive KASAN issue when
      we access metadata.
      
      Reset the tag before we access the metadata.
      
      Link: https://lkml.kernel.org/r/20210804090957.12393-3-Kuan-Ying.Lee@mediatek.com
      
      
      Fixes: aa1ef4d7 ("kasan, mm: reset tags when accessing metadata")
      Signed-off-by: default avatarKuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chinwen Chang <chinwen.chang@mediatek.com>
      Cc: Nicholas Tang <nicholas.tang@mediatek.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      340caf17
    • Kuan-Ying Lee's avatar
      kasan, kmemleak: reset tags when scanning block · 6c7a00b8
      Kuan-Ying Lee authored
      Patch series "kasan, slub: reset tag when printing address", v3.
      
      With hardware tag-based kasan enabled, we reset the tag when we access
      metadata to avoid from false alarm.
      
      This patch (of 2):
      
      Kmemleak needs to scan kernel memory to check memory leak.  With hardware
      tag-based kasan enabled, when it scans on the invalid slab and
      dereference, the issue will occur as below.
      
      Hardware tag-based KASAN doesn't use compiler instrumentation, we can not
      use kasan_disable_current() to ignore tag check.
      
      Based on the below report, there are 11 0xf7 granules, which amounts to
      176 bytes, and the object is allocated from the kmalloc-256 cache.  So
      when kmemleak accesses the last 256-176 bytes, it causes faults, as those
      are marked with KASAN_KMALLOC_REDZONE == KASAN_TAG_INVALID == 0xfe.
      
      Thus, we reset tags before accessing metadata to avoid from false positives.
      
        BUG: KASAN: out-of-bounds in scan_block+0x58/0x170
        Read at addr f7ff0000c0074eb0 by task kmemleak/138
        Pointer tag: [f7], memory tag: [fe]
      
        CPU: 7 PID: 138 Comm: kmemleak Not tainted 5.14.0-rc2-00001-g8cae8cd89f05-dirty #134
        Hardware name: linux,dummy-virt (DT)
        Call trace:
         dump_backtrace+0x0/0x1b0
         show_stack+0x1c/0x30
         dump_stack_lvl+0x68/0x84
         print_address_description+0x7c/0x2b4
         kasan_report+0x138/0x38c
         __do_kernel_fault+0x190/0x1c4
         do_tag_check_fault+0x78/0x90
         do_mem_abort+0x44/0xb4
         el1_abort+0x40/0x60
         el1h_64_sync_handler+0xb4/0xd0
         el1h_64_sync+0x78/0x7c
         scan_block+0x58/0x170
         scan_gray_list+0xdc/0x1a0
         kmemleak_scan+0x2ac/0x560
         kmemleak_scan_thread+0xb0/0xe0
         kthread+0x154/0x160
         ret_from_fork+0x10/0x18
      
        Allocated by task 0:
         kasan_save_stack+0x2c/0x60
         __kasan_kmalloc+0xec/0x104
         __kmalloc+0x224/0x3c4
         __register_sysctl_paths+0x200/0x290
         register_sysctl_table+0x2c/0x40
         sysctl_init+0x20/0x34
         proc_sys_init+0x3c/0x48
         proc_root_init+0x80/0x9c
         start_kernel+0x648/0x6a4
         __primary_switched+0xc0/0xc8
      
        Freed by task 0:
         kasan_save_stack+0x2c/0x60
         kasan_set_track+0x2c/0x40
         kasan_set_free_info+0x44/0x54
         ____kasan_slab_free.constprop.0+0x150/0x1b0
         __kasan_slab_free+0x14/0x20
         slab_free_freelist_hook+0xa4/0x1fc
         kfree+0x1e8/0x30c
         put_fs_context+0x124/0x220
         vfs_kern_mount.part.0+0x60/0xd4
         kern_mount+0x24/0x4c
         bdev_cache_init+0x70/0x9c
         vfs_caches_init+0xdc/0xf4
         start_kernel+0x638/0x6a4
         __primary_switched+0xc0/0xc8
      
        The buggy address belongs to the object at ffff0000c0074e00
         which belongs to the cache kmalloc-256 of size 256
        The buggy address is located 176 bytes inside of
         256-byte region [ffff0000c0074e00, ffff0000c0074f00)
        The buggy address belongs to the page:
        page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x100074
        head:(____ptrval____) order:2 compound_mapcount:0 compound_pincount:0
        flags: 0xbfffc0000010200(slab|head|node=0|zone=2|lastcpupid=0xffff|kasantag=0x0)
        raw: 0bfffc0000010200 0000000000000000 dead000000000122 f5ff0000c0002300
        raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
        page dumped because: kasan: bad access detected
      
        Memory state around the buggy address:
         ffff0000c0074c00: f0 f0 f0 f0 f0 f0 f0 f0 f0 fe fe fe fe fe fe fe
         ffff0000c0074d00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
        >ffff0000c0074e00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 fe fe fe fe fe
                                                            ^
         ffff0000c0074f00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
         ffff0000c0075000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ==================================================================
        Disabling lock debugging due to kernel taint
        kmemleak: 181 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      
      Link: https://lkml.kernel.org/r/20210804090957.12393-1-Kuan-Ying.Lee@mediatek.com
      Link: https://lkml.kernel.org/r/20210804090957.12393-2-Kuan-Ying.Lee@mediatek.com
      
      
      Signed-off-by: default avatarKuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Nicholas Tang <nicholas.tang@mediatek.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Chinwen Chang <chinwen.chang@mediatek.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6c7a00b8
    • Linus Torvalds's avatar
      Merge tag 'block-5.14-2021-08-13' of git://git.kernel.dk/linux-block · 020efdad
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A few fixes for block that should go into 5.14:
      
         - Revert the mq-deadline cgroup addition. More work is needed on this
           front, let's revert it for now and get it right before having it in
           a released kernel (Tejun)
      
         - blk-iocost lockdep fix (Ming)
      
         - nbd double completion fix (Xie)
      
         - Fix for non-idling when clearing the shared tag flag (Yu)"
      
      * tag 'block-5.14-2021-08-13' of git://git.kernel.dk/linux-block:
        nbd: Aovid double completion of a request
        blk-mq: clear active_queues before clearing BLK_MQ_F_TAG_QUEUE_SHARED
        Revert "block/mq-deadline: Add cgroup support"
        blk-iocost: fix lockdep warning on blkcg->lock
      020efdad
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.14-2021-08-13' of git://git.kernel.dk/linux-block · 42995cee
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "A bit bigger than the previous weeks, but mostly just a few stable
        bound fixes. In detail:
      
         - Followup fixes to patches from last week for io-wq, turns out they
           weren't complete (Hao)
      
         - Two lockdep reported fixes out of the RT camp (me)
      
         - Sync the io_uring-cp example with liburing, as a few bug fixes
           never made it to the kernel carried version (me)
      
         - SQPOLL related TIF_NOTIFY_SIGNAL fix (Nadav)
      
         - Use WRITE_ONCE() when writing sq flags (Nadav)
      
         - io_rsrc_put_work() deadlock fix (Pavel)"
      
      * tag 'io_uring-5.14-2021-08-13' of git://git.kernel.dk/linux-block:
        tools/io_uring/io_uring-cp: sync with liburing example
        io_uring: fix ctx-exit io_rsrc_put_work() deadlock
        io_uring: drop ctx->uring_lock before flushing work item
        io-wq: fix IO_WORKER_F_FIXED issue in create_io_worker()
        io-wq: fix bug of creating io-wokers unconditionally
        io_uring: rsrc ref lock needs to be IRQ safe
        io_uring: Use WRITE_ONCE() when writing to sq_flags
        io_uring: clear TIF_NOTIFY_SIGNAL when running task work
      42995cee
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 462938cd
      Linus Torvalds authored
      Pull pin control fixes from Linus Walleij:
       "An assortment of pin control fixes of varying importance, the most
        important ones affecting Intel and AMD laptops turned up the recent
        few days so it's time to push this to your tree.
      
         - Fix the Kconfig dependency for Qualcomm SM8350 pin controller
      
         - Fix pin biasing fallback behaviour on the Mediatek pin controller
      
         - Fix the GPIO numbering scheme for Intel Tiger Lake-H to correspond
           to the products that are now actually out on the market
      
         - Fix a pin control function itemization in the Sunxi driver
           out-of-bounds access bug
      
         - Fix disable clocking for the RISC-V K210 pin controller on the
           errorpath
      
         - Fix a system shutdown bug affecting AMD Ryzen-based laptops, the
           system would not suspend but just bounce back up"
      
      * tag 'pinctrl-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: amd: Fix an issue with shutdown when system set to s0ix
        pinctrl: k210: Fix k210_fpioa_probe()
        pinctrl: sunxi: Don't underestimate number of functions
        pinctrl: tigerlake: Fix GPIO mapping for newer version of software
        pinctrl: mediatek: Fix fallback behavior for bias_set_combo
        pinctrl: qcom: fix GPIOLIB dependencies
      462938cd
  7. Aug 13, 2021