Skip to content
  1. Sep 12, 2020
  2. Sep 04, 2020
  3. Aug 22, 2020
    • Will Deacon's avatar
      KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set · b5331379
      Will Deacon authored
      When an MMU notifier call results in unmapping a range that spans multiple
      PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary,
      since this avoids running into RCU stalls during VM teardown. Unfortunately,
      if the VM is destroyed as a result of OOM, then blocking is not permitted
      and the call to the scheduler triggers the following BUG():
      
       | BUG: sleeping function called from invalid context at arch/arm64/kvm/mmu.c:394
       | in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper
       | INFO: lockdep is turned off.
       | CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1
       | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
       | Call trace:
       |  dump_backtrace+0x0/0x284
       |  show_stack+0x1c/0x28
       |  dump_stack+0xf0/0x1a4
       |  ___might_sleep+0x2bc/0x2cc
       |  unmap_stage2_range+0x160/0x1ac
       |  kvm_unmap_hva_range+0x1a0/0x1c8
       |  kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8
       |  __mmu_notifier_invalidate_range_start+0x218/0x31c
       |  mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0
       |  __oom_reap_task_mm+0x128/0x268
       |  oom_reap_task+0xac/0x298
       |  oom_reaper+0x178/0x17c
       |  kthread+0x1e4/0x1fc
       |  ret_from_fork+0x10/0x30
      
      Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we
      only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier
      flags.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 8b3405e3
      
       ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd")
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Message-Id: <20200811102725.7121-3-will@kernel.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b5331379
    • Will Deacon's avatar
      KVM: Pass MMU notifier range flags to kvm_unmap_hva_range() · fdfe7cbd
      Will Deacon authored
      
      
      The 'flags' field of 'struct mmu_notifier_range' is used to indicate
      whether invalidate_range_{start,end}() are permitted to block. In the
      case of kvm_mmu_notifier_invalidate_range_start(), this field is not
      forwarded on to the architecture-specific implementation of
      kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
      whether or not to block.
      
      Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
      architectures are aware as to whether or not they are permitted to block.
      
      Cc: <stable@vger.kernel.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Message-Id: <20200811102725.7121-2-will@kernel.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fdfe7cbd
  4. Aug 21, 2020
  5. Aug 18, 2020
    • Jim Mattson's avatar
      kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode · cb957adb
      Jim Mattson authored
      See the SDM, volume 3, section 4.4.1:
      
      If PAE paging would be in use following an execution of MOV to CR0 or
      MOV to CR4 (see Section 4.1.1) and the instruction is modifying any of
      CR0.CD, CR0.NW, CR0.PG, CR4.PAE, CR4.PGE, CR4.PSE, or CR4.SMEP; then
      the PDPTEs are loaded from the address in CR3.
      
      Fixes: b9baba86
      
       ("KVM, pkeys: expose CPUID/CR4 to guest")
      Cc: Huaitong Han <huaitong.han@intel.com>
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Reviewed-by: default avatarPeter Shier <pshier@google.com>
      Reviewed-by: default avatarOliver Upton <oupton@google.com>
      Message-Id: <20200817181655.3716509-1-jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cb957adb
    • Jim Mattson's avatar
      kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode · 427890af
      Jim Mattson authored
      See the SDM, volume 3, section 4.4.1:
      
      If PAE paging would be in use following an execution of MOV to CR0 or
      MOV to CR4 (see Section 4.1.1) and the instruction is modifying any of
      CR0.CD, CR0.NW, CR0.PG, CR4.PAE, CR4.PGE, CR4.PSE, or CR4.SMEP; then
      the PDPTEs are loaded from the address in CR3.
      
      Fixes: 0be0226f
      
       ("KVM: MMU: fix SMAP virtualization")
      Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Reviewed-by: default avatarPeter Shier <pshier@google.com>
      Reviewed-by: default avatarOliver Upton <oupton@google.com>
      Message-Id: <20200817181655.3716509-2-jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      427890af
    • Paolo Bonzini's avatar
      KVM: x86: fix access code passed to gva_to_gpa · 19cf4b7e
      Paolo Bonzini authored
      
      
      The PK bit of the error code is computed dynamically in permission_fault
      and therefore need not be passed to gva_to_gpa: only the access bits
      (fetch, user, write) need to be passed down.
      
      Not doing so causes a splat in the pku test:
      
         WARNING: CPU: 25 PID: 5465 at arch/x86/kvm/mmu.h:197 paging64_walk_addr_generic+0x594/0x750 [kvm]
         Hardware name: Intel Corporation WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0014.D62.2001092233 01/09/2020
         RIP: 0010:paging64_walk_addr_generic+0x594/0x750 [kvm]
         Code: <0f> 0b e9 db fe ff ff 44 8b 43 04 4c 89 6c 24 30 8b 13 41 39 d0 89
         RSP: 0018:ff53778fc623fb60 EFLAGS: 00010202
         RAX: 0000000000000001 RBX: ff53778fc623fbf0 RCX: 0000000000000007
         RDX: 0000000000000001 RSI: 0000000000000002 RDI: ff4501efba818000
         RBP: 0000000000000020 R08: 0000000000000005 R09: 00000000004000e7
         R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000007
         R13: ff4501efba818388 R14: 10000000004000e7 R15: 0000000000000000
         FS:  00007f2dcf31a700(0000) GS:ff4501f1c8040000(0000) knlGS:0000000000000000
         CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
         CR2: 0000000000000000 CR3: 0000001dea475005 CR4: 0000000000763ee0
         DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
         DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
         PKRU: 55555554
         Call Trace:
          paging64_gva_to_gpa+0x3f/0xb0 [kvm]
          kvm_fixup_and_inject_pf_error+0x48/0xa0 [kvm]
          handle_exception_nmi+0x4fc/0x5b0 [kvm_intel]
          kvm_arch_vcpu_ioctl_run+0x911/0x1c10 [kvm]
          kvm_vcpu_ioctl+0x23e/0x5d0 [kvm]
          ksys_ioctl+0x92/0xb0
          __x64_sys_ioctl+0x16/0x20
          do_syscall_64+0x3e/0xb0
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
         ---[ end trace d17eb998aee991da ]---
      
      Reported-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Fixes: 89786147
      
       ("KVM: x86: Add helper functions for illegal GPA checking and page fault injection")
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      19cf4b7e
    • Yang Weijiang's avatar
      selftests: kvm: Use a shorter encoding to clear RAX · 98b0bf02
      Yang Weijiang authored
      
      
      If debug_regs.c is built with newer binutils, the resulting binary is "optimized"
      by the assembler:
      
      asm volatile("ss_start: "
                   "xor %%rax,%%rax\n\t"
                   "cpuid\n\t"
                   "movl $0x1a0,%%ecx\n\t"
                   "rdmsr\n\t"
                   : : : "rax", "ecx");
      
      is translated to :
      
        000000000040194e <ss_start>:
        40194e:       31 c0                   xor    %eax,%eax     <----- rax->eax?
        401950:       0f a2                   cpuid
        401952:       b9 a0 01 00 00          mov    $0x1a0,%ecx
        401957:       0f 32                   rdmsr
      
      As you can see rax is replaced with eax in target binary code.
      This causes a difference is the length of xor instruction (2 Byte vs 3 Byte),
      and makes the hard-coded instruction length check fail:
      
              /* Instruction lengths starting at ss_start */
              int ss_size[4] = {
                      3,              /* xor */   <-------- 2 or 3?
                      2,              /* cpuid */
                      5,              /* mov */
                      2,              /* rdmsr */
              };
      
      Encode the shorter version directly and, while at it, fix the "clobbers"
      of the asm.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarYang Weijiang <weijiang.yang@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      98b0bf02
  6. Aug 17, 2020
  7. Aug 16, 2020
    • Linus Torvalds's avatar
      Merge tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block · 4b6c093e
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A few fixes on the block side of things:
      
         - Discard granularity fix (Coly)
      
         - rnbd cleanups (Guoqing)
      
         - md error handling fix (Dan)
      
         - md sysfs fix (Junxiao)
      
         - Fix flush request accounting, which caused an IO slowdown for some
           configurations (Ming)
      
         - Properly propagate loop flag for partition scanning (Lennart)"
      
      * tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block:
        block: fix double account of flush request's driver tag
        loop: unset GENHD_FL_NO_PART_SCAN on LOOP_CONFIGURE
        rnbd: no need to set bi_end_io in rnbd_bio_map_kern
        rnbd: remove rnbd_dev_submit_io
        md-cluster: Fix potential error pointer dereference in resize_bitmaps()
        block: check queue's limits.discard_granularity in __blkdev_issue_discard()
        md: get sysfs entry after redundancy attr group create
      4b6c093e
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.9-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · d84835b1
      Linus Torvalds authored
      Pull RISC-V fix from Palmer Dabbelt:
       "I collected a single fix during the merge window: we managed to break
        the early trap setup on !MMU, this fixes it"
      
      * tag 'riscv-for-linus-5.9-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: Setup exception vector for nommu platform
      d84835b1
    • Linus Torvalds's avatar
      Merge tag 'sh-for-5.9' of git://git.libc.org/linux-sh · 5bbec3cf
      Linus Torvalds authored
      Pull arch/sh updates from Rich Felker:
       "Cleanup, SECCOMP_FILTER support, message printing fixes, and other
        changes to arch/sh"
      
      * tag 'sh-for-5.9' of git://git.libc.org/linux-sh: (34 commits)
        sh: landisk: Add missing initialization of sh_io_port_base
        sh: bring syscall_set_return_value in line with other architectures
        sh: Add SECCOMP_FILTER
        sh: Rearrange blocks in entry-common.S
        sh: switch to copy_thread_tls()
        sh: use the generic dma coherent remap allocator
        sh: don't allow non-coherent DMA for NOMMU
        dma-mapping: consolidate the NO_DMA definition in kernel/dma/Kconfig
        sh: unexport register_trapped_io and match_trapped_io_handler
        sh: don't include <asm/io_trapped.h> in <asm/io.h>
        sh: move the ioremap implementation out of line
        sh: move ioremap_fixed details out of <asm/io.h>
        sh: remove __KERNEL__ ifdefs from non-UAPI headers
        sh: sort the selects for SUPERH alphabetically
        sh: remove -Werror from Makefiles
        sh: Replace HTTP links with HTTPS ones
        arch/sh/configs: remove obsolete CONFIG_SOC_CAMERA*
        sh: stacktrace: Remove stacktrace_ops.stack()
        sh: machvec: Modernize printing of kernel messages
        sh: pci: Modernize printing of kernel messages
        ...
      5bbec3cf
    • Jens Axboe's avatar
      io_uring: short circuit -EAGAIN for blocking read attempt · f91daf56
      Jens Axboe authored
      One case was missed in the short IO retry handling, and that's hitting
      -EAGAIN on a blocking attempt read (eg from io-wq context). This is a
      problem on sockets that are marked as non-blocking when created, they
      don't carry any REQ_F_NOWAIT information to help us terminate them
      instead of perpetually retrying.
      
      Fixes: 227c0c96
      
       ("io_uring: internally retry short reads")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f91daf56
    • Jens Axboe's avatar
      io_uring: sanitize double poll handling · d4e7cd36
      Jens Axboe authored
      
      
      There's a bit of confusion on the matching pairs of poll vs double poll,
      depending on if the request is a pure poll (IORING_OP_POLL_ADD) or
      poll driven retry.
      
      Add io_poll_get_double() that returns the double poll waitqueue, if any,
      and io_poll_get_single() that returns the original poll waitqueue. With
      that, remove the argument to io_poll_remove_double().
      
      Finally ensure that wait->private is cleared once the double poll handler
      has run, so that remove knows it's already been seen.
      
      Cc: stable@vger.kernel.org # v5.8
      Reported-by: default avatar <syzbot+7f617d4a9369028b8a2c@syzkaller.appspotmail.com>
      Fixes: 18bceab101ad ("io_uring: allow POLL_ADD with double poll_wait() users")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d4e7cd36
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux · 713eee84
      Linus Torvalds authored
      Pull more perf tools updates from Arnaldo Carvalho de Melo:
       "Fixes:
         - Fixes for 'perf bench numa'.
      
         - Always memset source before memcpy in 'perf bench mem'.
      
         - Quote CC and CXX for their arguments to fix build in environments
           using those variables to pass more than just the compiler names.
      
         - Fix module symbol processing, addressing regression detected via
           "perf test".
      
         - Allow multiple probes in record+script_probe_vfs_getname.sh 'perf
           test' entry.
      
        Improvements:
         - Add script to autogenerate socket family name id->string table from
           copy of kernel header, used so far in 'perf trace'.
      
         - 'perf ftrace' improvements to provide similar options for this
           utility so that one can go from 'perf record', 'perf trace', etc to
           'perf ftrace' just by changing the name of the subcommand.
      
         - Prefer new "sched:sched_waking" trace event when it exists in 'perf
           sched' post processing.
      
         - Update POWER9 metrics to utilize other metrics.
      
         - Fall back to querying debuginfod if debuginfo not found locally.
      
        Miscellaneous:
         - Sync various kvm headers with kernel sources"
      
      * tag 'perf-tools-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (40 commits)
        perf ftrace: Make option description initials all capital letters
        perf build-ids: Fall back to debuginfod query if debuginfo not found
        perf bench numa: Remove dead code in parse_nodes_opt()
        perf stat: Update POWER9 metrics to utilize other metrics
        perf ftrace: Add change log
        perf: ftrace: Add set_tracing_options() to set all trace options
        perf ftrace: Add option --tid to filter by thread id
        perf ftrace: Add option -D/--delay to delay tracing
        perf: ftrace: Allow set graph depth by '--graph-opts'
        perf ftrace: Add support for trace option tracing_thresh
        perf ftrace: Add option 'verbose' to show more info for graph tracer
        perf ftrace: Add support for tracing option 'irq-info'
        perf ftrace: Add support for trace option funcgraph-irqs
        perf ftrace: Add support for trace option sleep-time
        perf ftrace: Add support for tracing option 'func_stack_trace'
        perf tools: Add general function to parse sublevel options
        perf ftrace: Add option '--inherit' to trace children processes
        perf ftrace: Show trace column header
        perf ftrace: Add option '-m/--buffer-size' to set per-cpu buffer size
        perf ftrace: Factor out function write_tracing_file_int()
        ...
      713eee84
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 50f6c7db
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Misc fixes and small updates all around the place:
      
         - Fix mitigation state sysfs output
      
         - Fix an FPU xstate/sxave code assumption bug triggered by
           Architectural LBR support
      
         - Fix Lightning Mountain SoC TSC frequency enumeration bug
      
         - Fix kexec debug output
      
         - Fix kexec memory range assumption bug
      
         - Fix a boundary condition in the crash kernel code
      
         - Optimize porgatory.ro generation a bit
      
         - Enable ACRN guests to use X2APIC mode
      
         - Reduce a __text_poke() IRQs-off critical section for the benefit of
           PREEMPT_RT"
      
      * tag 'x86-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/alternatives: Acquire pte lock with interrupts enabled
        x86/bugs/multihit: Fix mitigation reporting when VMX is not in use
        x86/fpu/xstate: Fix an xstate size check warning with architectural LBRs
        x86/purgatory: Don't generate debug info for purgatory.ro
        x86/tsr: Fix tsc frequency enumeration bug on Lightning Mountain SoC
        kexec_file: Correctly output debugging information for the PT_LOAD ELF header
        kexec: Improve & fix crash_exclude_mem_range() to handle overlapping ranges
        x86/crash: Correct the address boundary of function parameters
        x86/acrn: Remove redundant chars from ACRN signature
        x86/acrn: Allow ACRN guest to use X2APIC mode
      50f6c7db
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1195d58f
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar:
       "Two fixes: fix a new tracepoint's output value, and fix the formatting
        of show-state syslog printouts"
      
      * tag 'sched-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/debug: Fix the alignment of the show-state debug output
        sched: Fix use of count for nr_running tracepoint
      1195d58f
    • Linus Torvalds's avatar
      Merge tag 'perf-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7f5faaaa
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Misc fixes, an expansion of perf syscall access to CAP_PERFMON
        privileged tools, plus a RAPL HW-enablement for Intel SPR platforms"
      
      * tag 'perf-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/rapl: Add support for Intel SPR platform
        perf/x86/rapl: Support multiple RAPL unit quirks
        perf/x86/rapl: Fix missing psys sysfs attributes
        hw_breakpoint: Remove unused __register_perf_hw_breakpoint() declaration
        kprobes: Remove show_registers() function prototype
        perf/core: Take over CAP_SYS_PTRACE creds to CAP_PERFMON capability
      7f5faaaa
    • Linus Torvalds's avatar
      Merge tag 'locking-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · eb1319af
      Linus Torvalds authored
      Pull locking fixlets from Ingo Molnar:
       "A documentation fix and a 'fallthrough' macro update"
      
      * tag 'locking-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        futex: Convert to use the preferred 'fallthrough' macro
        Documentation/locking/locktypes: Fix a typo
      eb1319af