Skip to content
  1. Oct 20, 2018
    • Aneesh Kumar K.V's avatar
      powerpc/mm: Fix WARN_ON with THP NUMA migration · dd0e144a
      Aneesh Kumar K.V authored
      WARNING: CPU: 12 PID: 4322 at /arch/powerpc/mm/pgtable-book3s64.c:76 set_pmd_at+0x4c/0x2b0
       Modules linked in:
       CPU: 12 PID: 4322 Comm: qemu-system-ppc Tainted: G        W         4.19.0-rc3-00758-g8f0c636b0542 #36
       NIP:  c0000000000872fc LR: c000000000484eec CTR: 0000000000000000
       REGS: c000003fba876fe0 TRAP: 0700   Tainted: G        W          (4.19.0-rc3-00758-g8f0c636b0542)
       MSR:  900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 24282884  XER: 00000000
       CFAR: c000000000484ee8 IRQMASK: 0
       GPR00: c000000000484eec c000003fba877268 c000000001f0ec00 c000003fbd229f80
       GPR04: 00007c8fe8e00000 c000003f864c5a38 860300853e0000c0 0000000000000080
       GPR08: 0000000080000000 0000000000000001 0401000000000080 0000000000000001
       GPR12: 0000000000002000 c000003fffff5400 c000003fce292000 00007c9024570000
       GPR16: 0000000000000000 0000000000ffffff 0000000000000001 c000000001885950
       GPR20: 0000000000000000 001ffffc0004807c 0000000000000008 c000000001f49d05
       GPR24: 00007c8fe8e00000 c0000000020f2468 ffffffffffffffff c000003fcd33b090
       GPR28: 00007c8fe8e00000 c000003fbd229f80 c000003f864c5a38 860300853e0000c0
       NIP [c0000000000872fc] set_pmd_at+0x4c/0x2b0
       LR [c000000000484eec] do_huge_pmd_numa_page+0xb1c/0xc20
       Call Trace:
       [c000003fba877268] [c00000000045931c] mpol_misplaced+0x1bc/0x230 (unreliable)
       [c000003fba8772c8] [c000000000484eec] do_huge_pmd_numa_page+0xb1c/0xc20
       [c000003fba877398] [c00000000040d344] __handle_mm_fault+0x5e4/0x2300
       [c000003fba8774d8] [c00000000040f400] handle_mm_fault+0x3a0/0x420
       [c000003fba877528] [c0000000003ff6f4] __get_user_pages+0x2e4/0x560
       [c000003fba877628] [c000000000400314] get_user_pages_unlocked+0x104/0x2a0
       [c000003fba8776c8] [c000000000118f44] __gfn_to_pfn_memslot+0x284/0x6a0
       [c000003fba877748] [c0000000001463a0] kvmppc_book3s_radix_page_fault+0x360/0x12d0
       [c000003fba877838] [c000000000142228] kvmppc_book3s_hv_page_fault+0x48/0x1300
       [c000003fba877988] [c00000000013dc08] kvmppc_vcpu_run_hv+0x1808/0x1b50
       [c000003fba877af8] [c000000000126b44] kvmppc_vcpu_run+0x34/0x50
       [c000003fba877b18] [c000000000123268] kvm_arch_vcpu_ioctl_run+0x288/0x2d0
       [c000003fba877b98] [c00000000011253c] kvm_vcpu_ioctl+0x1fc/0x8c0
       [c000003fba877d08] [c0000000004e9b24] do_vfs_ioctl+0xa44/0xae0
       [c000003fba877db8] [c0000000004e9c44] ksys_ioctl+0x84/0xf0
       [c000003fba877e08] [c0000000004e9cd8] sys_ioctl+0x28/0x80
      
      We removed the pte_protnone check earlier with the understanding that we
      mark the pte invalid before the set_pte/set_pmd usage. But the huge pmd
      autonuma still use the set_pmd_at directly. This is ok because a protnone pte
      won't have translation cache in TLB.
      
      Fixes: da7ad366
      
       ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      dd0e144a
    • Michael Ellerman's avatar
      selftests/powerpc: Fix out-of-tree build errors · d8a2fe29
      Michael Ellerman authored
      
      
      Some of our Makefiles don't do the right thing when building the
      selftests with O=, fix them up.
      
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d8a2fe29
    • Christophe Leroy's avatar
      powerpc/time: no steal_time when CONFIG_PPC_SPLPAR is not selected · 51eeef9e
      Christophe Leroy authored
      
      
      If CONFIG_PPC_SPLPAR is not selected, steal_time will always
      be NUL, so accounting it is pointless
      
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      51eeef9e
    • Christophe Leroy's avatar
      powerpc/time: Only set CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC64 · abcff86d
      Christophe Leroy authored
      
      
      scaled cputime is only meaningfull when the processor has
      SPURR and/or PURR, which means only on PPC64.
      
      Removing it on PPC32 significantly reduces the size of
      vtime_account_system() and vtime_account_idle() on an 8xx:
      
      Before:
      00000000 l     F .text	000000a8 vtime_delta
      00000280 g     F .text	0000010c vtime_account_system
      0000038c g     F .text	00000048 vtime_account_idle
      
      After:
      (vtime_delta gets inlined inside the two functions)
      000001d8 g     F .text	000000a0 vtime_account_system
      00000278 g     F .text	00000038 vtime_account_idle
      
      In terms of performance, we also get approximatly 7% improvement on
      task switch. The following small benchmark app is run with perf stat:
      
      void *thread(void *arg)
      {
      	int i;
      
      	for (i = 0; i < atoi((char*)arg); i++)
      		pthread_yield();
      }
      
      int main(int argc, char **argv)
      {
      	pthread_t th1, th2;
      
      	pthread_create(&th1, NULL, thread, argv[1]);
      	pthread_create(&th2, NULL, thread, argv[1]);
      	pthread_join(th1, NULL);
      	pthread_join(th2, NULL);
      
      	return 0;
      }
      
      Before the patch:
      
       Performance counter stats for 'chrt -f 98 ./sched 100000' (50 runs):
      
             8228.476465      task-clock (msec)         #    0.954 CPUs utilized            ( +-  0.23% )
                  200004      context-switches          #    0.024 M/sec                    ( +-  0.00% )
      
      After the patch:
      
       Performance counter stats for 'chrt -f 98 ./sched 100000' (50 runs):
      
             7649.070444      task-clock (msec)         #    0.955 CPUs utilized            ( +-  0.27% )
                  200004      context-switches          #    0.026 M/sec                    ( +-  0.00% )
      
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      abcff86d
    • Christophe Leroy's avatar
      powerpc/time: isolate scaled cputime accounting in dedicated functions. · b38a181c
      Christophe Leroy authored
      
      
      scaled cputime is only meaningfull when the processor has
      SPURR and/or PURR, which means only on PPC64.
      
      In preparation of the following patch that will remove
      CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC32, this patch moves
      all scaled cputing accounting logic into dedicated functions.
      
      This patch doesn't change any functionality. It's only code
      reorganisation.
      
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b38a181c
    • Christophe Leroy's avatar
      powerpc/kgdb: add kgdb_arch_set/remove_breakpoint() · fb978ca2
      Christophe Leroy authored
      Generic implementation fails to remove breakpoints after init
      when CONFIG_STRICT_KERNEL_RWX is selected:
      
      [   13.251285] KGDB: BP remove failed: c001c338
      [   13.259587] kgdbts: ERROR PUT: end of test buffer on 'do_fork_test' line 8 expected OK got $E14#aa
      [   13.268969] KGDB: re-enter exception: ALL breakpoints killed
      [   13.275099] CPU: 0 PID: 1 Comm: init Not tainted 4.18.0-g82bbb913ffd8 #860
      [   13.282836] Call Trace:
      [   13.285313] [c60e1ba0] [c0080ef0] kgdb_handle_exception+0x6f4/0x720 (unreliable)
      [   13.292618] [c60e1c30] [c000e97c] kgdb_handle_breakpoint+0x3c/0x98
      [   13.298709] [c60e1c40] [c000af54] program_check_exception+0x104/0x700
      [   13.305083] [c60e1c60] [c000e45c] ret_from_except_full+0x0/0x4
      [   13.310845] [c60e1d20] [c02a22ac] run_simple_test+0x2b4/0x2d4
      [   13.316532] [c60e1d30] [c0081698] put_packet+0xb8/0x158
      [   13.321694] [c60e1d60] [c00820b4] gdb_serial_stub+0x230/0xc4c
      [   13.327374] [c60e1dc0] [c0080af8] kgdb_handle_exception+0x2fc/0x720
      [   13.333573] [c60e1e50] [c000e928] kgdb_singlestep+0xb4/0xcc
      [   13.339068] [c60e1e70] [c000ae1c] single_step_exception+0x90/0xac
      [   13.345100] [c60e1e80] [c000e45c] ret_from_except_full+0x0/0x4
      [   13.350865] [c60e1f40] [c000e11c] ret_from_syscall+0x0/0x38
      [   13.356346] Kernel panic - not syncing: Recursive entry to debugger
      
      This patch creates powerpc specific version of
      kgdb_arch_set_breakpoint() and kgdb_arch_remove_breakpoint()
      using patch_instruction()
      
      Fixes: 1e0fc9d1
      
       ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      fb978ca2
    • Christophe Leroy's avatar
      powerpc/sysdev/ipic: check primary_ipic NULL pointer before using it · 6beb3381
      Christophe Leroy authored
      
      
      ipic_get_mcp_status() is used by targets implementing NMI
      watchdog in target specific machine check handler in order
      to known whether a machine check results from a watchdog
      NMI reset.
      
      In case of very early machine check, primary_ipic pointer
      might not have been set yet, so ipic_get_mcp_status() needs
      to check it for nullity before using it.
      
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      6beb3381
    • Christophe Leroy's avatar
      powerpc/mm: fix always true/false warning in slice.c · 37e9c674
      Christophe Leroy authored
      
      
      This patch fixes the following warnings (obtained with make W=1).
      
      arch/powerpc/mm/slice.c: In function 'slice_range_to_mask':
      arch/powerpc/mm/slice.c:73:12: error: comparison is always true due to limited range of data type [-Werror=type-limits]
        if (start < SLICE_LOW_TOP) {
                  ^
      arch/powerpc/mm/slice.c:81:20: error: comparison is always false due to limited range of data type [-Werror=type-limits]
        if ((start + len) > SLICE_LOW_TOP) {
                          ^
      arch/powerpc/mm/slice.c: In function 'slice_mask_for_free':
      arch/powerpc/mm/slice.c:136:17: error: comparison is always true due to limited range of data type [-Werror=type-limits]
        if (high_limit <= SLICE_LOW_TOP)
                       ^
      arch/powerpc/mm/slice.c: In function 'slice_check_range_fits':
      arch/powerpc/mm/slice.c:185:12: error: comparison is always true due to limited range of data type [-Werror=type-limits]
        if (start < SLICE_LOW_TOP) {
                  ^
      arch/powerpc/mm/slice.c:195:39: error: comparison is always false due to limited range of data type [-Werror=type-limits]
        if (SLICE_NUM_HIGH && ((start + len) > SLICE_LOW_TOP)) {
                                             ^
      arch/powerpc/mm/slice.c: In function 'slice_scan_available':
      arch/powerpc/mm/slice.c:306:11: error: comparison is always true due to limited range of data type [-Werror=type-limits]
        if (addr < SLICE_LOW_TOP) {
                 ^
      arch/powerpc/mm/slice.c: In function 'get_slice_psize':
      arch/powerpc/mm/slice.c:709:11: error: comparison is always true due to limited range of data type [-Werror=type-limits]
        if (addr < SLICE_LOW_TOP) {
                 ^
      
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      37e9c674
    • Christophe Leroy's avatar
      powerpc/mm: fix missing prototypes in slice.c · aa5456ab
      Christophe Leroy authored
      
      
      This patch fixes the following warnings (obtained with make W=1).
      
      arch/powerpc/mm/slice.c: At top level:
      arch/powerpc/mm/slice.c:682:15: error: no previous prototype for 'arch_get_unmapped_area' [-Werror=missing-prototypes]
       unsigned long arch_get_unmapped_area(struct file *filp,
                     ^
      arch/powerpc/mm/slice.c:692:15: error: no previous prototype for 'arch_get_unmapped_area_topdown' [-Werror=missing-prototypes]
       unsigned long arch_get_unmapped_area_topdown(struct file *filp,
                     ^
      
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      aa5456ab
    • Christophe Leroy's avatar
      powerpc/mm: Trace tlbia instruction · 8114c36e
      Christophe Leroy authored
      
      
      Add a trace point for tlbia (Translation Lookaside Buffer Invalidate
      All) instruction.
      
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      8114c36e
    • Christophe Leroy's avatar
      powerpc/mm: Add missing tracepoint for tlbie · cf4a6085
      Christophe Leroy authored
      commit 0428491c ("powerpc/mm: Trace tlbie(l) instructions")
      added tracepoints for tlbie calls, but _tlbil_va() was forgotten
      
      Fixes: 0428491c
      
       ("powerpc/mm: Trace tlbie(l) instructions")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      cf4a6085
    • Christophe Leroy's avatar
      powerpc/book3s64: fix dump_linuxpagetables "present" flag · 3ff38e18
      Christophe Leroy authored
      Since commit bd0dbb73 ("powerpc/mm/books3s: Add new pte bit to
      mark pte temporarily invalid."), _PAGE_PRESENT doesn't mean exactly
      that a page is present. A page is also considered preset when
      _PAGE_INVALID is set.
      
      This patch changes the meaning of "present" and adds a status "valid"
      associated to the _PAGE_PRESENT flag.
      
      Fixes: bd0dbb73
      
       ("powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid.")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      3ff38e18
    • Aravinda Prasad's avatar
      powerpc/pseries: Export raw per-CPU VPA data via debugfs · c6c26fb5
      Aravinda Prasad authored
      
      
      This patch exports the raw per-CPU VPA data via debugfs.
      A per-CPU file is created which exports the VPA data of
      that CPU to help debug some of the VPA related issues or
      to analyze the per-CPU VPA related statistics.
      
      v3: Removed offline CPU check.
      
      v2: Included offline CPU check and other review comments.
      
      Signed-off-by: default avatarAravinda Prasad <aravinda@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      c6c26fb5
    • Naveen N. Rao's avatar
      selftests/powerpc: Add test to verify rfi flush across a system call · d2bf7932
      Naveen N. Rao authored
      
      
      This adds a test to verify proper functioning of the rfi flush
      capability implemented to mitigate meltdown. The test works by
      measuring the number of L1d cache misses encountered while loading
      data from memory. Across a system call, since the L1d cache is flushed
      when rfi_flush is enabled, the number of cache misses is expected to
      be relative to the number of cachelines corresponding to the data
      being loaded.
      
      The current system setting is reflected via powerpc/rfi_flush under
      debugfs (assumed to be /sys/kernel/debug/). This test verifies the
      expected result with rfi_flush enabled as well as when it is disabled.
      
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      [mpe: Add SPDX tags, clang format, skip if the debugfs is missing, use
       __u64 and SANE_USERSPACE_TYPES to avoid printf() build errors.]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d2bf7932
    • Naveen N. Rao's avatar
      selftests/powerpc: Move UCONTEXT_NIA() into utils.h · db384851
      Naveen N. Rao authored
      
      
      ... so that it can be used by others.
      
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      db384851
    • Naveen N. Rao's avatar
      powerpc64/module elfv1: Set opd addresses after module relocation · 59fe7eaf
      Naveen N. Rao authored
      module_frob_arch_sections() is called before the module is moved to its
      final location. The function descriptor section addresses we are setting
      here are thus invalid. Fix this by processing opd section during
      module_finalize()
      
      Fixes: 5633e85b
      
       ("powerpc64: Add .opd based function descriptor dereference")
      Cc: stable@vger.kernel.org # v4.16
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      59fe7eaf
    • Naveen N. Rao's avatar
      powerpc: Add support for function error injection · 7cd01b08
      Naveen N. Rao authored
      
      
      We implement regs_set_return_value() and override_function_with_return()
      for this purpose.
      
      On powerpc, a return from a function (blr) just branches to the location
      contained in the link register. So, we can just update pt_regs rather
      than redirecting execution to a dummy function that returns.
      
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Reviewed-by: default avatarSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7cd01b08
  2. Oct 19, 2018
    • Michael Ellerman's avatar
      powerpc/time: Fix clockevent_decrementer initalisation for PR KVM · b4d16ab5
      Michael Ellerman authored
      In the recent commit 8b78fdb0 ("powerpc/time: Use
      clockevents_register_device(), fixing an issue with large
      decrementer") we changed the way we initialise the decrementer
      clockevent(s).
      
      We no longer initialise the mult & shift values of
      decrementer_clockevent itself.
      
      This has the effect of breaking PR KVM, because it uses those values
      in kvmppc_emulate_dec(). The symptom is guest kernels spin forever
      mid-way through boot.
      
      For now fix it by assigning back to decrementer_clockevent the mult
      and shift values.
      
      Fixes: 8b78fdb0
      
       ("powerpc/time: Use clockevents_register_device(), fixing an issue with large decrementer")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b4d16ab5
    • Michael Ellerman's avatar
      powerpc/aout: Fix struct user definition to use user_pt_regs · 6ce7bff0
      Michael Ellerman authored
      I'm pretty sure this is dead code, it's only used by the a.out core
      dump code, and we don't support a.out. We should remove it.
      
      But while it's in the tree it should be using the ABI version of
      pt_regs which is called user_pt_regs in the kernel, because the whole
      struct is written to the core dump and so its size shouldn't change.
      
      Note this isn't a uapi header so we don't need an ifdef.
      
      Fixes: 002af939
      
       ("powerpc: Split user/kernel definitions of struct pt_regs")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      6ce7bff0
    • Michael Ellerman's avatar
      powerpc/uapi: Fix sigcontext definition to use user_pt_regs · 22a3d03d
      Michael Ellerman authored
      My recent patch to split pt_regs between user and kernel missed
      the usage in struct sigcontext.
      
      Because this is a user visible struct it should be using the user
      visible definition, which when we're building for the kernel is called
      struct user_pt_regs.
      
      As far as I can see this hasn't actually caused a bug (yet), because
      we don't use the sizeof() the sigcontext->regs anywhere. But we should
      still fix it to avoid confusion and future bugs.
      
      Fixes: 002af939
      
       ("powerpc: Split user/kernel definitions of struct pt_regs")
      Reported-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      22a3d03d
  3. Oct 18, 2018
  4. Oct 14, 2018
    • Aneesh Kumar K.V's avatar
      powerpc/mm: Increase the max addressable memory to 2PB · 4ffe713b
      Aneesh Kumar K.V authored
      
      
      Currently we limit the max addressable memory to 128TB. This patch increase the
      limit to 2PB. We can have devices like nvdimm which adds memory above 512TB
      limit.
      
      We still don't support regular system ram above 512TB. One of the challenge with
      that is the percpu allocator, that allocates per node memory and use the max
      distance between them as the percpu offsets. This means with large gap in
      address space ( system ram above 1PB) we will run out of vmalloc space to map
      the percpu allocation.
      
      In order to support addressable memory above 512TB, kernel should be able to
      linear map this range. To do that with hash translation we now add 4 context
      to kernel linear map region. Our per context addressable range is 512TB. We
      still keep VMALLOC and VMEMMAP region to old size. SLB miss handlers is updated
      to validate these limit.
      
      We also limit this update to SPARSEMEM_VMEMMAP and SPARSEMEM_EXTREME
      
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      4ffe713b