Skip to content
  1. Jan 20, 2022
  2. Jan 06, 2022
    • Catalin Marinas's avatar
      Merge branches 'for-next/misc', 'for-next/cache-ops-dzp',... · 945409a6
      Catalin Marinas authored
      Merge branches 'for-next/misc', 'for-next/cache-ops-dzp', 'for-next/stacktrace', 'for-next/xor-neon', 'for-next/kasan', 'for-next/armv8_7-fp', 'for-next/atomics', 'for-next/bti', 'for-next/sve', 'for-next/kselftest' and 'for-next/kcsan', remote-tracking branch 'arm64/for-next/perf' into for-next/core
      
      * arm64/for-next/perf: (32 commits)
        arm64: perf: Don't register user access sysctl handler multiple times
        drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check
        perf/smmuv3: Fix unused variable warning when CONFIG_OF=n
        arm64: perf: Support new DT compatibles
        arm64: perf: Simplify registration boilerplate
        arm64: perf: Support Denver and Carmel PMUs
        drivers/perf: hisi: Add driver for HiSilicon PCIe PMU
        docs: perf: Add description for HiSilicon PCIe PMU driver
        dt-bindings: perf: Add YAML schemas for Marvell CN10K LLC-TAD pmu bindings
        drivers: perf: Add LLC-TAD perf counter support
        perf/smmuv3: Synthesize IIDR from CoreSight ID registers
        perf/smmuv3: Add devicetree support
        dt-bindings: Add Arm SMMUv3 PMCG binding
        perf/arm-cmn: Add debugfs topology info
        perf/arm-cmn: Add CI-700 Support
        dt-bindings: perf: arm-cmn: Add CI-700
        perf/arm-cmn: Support new IP features
        perf/arm-cmn: Demarcate CMN-600 specifics
        perf/arm-cmn: Move group validation data off-stack
        perf/arm-cmn: Optimise DTC counter accesses
        ...
      
      * for-next/misc:
        : Miscellaneous patches
        arm64: Use correct method to calculate nomap region boundaries
        arm64: Drop outdated links in comments
        arm64: errata: Fix exec handling in erratum 1418040 workaround
        arm64: Unhash early pointer print plus improve comment
        asm-generic: introduce io_stop_wc() and add implementation for ARM64
        arm64: remove __dma_*_area() aliases
        docs/arm64: delete a space from tagged-address-abi
        arm64/fp: Add comments documenting the usage of state restore functions
        arm64: mm: Use asid feature macro for cheanup
        arm64: mm: Rename asid2idx() to ctxid2asid()
        arm64: kexec: reduce calls to page_address()
        arm64: extable: remove unused ex_handler_t definition
        arm64: entry: Use SDEI event constants
        arm64: Simplify checking for populated DT
        arm64/kvm: Fix bitrotted comment for SVE handling in handle_exit.c
      
      * for-next/cache-ops-dzp:
        : Avoid DC instructions when DCZID_EL0.DZP == 1
        arm64: mte: DC {GVA,GZVA} shouldn't be used when DCZID_EL0.DZP == 1
        arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1
      
      * for-next/stacktrace:
        : Unify the arm64 unwind code
        arm64: Make some stacktrace functions private
        arm64: Make dump_backtrace() use arch_stack_walk()
        arm64: Make profile_pc() use arch_stack_walk()
        arm64: Make return_address() use arch_stack_walk()
        arm64: Make __get_wchan() use arch_stack_walk()
        arm64: Make perf_callchain_kernel() use arch_stack_walk()
        arm64: Mark __switch_to() as __sched
        arm64: Add comment for stack_info::kr_cur
        arch: Make ARCH_STACKWALK independent of STACKTRACE
      
      * for-next/xor-neon:
        : Use SHA3 instructions to speed up XOR
        arm64/xor: use EOR3 instructions when available
      
      * for-next/kasan:
        : Log potential KASAN shadow aliases
        arm64: mm: log potential KASAN shadow alias
        arm64: mm: use die_kernel_fault() in do_mem_abort()
      
      * for-next/armv8_7-fp:
        : Add HWCAPS for ARMv8.7 FEAT_AFP amd FEAT_RPRES
        arm64: cpufeature: add HWCAP for FEAT_RPRES
        arm64: add ID_AA64ISAR2_EL1 sys register
        arm64: cpufeature: add HWCAP for FEAT_AFP
      
      * for-next/atomics:
        : arm64 atomics clean-ups and codegen improvements
        arm64: atomics: lse: define RETURN ops in terms of FETCH ops
        arm64: atomics: lse: improve constraints for simple ops
        arm64: atomics: lse: define ANDs in terms of ANDNOTs
        arm64: atomics lse: define SUBs in terms of ADDs
        arm64: atomics: format whitespace consistently
      
      * for-next/bti:
        : BTI clean-ups
        arm64: Ensure that the 'bti' macro is defined where linkage.h is included
        arm64: Use BTI C directly and unconditionally
        arm64: Unconditionally override SYM_FUNC macros
        arm64: Add macro version of the BTI instruction
        arm64: ftrace: add missing BTIs
        arm64: kexec: use __pa_symbol(empty_zero_page)
        arm64: update PAC description for kernel
      
      * for-next/sve:
        : SVE code clean-ups and refactoring in prepararation of Scalable Matrix Extensions
        arm64/sve: Minor clarification of ABI documentation
        arm64/sve: Generalise vector length configuration prctl() for SME
        arm64/sve: Make sysctl interface for SVE reusable by SME
      
      * for-next/kselftest:
        : arm64 kselftest additions
        kselftest/arm64: Add pidbench for floating point syscall cases
        kselftest/arm64: Add a test program to exercise the syscall ABI
        kselftest/arm64: Allow signal tests to trigger from a function
        kselftest/arm64: Parameterise ptrace vector length information
      
      * for-next/kcsan:
        : Enable KCSAN for arm64
        arm64: Enable KCSAN
      945409a6
  3. Jan 05, 2022
  4. Jan 04, 2022
  5. Dec 22, 2021
  6. Dec 18, 2021
  7. Dec 15, 2021
  8. Dec 14, 2021
    • Will Deacon's avatar
      Merge branch 'for-next/perf-user-counter-access' into for-next/perf · 8bd09b41
      Will Deacon authored
      * for-next/perf-user-counter-access:
        Documentation: arm64: Document PMU counters access from userspace
        arm64: perf: Enable PMU counter userspace access for perf event
        arm64: perf: Add userspace counter access disable switch
        perf: Add a counter for number of user access events in context
        x86: perf: Move RDPMC event flag to a common definition
      8bd09b41
    • Will Deacon's avatar
      Merge branch 'for-next/perf-smmu' into for-next/perf · 1879a61f
      Will Deacon authored
      * for-next/perf-smmu:
        perf/smmuv3: Synthesize IIDR from CoreSight ID registers
        perf/smmuv3: Add devicetree support
        dt-bindings: Add Arm SMMUv3 PMCG binding
      1879a61f
    • Will Deacon's avatar
      Merge branch 'for-next/perf-hisi' into for-next/perf · 8330904f
      Will Deacon authored
      * for-next/perf-hisi:
        drivers/perf: hisi: Add driver for HiSilicon PCIe PMU
        docs: perf: Add description for HiSilicon PCIe PMU driver
      8330904f
    • Will Deacon's avatar
      Merge branch 'for-next/perf-cn10k' into for-next/perf · e73bc4fd
      Will Deacon authored
      * for-next/perf-cn10k:
        dt-bindings: perf: Add YAML schemas for Marvell CN10K LLC-TAD pmu bindings
        drivers: perf: Add LLC-TAD perf counter support
      e73bc4fd
    • Will Deacon's avatar
      Merge branch 'for-next/perf-cmn' into for-next/perf · fc369f92
      Will Deacon authored
      * for-next/perf-cmn:
        perf/arm-cmn: Add debugfs topology info
        perf/arm-cmn: Add CI-700 Support
        dt-bindings: perf: arm-cmn: Add CI-700
        perf/arm-cmn: Support new IP features
        perf/arm-cmn: Demarcate CMN-600 specifics
        perf/arm-cmn: Move group validation data off-stack
        perf/arm-cmn: Optimise DTC counter accesses
        perf/arm-cmn: Optimise DTM counter reads
        perf/arm-cmn: Refactor DTM handling
        perf/arm-cmn: Streamline node iteration
        perf/arm-cmn: Refactor node ID handling
        perf/arm-cmn: Drop compile-test restriction
        perf/arm-cmn: Account for NUMA affinity
        perf/arm-cmn: Fix CPU hotplug unregistration
      fc369f92
    • Mark Rutland's avatar
      arm64: atomics: lse: define RETURN ops in terms of FETCH ops · 053f58ba
      Mark Rutland authored
      
      
      The FEAT_LSE atomic instructions include LD* instructions which return
      the original value of a memory location can be used to directly
      implement FETCH opertations. Each RETURN op is implemented as a copy of
      the corresponding FETCH op with a trailing instruction to generate the
      new value of the memory location. We only directly implement
      *_fetch_add*(), for which we have a trailing `add` instruction.
      
      As the compiler has no visibility of the `add`, this leads to less than
      optimal code generation when consuming the result.
      
      For example, the compiler cannot constant-fold the addition into later
      operations, and currently GCC 11.1.0 will compile:
      
             return __lse_atomic_sub_return(1, v) == 0;
      
      As:
      
      	mov     w1, #0xffffffff
      	ldaddal w1, w2, [x0]
      	add     w1, w1, w2
      	cmp     w1, #0x0
      	cset    w0, eq  // eq = none
      	ret
      
      This patch improves this by replacing the `add` with C addition after
      the inline assembly block, e.g.
      
      	ret += i;
      
      This allows the compiler to manipulate `i`. This permits the compiler to
      merge the `add` and `cmp` for the above, e.g.
      
      	mov     w1, #0xffffffff
      	ldaddal w1, w1, [x0]
      	cmp     w1, #0x1
      	cset    w0, eq  // eq = none
      	ret
      
      With this change the assembly for each RETURN op is identical to the
      corresponding FETCH op (including barriers and clobbers) so I've removed
      the inline assembly and rewritten each RETURN op in terms of the
      corresponding FETCH op, e.g.
      
      | static inline void __lse_atomic_add_return(int i, atomic_t *v)
      | {
      |       return __lse_atomic_fetch_add(i, v) + i
      | }
      
      The new construction does not adversely affect the common case, and
      before and after this patch GCC 11.1.0 can compile:
      
      	__lse_atomic_add_return(i, v)
      
      As:
      
      	ldaddal w0, w2, [x1]
      	add     w0, w0, w2
      
      ... while having the freedom to do better elsewhere.
      
      This is intended as an optimization and cleanup.
      There should be no functional change as a result of this patch.
      
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20211210151410.2782645-6-mark.rutland@arm.com
      
      
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      053f58ba
    • Mark Rutland's avatar
      arm64: atomics: lse: improve constraints for simple ops · 8a578a75
      Mark Rutland authored
      
      
      We have overly conservative assembly constraints for the basic FEAT_LSE
      atomic instructions, and using more accurate and permissive constraints
      will allow for better code generation.
      
      The FEAT_LSE basic atomic instructions have come in two forms:
      
      	LD{op}{order}{size} <Rs>, <Rt>, [<Rn>]
      	ST{op}{order}{size} <Rs>, [<Rn>]
      
      The ST* forms are aliases of the LD* forms where:
      
      	ST{op}{order}{size} <Rs>, [<Rn>]
      Is:
      	LD{op}{order}{size} <Rs>, XZR, [<Rn>]
      
      For either form, both <Rs> and <Rn> are read but not written back to,
      and <Rt> is written with the original value of the memory location.
      Where (<Rt> == <Rs>) or (<Rt> == <Rn>), <Rt> is written *after* the
      other register value(s) are consumed. There are no UNPREDICTABLE or
      CONSTRAINED UNPREDICTABLE behaviours when any pair of <Rs>, <Rt>, or
      <Rn> are the same register.
      
      Our current inline assembly always uses <Rs> == <Rt>, treating this
      register as both an input and an output (using a '+r' constraint). This
      forces the compiler to do some unnecessary register shuffling and/or
      redundant value generation.
      
      For example, the compiler cannot reuse the <Rs> value, and currently GCC
      11.1.0 will compile:
      
      	__lse_atomic_add(1, a);
      	__lse_atomic_add(1, b);
      	__lse_atomic_add(1, c);
      
      As:
      
      	mov     w3, #0x1
      	mov     w4, w3
      	stadd   w4, [x0]
      	mov     w0, w3
      	stadd   w0, [x1]
      	stadd   w3, [x2]
      
      We can improve this with more accurate constraints, separating <Rs> and
      <Rt>, where <Rs> is an input-only register ('r'), and <Rt> is an
      output-only value ('=r'). As <Rt> is written back after <Rs> is
      consumed, it does not need to be earlyclobber ('=&r'), leaving the
      compiler free to use the same register for both <Rs> and <Rt> where this
      is desirable.
      
      At the same time, the redundant 'r' constraint for `v` is removed, as
      the `+Q` constraint is sufficient.
      
      With this change, the above example becomes:
      
      	mov     w3, #0x1
      	stadd   w3, [x0]
      	stadd   w3, [x1]
      	stadd   w3, [x2]
      
      I've made this change for the non-value-returning and FETCH ops. The
      RETURN ops have a multi-instruction sequence for which we cannot use the
      same constraints, and a subsequent patch will rewrite hte RETURN ops in
      terms of the FETCH ops, relying on the ability for the compiler to reuse
      the <Rs> value.
      
      This is intended as an optimization.
      There should be no functional change as a result of this patch.
      
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20211210151410.2782645-5-mark.rutland@arm.com
      
      
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      8a578a75
    • Mark Rutland's avatar
      arm64: atomics: lse: define ANDs in terms of ANDNOTs · 5e9e43c9
      Mark Rutland authored
      
      
      The FEAT_LSE atomic instructions include atomic bit-clear instructions
      (`ldclr*` and `stclr*`) which can be used to directly implement ANDNOT
      operations. Each AND op is implemented as a copy of the corresponding
      ANDNOT op with a leading `mvn` instruction to apply a bitwise NOT to the
      `i` argument.
      
      As the compiler has no visibility of the `mvn`, this leads to less than
      optimal code generation when generating `i` into a register. For
      example, __lse_atomic_fetch_and(0xf, v) can be compiled to:
      
      	mov     w1, #0xf
      	mvn     w1, w1
      	ldclral w1, w1, [x2]
      
      This patch improves this by replacing the `mvn` with NOT in C before the
      inline assembly block, e.g.
      
      	i = ~i;
      
      This allows the compiler to generate `i` into a register more optimally,
      e.g.
      
      	mov     w1, #0xfffffff0
      	ldclral w1, w1, [x2]
      
      With this change the assembly for each AND op is identical to the
      corresponding ANDNOT op (including barriers and clobbers), so I've
      removed the inline assembly and rewritten each AND op in terms of the
      corresponding ANDNOT op, e.g.
      
      | static inline void __lse_atomic_and(int i, atomic_t *v)
      | {
      | 	return __lse_atomic_andnot(~i, v);
      | }
      
      This is intended as an optimization and cleanup.
      There should be no functional change as a result of this patch.
      
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20211210151410.2782645-4-mark.rutland@arm.com
      
      
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      5e9e43c9