Loading
!11950 arm64/perf: Enable branch stack sampling
Merge Pull Request from: @hejunhao3
```text
========== Perf Branch Stack Sampling Support (arm64 platforms) ===========
Currently arm64 platform does not support perf branch stack sampling. Hence
any event requesting for branch stack records i.e PERF_SAMPLE_BRANCH_STACK
marked in event->attr.sample_type, will be rejected in armpmu_event_init().
static int armpmu_event_init(struct perf_event *event)
{
........
/* does not support taken branch sampling */
if (has_branch_stack(event))
return -EOPNOTSUPP;
........
}
$perf record -j any,u,k ls
Error:
cycles:P: PMU Hardware or event type doesn't support branch stack sampling.
-------------------- CONFIG_ARM64_BRBE and FEAT_BRBE ----------------------
After this series, perf branch stack sampling feature gets enabled on arm64
platforms where FEAT_BRBE HW feature is supported, and CONFIG_ARM64_BRBE is
also selected during build. Let's observe all all possible scenarios here.
1. Feature not built (!CONFIG_ARM64_BRBE):
Falls back to the current behaviour i.e event gets rejected.
2. Feature built but HW not supported (CONFIG_ARM64_BRBE && !FEAT_BRBE):
Falls back to the current behaviour i.e event gets rejected.
3. Feature built and HW supported (CONFIG_ARM64_BRBE && FEAT_BRBE):
Platform supports branch stack sampling requests. Let's observe through a
simple example here.
$perf record -j any_call,u,k,save_type ls
[Please refer perf-record man pages for all possible branch filter options]
$perf report
-------------------------- Snip ----------------------
# Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles
# ........ ....... .................... ............................................ ............................................ ..................
#
3.52% ls [kernel.kallsyms] [k] sched_clock_noinstr [k] arch_counter_get_cntpct 16
3.52% ls [kernel.kallsyms] [k] sched_clock [k] sched_clock_noinstr 9
1.85% ls [kernel.kallsyms] [k] sched_clock_cpu [k] sched_clock 5
1.80% ls [kernel.kallsyms] [k] irqtime_account_irq [k] sched_clock_cpu 20
1.58% ls [kernel.kallsyms] [k] gic_handle_irq [k] generic_handle_domain_irq 19
1.58% ls [kernel.kallsyms] [k] call_on_irq_stack [k] gic_handle_irq 9
1.58% ls [kernel.kallsyms] [k] do_interrupt_handler [k] call_on_irq_stack 23
1.58% ls [kernel.kallsyms] [k] generic_handle_domain_irq [k] __irq_resolve_mapping 6
1.58% ls [kernel.kallsyms] [k] __irq_resolve_mapping [k] __rcu_read_lock 10
-------------------------- Snip ----------------------
$perf report -D | grep cycles
-------------------------- Snip ----------------------
..... 1: ffff800080dd3334 -> ffff800080dd759c 39 cycles P 0 IND_CALL
..... 2: ffff800080ffaea0 -> ffff800080ffb688 16 cycles P 0 IND_CALL
..... 3: ffff800080139918 -> ffff800080ffae64 9 cycles P 0 CALL
..... 4: ffff800080dd3324 -> ffff8000801398f8 7 cycles P 0 CALL
..... 5: ffff8000800f8548 -> ffff800080dd330c 21 cycles P 0 IND_CALL
..... 6: ffff8000800f864c -> ffff8000800f84ec 6 cycles P 0 CALL
..... 7: ffff8000800f86dc -> ffff8000800f8638 11 cycles P 0 CALL
..... 8: ffff8000800f86d4 -> ffff800081008630 16 cycles P 0 CALL
-------------------------- Snip ----------------------
perf script and other tooling can also be applied on the captured perf.data
Similarly branch stack sampling records can be collected via direct system
call i.e perf_event_open() method after setting 'struct perf_event_attr' as
required.
event->attr.sample_type |= PERF_SAMPLE_BRANCH_STACK
event->attr.branch_sample_type |= PERF_SAMPLE_BRANCH_<FILTER_1> |
PERF_SAMPLE_BRANCH_<FILTER_2> |
PERF_SAMPLE_BRANCH_<FILTER_3> |
...............................
But all branch filters might not be supported on the platform.
----------------------- BRBE Branch Filters Support -----------------------
- Following branch filters are supported on arm64.
PERF_SAMPLE_BRANCH_USER /* Branch privilege filters */
PERF_SAMPLE_BRANCH_KERNEL
PERF_SAMPLE_BRANCH_HV
PERF_SAMPLE_BRANCH_ANY /* Branch type filters */
PERF_SAMPLE_BRANCH_ANY_CALL
PERF_SAMPLE_BRANCH_ANY_RETURN
PERF_SAMPLE_BRANCH_IND_CALL
PERF_SAMPLE_BRANCH_COND
PERF_SAMPLE_BRANCH_IND_JUMP
PERF_SAMPLE_BRANCH_CALL
PERF_SAMPLE_BRANCH_NO_FLAGS /* Branch record flags */
PERF_SAMPLE_BRANCH_NO_CYCLES
PERF_SAMPLE_BRANCH_TYPE_SAVE
PERF_SAMPLE_BRANCH_HW_INDEX
PERF_SAMPLE_BRANCH_PRIV_SAVE
- Following branch filters are not supported on arm64.
PERF_SAMPLE_BRANCH_ABORT_TX
PERF_SAMPLE_BRANCH_IN_TX
PERF_SAMPLE_BRANCH_NO_TX
PERF_SAMPLE_BRANCH_CALL_STACK
Events requesting above non-supported branch filters get rejected.
--------------------------- Virtualisation support ------------------------
- No guest support
-------------------------------- Testing ---------------------------------
- Cross compiled for both arm64 and arm32 platforms
- Passes all branch tests with 'perf test branch' on arm64
```
Link:https://gitee.com/openeuler/kernel/pulls/11950
Reviewed-by:
Xu Kuohai <xukuohai@huawei.com>
Reviewed-by:
Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by:
Liu Chao <liuchao173@huawei.com>
Reviewed-by:
Zhang Jianhua <chris.zjh@huawei.com>
Signed-off-by:
Zhang Peng <zhangpeng362@huawei.com>