perf cs-etm: Split Coresight decode by aux records
Populate the auxtrace queues using AUX records rather than whole auxtrace buffers so that the decoder is reset between each aux record. This is similar to the auxtrace_queues__process_index() -> auxtrace_queues__add_indexed_event() flow where perf_session__peek_event() is used to read AUXTRACE events out of random positions in the file based on the auxtrace index. But now we loop over all PERF_RECORD_AUX events instead of AUXTRACE buffers. For each PERF_RECORD_AUX event, we find the corresponding AUXTRACE buffer using the index, and add a fragment of that buffer to the auxtrace queues. No other changes to decoding were made, apart from populating the auxtrace queues. The result of decoding is identical to before, except in cases where decoding failed completely, due to not resetting the decoder. The reason for this change is because AUX records are emitted any time tracing is disabled, for example when the process is scheduled out. Because ETM was disabled and enabled again, the decoder also needs to be reset to force the search for a sync packet. Otherwise there would be fatal decoding errors. Testing ======= Testing was done with the following script, to diff the decoding results between the patched and un-patched versions of perf: #!/bin/bash set -ex $1 script -i $3 $4 > split.script $2 script -i $3 $4 > default.script diff split.script default.script | head -n 20 And it was run like this, with various itrace options depending on the quantity of synthesised events: compare.sh ./perf-patched ./perf-default perf-per-cpu-2-threads.data --itrace=i100000ns No changes in output were observed in the following scenarios: * Simple per-cpu perf record -e cs_etm/@tmc_etr0/u top * Per-thread, single thread perf record -e cs_etm/@tmc_etr0/u --per-thread ./threads_C * Per-thread multiple threads (but only one thread collected data): perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597 * Per-thread multiple threads (both threads collected data): perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597 * Per-cpu explicit threads: perf record -e cs_etm/@tmc_etr0/u --pid 853,854 * System-wide (per-cpu): perf record -e cs_etm/@tmc_etr0/u -a * No data collected (no aux buffers) Can happen with any command when run for a short period * Containing truncated records Can happen with any command * Containing aux records with 0 size Can happen with any command * Snapshot mode (various files with and without buffer wrap) perf record -e cs_etm/@tmc_etr0/u -a --snapshot Some differences were observed in the following scenario: * Snapshot mode (with duplicate buffers) perf record -e cs_etm/@tmc_etr0/u -a --snapshot Fewer samples are generated in snapshot mode if duplicate buffers were gathered because buffers with the same offset are now only added once. This gives different, but more correct results and no duplicate data is decoded any more. Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Al Grant <al.grant@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Branislav Rankov <branislav.rankov@arm.com> Cc: Denis Nikitin <denik@chromium.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20210624164303.28632-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Please register or sign in to comment