+4
−0
+12
−15
+190
−8
Loading
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9FGRE -------------------------------- Add long jump support to fentry, so dynamically allocated trampolines like bpf trampoline can be called from fentry directly, as these trampoline addresses may be out of the range that a single bl instruction can jump to. The scheme used here is basically the same as commit b2ad54e1 ("bpf, arm64: Implement bpf_arch_text_poke() for arm64"). 1. At compile time, we use -fpatchable-function-entry=7,5 to insert 5 NOPs before function entry and 2 NOPs after function entry: NOP NOP NOP NOP NOP func: BTI C // if BTI NOP NOP The reason for inserting 5 NOPs before the function entry is that 2 NOPs are patched to LDR and BR instructions, 2 NOPs are used to store the destination jump address, and 1 NOP is used to adjust alignment to ensure the destination jump address is stored in 8-byte aligned memory, which is required by atomic store and load. 2. When there is no trampoline attached, the callsite is patched to: NOP // extra NOP if func is 8-byte aligned literal: .quad ftrace_dummy_tramp NOP // extra NOP if func is NOT 8-byte aligned literal_call: LDR X16, literal BR X16 func: BTI C // if BTI MOV X9, LR NOP 3. When long jump trampoline is attached, the callsite is patched to: NOP // extra NOP if func is 8-byte aligned literal: .quad <long-jump-trampoline> NOP // extra NOP if func is NOT 8-byte aligned literal_call: LDR X16, literal BR X16 func: BTI C // if BTI MOV X9, LR BL literal_call 4. When short jump trampoline is attached, the callsite is patched to: NOP // extra NOP if func is 8-byte aligned literal: .quad ftrace_dummy_tramp NOP // extra NOP if func is NOT 8-byte aligned literal_call: LDR X16, literal BR X16 func: BTI C // if BTI MOV X9, LR BL <short-jump-trampoline> Note that there is always a valid jump address in literal, either custom trampoline address or the dummy trampoline address, which ensures that we'll never jump from callsite to an unknown place. Also note that the callsite is only ensured to be patched atomically and securely. Whether the custom trampoline can be freed should be checked by the trampoline user. For example, bpf uses refcnt and task based rcu to ensure bpf trampoline could be freed safely. In my environment, before this patch, there are 2 NOPs inserted in function entry, and the generated vmlinux size is 463,649,280 bytes, while after this patch, the vmlinux size is 465,069,368 bytes, increased 1,420,088 bytes, about 0.3%. In vmlinux, there are 14,376 8-byte aligned functions and 41,847 unaligned functions. For each aligned function, one of the five NOPs before the function entry is unnecessary, wasting 57,504 bytes. Signed-off-by:Xu Kuohai <xukuohai@huawei.com> Signed-off-by:
Pu Lehui <pulehui@huawei.com>