powerpc/64s/radix: Improve TLB flushing for page table freeing
Unmaps that free page tables always flush the entire PID, which is sub-optimal. Provide TLB range flushing with an additional PWC flush that can be use for va range invalidations with PWC flush. Time to munmap N pages of memory including last level page table teardown (after mmap, touch), local invalidate: N 1 2 4 8 16 32 64 vanilla 3.2us 3.3us 3.4us 3.6us 4.1us 5.2us 7.2us patched 1.4us 1.5us 1.7us 1.9us 2.6us 3.7us 6.2us Global invalidate: N 1 2 4 8 16 32 64 vanilla 2.2us 2.3us 2.4us 2.6us 3.2us 4.1us 6.2us patched 2.1us 2.5us 3.4us 5.2us 8.7us 15.7us 6.2us Local invalidates get much better across the board. Global ones have the same issue where multiple tlbies for va flush do get slower than the single tlbie to invalidate the PID. None of this test captures the TLB benefits of avoiding killing everything. Global gets worse, but it is brought in to line with global invalidate for munmap()s that do not free page tables. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Please register or sign in to comment