powerpc/pseries: Prevent free CPU ids being reused on another node
When a CPU is hot added, the CPU ids are taken from the available mask from the lower possible set. If that set of values was previously used for a CPU attached to a different node, it appears to an application as if these CPUs have migrated from one node to another node which is not expected. To prevent this, it is needed to record the CPU ids used for each node and to not reuse them on another node. However, to prevent CPU hot plug to fail, in the case the CPU ids is starved on a node, the capability to reuse other nodes’ free CPU ids is kept. A warning is displayed in such a case to warn the user. A new CPU bit mask (node_recorded_ids_map) is introduced for each possible node. It is populated with the CPU onlined at boot time, and then when a CPU is hot plugged to a node. The bits in that mask remain when the CPU is hot unplugged, to remind this CPU ids have been used for this node. If no id set was found, a retry is made without removing the ids used on the other nodes to try reusing them. This is the way ids have been allocated prior to this patch. The effect of this patch can be seen by removing and adding CPUs using the Qemu monitor. In the following case, the first CPU from the node 2 is removed, then the first one from the node 1 is removed too. Later, the first CPU of the node 2 is added back. Without that patch, the kernel will number these CPUs using the first CPU ids available which are the ones freed when removing the second CPU of the node 0. This leads to the CPU ids 16-23 to move from the node 1 to the node 2. With the patch applied, the CPU ids 32-39 are used since they are the lowest free ones which have not been used on another node. At boot time: [root@vm40 ~]# numactl -H | grep cpus node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 Vanilla kernel, after the CPU hot unplug/plug operations: [root@vm40 ~]# numactl -H | grep cpus node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 1 cpus: 24 25 26 27 28 29 30 31 node 2 cpus: 16 17 18 19 20 21 22 23 40 41 42 43 44 45 46 47 Patched kernel, after the CPU hot unplug/plug operations: [root@vm40 ~]# numactl -H | grep cpus node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 1 cpus: 24 25 26 27 28 29 30 31 node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210429174908.16613-1-ldufour@linux.ibm.com
Please register or sign in to comment