Unverified Commit e053d517 authored by openeuler-ci-bot's avatar openeuler-ci-bot Committed by Gitee
Browse files

!5854 [OLK-6.6] Make Cluster Scheduling Configurable

Merge Pull Request from: @liujie-248683921 
 
Cluster scheduling domain was introduced in 5.16 to help even out load
between the clusters. In a last level cache, there can be multiple 
clusters, with each cluster having its own resources and multiple CPUs
in it. With cluster scheduling, contention on cluster resource (e.g. L2
cache) can be reduced for better performance.

These patches made cluster scheduling configurable at run time and
boot time.  When system is moderately loaded, it is worthwhile to do the
extra load balancing to balance out load between the clusters to reduce
contention on cluster resources (e.g. L2 cache).  If the system is
fully utilized, load balancing among cluster is unlikely going to help
to reduce contention of resources a cluster as the cluster
is fully busy.

On a Jacobsville system with 24 Atom cores, where 4 Atom core per cluster
share an L2, we ran the mcf benchmark from very low load of 1 benchmark
copy to 24 benchmark copies on the 24 CPUs system.  We see that
throughput is boosted for medium load but there is little improvement
from cluster scheduling when the system is fully loaded.

     Improvement over baseline kernel for mcf_r
     copies         run time        base rate
     1              -0.1%           -0.2%
     6              25.1%           25.1%
     12             18.8%           19.0%
     24             0.3%            0.3%

If the system is expected to operate close to full utilization, the sys
admin could choose to turn off the cluster feature to reduce scheduler
overhead from load balancing at the cluster level.

Cluster scheduling is disabled by default for x86 hybrid CPUs in the
last patch of this series. For such asymmetric system, the system
should rely strictly on CPU priority to determine the order
of task scheduling.

https://gitee.com/openeuler/kernel/issues/I9F5WO 
 
Link:https://gitee.com/openeuler/kernel/pulls/5854

 

Reviewed-by: default avatarZucheng Zheng <zhengzucheng@huawei.com>
Reviewed-by: default avatarWei Li <liwei391@huawei.com>
Reviewed-by: default avatarWeilong Chen <chenweilong@huawei.com>
Signed-off-by: default avatarZheng Zengkai <zhengzengkai@huawei.com>
parents d69283f1 35c0a227
Loading
Loading
Loading
Loading
+4 −0
Original line number Diff line number Diff line
@@ -5768,6 +5768,10 @@

	sched_verbose	[KNL] Enables verbose scheduler debug messages.

	sched_cluster=  Enable or disable cluster scheduling.
			0 -- disable.
			1 -- enable.

	schedstats=	[KNL,X86] Enable or disable scheduled statistics.
			Allowed values are enable and disable. This feature
			incurs a small amount of overhead in the scheduler
+8 −0
Original line number Diff line number Diff line
@@ -60,6 +60,7 @@
#include <linux/stackprotector.h>
#include <linux/cpuhotplug.h>
#include <linux/mc146818rtc.h>
#include <linux/cpuset.h>

#include <asm/acpi.h>
#include <asm/cacheinfo.h>
@@ -144,6 +145,13 @@ int arch_update_cpu_topology(void)
	return retval;
}

void arch_rebuild_cpu_topology(void)
{
	x86_topology_update = true;
	rebuild_sched_domains();
	x86_topology_update = false;
}

static unsigned int smpboot_warm_reset_vector_count;

static inline void smpboot_setup_warm_reset_vector(unsigned long start_eip)
+9 −4
Original line number Diff line number Diff line
@@ -260,16 +260,21 @@ int topology_update_cpu_topology(void)
	return update_topology;
}

void __weak arch_rebuild_cpu_topology(void)
{
	update_topology = 1;
	rebuild_sched_domains();
	pr_debug("sched_domain hierarchy rebuilt, flags updated\n");
	update_topology = 0;
}

/*
 * Updating the sched_domains can't be done directly from cpufreq callbacks
 * due to locking, so queue the work for later.
 */
static void update_topology_flags_workfn(struct work_struct *work)
{
	update_topology = 1;
	rebuild_sched_domains();
	pr_debug("sched_domain hierarchy rebuilt, flags updated\n");
	update_topology = 0;
	arch_rebuild_cpu_topology();
}

static u32 *raw_capacity;
+6 −0
Original line number Diff line number Diff line
@@ -29,4 +29,10 @@ extern int sysctl_numa_balancing_mode;
#define sysctl_numa_balancing_mode	0
#endif

#ifdef CONFIG_SCHED_CLUSTER
extern unsigned int sysctl_sched_cluster;
int sched_cluster_handler(struct ctl_table *table, int write,
			  void *buffer, size_t *lenp, loff_t *ppos);
#endif

#endif /* _LINUX_SCHED_SYSCTL_H */
+1 −0
Original line number Diff line number Diff line
@@ -188,6 +188,7 @@ typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
typedef int (*sched_domain_flags_f)(void);

#define SDTL_OVERLAP	0x01
#define SDTL_SKIP	0x02

struct sd_data {
	struct sched_domain *__percpu *sd;
Loading