Unverified Commit bad6d571 authored by openeuler-ci-bot's avatar openeuler-ci-bot Committed by Gitee
Browse files

!11432 v2 mm/block: add bdi sysfs knobs

Merge Pull Request from: @ci-robot 
 
PR sync from: Yifan Qiao <qiaoyifan4@huawei.com>
https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/E74UXPOBLHDTSIHSMW7Q2Y4QYEH7GN55/ 
At meta network block devices (nbd) are used to implement remote block
storage. In testing and during production it has been observed that
these network block devices can consume a huge portion of the dirty
writeback cache and writeback can take a considerable time.

To be able to give stricter limits, I'm proposing the following changes:

introduce strictlimit knob
Currently the max_ratio knob exists to limit the dirty_memory. However
this knob only applies once (dirty_ratio + dirty_background_ratio) / 2
has been reached.
With the BDI_CAP_STRICTLIMIT flag, the max_ratio can be applied without
reaching that limit. This change exposes that knob.

This knob can also be useful for NFS, fuse filesystems and USB devices.

Use part of 1000000 internal calculation
The max_ratio is based on percentage. With the current machine sizes
percentage values can be very high (1% of a 256GB main memory is already
2.5GB). This change uses part of 1000000 instead of percentages for the
internal calculations.

Introduce two new sysfs knobs: min_bytes and max_bytes.
Currently all calculations are based on ratio, but for a user it often
more convenient to specify a limit in bytes. The new knobs will not
store bytes values, instead they will translate the byte value to a
corresponding ratio. As the internal values are now part of 1000, the
ratio is closer to the specified value. However the value should be more
seen as an approximation as it can fluctuate over time.

Introduce two new sysfs knobs: min_ratio_fine and max_ratio_fine.
The granularity for the existing sysfs bdi knobs min_ratio and max_ratio
is based on percentage values. The new sysfs bdi knobs min_ratio_fine
and max_ratio_fine allow to specify the ratio as part of 1 million.

Chen Wandun (1):
  mm: rework calculation of bdi_min_ratio in bdi_set_min_ratio

Jingbo Xu (2):
  mm: fix arithmetic for max_prop_frac when setting max_ratio
  mm: fix arithmetic for bdi min_ratio

Stefan Roesch (20):
  mm: add bdi_set_strict_limit() function
  mm: add knob /sys/class/bdi/<bdi>/strict_limit
  mm: document /sys/class/bdi/<bdi>/strict_limit knob
  mm: use part per 1000000 for bdi ratios
  mm: add bdi_get_max_bytes() function
  mm: split off __bdi_set_max_ratio() function
  mm: add bdi_set_max_bytes() function
  mm: add knob /sys/class/bdi/<bdi>/max_bytes
  mm: document /sys/class/bdi/<bdi>/max_bytes knob
  mm: add bdi_get_min_bytes() function
  mm: split off __bdi_set_min_ratio() function
  mm: add bdi_set_min_bytes() function
  mm: add /sys/class/bdi/<bdi>/min_bytes knob
  mm: document /sys/class/bdi/<bdi>/min_bytes knob
  mm: add bdi_set_max_ratio_no_scale() function
  mm: add /sys/class/bdi/<bdi>/max_ratio_fine knob
  mm: document /sys/class/bdi/<bdi>/max_ratio_fine knob
  mm: add bdi_set_min_ratio_no_scale() function
  mm: add /sys/class/bdi/<bdi>/min_ratio_fine knob
  mm: document /sys/class/bdi/<bdi>/min_ratio_fine knob


-- 
2.39.2
 
https://gitee.com/src-openeuler/kernel/issues/IAN96I 
 
Link:https://gitee.com/openeuler/kernel/pulls/11432

 

Reviewed-by: default avatarzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
parents 9a3c16d6 17ef6b20
Loading
Loading
Loading
Loading
+48 −0
Original line number Diff line number Diff line
@@ -39,6 +39,17 @@ min_ratio (read-write)
	percentage of the write-back cache to a particular device.
	For example, this is useful for providing a minimum QoS.

min_ratio_fine (read-write)

	Under normal circumstances each device is given a part of the
	total write-back cache that relates to its current average
	writeout speed in relation to the other devices.

	The 'min_ratio_fine' parameter allows assigning a minimum reserve
	of the write-back cache to a particular device. The value is
	expressed as part of 1 million. For example, this is useful for
	providing a minimum QoS.

max_ratio (read-write)

	Allows limiting a particular device to use not more than the
@@ -48,6 +59,43 @@ max_ratio (read-write)
	mount that is prone to get stuck, or a FUSE mount which cannot
	be trusted to play fair.

max_ratio_fine (read-write)

	Allows limiting a particular device to use not more than the
	given value of the write-back cache.  The value is given as part
	of 1 million. This is useful in situations where we want to avoid
	one device taking all or most of the write-back cache.  For example
	in case of an NFS mount that is prone to get stuck, or a FUSE mount
	which cannot be trusted to play fair.

min_bytes (read-write)

	Under normal circumstances each device is given a part of the
	total write-back cache that relates to its current average
	writeout speed in relation to the other devices.

	The 'min_bytes' parameter allows assigning a minimum
	percentage of the write-back cache to a particular device
	expressed in bytes.
	For example, this is useful for providing a minimum QoS.

max_bytes (read-write)

	Allows limiting a particular device to use not more than the
	given 'max_bytes' of the write-back cache.  This is useful in
	situations where we want to avoid one device taking all or
	most of the write-back cache.  For example in case of an NFS
	mount that is prone to get stuck, a FUSE mount which cannot be
	trusted to play fair, or a nbd device.

strict_limit (read-write)

	Forces per-BDI checks for the share of given device in the write-back
	cache even before the global background dirty limit is reached. This
	is useful in situations where the global limit is much higher than
	affordable for given relatively slow (or untrusted) device. Turning
	strictlimit on has no visible effect if max_ratio is equal to 100%.

stable_pages_required (read-only)

	If set, the backing device requires that all pages comprising a write
+10 −0
Original line number Diff line number Diff line
@@ -104,8 +104,18 @@ static inline unsigned long wb_stat_error(void)
#endif
}

/* BDI ratio is expressed as part per 1000000 for finer granularity. */
#define BDI_RATIO_SCALE 10000

u64 bdi_get_min_bytes(struct backing_dev_info *bdi);
u64 bdi_get_max_bytes(struct backing_dev_info *bdi);
int bdi_set_min_ratio(struct backing_dev_info *bdi, unsigned int min_ratio);
int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ratio);
int bdi_set_min_ratio_no_scale(struct backing_dev_info *bdi, unsigned int min_ratio);
int bdi_set_max_ratio_no_scale(struct backing_dev_info *bdi, unsigned int max_ratio);
int bdi_set_min_bytes(struct backing_dev_info *bdi, u64 min_bytes);
int bdi_set_max_bytes(struct backing_dev_info *bdi, u64 max_bytes);
int bdi_set_strict_limit(struct backing_dev_info *bdi, unsigned int strict_limit);

/*
 * Flags in backing_dev_info::capability
+130 −3
Original line number Diff line number Diff line
@@ -176,7 +176,26 @@ static ssize_t min_ratio_store(struct device *dev,

	return ret;
}
BDI_SHOW(min_ratio, bdi->min_ratio)
BDI_SHOW(min_ratio, bdi->min_ratio / BDI_RATIO_SCALE)

static ssize_t min_ratio_fine_store(struct device *dev,
		struct device_attribute *attr, const char *buf, size_t count)
{
	struct backing_dev_info *bdi = dev_get_drvdata(dev);
	unsigned int ratio;
	ssize_t ret;

	ret = kstrtouint(buf, 10, &ratio);
	if (ret < 0)
		return ret;

	ret = bdi_set_min_ratio_no_scale(bdi, ratio);
	if (!ret)
		ret = count;

	return ret;
}
BDI_SHOW(min_ratio_fine, bdi->min_ratio)

static ssize_t max_ratio_store(struct device *dev,
		struct device_attribute *attr, const char *buf, size_t count)
@@ -195,7 +214,82 @@ static ssize_t max_ratio_store(struct device *dev,

	return ret;
}
BDI_SHOW(max_ratio, bdi->max_ratio)
BDI_SHOW(max_ratio, bdi->max_ratio / BDI_RATIO_SCALE)

static ssize_t max_ratio_fine_store(struct device *dev,
		struct device_attribute *attr, const char *buf, size_t count)
{
	struct backing_dev_info *bdi = dev_get_drvdata(dev);
	unsigned int ratio;
	ssize_t ret;

	ret = kstrtouint(buf, 10, &ratio);
	if (ret < 0)
		return ret;

	ret = bdi_set_max_ratio_no_scale(bdi, ratio);
	if (!ret)
		ret = count;

	return ret;
}
BDI_SHOW(max_ratio_fine, bdi->max_ratio)

static ssize_t min_bytes_show(struct device *dev,
			      struct device_attribute *attr,
			      char *buf)
{
	struct backing_dev_info *bdi = dev_get_drvdata(dev);

	return sysfs_emit(buf, "%llu\n", bdi_get_min_bytes(bdi));
}

static ssize_t min_bytes_store(struct device *dev,
		struct device_attribute *attr, const char *buf, size_t count)
{
	struct backing_dev_info *bdi = dev_get_drvdata(dev);
	u64 bytes;
	ssize_t ret;

	ret = kstrtoull(buf, 10, &bytes);
	if (ret < 0)
		return ret;

	ret = bdi_set_min_bytes(bdi, bytes);
	if (!ret)
		ret = count;

	return ret;
}
DEVICE_ATTR_RW(min_bytes);

static ssize_t max_bytes_show(struct device *dev,
			      struct device_attribute *attr,
			      char *buf)
{
	struct backing_dev_info *bdi = dev_get_drvdata(dev);

	return sysfs_emit(buf, "%llu\n", bdi_get_max_bytes(bdi));
}

static ssize_t max_bytes_store(struct device *dev,
		struct device_attribute *attr, const char *buf, size_t count)
{
	struct backing_dev_info *bdi = dev_get_drvdata(dev);
	u64 bytes;
	ssize_t ret;

	ret = kstrtoull(buf, 10, &bytes);
	if (ret < 0)
		return ret;

	ret = bdi_set_max_bytes(bdi, bytes);
	if (!ret)
		ret = count;

	return ret;
}
DEVICE_ATTR_RW(max_bytes);

static ssize_t stable_pages_required_show(struct device *dev,
					  struct device_attribute *attr,
@@ -207,11 +301,44 @@ static ssize_t stable_pages_required_show(struct device *dev,
}
static DEVICE_ATTR_RO(stable_pages_required);

static ssize_t strict_limit_store(struct device *dev,
		struct device_attribute *attr, const char *buf, size_t count)
{
	struct backing_dev_info *bdi = dev_get_drvdata(dev);
	unsigned int strict_limit;
	ssize_t ret;

	ret = kstrtouint(buf, 10, &strict_limit);
	if (ret < 0)
		return ret;

	ret = bdi_set_strict_limit(bdi, strict_limit);
	if (!ret)
		ret = count;

	return ret;
}

static ssize_t strict_limit_show(struct device *dev,
		struct device_attribute *attr, char *buf)
{
	struct backing_dev_info *bdi = dev_get_drvdata(dev);

	return sysfs_emit(buf, "%d\n",
			!!(bdi->capabilities & BDI_CAP_STRICTLIMIT));
}
static DEVICE_ATTR_RW(strict_limit);

static struct attribute *bdi_dev_attrs[] = {
	&dev_attr_read_ahead_kb.attr,
	&dev_attr_min_ratio.attr,
	&dev_attr_min_ratio_fine.attr,
	&dev_attr_max_ratio.attr,
	&dev_attr_max_ratio_fine.attr,
	&dev_attr_min_bytes.attr,
	&dev_attr_max_bytes.attr,
	&dev_attr_stable_pages_required.attr,
	&dev_attr_strict_limit.attr,
	NULL,
};
ATTRIBUTE_GROUPS(bdi_dev);
@@ -780,7 +907,7 @@ static int bdi_init(struct backing_dev_info *bdi)

	kref_init(&bdi->refcnt);
	bdi->min_ratio = 0;
	bdi->max_ratio = 100;
	bdi->max_ratio = 100 * BDI_RATIO_SCALE;
	bdi->max_prop_frac = FPROP_FRAC_BASE;
	INIT_LIST_HEAD(&bdi->bdi_list);
	INIT_LIST_HEAD(&bdi->wb_list);
+133 −14
Original line number Diff line number Diff line
@@ -13,6 +13,7 @@
 */

#include <linux/kernel.h>
#include <linux/math64.h>
#include <linux/export.h>
#include <linux/spinlock.h>
#include <linux/fs.h>
@@ -198,7 +199,7 @@ static void wb_min_max_ratio(struct bdi_writeback *wb,
			min *= this_bw;
			min = div64_ul(min, tot_bw);
		}
		if (max < 100) {
		if (max < 100 * BDI_RATIO_SCALE) {
			max *= this_bw;
			max = div64_ul(max, tot_bw);
		}
@@ -683,32 +684,76 @@ void wb_domain_exit(struct wb_domain *dom)
 */
static unsigned int bdi_min_ratio;

int bdi_set_min_ratio(struct backing_dev_info *bdi, unsigned int min_ratio)
static int bdi_check_pages_limit(unsigned long pages)
{
	unsigned long max_dirty_pages = global_dirtyable_memory();

	if (pages > max_dirty_pages)
		return -EINVAL;

	return 0;
}

static unsigned long bdi_ratio_from_pages(unsigned long pages)
{
	unsigned long background_thresh;
	unsigned long dirty_thresh;
	unsigned long ratio;

	global_dirty_limits(&background_thresh, &dirty_thresh);
	ratio = div64_u64(pages * 100ULL * BDI_RATIO_SCALE, dirty_thresh);

	return ratio;
}

static u64 bdi_get_bytes(unsigned int ratio)
{
	unsigned long background_thresh;
	unsigned long dirty_thresh;
	u64 bytes;

	global_dirty_limits(&background_thresh, &dirty_thresh);
	bytes = (dirty_thresh * PAGE_SIZE * ratio) / BDI_RATIO_SCALE / 100;

	return bytes;
}

static int __bdi_set_min_ratio(struct backing_dev_info *bdi, unsigned int min_ratio)
{
	unsigned int delta;
	int ret = 0;

	if (min_ratio > 100 * BDI_RATIO_SCALE)
		return -EINVAL;

	spin_lock_bh(&bdi_lock);
	if (min_ratio > bdi->max_ratio) {
		ret = -EINVAL;
	} else {
		min_ratio -= bdi->min_ratio;
		if (bdi_min_ratio + min_ratio < 100) {
			bdi_min_ratio += min_ratio;
			bdi->min_ratio += min_ratio;
		if (min_ratio < bdi->min_ratio) {
			delta = bdi->min_ratio - min_ratio;
			bdi_min_ratio -= delta;
			bdi->min_ratio = min_ratio;
		} else {
			delta = min_ratio - bdi->min_ratio;
			if (bdi_min_ratio + delta < 100 * BDI_RATIO_SCALE) {
				bdi_min_ratio += delta;
				bdi->min_ratio = min_ratio;
			} else {
				ret = -EINVAL;
			}
		}
	}
	spin_unlock_bh(&bdi_lock);

	return ret;
}

int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned max_ratio)
static int __bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ratio)
{
	int ret = 0;

	if (max_ratio > 100)
	if (max_ratio > 100 * BDI_RATIO_SCALE)
		return -EINVAL;

	spin_lock_bh(&bdi_lock);
@@ -716,14 +761,88 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned max_ratio)
		ret = -EINVAL;
	} else {
		bdi->max_ratio = max_ratio;
		bdi->max_prop_frac = (FPROP_FRAC_BASE * max_ratio) / 100;
		bdi->max_prop_frac = (FPROP_FRAC_BASE * max_ratio) /
						(100 * BDI_RATIO_SCALE);
	}
	spin_unlock_bh(&bdi_lock);

	return ret;
}

int bdi_set_min_ratio_no_scale(struct backing_dev_info *bdi, unsigned int min_ratio)
{
	return __bdi_set_min_ratio(bdi, min_ratio);
}

int bdi_set_max_ratio_no_scale(struct backing_dev_info *bdi, unsigned int max_ratio)
{
	return __bdi_set_max_ratio(bdi, max_ratio);
}

int bdi_set_min_ratio(struct backing_dev_info *bdi, unsigned int min_ratio)
{
	return __bdi_set_min_ratio(bdi, min_ratio * BDI_RATIO_SCALE);
}

int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ratio)
{
	return __bdi_set_max_ratio(bdi, max_ratio * BDI_RATIO_SCALE);
}
EXPORT_SYMBOL(bdi_set_max_ratio);

u64 bdi_get_min_bytes(struct backing_dev_info *bdi)
{
	return bdi_get_bytes(bdi->min_ratio);
}

int bdi_set_min_bytes(struct backing_dev_info *bdi, u64 min_bytes)
{
	int ret;
	unsigned long pages = min_bytes >> PAGE_SHIFT;
	unsigned long min_ratio;

	ret = bdi_check_pages_limit(pages);
	if (ret)
		return ret;

	min_ratio = bdi_ratio_from_pages(pages);
	return __bdi_set_min_ratio(bdi, min_ratio);
}

u64 bdi_get_max_bytes(struct backing_dev_info *bdi)
{
	return bdi_get_bytes(bdi->max_ratio);
}

int bdi_set_max_bytes(struct backing_dev_info *bdi, u64 max_bytes)
{
	int ret;
	unsigned long pages = max_bytes >> PAGE_SHIFT;
	unsigned long max_ratio;

	ret = bdi_check_pages_limit(pages);
	if (ret)
		return ret;

	max_ratio = bdi_ratio_from_pages(pages);
	return __bdi_set_max_ratio(bdi, max_ratio);
}

int bdi_set_strict_limit(struct backing_dev_info *bdi, unsigned int strict_limit)
{
	if (strict_limit > 1)
		return -EINVAL;

	spin_lock_bh(&bdi_lock);
	if (strict_limit)
		bdi->capabilities |= BDI_CAP_STRICTLIMIT;
	else
		bdi->capabilities &= ~BDI_CAP_STRICTLIMIT;
	spin_unlock_bh(&bdi_lock);

	return 0;
}

static unsigned long dirty_freerun_ceiling(unsigned long thresh,
					   unsigned long bg_thresh)
{
@@ -786,15 +905,15 @@ static unsigned long __wb_calc_thresh(struct dirty_throttle_control *dtc)
	fprop_fraction_percpu(&dom->completions, dtc->wb_completions,
			      &numerator, &denominator);

	wb_thresh = (thresh * (100 - bdi_min_ratio)) / 100;
	wb_thresh = (thresh * (100 * BDI_RATIO_SCALE - bdi_min_ratio)) / (100 * BDI_RATIO_SCALE);
	wb_thresh *= numerator;
	wb_thresh = div64_ul(wb_thresh, denominator);

	wb_min_max_ratio(dtc->wb, &wb_min_ratio, &wb_max_ratio);

	wb_thresh += (thresh * wb_min_ratio) / 100;
	if (wb_thresh > (thresh * wb_max_ratio) / 100)
		wb_thresh = thresh * wb_max_ratio / 100;
	wb_thresh += (thresh * wb_min_ratio) / (100 * BDI_RATIO_SCALE);
	if (wb_thresh > (thresh * wb_max_ratio) / (100 * BDI_RATIO_SCALE))
		wb_thresh = thresh * wb_max_ratio / (100 * BDI_RATIO_SCALE);

	return wb_thresh;
}