Commit 55b98837 authored by Paolo Abeni's avatar Paolo Abeni
Browse files

Merge branch 'vsock-update-tools-and-error-handling'

Arseniy Krasnov says:

====================
vsock: update tools and error handling

Patchset consists of two parts:

1) Kernel patch
One patch from Bobby Eshleman. I took single patch from Bobby:
https://lore.kernel.org/lkml/d81818b868216c774613dd03641fcfe63cc55a45
.1660362668.git.bobby.eshleman@bytedance.com/ and use only part for
af_vsock.c, as VMCI and Hyper-V parts were rejected.

I used it, because for SOCK_SEQPACKET big messages handling was broken -
ENOMEM was returned instead of EMSGSIZE. And anyway, current logic which
always replaces any error code returned by transport to ENOMEM looks
strange for me also(for example in EMSGSIZE case it was changed to
ENOMEM).

2) Tool patches
Since there is work on several significant updates for vsock(virtio/
vsock especially): skbuff, DGRAM, zerocopy rx/tx, so I think that this
patchset will be useful.

This patchset updates vsock tests and tools a little bit. First of all
it updates test suite: two new tests are added. One test is reworked
message bound test. Now it is more complex. Instead of sending 1 byte
messages with one MSG_EOR bit, it sends messages of random length(one
half of messages are smaller than page size, second half are bigger)
with random number of MSG_EOR bits set. Receiver also don't know total
number of messages. Message bounds control is maintained by hash sum
of messages length calculation. Second test is for SOCK_SEQPACKET - it
tries to send message with length more than allowed. I think both tests
will be useful for DGRAM support also.

Third thing that this patchset adds is small utility to test vsock
performance for both rx and tx. I think this util could be useful as
'iperf'/'uperf', because:
1) It is small comparing to 'iperf' or 'uperf', so it very easy to add
   new mode or feature to it(especially vsock specific).
2) It allows to set SO_RCVLOWAT and SO_VM_SOCKETS_BUFFER_SIZE option.
   Whole throughtput depends on both parameters.
3) It is located in the kernel source tree, so it could be updated by
   the same patchset which changes related kernel functionality in vsock.

I used this util very often to check performance of my rx zerocopy
support(this tool has rx zerocopy support, but not in this patchset).

Here is comparison of outputs from three utils: 'iperf', 'uperf' and
'vsock_perf'. In all three cases sender was at guest side. rx and
tx buffers were always 64Kb(because by default 'uperf' uses 8K).

iperf:

   [ ID] Interval           Transfer     Bitrate
   [  5]   0.00-10.00  sec  12.8 GBytes  11.0 Gbits/sec sender
   [  5]   0.00-10.00  sec  12.8 GBytes  11.0 Gbits/sec receiver

uperf:

   Total     16.27GB /  11.36(s) =    12.30Gb/s       23455op/s

vsock_perf:

   tx performance: 12.301529 Gbits/s
   rx performance: 12.288011 Gbits/s

Results are almost same in all three cases.

Patchset was rebased and tested on skbuff v9 patch from Bobby Eshleman:
https://lore.kernel.org/netdev/20230107002937.899605-1-bobby.eshleman@bytedance.com/

====================

Link: https://lore.kernel.org/r/67cd2d0a-1c58-baac-7b39-b8d4ea44f719@sberdevices.ru


Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
parents d4f12a82 8abbffd2
Loading
Loading
Loading
Loading
+2 −1
Original line number Diff line number Diff line
@@ -1861,8 +1861,9 @@ static int vsock_connectible_sendmsg(struct socket *sock, struct msghdr *msg,
			written = transport->stream_enqueue(vsk,
					msg, len - total_written);
		}

		if (written < 0) {
			err = -ENOMEM;
			err = written;
			goto out_err;
		}

+2 −1
Original line number Diff line number Diff line
# SPDX-License-Identifier: GPL-2.0-only
all: test
all: test vsock_perf
test: vsock_test vsock_diag_test
vsock_test: vsock_test.o timeout.o control.o util.o
vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
vsock_perf: vsock_perf.o

CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -D_GNU_SOURCE
.PHONY: all test clean
+34 −0
Original line number Diff line number Diff line
@@ -35,3 +35,37 @@ Invoke test binaries in both directions as follows:
                       --control-port=$GUEST_IP \
                       --control-port=1234 \
                       --peer-cid=3

vsock_perf utility
-------------------
'vsock_perf' is a simple tool to measure vsock performance. It works in
sender/receiver modes: sender connect to peer at the specified port and
starts data transmission to the receiver. After data processing is done,
it prints several metrics(see below).

Usage:
# run as sender
# connect to CID 2, port 1234, send 1G of data, tx buf size is 1M
./vsock_perf --sender 2 --port 1234 --bytes 1G --buf-size 1M

Output:
tx performance: A Gbits/s

Output explanation:
A is calculated as "number of bits to send" / "time in tx loop"

# run as receiver
# listen port 1234, rx buf size is 1M, socket buf size is 1G, SO_RCVLOWAT is 64K
./vsock_perf --port 1234 --buf-size 1M --vsk-size 1G --rcvlowat 64K

Output:
rx performance: A Gbits/s
total in 'read()': B sec
POLLIN wakeups: C
average in 'read()': D ns

Output explanation:
A is calculated as "number of received bits" / "time in rx loop".
B is time, spent in 'read()' system call(excluding 'poll()')
C is number of 'poll()' wake ups with POLLIN bit set.
D is B / C, e.g. average amount of time, spent in single 'read()'.
+28 −0
Original line number Diff line number Diff line
@@ -141,6 +141,34 @@ void control_writeln(const char *str)
	timeout_end();
}

void control_writeulong(unsigned long value)
{
	char str[32];

	if (snprintf(str, sizeof(str), "%lu", value) >= sizeof(str)) {
		perror("snprintf");
		exit(EXIT_FAILURE);
	}

	control_writeln(str);
}

unsigned long control_readulong(void)
{
	unsigned long value;
	char *str;

	str = control_readln();

	if (!str)
		exit(EXIT_FAILURE);

	value = strtoul(str, NULL, 10);
	free(str);

	return value;
}

/* Return the next line from the control socket (without the trailing newline).
 *
 * The program terminates if a timeout occurs.
+2 −0
Original line number Diff line number Diff line
@@ -9,7 +9,9 @@ void control_init(const char *control_host, const char *control_port,
void control_cleanup(void);
void control_writeln(const char *str);
char *control_readln(void);
unsigned long control_readulong(void);
void control_expectln(const char *str);
bool control_cmpln(char *line, const char *str, bool fail);
void control_writeulong(unsigned long value);

#endif /* CONTROL_H */
Loading