Commit b534dc46 authored by Willem de Bruijn's avatar Willem de Bruijn Committed by Jakub Kicinski
Browse files

net_tstamp: add SOF_TIMESTAMPING_OPT_ID_TCP



Add an option to initialize SOF_TIMESTAMPING_OPT_ID for TCP from
write_seq sockets instead of snd_una.

This should have been the behavior from the start. Because processes
may now exist that rely on the established behavior, do not change
behavior of the existing option, but add the right behavior with a new
flag. It is encouraged to always set SOF_TIMESTAMPING_OPT_ID_TCP on
stream sockets along with the existing SOF_TIMESTAMPING_OPT_ID.

Intuitively the contract is that the counter is zero after the
setsockopt, so that the next write N results in a notification for
the last byte N - 1.

On idle sockets snd_una == write_seq and this holds for both. But on
sockets with data in transmission, snd_una records the unacked offset
in the stream. This depends on the ACK response from the peer. A
process cannot learn this in a race free manner (ioctl SIOCOUTQ is one
racy approach).

write_seq records the offset at the last byte written by the process.
This is a better starting point. It matches the intuitive contract in
all circumstances, unaffected by external behavior.

The new timestamp flag necessitates increasing sk_tsflags to 32 bits.
Move the field in struct sock to avoid growing the socket (for some
common CONFIG variants). The UAPI interface so_timestamping.flags is
already int, so 32 bits wide.

Reported-by: default avatarSotirios Delimanolis <sotodel@meta.com>
Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20221207143701.29861-1-willemdebruijn.kernel@gmail.com


Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parent ecd6df3c
Loading
Loading
Loading
Loading
+31 −1
Original line number Diff line number Diff line
@@ -179,7 +179,8 @@ SOF_TIMESTAMPING_OPT_ID:
  identifier and returns that along with the timestamp. The identifier
  is derived from a per-socket u32 counter (that wraps). For datagram
  sockets, the counter increments with each sent packet. For stream
  sockets, it increments with every byte.
  sockets, it increments with every byte. For stream sockets, also set
  SOF_TIMESTAMPING_OPT_ID_TCP, see the section below.

  The counter starts at zero. It is initialized the first time that
  the socket option is enabled. It is reset each time the option is
@@ -192,6 +193,35 @@ SOF_TIMESTAMPING_OPT_ID:
  among all possibly concurrently outstanding timestamp requests for
  that socket.

SOF_TIMESTAMPING_OPT_ID_TCP:
  Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
  timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
  counter increments for stream sockets, but its starting point is
  not entirely trivial. This option fixes that.

  For stream sockets, if SOF_TIMESTAMPING_OPT_ID is set, this should
  always be set too. On datagram sockets the option has no effect.

  A reasonable expectation is that the counter is reset to zero with
  the system call, so that a subsequent write() of N bytes generates
  a timestamp with counter N-1. SOF_TIMESTAMPING_OPT_ID_TCP
  implements this behavior under all conditions.

  SOF_TIMESTAMPING_OPT_ID without modifier often reports the same,
  especially when the socket option is set when no data is in
  transmission. If data is being transmitted, it may be off by the
  length of the output queue (SIOCOUTQ).

  The difference is due to being based on snd_una versus write_seq.
  snd_una is the offset in the stream acknowledged by the peer. This
  depends on factors outside of process control, such as network RTT.
  write_seq is the last byte written by the process. This offset is
  not affected by external inputs.

  The difference is subtle and unlikely to be noticed when configured
  at initial socket creation, when no data is queued or sent. But
  SOF_TIMESTAMPING_OPT_ID_TCP behavior is more robust regardless of
  when the socket option is set.

SOF_TIMESTAMPING_OPT_CMSG:
  Support recv() cmsg for all timestamped packets. Control messages
+3 −3
Original line number Diff line number Diff line
@@ -503,10 +503,10 @@ struct sock {
#if BITS_PER_LONG==32
	seqlock_t		sk_stamp_seq;
#endif
	u16			sk_tsflags;
	u8			sk_shutdown;
	atomic_t		sk_tskey;
	atomic_t		sk_zckey;
	u32			sk_tsflags;
	u8			sk_shutdown;

	u8			sk_clockid;
	u8			sk_txtime_deadline_mode : 1,
@@ -1899,7 +1899,7 @@ static inline void sock_replace_proto(struct sock *sk, struct proto *proto)
struct sockcm_cookie {
	u64 transmit_time;
	u32 mark;
	u16 tsflags;
	u32 tsflags;
};

static inline void sockcm_init(struct sockcm_cookie *sockc,
+2 −1
Original line number Diff line number Diff line
@@ -31,8 +31,9 @@ enum {
	SOF_TIMESTAMPING_OPT_PKTINFO = (1<<13),
	SOF_TIMESTAMPING_OPT_TX_SWHW = (1<<14),
	SOF_TIMESTAMPING_BIND_PHC = (1 << 15),
	SOF_TIMESTAMPING_OPT_ID_TCP = (1 << 16),

	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_BIND_PHC,
	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_ID_TCP,
	SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
				 SOF_TIMESTAMPING_LAST
};
+8 −1
Original line number Diff line number Diff line
@@ -901,12 +901,19 @@ int sock_set_timestamping(struct sock *sk, int optname,
	if (val & ~SOF_TIMESTAMPING_MASK)
		return -EINVAL;

	if (val & SOF_TIMESTAMPING_OPT_ID_TCP &&
	    !(val & SOF_TIMESTAMPING_OPT_ID))
		return -EINVAL;

	if (val & SOF_TIMESTAMPING_OPT_ID &&
	    !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) {
		if (sk_is_tcp(sk)) {
			if ((1 << sk->sk_state) &
			    (TCPF_CLOSE | TCPF_LISTEN))
				return -EINVAL;
			if (val & SOF_TIMESTAMPING_OPT_ID_TCP)
				atomic_set(&sk->sk_tskey, tcp_sk(sk)->write_seq);
			else
				atomic_set(&sk->sk_tskey, tcp_sk(sk)->snd_una);
		} else {
			atomic_set(&sk->sk_tskey, 0);
+1 −0
Original line number Diff line number Diff line
@@ -417,6 +417,7 @@ const char sof_timestamping_names[][ETH_GSTRING_LEN] = {
	[const_ilog2(SOF_TIMESTAMPING_OPT_PKTINFO)]  = "option-pktinfo",
	[const_ilog2(SOF_TIMESTAMPING_OPT_TX_SWHW)]  = "option-tx-swhw",
	[const_ilog2(SOF_TIMESTAMPING_BIND_PHC)]     = "bind-phc",
	[const_ilog2(SOF_TIMESTAMPING_OPT_ID_TCP)]   = "option-id-tcp",
};
static_assert(ARRAY_SIZE(sof_timestamping_names) == __SOF_TIMESTAMPING_CNT);