Commit 98123866 authored by Benjamin Coddington's avatar Benjamin Coddington Committed by Jakub Kicinski
Browse files

Treewide: Stop corrupting socket's task_frag



Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the
GFP_NOIO flag on sk_allocation which the networking system uses to decide
when it is safe to use current->task_frag.  The results of this are
unexpected corruption in task_frag when SUNRPC is involved in memory
reclaim.

The corruption can be seen in crashes, but the root cause is often
difficult to ascertain as a crashing machine's stack trace will have no
evidence of being near NFS or SUNRPC code.  I believe this problem to
be much more pervasive than reports to the community may indicate.

Fix this by having kernel users of sockets that may corrupt task_frag due
to reclaim set sk_use_task_frag = false.  Preemptively correcting this
situation for users that still set sk_allocation allows them to convert to
memalloc_nofs_save/restore without the same unexpected corruptions that are
sure to follow, unlikely to show up in testing, and difficult to bisect.

CC: Philipp Reisner <philipp.reisner@linbit.com>
CC: Lars Ellenberg <lars.ellenberg@linbit.com>
CC: "Christoph Böhmwalder" <christoph.boehmwalder@linbit.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Josef Bacik <josef@toxicpanda.com>
CC: Keith Busch <kbusch@kernel.org>
CC: Christoph Hellwig <hch@lst.de>
CC: Sagi Grimberg <sagi@grimberg.me>
CC: Lee Duncan <lduncan@suse.com>
CC: Chris Leech <cleech@redhat.com>
CC: Mike Christie <michael.christie@oracle.com>
CC: "James E.J. Bottomley" <jejb@linux.ibm.com>
CC: "Martin K. Petersen" <martin.petersen@oracle.com>
CC: Valentina Manea <valentina.manea.m@gmail.com>
CC: Shuah Khan <shuah@kernel.org>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: David Howells <dhowells@redhat.com>
CC: Marc Dionne <marc.dionne@auristor.com>
CC: Steve French <sfrench@samba.org>
CC: Christine Caulfield <ccaulfie@redhat.com>
CC: David Teigland <teigland@redhat.com>
CC: Mark Fasheh <mark@fasheh.com>
CC: Joel Becker <jlbec@evilplan.org>
CC: Joseph Qi <joseph.qi@linux.alibaba.com>
CC: Eric Van Hensbergen <ericvh@gmail.com>
CC: Latchesar Ionkov <lucho@ionkov.net>
CC: Dominique Martinet <asmadeus@codewreck.org>
CC: Ilya Dryomov <idryomov@gmail.com>
CC: Xiubo Li <xiubli@redhat.com>
CC: Chuck Lever <chuck.lever@oracle.com>
CC: Jeff Layton <jlayton@kernel.org>
CC: Trond Myklebust <trond.myklebust@hammerspace.com>
CC: Anna Schumaker <anna@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>

Suggested-by: default avatarGuillaume Nault <gnault@redhat.com>
Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parent fb87bd47
Loading
Loading
Loading
Loading
+3 −0
Original line number Diff line number Diff line
@@ -1030,6 +1030,9 @@ static int conn_connect(struct drbd_connection *connection)
	sock.socket->sk->sk_allocation = GFP_NOIO;
	msock.socket->sk->sk_allocation = GFP_NOIO;

	sock.socket->sk->sk_use_task_frag = false;
	msock.socket->sk->sk_use_task_frag = false;

	sock.socket->sk->sk_priority = TC_PRIO_INTERACTIVE_BULK;
	msock.socket->sk->sk_priority = TC_PRIO_INTERACTIVE;

+1 −0
Original line number Diff line number Diff line
@@ -512,6 +512,7 @@ static int sock_xmit(struct nbd_device *nbd, int index, int send,
	noreclaim_flag = memalloc_noreclaim_save();
	do {
		sock->sk->sk_allocation = GFP_NOIO | __GFP_MEMALLOC;
		sock->sk->sk_use_task_frag = false;
		msg.msg_name = NULL;
		msg.msg_namelen = 0;
		msg.msg_control = NULL;
+1 −0
Original line number Diff line number Diff line
@@ -1537,6 +1537,7 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid)
	queue->sock->sk->sk_rcvtimeo = 10 * HZ;

	queue->sock->sk->sk_allocation = GFP_ATOMIC;
	queue->sock->sk->sk_use_task_frag = false;
	nvme_tcp_set_queue_io_cpu(queue);
	queue->request = NULL;
	queue->data_remaining = 0;
+1 −0
Original line number Diff line number Diff line
@@ -738,6 +738,7 @@ iscsi_sw_tcp_conn_bind(struct iscsi_cls_session *cls_session,
	sk->sk_reuse = SK_CAN_REUSE;
	sk->sk_sndtimeo = 15 * HZ; /* FIXME: make it configurable */
	sk->sk_allocation = GFP_ATOMIC;
	sk->sk_use_task_frag = false;
	sk_set_memalloc(sk);
	sock_no_linger(sk);

+1 −0
Original line number Diff line number Diff line
@@ -315,6 +315,7 @@ int usbip_recv(struct socket *sock, void *buf, int size)

	do {
		sock->sk->sk_allocation = GFP_NOIO;
		sock->sk->sk_use_task_frag = false;

		result = sock_recvmsg(sock, &msg, MSG_WAITALL);
		if (result <= 0)
Loading