Commit f1260b3a authored by Andrii Nakryiko's avatar Andrii Nakryiko Committed by Pu Lehui
Browse files

bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic

stable inclusion
from stable-v6.6.80
commit fc01ba097319ae2033f118da3075bd783f4a08e9
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBSW0M
CVE: CVE-2025-21853

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fc01ba097319



--------------------------------

[ Upstream commit 98671a0fd1f14e4a518ee06b19037c20014900eb ]

For all BPF maps we ensure that VM_MAYWRITE is cleared when
memory-mapping BPF map contents as initially read-only VMA. This is
because in some cases BPF verifier relies on the underlying data to not
be modified afterwards by user space, so once something is mapped
read-only, it shouldn't be re-mmap'ed as read-write.

As such, it's not necessary to check VM_MAYWRITE in bpf_map_mmap() and
map->ops->map_mmap() callbacks: VM_WRITE should be consistently set for
read-write mappings, and if VM_WRITE is not set, there is no way for
user space to upgrade read-only mapping to read-write one.

This patch cleans up this VM_WRITE vs VM_MAYWRITE handling within
bpf_map_mmap(), which is an entry point for any BPF map mmap()-ing
logic. We also drop unnecessary sanitization of VM_MAYWRITE in BPF
ringbuf's map_mmap() callback implementation, as it is already performed
by common code in bpf_map_mmap().

Note, though, that in bpf_map_mmap_{open,close}() callbacks we can't
drop VM_MAYWRITE use, because it's possible (and is outside of
subsystem's control) to have initially read-write memory mapping, which
is subsequently dropped to read-only by user space through mprotect().
In such case, from BPF verifier POV it's read-write data throughout the
lifetime of BPF map, and is counted as "active writer".

But its VMAs will start out as VM_WRITE|VM_MAYWRITE, then mprotect() can
change it to just VM_MAYWRITE (and no VM_WRITE), so when its finally
munmap()'ed and bpf_map_mmap_close() is called, vm_flags will be just
VM_MAYWRITE, but we still need to decrement active writer count with
bpf_map_write_active_dec() as it's still considered to be a read-write
mapping by the rest of BPF subsystem.

Similar reasoning applies to bpf_map_mmap_open(), which is called
whenever mmap(), munmap(), and/or mprotect() forces mm subsystem to
split original VMA into multiple discontiguous VMAs.

Memory-mapping handling is a bit tricky, yes.

Cc: Jann Horn <jannh@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250129012246.1515826-1-andrii@kernel.org


Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
Stable-dep-of: bc27c52eea18 ("bpf: avoid holding freeze_mutex during mmap operation")
Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
Signed-off-by: default avatarPu Lehui <pulehui@huawei.com>
parent 1579902a
Loading
Loading
Loading
Loading
+0 −4
Original line number Diff line number Diff line
@@ -268,8 +268,6 @@ static int ringbuf_map_mmap_kern(struct bpf_map *map, struct vm_area_struct *vma
		/* allow writable mapping for the consumer_pos only */
		if (vma->vm_pgoff != 0 || vma->vm_end - vma->vm_start != PAGE_SIZE)
			return -EPERM;
	} else {
		vm_flags_clear(vma, VM_MAYWRITE);
	}
	/* remap_vmalloc_range() checks size and offset constraints */
	return remap_vmalloc_range(vma, rb_map->rb,
@@ -289,8 +287,6 @@ static int ringbuf_map_mmap_user(struct bpf_map *map, struct vm_area_struct *vma
			 * position, and the ring buffer data itself.
			 */
			return -EPERM;
	} else {
		vm_flags_clear(vma, VM_MAYWRITE);
	}
	/* remap_vmalloc_range() checks size and offset constraints */
	return remap_vmalloc_range(vma, rb_map->rb, vma->vm_pgoff + RINGBUF_PGOFF);
+8 −2
Original line number Diff line number Diff line
@@ -913,15 +913,21 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
	vma->vm_ops = &bpf_map_default_vmops;
	vma->vm_private_data = map;
	vm_flags_clear(vma, VM_MAYEXEC);
	/* If mapping is read-only, then disallow potentially re-mapping with
	 * PROT_WRITE by dropping VM_MAYWRITE flag. This VM_MAYWRITE clearing
	 * means that as far as BPF map's memory-mapped VMAs are concerned,
	 * VM_WRITE and VM_MAYWRITE and equivalent, if one of them is set,
	 * both should be set, so we can forget about VM_MAYWRITE and always
	 * check just VM_WRITE
	 */
	if (!(vma->vm_flags & VM_WRITE))
		/* disallow re-mapping with PROT_WRITE */
		vm_flags_clear(vma, VM_MAYWRITE);

	err = map->ops->map_mmap(map, vma);
	if (err)
		goto out;

	if (vma->vm_flags & VM_MAYWRITE)
	if (vma->vm_flags & VM_WRITE)
		bpf_map_write_active_inc(map);
out:
	mutex_unlock(&map->freeze_mutex);