Skip to content
  1. Mar 05, 2022
    • Daniel Borkmann's avatar
      mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls · 0708a0af
      Daniel Borkmann authored
      
      
      syzkaller was recently triggering an oversized kvmalloc() warning via
      xdp_umem_create().
      
      The triggered warning was added back in 7661809d ("mm: don't allow
      oversized kvmalloc() calls"). The rationale for the warning for huge
      kvmalloc sizes was as a reaction to a security bug where the size was
      more than UINT_MAX but not everything was prepared to handle unsigned
      long sizes.
      
      Anyway, the AF_XDP related call trace from this syzkaller report was:
      
        kvmalloc include/linux/mm.h:806 [inline]
        kvmalloc_array include/linux/mm.h:824 [inline]
        kvcalloc include/linux/mm.h:829 [inline]
        xdp_umem_pin_pages net/xdp/xdp_umem.c:102 [inline]
        xdp_umem_reg net/xdp/xdp_umem.c:219 [inline]
        xdp_umem_create+0x6a5/0xf00 net/xdp/xdp_umem.c:252
        xsk_setsockopt+0x604/0x790 net/xdp/xsk.c:1068
        __sys_setsockopt+0x1fd/0x4e0 net/socket.c:2176
        __do_sys_setsockopt net/socket.c:2187 [inline]
        __se_sys_setsockopt net/socket.c:2184 [inline]
        __x64_sys_setsockopt+0xb5/0x150 net/socket.c:2184
        do_syscall_x64 arch/x86/entry/common.c:50 [inline]
        do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Björn mentioned that requests for >2GB allocation can still be valid:
      
        The structure that is being allocated is the page-pinning accounting.
        AF_XDP has an internal limit of U32_MAX pages, which is *a lot*, but
        still fewer than what memcg allows (PAGE_COUNTER_MAX is a LONG_MAX/
        PAGE_SIZE on 64 bit systems). [...]
      
        I could just change from U32_MAX to INT_MAX, but as I stated earlier
        that has a hacky feeling to it. [...] From my perspective, the code
        isn't broken, with the memcg limits in consideration. [...]
      
      Linus says:
      
        [...] Pretty much every time this has come up, the kernel warning has
        shown that yes, the code was broken and there really wasn't a reason
        for doing allocations that big.
      
        Of course, some people would be perfectly fine with the allocation
        failing, they just don't want the warning. I didn't want __GFP_NOWARN
        to shut it up originally because I wanted people to see all those
        cases, but these days I think we can just say "yeah, people can shut
        it up explicitly by saying 'go ahead and fail this allocation, don't
        warn about it'".
      
        So enough time has passed that by now I'd certainly be ok with [it].
      
      Thus allow call-sites to silence such userspace triggered splats if the
      allocation requests have __GFP_NOWARN. For xdp_umem_pin_pages()'s call
      to kvcalloc() this is already the case, so nothing else needed there.
      
      Fixes: 7661809d ("mm: don't allow oversized kvmalloc() calls")
      Reported-by: default avatar <syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatar <syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com>
      Cc: Björn Töpel <bjorn@kernel.org>
      Cc: Magnus Karlsson <magnus.karlsson@intel.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: David S. Miller <davem@davemloft.net>
      Link: https://lore.kernel.org/bpf/CAJ+HfNhyfsT5cS_U9EC213ducHs9k9zNxX9+abqC0kTrPbQ0gg@mail.gmail.com
      Link: https://lore.kernel.org/bpf/20211201202905.b9892171e3f5b9a60f9da251@linux-foundation.org
      
      
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Ackd-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0708a0af
  2. Mar 04, 2022
  3. Mar 03, 2022
  4. Mar 02, 2022
    • Gao Xiang's avatar
      erofs: fix ztailpacking on > 4GiB filesystems · 22ba5e99
      Gao Xiang authored
      z_idataoff here is an absolute physical offset, so it should use
      erofs_off_t (64 bits at least). Otherwise, it'll get trimmed and
      cause the decompresion failure.
      
      Link: https://lore.kernel.org/r/20220222033118.20540-1-hsiangkao@linux.alibaba.com
      
      
      Fixes: ab92184f ("erofs: add on-disk compressed tail-packing inline support")
      Reviewed-by: default avatarYue Hu <huyue2@yulong.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      22ba5e99
    • Sven Eckelmann's avatar
      batman-adv: Don't expect inter-netns unique iflink indices · 6c1f41af
      Sven Eckelmann authored
      
      
      The ifindex doesn't have to be unique for multiple network namespaces on
      the same machine.
      
        $ ip netns add test1
        $ ip -net test1 link add dummy1 type dummy
        $ ip netns add test2
        $ ip -net test2 link add dummy2 type dummy
      
        $ ip -net test1 link show dev dummy1
        6: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
            link/ether 96:81:55:1e:dd:85 brd ff:ff:ff:ff:ff:ff
        $ ip -net test2 link show dev dummy2
        6: dummy2: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
            link/ether 5a:3c:af:35:07:c3 brd ff:ff:ff:ff:ff:ff
      
      But the batman-adv code to walk through the various layers of virtual
      interfaces uses this assumption because dev_get_iflink handles it
      internally and doesn't return the actual netns of the iflink. And
      dev_get_iflink only documents the situation where ifindex == iflink for
      physical devices.
      
      But only checking for dev->netdev_ops->ndo_get_iflink is also not an option
      because ipoib_get_iflink implements it even when it sometimes returns an
      iflink != ifindex and sometimes iflink == ifindex. The caller must
      therefore make sure itself to check both netns and iflink + ifindex for
      equality. Only when they are equal, a "physical" interface was detected
      which should stop the traversal. On the other hand, vxcan_get_iflink can
      also return 0 in case there was currently no valid peer. In this case, it
      is still necessary to stop.
      
      Fixes: b7eddd0b ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface")
      Fixes: 5ed4a460 ("batman-adv: additional checks for virtual interfaces on top of WiFi")
      Reported-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      6c1f41af
    • Sven Eckelmann's avatar
      batman-adv: Request iflink once in batadv_get_real_netdevice · 6116ba09
      Sven Eckelmann authored
      
      
      There is no need to call dev_get_iflink multiple times for the same
      net_device in batadv_get_real_netdevice. And since some of the
      ndo_get_iflink callbacks are dynamic (for example via RCUs like in
      vxcan_get_iflink), it could easily happen that the returned values are not
      stable. The pre-checks before __dev_get_by_index are then of course bogus.
      
      Fixes: 5ed4a460 ("batman-adv: additional checks for virtual interfaces on top of WiFi")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      6116ba09
    • Sven Eckelmann's avatar
      batman-adv: Request iflink once in batadv-on-batadv check · 690bb6fb
      Sven Eckelmann authored
      
      
      There is no need to call dev_get_iflink multiple times for the same
      net_device in batadv_is_on_batman_iface. And since some of the
      .ndo_get_iflink callbacks are dynamic (for example via RCUs like in
      vxcan_get_iflink), it could easily happen that the returned values are not
      stable. The pre-checks before __dev_get_by_index are then of course bogus.
      
      Fixes: b7eddd0b ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      690bb6fb