Skip to content
  1. Jul 01, 2020
    • Li Heng's avatar
      net: cxgb4: fix return error value in t4_prep_fw · 8a259e6b
      Li Heng authored
      
      
      t4_prep_fw goto bye tag with positive return value when something
      bad happened and which can not free resource in adap_init0.
      so fix it to return negative value.
      
      Fixes: 16e47624 ("cxgb4: Add new scheme to update T4/T5 firmware")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarLi Heng <liheng40@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a259e6b
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · e708e2bd
      David S. Miller authored
      
      
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2020-06-30
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 28 non-merge commits during the last 9 day(s) which contain
      a total of 35 files changed, 486 insertions(+), 232 deletions(-).
      
      The main changes are:
      
      1) Fix an incorrect verifier branch elimination for PTR_TO_BTF_ID pointer
         types, from Yonghong Song.
      
      2) Fix UAPI for sockmap and flow_dissector progs that were ignoring various
         arguments passed to BPF_PROG_{ATTACH,DETACH}, from Lorenz Bauer & Jakub Sitnicki.
      
      3) Fix broken AF_XDP DMA hacks that are poking into dma-direct and swiotlb
         internals and integrate it properly into DMA core, from Christoph Hellwig.
      
      4) Fix RCU splat from recent changes to avoid skipping ingress policy when
         kTLS is enabled, from John Fastabend.
      
      5) Fix BPF ringbuf map to enforce size to be the power of 2 in order for its
         position masking to work, from Andrii Nakryiko.
      
      6) Fix regression from CAP_BPF work to re-allow CAP_SYS_ADMIN for loading
         of network programs, from Maciej Żenczykowski.
      
      7) Fix libbpf section name prefix for devmap progs, from Jesper Dangaard Brouer.
      
      8) Fix formatting in UAPI documentation for BPF helpers, from Quentin Monnet.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e708e2bd
    • Yonghong Song's avatar
      bpf: Add tests for PTR_TO_BTF_ID vs. null comparison · d923021c
      Yonghong Song authored
      
      
      Add two tests for PTR_TO_BTF_ID vs. null ptr comparison,
      one for PTR_TO_BTF_ID in the ctx structure and the
      other for PTR_TO_BTF_ID after one level pointer chasing.
      In both cases, the test ensures condition is not
      removed.
      
      For example, for this test
       struct bpf_fentry_test_t {
           struct bpf_fentry_test_t *a;
       };
       int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
       {
           if (arg == 0)
               test7_result = 1;
           return 0;
       }
      Before the previous verifier change, we have xlated codes:
        int test7(long long unsigned int * ctx):
        ; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
           0: (79) r1 = *(u64 *)(r1 +0)
        ; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
           1: (b4) w0 = 0
           2: (95) exit
      After the previous verifier change, we have:
        int test7(long long unsigned int * ctx):
        ; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
           0: (79) r1 = *(u64 *)(r1 +0)
        ; if (arg == 0)
           1: (55) if r1 != 0x0 goto pc+4
        ; test7_result = 1;
           2: (18) r1 = map[id:6][0]+48
           4: (b7) r2 = 1
           5: (7b) *(u64 *)(r1 +0) = r2
        ; int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
           6: (b4) w0 = 0
           7: (95) exit
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200630171241.2523875-1-yhs@fb.com
      d923021c
    • Yonghong Song's avatar
      bpf: Fix an incorrect branch elimination by verifier · 01c66c48
      Yonghong Song authored
      Wenbo reported an issue in [1] where a checking of null
      pointer is evaluated as always false. In this particular
      case, the program type is tp_btf and the pointer to
      compare is a PTR_TO_BTF_ID.
      
      The current verifier considers PTR_TO_BTF_ID always
      reprents a non-null pointer, hence all PTR_TO_BTF_ID compares
      to 0 will be evaluated as always not-equal, which resulted
      in the branch elimination.
      
      For example,
       struct bpf_fentry_test_t {
           struct bpf_fentry_test_t *a;
       };
       int BPF_PROG(test7, struct bpf_fentry_test_t *arg)
       {
           if (arg == 0)
               test7_result = 1;
           return 0;
       }
       int BPF_PROG(test8, struct bpf_fentry_test_t *arg)
       {
           if (arg->a == 0)
               test8_result = 1;
           return 0;
       }
      
      In above bpf programs, both branch arg == 0 and arg->a == 0
      are removed. This may not be what developer expected.
      
      The bug is introduced by Commit cac616db ("bpf: Verifier
      track null pointer branch_taken with JNE and JEQ"),
      where PTR_TO_BTF_ID is considered to be non-null when evaluting
      pointer vs. scalar comparison. This may be added
      considering we have PTR_TO_BTF_ID_OR_NULL in the verifier
      as well.
      
      PTR_TO_BTF_ID_OR_NULL is added to explicitly requires
      a non-NULL testing in selective cases. The current generic
      pointer tracing framework in verifier always
      assigns PTR_TO_BTF_ID so users does not need to
      check NULL pointer at every pointer level like a->b->c->d.
      
      We may not want to assign every PTR_TO_BTF_ID as
      PTR_TO_BTF_ID_OR_NULL as this will require a null test
      before pointer dereference which may cause inconvenience
      for developers. But we could avoid branch elimination
      to preserve original code intention.
      
      This patch simply removed PTR_TO_BTD_ID from reg_type_not_null()
      in verifier, which prevented the above branches from being eliminated.
      
       [1]: https://lore.kernel.org/bpf/79dbb7c0-449d-83eb-5f4f-7af0cc269168@fb.com/T/
      
      
      
      Fixes: cac616db ("bpf: Verifier track null pointer branch_taken with JNE and JEQ")
      Reported-by: default avatarWenbo Zhang <ethercflow@gmail.com>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200630171240.2523722-1-yhs@fb.com
      01c66c48
    • David S. Miller's avatar
      Merge branch 'net-ipa-three-bug-fixes' · 0433c93d
      David S. Miller authored
      
      
      Alex Elder says:
      
      ====================
      net: ipa: three bug fixes
      
      This series contains three bug fixes for the Qualcomm IPA driver.
      In practice these bugs are unlikke.y to be harmful, but they do
      represent incorrect code.
      
      Version 2 adds "Fixes" tags to two of the patches and fixes a typo
      in one (found by checkpatch.pl).
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0433c93d
    • Alex Elder's avatar
      net: ipa: introduce ipa_cmd_tag_process() · 6cb63ea6
      Alex Elder authored
      
      
      Create a new function ipa_cmd_tag_process() that simply allocates a
      transaction, adds a tag process command to it to clear the hardware
      pipeline, and commits the transaction.
      
      Call it in from ipa_endpoint_suspend(), after suspending the modem
      endpoints but before suspending the AP command TX and AP LAN RX
      endpoints (which are used by the tag sequence).
      
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6cb63ea6
    • Alex Elder's avatar
      net: ipa: no checksum offload for SDM845 LAN RX · 41af5436
      Alex Elder authored
      
      
      The AP LAN RX endpoint should not have download checksum offload
      enabled.
      
      The receive handler does properly accommodate the trailer that's
      added by the hardware, but we ignore it.
      
      Fixes: 1ed7d0c0 ("soc: qcom: ipa: configuration data")
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41af5436
    • Alex Elder's avatar
      net: ipa: always check for stopped channel · 5468cbcd
      Alex Elder authored
      
      
      In gsi_channel_stop(), there's a check to see if the channel might
      have entered STOPPED state since a previous call, which might have
      timed out before stopping completed.
      
      That check actually belongs in gsi_channel_stop_command(), which is
      called repeatedly by gsi_channel_stop() for RX channels.
      
      Fixes: 650d1603 ("soc: qcom: ipa: the generic software interface")
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5468cbcd
    • Russell King's avatar
      net: mvneta: fix use of state->speed · f2ca673d
      Russell King authored
      
      
      When support for short preambles was added, it incorrectly keyed its
      decision off state->speed instead of state->interface.  state->speed
      is not guaranteed to be correct for in-band modes, which can lead to
      short preambles being unexpectedly disabled.
      
      Fix this by keying off the interface mode, which is the only way that
      mvneta can operate at 2.5Gbps.
      
      Fixes: da58a931 ("net: mvneta: Add support for 2500Mbps SGMII")
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2ca673d
    • David S. Miller's avatar
      Merge branch 'support-AF_PACKET-for-layer-3-devices' · b9fcf0a0
      David S. Miller authored
      
      
      Jason A. Donenfeld says:
      
      ====================
      support AF_PACKET for layer 3 devices
      
      Hans reported that packets injected by a correct-looking and trivial
      libpcap-based program were not being accepted by wireguard. In
      investigating that, I noticed that a few devices weren't properly
      handling AF_PACKET-injected packets, and so this series introduces a bit
      of shared infrastructure to support that.
      
      The basic problem begins with socket(AF_PACKET, SOCK_RAW,
      htons(ETH_P_ALL)) sockets. When sendto is called, AF_PACKET examines the
      headers of the packet with this logic:
      
      static void packet_parse_headers(struct sk_buff *skb, struct socket *sock)
      {
          if ((!skb->protocol || skb->protocol == htons(ETH_P_ALL)) &&
              sock->type == SOCK_RAW) {
              skb_reset_mac_header(skb);
              skb->protocol = dev_parse_header_protocol(skb);
          }
      
          skb_probe_transport_header(skb);
      }
      
      The middle condition there triggers, and we jump to
      dev_parse_header_protocol. Note that this is the only caller of
      dev_parse_header_protocol in the kernel, and I assume it was designed
      for this purpose:
      
      static inline __be16 dev_parse_header_protocol(const struct sk_buff *skb)
      {
          const struct net_device *dev = skb->dev;
      
          if (!dev->header_ops || !dev->header_ops->parse_protocol)
              return 0;
          return dev->header_ops->parse_protocol(skb);
      }
      
      Since AF_PACKET already knows which netdev the packet is going to, the
      dev_parse_header_protocol function can see if that netdev has a way it
      prefers to figure out the protocol from the header. This, again, is the
      only use of parse_protocol in the kernel. At the moment, it's only used
      with ethernet devices, via eth_header_parse_protocol. This makes sense,
      as mostly people are used to AF_PACKET-injecting ethernet frames rather
      than layer 3 frames. But with nothing in place for layer 3 netdevs, this
      function winds up returning 0, and skb->protocol then is set to 0, and
      then by the time it hits the netdev's ndo_start_xmit, the driver doesn't
      know what to do with it.
      
      This is a problem because drivers very much rely on skb->protocol being
      correct, and routinely reject packets where it's incorrect. That's why
      having this parsing happen for injected packets is quite important. In
      wireguard, ipip, and ipip6, for example, packets from AF_PACKET are just
      dropped entirely. For tun devices, it's sort of uglier, with the tun
      "packet information" header being passed to userspace containing a bogus
      protocol value. Some userspace programs are ill-equipped to deal with
      that. (But of course, that doesn't happen with tap devices, which
      benefit from the similar shared infrastructure for layer 2 netdevs,
      further motiviating this patchset for layer 3 netdevs.)
      
      This patchset addresses the issue by first adding a layer 3 header parse
      function, much akin to the existing one for layer 2 packets, and then
      adds a shared header_ops structure that, also much akin to the existing
      one for layer 2 packets. Then it wires it up to a few immediate places
      that stuck out as requiring it, and does a bit of cleanup.
      
      This patchset seems like it's fixing real bugs, so it might be
      appropriate for stable. But they're also very old bugs, so if you'd
      rather not backport to stable, that'd make sense to me too.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9fcf0a0
    • Jason A. Donenfeld's avatar
      net: xfrmi: implement header_ops->parse_protocol for AF_PACKET · 8f9a1fa4
      Jason A. Donenfeld authored
      
      
      The xfrm interface uses skb->protocol to determine packet type, and
      bails out if it's not set. For AF_PACKET injection, we need to support
      its call chain of:
      
          packet_sendmsg -> packet_snd -> packet_parse_headers ->
            dev_parse_header_protocol -> parse_protocol
      
      Without a valid parse_protocol, this returns zero, and xfrmi rejects the
      skb. So, this wires up the ip_tunnel handler for layer 3 packets for
      that case.
      
      Reported-by: default avatarWillem de Bruijn <willemdebruijn.kernel@gmail.com>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f9a1fa4
    • Jason A. Donenfeld's avatar
      net: sit: implement header_ops->parse_protocol for AF_PACKET · 75ea1f47
      Jason A. Donenfeld authored
      
      
      Sit uses skb->protocol to determine packet type, and bails out if it's
      not set. For AF_PACKET injection, we need to support its call chain of:
      
          packet_sendmsg -> packet_snd -> packet_parse_headers ->
            dev_parse_header_protocol -> parse_protocol
      
      Without a valid parse_protocol, this returns zero, and sit rejects the
      skb. So, this wires up the ip_tunnel handler for layer 3 packets for
      that case.
      
      Reported-by: default avatarWillem de Bruijn <willemdebruijn.kernel@gmail.com>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75ea1f47
    • Jason A. Donenfeld's avatar
      net: vti: implement header_ops->parse_protocol for AF_PACKET · ab59d2b6
      Jason A. Donenfeld authored
      
      
      Vti uses skb->protocol to determine packet type, and bails out if it's
      not set. For AF_PACKET injection, we need to support its call chain of:
      
          packet_sendmsg -> packet_snd -> packet_parse_headers ->
            dev_parse_header_protocol -> parse_protocol
      
      Without a valid parse_protocol, this returns zero, and vti rejects the
      skb. So, this wires up the ip_tunnel handler for layer 3 packets for
      that case.
      
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab59d2b6
    • Jason A. Donenfeld's avatar
      tun: implement header_ops->parse_protocol for AF_PACKET · b9815eb1
      Jason A. Donenfeld authored
      
      
      The tun driver passes up skb->protocol to userspace in the form of PI headers.
      For AF_PACKET injection, we need to support its call chain of:
      
          packet_sendmsg -> packet_snd -> packet_parse_headers ->
            dev_parse_header_protocol -> parse_protocol
      
      Without a valid parse_protocol, this returns zero, and the tun driver
      then gives userspace bogus values that it can't deal with.
      
      Note that this isn't the case with tap, because tap already benefits
      from the shared infrastructure for ethernet headers. But with tun,
      there's nothing.
      
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9815eb1
    • Jason A. Donenfeld's avatar
      wireguard: queueing: make use of ip_tunnel_parse_protocol · 1a574074
      Jason A. Donenfeld authored
      
      
      Now that wg_examine_packet_protocol has been added for general
      consumption as ip_tunnel_parse_protocol, it's possible to remove
      wg_examine_packet_protocol and simply use the new
      ip_tunnel_parse_protocol function directly.
      
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a574074
    • Jason A. Donenfeld's avatar
      wireguard: implement header_ops->parse_protocol for AF_PACKET · 01a4967c
      Jason A. Donenfeld authored
      
      
      WireGuard uses skb->protocol to determine packet type, and bails out if
      it's not set or set to something it's not expecting. For AF_PACKET
      injection, we need to support its call chain of:
      
          packet_sendmsg -> packet_snd -> packet_parse_headers ->
            dev_parse_header_protocol -> parse_protocol
      
      Without a valid parse_protocol, this returns zero, and wireguard then
      rejects the skb. So, this wires up the ip_tunnel handler for layer 3
      packets for that case.
      
      Reported-by: default avatarHans Wippel <ndev@hwipl.net>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      01a4967c
    • Jason A. Donenfeld's avatar
      net: ipip: implement header_ops->parse_protocol for AF_PACKET · e53ac932
      Jason A. Donenfeld authored
      
      
      Ipip uses skb->protocol to determine packet type, and bails out if it's
      not set. For AF_PACKET injection, we need to support its call chain of:
      
          packet_sendmsg -> packet_snd -> packet_parse_headers ->
            dev_parse_header_protocol -> parse_protocol
      
      Without a valid parse_protocol, this returns zero, and ipip rejects the
      skb. So, this wires up the ip_tunnel handler for layer 3 packets for
      that case.
      
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e53ac932
    • Jason A. Donenfeld's avatar
      net: ip_tunnel: add header_ops for layer 3 devices · 2606aff9
      Jason A. Donenfeld authored
      
      
      Some devices that take straight up layer 3 packets benefit from having a
      shared header_ops so that AF_PACKET sockets can inject packets that are
      recognized. This shared infrastructure will be used by other drivers
      that currently can't inject packets using AF_PACKET. It also exposes the
      parser function, as it is useful in standalone form too.
      
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2606aff9
    • Jakub Sitnicki's avatar
      bpf, netns: Fix use-after-free in pernet pre_exit callback · 2576f870
      Jakub Sitnicki authored
      
      
      Iterating over BPF links attached to network namespace in pre_exit hook is
      not safe, even if there is just one. Once link gets auto-detached, that is
      its back-pointer to net object is set to NULL, the link can be released and
      freed without waiting on netns_bpf_mutex, effectively causing the list
      element we are operating on to be freed.
      
      This leads to use-after-free when trying to access the next element on the
      list, as reported by KASAN. Bug can be triggered by destroying a network
      namespace, while also releasing a link attached to this network namespace.
      
      | ==================================================================
      | BUG: KASAN: use-after-free in netns_bpf_pernet_pre_exit+0xd9/0x130
      | Read of size 8 at addr ffff888119e0d778 by task kworker/u8:2/177
      |
      | CPU: 3 PID: 177 Comm: kworker/u8:2 Not tainted 5.8.0-rc1-00197-ga0c04c9d1008-dirty #776
      | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
      | Workqueue: netns cleanup_net
      | Call Trace:
      |  dump_stack+0x9e/0xe0
      |  print_address_description.constprop.0+0x3a/0x60
      |  ? netns_bpf_pernet_pre_exit+0xd9/0x130
      |  kasan_report.cold+0x1f/0x40
      |  ? netns_bpf_pernet_pre_exit+0xd9/0x130
      |  netns_bpf_pernet_pre_exit+0xd9/0x130
      |  cleanup_net+0x30b/0x5b0
      |  ? unregister_pernet_device+0x50/0x50
      |  ? rcu_read_lock_bh_held+0xb0/0xb0
      |  ? _raw_spin_unlock_irq+0x24/0x50
      |  process_one_work+0x4d1/0xa10
      |  ? lock_release+0x3e0/0x3e0
      |  ? pwq_dec_nr_in_flight+0x110/0x110
      |  ? rwlock_bug.part.0+0x60/0x60
      |  worker_thread+0x7a/0x5c0
      |  ? process_one_work+0xa10/0xa10
      |  kthread+0x1e3/0x240
      |  ? kthread_create_on_node+0xd0/0xd0
      |  ret_from_fork+0x1f/0x30
      |
      | Allocated by task 280:
      |  save_stack+0x1b/0x40
      |  __kasan_kmalloc.constprop.0+0xc2/0xd0
      |  netns_bpf_link_create+0xfe/0x650
      |  __do_sys_bpf+0x153a/0x2a50
      |  do_syscall_64+0x59/0x300
      |  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      |
      | Freed by task 198:
      |  save_stack+0x1b/0x40
      |  __kasan_slab_free+0x12f/0x180
      |  kfree+0xed/0x350
      |  process_one_work+0x4d1/0xa10
      |  worker_thread+0x7a/0x5c0
      |  kthread+0x1e3/0x240
      |  ret_from_fork+0x1f/0x30
      |
      | The buggy address belongs to the object at ffff888119e0d700
      |  which belongs to the cache kmalloc-192 of size 192
      | The buggy address is located 120 bytes inside of
      |  192-byte region [ffff888119e0d700, ffff888119e0d7c0)
      | The buggy address belongs to the page:
      | page:ffffea0004678340 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
      | flags: 0x2fffe0000000200(slab)
      | raw: 02fffe0000000200 ffffea00045ba8c0 0000000600000006 ffff88811a80ea80
      | raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      | page dumped because: kasan: bad access detected
      |
      | Memory state around the buggy address:
      |  ffff888119e0d600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      |  ffff888119e0d680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      | >ffff888119e0d700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      |                                                                 ^
      |  ffff888119e0d780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      |  ffff888119e0d800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      | ==================================================================
      
      Remove the "fast-path" for releasing a link that got auto-detached by a
      dying network namespace to fix it. This way as long as link is on the list
      and netns_bpf mutex is held, we have a guarantee that link memory can be
      accessed.
      
      An alternative way to fix this issue would be to safely iterate over the
      list of links and ensure there is no access to link object after detaching
      it. But, at the moment, optimizing synchronization overhead on link release
      without a workload in mind seems like an overkill.
      
      Fixes: ab53cad9 ("bpf, netns: Keep a list of attached bpf_link's")
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200630164541.1329993-1-jakub@cloudflare.com
      2576f870
    • Alexei Starovoitov's avatar
      Merge branch 'fix-sockmap-flow_dissector-uapi' · 084af57c
      Alexei Starovoitov authored
      Lorenz Bauer says:
      
      ====================
      Both sockmap and flow_dissector ingnore various arguments passed to
      BPF_PROG_ATTACH and BPF_PROG_DETACH. We can fix the attach case by
      checking that the unused arguments are zero. I considered requiring
      target_fd to be -1 instead of 0, but this leads to a lot of churn
      in selftests. There is also precedent in that bpf_iter already
      expects 0 for a similar field. I think that we can come up with a
      work around for fd 0 should we need to in the future.
      
      The detach case is more problematic: both cgroups and lirc2 verify
      that attach_bpf_fd matches the currently attached program. This
      way you need access to the program fd to be able to remove it.
      Neither sockmap nor flow_dissector do this. flow_dissector even
      has a check for CAP_NET_ADMIN because of this. The patch set
      addresses this by implementing the desired behaviour.
      
      There is a possibility for user space breakage: any callers that
      don't provide the correct fd will fail with ENOENT. For sockmap
      the risk is low: even the selftests assume that sockmap works
      the way I described. For flow_dissector the story is less
      straightforward, and the selftests use a variety of arguments.
      
      I've includes fixes tags for the oldest commits that allow an easy
      backport, however the behaviour dates back to when sockmap and
      flow_dissector were introduced. What is the best way to handle these?
      
      This set is based on top of Jakub's work "bpf, netns: Prepare
      for multi-prog attachment" available at
      https://lore.kernel.org/bpf/87k0zwmhtb.fsf@cloudflare.com/T/
      
      
      
      Since v1:
      - Adjust selftests
      - Implement detach behaviour
      ====================
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      084af57c
    • Lorenz Bauer's avatar
      selftests: bpf: Pass program to bpf_prog_detach in flow_dissector · 1a1ad3c2
      Lorenz Bauer authored
      
      
      Calling bpf_prog_detach is incorrect, since it takes target_fd as
      its argument. The intention here is to pass it as attach_bpf_fd,
      so use bpf_prog_detach2 and pass zero for target_fd.
      
      Fixes: 06716e04 ("selftests/bpf: Extend test_flow_dissector to cover link creation")
      Signed-off-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200629095630.7933-7-lmb@cloudflare.com
      1a1ad3c2
    • Lorenz Bauer's avatar
      selftests: bpf: Pass program and target_fd in flow_dissector_reattach · 0434296c
      Lorenz Bauer authored
      
      
      Pass 0 as target_fd when attaching and detaching flow dissector.
      Additionally, pass the expected program when detaching.
      
      Fixes: 1f043f87 ("selftests/bpf: Add tests for attaching bpf_link to netns")
      Signed-off-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200629095630.7933-6-lmb@cloudflare.com
      0434296c
    • Lorenz Bauer's avatar
      bpf: sockmap: Require attach_bpf_fd when detaching a program · bb0de313
      Lorenz Bauer authored
      
      
      The sockmap code currently ignores the value of attach_bpf_fd when
      detaching a program. This is contrary to the usual behaviour of
      checking that attach_bpf_fd represents the currently attached
      program.
      
      Ensure that attach_bpf_fd is indeed the currently attached
      program. It turns out that all sockmap selftests already do this,
      which indicates that this is unlikely to cause breakage.
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200629095630.7933-5-lmb@cloudflare.com
      bb0de313
    • Lorenz Bauer's avatar
      bpf: sockmap: Check value of unused args to BPF_PROG_ATTACH · 9b2b0971
      Lorenz Bauer authored
      
      
      Using BPF_PROG_ATTACH on a sockmap program currently understands no
      flags or replace_bpf_fd, but accepts any value. Return EINVAL instead.
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200629095630.7933-4-lmb@cloudflare.com
      9b2b0971
    • Lorenz Bauer's avatar
      bpf: flow_dissector: Check value of unused flags to BPF_PROG_DETACH · 4ac2add6
      Lorenz Bauer authored
      
      
      Using BPF_PROG_DETACH on a flow dissector program supports neither
      attach_flags nor attach_bpf_fd. Yet no value is enforced for them.
      
      Enforce that attach_flags are zero, and require the current program
      to be passed via attach_bpf_fd. This allows us to remove the check
      for CAP_SYS_ADMIN, since userspace can now no longer remove
      arbitrary flow dissector programs.
      
      Fixes: b27f7bb5 ("flow_dissector: Move out netns_bpf prog callbacks")
      Signed-off-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200629095630.7933-3-lmb@cloudflare.com
      4ac2add6
    • Lorenz Bauer's avatar
      bpf: flow_dissector: Check value of unused flags to BPF_PROG_ATTACH · 1b514239
      Lorenz Bauer authored
      
      
      Using BPF_PROG_ATTACH on a flow dissector program supports neither
      target_fd, attach_flags or replace_bpf_fd but accepts any value.
      
      Enforce that all of them are zero. This is fine for replace_bpf_fd
      since its presence is indicated by BPF_F_REPLACE. It's more
      problematic for target_fd, since zero is a valid fd. Should we
      want to use the flag later on we'd have to add an exception for
      fd 0. The alternative is to force a value like -1. This requires
      more changes to tests. There is also precedent for using 0,
      since bpf_iter uses this for target_fd as well.
      
      Fixes: b27f7bb5 ("flow_dissector: Move out netns_bpf prog callbacks")
      Signed-off-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200629095630.7933-2-lmb@cloudflare.com
      1b514239
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-multi-prog-prep' · 951f38cf
      Alexei Starovoitov authored
      Jakub Sitnicki says:
      
      ====================
      This patch set prepares ground for link-based multi-prog attachment for
      future netns attach types, with BPF_SK_LOOKUP attach type in mind [0].
      
      Two changes are needed in order to attach and run a series of BPF programs:
      
        1) an bpf_prog_array of programs to run (patch #2), and
        2) a list of attached links to keep track of attachments (patch #3).
      
      Nothing changes for BPF flow_dissector. Just as before only one program can
      be attached to netns.
      
      In v3 I've simplified patch #2 that introduces bpf_prog_array to take
      advantage of the fact that it will hold at most one program for now.
      
      In particular, I'm no longer using bpf_prog_array_copy. It turned out to be
      less suitable for link operations than I thought as it fails to append the
      same BPF program.
      
      bpf_prog_array_replace_item is also gone, because we know we always want to
      replace the first element in prog_array.
      
      Naturally the code that handles bpf_prog_array will need change once
      more when there is a program type that allows multi-prog attachment. But I
      feel it will be better to do it gradually and present it together with
      tests that actually exercise multi-prog code paths.
      
      [0] https://lore.kernel.org/bpf/20200511185218.1422406-1-jakub@cloudflare.com/
      
      
      
      v2 -> v3:
      - Don't check if run_array is null in link update callback. (Martin)
      - Allow updating the link with the same BPF program. (Andrii)
      - Add patch #4 with a test for the above case.
      - Kill bpf_prog_array_replace_item. Access the run_array directly.
      - Switch from bpf_prog_array_copy() to bpf_prog_array_alloc(1, ...).
      - Replace rcu_deref_protected & RCU_INIT_POINTER with rcu_replace_pointer.
      - Drop Andrii's Ack from patch #2. Code changed.
      
      v1 -> v2:
      
      - Show with a (void) cast that bpf_prog_array_replace_item() return value
        is ignored on purpose. (Andrii)
      - Explain why bpf-cgroup cannot replace programs in bpf_prog_array based
        on bpf_prog pointer comparison in patch #2 description. (Andrii)
      ====================
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      951f38cf
    • Jakub Sitnicki's avatar
      selftests/bpf: Test updating flow_dissector link with same program · 6ebb85c8
      Jakub Sitnicki authored
      
      
      This case, while not particularly useful, is worth covering because we
      expect the operation to succeed as opposed when re-attaching the same
      program directly with PROG_ATTACH.
      
      While at it, update the tests summary that fell out of sync when tests
      extended to cover links.
      
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20200625141357.910330-5-jakub@cloudflare.com
      6ebb85c8
    • Jakub Sitnicki's avatar
      bpf, netns: Keep a list of attached bpf_link's · ab53cad9
      Jakub Sitnicki authored
      
      
      To support multi-prog link-based attachments for new netns attach types, we
      need to keep track of more than one bpf_link per attach type. Hence,
      convert net->bpf.links into a list, that currently can be either empty or
      have just one item.
      
      Instead of reusing bpf_prog_list from bpf-cgroup, we link together
      bpf_netns_link's themselves. This makes list management simpler as we don't
      have to allocate, initialize, and later release list elements. We can do
      this because multi-prog attachment will be available only for bpf_link, and
      we don't need to build a list of programs attached directly and indirectly
      via links.
      
      No functional changes intended.
      
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20200625141357.910330-4-jakub@cloudflare.com
      ab53cad9
    • Jakub Sitnicki's avatar
      bpf, netns: Keep attached programs in bpf_prog_array · 695c1214
      Jakub Sitnicki authored
      
      
      Prepare for having multi-prog attachments for new netns attach types by
      storing programs to run in a bpf_prog_array, which is well suited for
      iterating over programs and running them in sequence.
      
      After this change bpf(PROG_QUERY) may block to allocate memory in
      bpf_prog_array_copy_to_user() for collected program IDs. This forces a
      change in how we protect access to the attached program in the query
      callback. Because bpf_prog_array_copy_to_user() can sleep, we switch from
      an RCU read lock to holding a mutex that serializes updaters.
      
      Because we allow only one BPF flow_dissector program to be attached to
      netns at all times, the bpf_prog_array pointed by net->bpf.run_array is
      always either detached (null) or one element long.
      
      No functional changes intended.
      
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200625141357.910330-3-jakub@cloudflare.com
      695c1214
    • Jakub Sitnicki's avatar
      flow_dissector: Pull BPF program assignment up to bpf-netns · 3b701699
      Jakub Sitnicki authored
      
      
      Prepare for using bpf_prog_array to store attached programs by moving out
      code that updates the attached program out of flow dissector.
      
      Managing bpf_prog_array is more involved than updating a single bpf_prog
      pointer. This will let us do it all from one place, bpf/net_namespace.c, in
      the subsequent patch.
      
      No functional change intended.
      
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20200625141357.910330-2-jakub@cloudflare.com
      3b701699
  2. Jun 30, 2020
  3. Jun 29, 2020
    • Eric Dumazet's avatar
      llc: make sure applications use ARPHRD_ETHER · a9b11101
      Eric Dumazet authored
      
      
      syzbot was to trigger a bug by tricking AF_LLC with
      non sensible addr->sllc_arphrd
      
      It seems clear LLC requires an Ethernet device.
      
      Back in commit abf9d537 ("llc: add support for SO_BINDTODEVICE")
      Octavian Purdila added possibility for application to use a zero
      value for sllc_arphrd, convert it to ARPHRD_ETHER to not cause
      regressions on existing applications.
      
      BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:199 [inline]
      BUG: KASAN: use-after-free in list_empty include/linux/list.h:268 [inline]
      BUG: KASAN: use-after-free in waitqueue_active include/linux/wait.h:126 [inline]
      BUG: KASAN: use-after-free in wq_has_sleeper include/linux/wait.h:160 [inline]
      BUG: KASAN: use-after-free in skwq_has_sleeper include/net/sock.h:2092 [inline]
      BUG: KASAN: use-after-free in sock_def_write_space+0x642/0x670 net/core/sock.c:2813
      Read of size 8 at addr ffff88801e0b4078 by task ksoftirqd/3/27
      
      CPU: 3 PID: 27 Comm: ksoftirqd/3 Not tainted 5.5.0-rc1-syzkaller #0
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x197/0x210 lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
       __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506
       kasan_report+0x12/0x20 mm/kasan/common.c:639
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135
       __read_once_size include/linux/compiler.h:199 [inline]
       list_empty include/linux/list.h:268 [inline]
       waitqueue_active include/linux/wait.h:126 [inline]
       wq_has_sleeper include/linux/wait.h:160 [inline]
       skwq_has_sleeper include/net/sock.h:2092 [inline]
       sock_def_write_space+0x642/0x670 net/core/sock.c:2813
       sock_wfree+0x1e1/0x260 net/core/sock.c:1958
       skb_release_head_state+0xeb/0x260 net/core/skbuff.c:652
       skb_release_all+0x16/0x60 net/core/skbuff.c:663
       __kfree_skb net/core/skbuff.c:679 [inline]
       consume_skb net/core/skbuff.c:838 [inline]
       consume_skb+0xfb/0x410 net/core/skbuff.c:832
       __dev_kfree_skb_any+0xa4/0xd0 net/core/dev.c:2967
       dev_kfree_skb_any include/linux/netdevice.h:3650 [inline]
       e1000_unmap_and_free_tx_resource.isra.0+0x21b/0x3a0 drivers/net/ethernet/intel/e1000/e1000_main.c:1963
       e1000_clean_tx_irq drivers/net/ethernet/intel/e1000/e1000_main.c:3854 [inline]
       e1000_clean+0x4cc/0x1d10 drivers/net/ethernet/intel/e1000/e1000_main.c:3796
       napi_poll net/core/dev.c:6532 [inline]
       net_rx_action+0x508/0x1120 net/core/dev.c:6600
       __do_softirq+0x262/0x98c kernel/softirq.c:292
       run_ksoftirqd kernel/softirq.c:603 [inline]
       run_ksoftirqd+0x8e/0x110 kernel/softirq.c:595
       smpboot_thread_fn+0x6a3/0xa40 kernel/smpboot.c:165
       kthread+0x361/0x430 kernel/kthread.c:255
       ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
      
      Allocated by task 8247:
       save_stack+0x23/0x90 mm/kasan/common.c:72
       set_track mm/kasan/common.c:80 [inline]
       __kasan_kmalloc mm/kasan/common.c:513 [inline]
       __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:486
       kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:521
       slab_post_alloc_hook mm/slab.h:584 [inline]
       slab_alloc mm/slab.c:3320 [inline]
       kmem_cache_alloc+0x121/0x710 mm/slab.c:3484
       sock_alloc_inode+0x1c/0x1d0 net/socket.c:240
       alloc_inode+0x68/0x1e0 fs/inode.c:230
       new_inode_pseudo+0x19/0xf0 fs/inode.c:919
       sock_alloc+0x41/0x270 net/socket.c:560
       __sock_create+0xc2/0x730 net/socket.c:1384
       sock_create net/socket.c:1471 [inline]
       __sys_socket+0x103/0x220 net/socket.c:1513
       __do_sys_socket net/socket.c:1522 [inline]
       __se_sys_socket net/socket.c:1520 [inline]
       __ia32_sys_socket+0x73/0xb0 net/socket.c:1520
       do_syscall_32_irqs_on arch/x86/entry/common.c:337 [inline]
       do_fast_syscall_32+0x27b/0xe16 arch/x86/entry/common.c:408
       entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
      
      Freed by task 17:
       save_stack+0x23/0x90 mm/kasan/common.c:72
       set_track mm/kasan/common.c:80 [inline]
       kasan_set_free_info mm/kasan/common.c:335 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/common.c:474
       kasan_slab_free+0xe/0x10 mm/kasan/common.c:483
       __cache_free mm/slab.c:3426 [inline]
       kmem_cache_free+0x86/0x320 mm/slab.c:3694
       sock_free_inode+0x20/0x30 net/socket.c:261
       i_callback+0x44/0x80 fs/inode.c:219
       __rcu_reclaim kernel/rcu/rcu.h:222 [inline]
       rcu_do_batch kernel/rcu/tree.c:2183 [inline]
       rcu_core+0x570/0x1540 kernel/rcu/tree.c:2408
       rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2417
       __do_softirq+0x262/0x98c kernel/softirq.c:292
      
      The buggy address belongs to the object at ffff88801e0b4000
       which belongs to the cache sock_inode_cache of size 1152
      The buggy address is located 120 bytes inside of
       1152-byte region [ffff88801e0b4000, ffff88801e0b4480)
      The buggy address belongs to the page:
      page:ffffea0000782d00 refcount:1 mapcount:0 mapping:ffff88807aa59c40 index:0xffff88801e0b4ffd
      raw: 00fffe0000000200 ffffea00008e6c88 ffffea0000782d48 ffff88807aa59c40
      raw: ffff88801e0b4ffd ffff88801e0b4000 0000000100000003 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88801e0b3f00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
       ffff88801e0b3f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff88801e0b4000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                                      ^
       ffff88801e0b4080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88801e0b4100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: abf9d537 ("llc: add support for SO_BINDTODEVICE")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9b11101
    • Cong Wang's avatar
      net: explain the lockdep annotations for dev_uc_unsync() · e8280338
      Cong Wang authored
      
      
      The lockdep annotations for dev_uc_unsync() and dev_mc_unsync()
      are not easy to understand, so add some comments to explain
      why they are correct.
      
      Similar for the rest netif_addr_lock_bh() cases, they don't
      need nested version.
      
      Cc: Taehee Yoo <ap420073@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e8280338