Skip to content
  1. May 16, 2019
  2. May 15, 2019
    • Alexei Starovoitov's avatar
      Merge branch 'lru-map-fix' · 5db17c96
      Alexei Starovoitov authored
      
      
      Daniel Borkmann says:
      
      ====================
      This set fixes LRU map eviction in combination with map lookups out
      of system call side from user space. Main patch is the second one and
      test cases are adapted and added in the last one. Thanks!
      ====================
      
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5db17c96
    • Daniel Borkmann's avatar
      bpf: test ref bit from data path and add new tests for syscall path · d2baab62
      Daniel Borkmann authored
      
      
      The test_lru_map is relying on marking the LRU map entry via regular
      BPF map lookup from system call side. This is basically for simplicity
      reasons. Given we fixed marking entries in that case, the test needs
      to be fixed as well. Here we add a small drop-in replacement to retain
      existing behavior for the tests by marking out of the BPF program and
      transferring the retrieved value out via temporary map. This also adds
      new test cases to track the new behavior where two elements are marked,
      one via system call side and one via program side, where the next update
      then evicts the key looked up only from system call side.
      
        # ./test_lru_map
        nr_cpus:8
      
        test_lru_sanity0 (map_type:9 map_flags:0x0): Pass
        test_lru_sanity1 (map_type:9 map_flags:0x0): Pass
        test_lru_sanity2 (map_type:9 map_flags:0x0): Pass
        test_lru_sanity3 (map_type:9 map_flags:0x0): Pass
        test_lru_sanity4 (map_type:9 map_flags:0x0): Pass
        test_lru_sanity5 (map_type:9 map_flags:0x0): Pass
        test_lru_sanity7 (map_type:9 map_flags:0x0): Pass
        test_lru_sanity8 (map_type:9 map_flags:0x0): Pass
      
        test_lru_sanity0 (map_type:10 map_flags:0x0): Pass
        test_lru_sanity1 (map_type:10 map_flags:0x0): Pass
        test_lru_sanity2 (map_type:10 map_flags:0x0): Pass
        test_lru_sanity3 (map_type:10 map_flags:0x0): Pass
        test_lru_sanity4 (map_type:10 map_flags:0x0): Pass
        test_lru_sanity5 (map_type:10 map_flags:0x0): Pass
        test_lru_sanity7 (map_type:10 map_flags:0x0): Pass
        test_lru_sanity8 (map_type:10 map_flags:0x0): Pass
      
        test_lru_sanity0 (map_type:9 map_flags:0x2): Pass
        test_lru_sanity4 (map_type:9 map_flags:0x2): Pass
        test_lru_sanity6 (map_type:9 map_flags:0x2): Pass
        test_lru_sanity7 (map_type:9 map_flags:0x2): Pass
        test_lru_sanity8 (map_type:9 map_flags:0x2): Pass
      
        test_lru_sanity0 (map_type:10 map_flags:0x2): Pass
        test_lru_sanity4 (map_type:10 map_flags:0x2): Pass
        test_lru_sanity6 (map_type:10 map_flags:0x2): Pass
        test_lru_sanity7 (map_type:10 map_flags:0x2): Pass
        test_lru_sanity8 (map_type:10 map_flags:0x2): Pass
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d2baab62
    • Daniel Borkmann's avatar
      bpf, lru: avoid messing with eviction heuristics upon syscall lookup · 50b045a8
      Daniel Borkmann authored
      One of the biggest issues we face right now with picking LRU map over
      regular hash table is that a map walk out of user space, for example,
      to just dump the existing entries or to remove certain ones, will
      completely mess up LRU eviction heuristics and wrong entries such
      as just created ones will get evicted instead. The reason for this
      is that we mark an entry as "in use" via bpf_lru_node_set_ref() from
      system call lookup side as well. Thus upon walk, all entries are
      being marked, so information of actual least recently used ones
      are "lost".
      
      In case of Cilium where it can be used (besides others) as a BPF
      based connection tracker, this current behavior causes disruption
      upon control plane changes that need to walk the map from user space
      to evict certain entries. Discussion result from bpfconf [0] was that
      we should simply just remove marking from system call side as no
      good use case could be found where it's actually needed there.
      Therefore this patch removes marking for regular LRU and per-CPU
      flavor. If there ever should be a need in future, the behavior could
      be selected via map creation flag, but due to mentioned reason we
      avoid this here.
      
        [0] http://vger.kernel.org/bpfconf.html
      
      Fixes: 29ba732a ("bpf: Add BPF_MAP_TYPE_LRU_HASH")
      Fixes: 8f844938
      
       ("bpf: Add BPF_MAP_TYPE_LRU_PERCPU_HASH")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      50b045a8
    • Daniel Borkmann's avatar
      bpf: add map_lookup_elem_sys_only for lookups from syscall side · c6110222
      Daniel Borkmann authored
      
      
      Add a callback map_lookup_elem_sys_only() that map implementations
      could use over map_lookup_elem() from system call side in case the
      map implementation needs to handle the latter differently than from
      the BPF data path. If map_lookup_elem_sys_only() is set, this will
      be preferred pick for map lookups out of user space. This hook is
      used in a follow-up fix for LRU map, but once development window
      opens, we can convert other map types from map_lookup_elem() (here,
      the one called upon BPF_MAP_LOOKUP_ELEM cmd is meant) over to use
      the callback to simplify and clean up the latter.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c6110222
  3. May 14, 2019
    • Gary Lin's avatar
      tools/bpf: Sync kernel btf.h header · 2474c628
      Gary Lin authored
      
      
      For the fix of BTF_INT_OFFSET().
      
      Signed-off-by: default avatarGary Lin <glin@suse.com>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      2474c628
    • Gary Lin's avatar
      bpf: btf: fix the brackets of BTF_INT_OFFSET() · 948dc8c9
      Gary Lin authored
      'VAL' should be protected by the brackets.
      
      v2:
      * Squash the fix for Documentation/bpf/btf.rst
      
      Fixes: 69b693f0
      
       ("bpf: btf: Introduce BPF Type Format (BTF)")
      Signed-off-by: default avatarGary Lin <glin@suse.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      948dc8c9
    • John Fastabend's avatar
      bpf: sockmap fix msg->sg.size account on ingress skb · cabede8b
      John Fastabend authored
      When converting a skb to msg->sg we forget to set the size after the
      latest ktls/tls code conversion. This patch can be reached by doing
      a redir into ingress path from BPF skb sock recv hook. Then trying to
      read the size fails.
      
      Fix this by setting the size.
      
      Fixes: 604326b4
      
       ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      cabede8b
    • John Fastabend's avatar
      bpf: sockmap remove duplicate queue free · c42253cc
      John Fastabend authored
      In tcp bpf remove we free the cork list and purge the ingress msg
      list. However we do this before the ref count reaches zero so it
      could be possible some other access is in progress. In this case
      (tcp close and/or tcp_unhash) we happen to also hold the sock
      lock so no path exists but lets fix it otherwise it is extremely
      fragile and breaks the reference counting rules. Also we already
      check the cork list and ingress msg queue and free them once the
      ref count reaches zero so its wasteful to check twice.
      
      Fixes: 604326b4
      
       ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c42253cc
    • John Fastabend's avatar
      bpf: sockmap, only stop/flush strp if it was enabled at some point · 01489436
      John Fastabend authored
      If we try to call strp_done on a parser that has never been
      initialized, because the sockmap user is only using TX side for
      example we get the following error.
      
        [  883.422081] WARNING: CPU: 1 PID: 208 at kernel/workqueue.c:3030 __flush_work+0x1ca/0x1e0
        ...
        [  883.422095] Workqueue: events sk_psock_destroy_deferred
        [  883.422097] RIP: 0010:__flush_work+0x1ca/0x1e0
      
      This had been wrapped in a 'if (psock->parser.enabled)' logic which
      was broken because the strp_done() was never actually being called
      because we do a strp_stop() earlier in the tear down logic will
      set parser.enabled to false. This could result in a use after free
      if work was still in the queue and was resolved by the patch here,
      1d79895a ("sk_msg: Always cancel strp work before freeing the
      psock"). However, calling strp_stop(), done by the patch marked in
      the fixes tag, only is useful if we never initialized a strp parser
      program and never initialized the strp to start with. Because if
      we had initialized a stream parser strp_stop() would have been called
      by sk_psock_drop() earlier in the tear down process.  By forcing the
      strp to stop we get past the WARNING in strp_done that checks
      the stopped flag but calling cancel_work_sync on work that has never
      been initialized is also wrong and generates the warning above.
      
      To fix check if the parser program exists. If the program exists
      then the strp work has been initialized and must be sync'd and
      cancelled before free'ing any structures. If no program exists we
      never initialized the stream parser in the first place so skip the
      sync/cancel logic implemented by strp_done.
      
      Finally, remove the strp_done its not needed and in the case where we
      are using the stream parser has already been called.
      
      Fixes: e8e34377
      
       ("bpf: Stop the psock parser before canceling its work")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      01489436
    • Stanislav Fomichev's avatar
      bpf: mark bpf_event_notify and bpf_event_init as static · 390e99cf
      Stanislav Fomichev authored
      Both of them are not declared in the headers and not used outside
      of bpf_trace.c file.
      
      Fixes: a38d1107
      
       ("bpf: support raw tracepoints in modules")
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      390e99cf
    • Eric Dumazet's avatar
      bpf: devmap: fix use-after-free Read in __dev_map_entry_free · 2baae354
      Eric Dumazet authored
      synchronize_rcu() is fine when the rcu callbacks only need
      to free memory (kfree_rcu() or direct kfree() call rcu call backs)
      
      __dev_map_entry_free() is a bit more complex, so we need to make
      sure that call queued __dev_map_entry_free() callbacks have completed.
      
      sysbot report:
      
      BUG: KASAN: use-after-free in dev_map_flush_old kernel/bpf/devmap.c:365
      [inline]
      BUG: KASAN: use-after-free in __dev_map_entry_free+0x2a8/0x300
      kernel/bpf/devmap.c:379
      Read of size 8 at addr ffff8801b8da38c8 by task ksoftirqd/1/18
      
      CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.17.0+ #39
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x1b9/0x294 lib/dump_stack.c:113
        print_address_description+0x6c/0x20b mm/kasan/report.c:256
        kasan_report_error mm/kasan/report.c:354 [inline]
        kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
        __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
        dev_map_flush_old kernel/bpf/devmap.c:365 [inline]
        __dev_map_entry_free+0x2a8/0x300 kernel/bpf/devmap.c:379
        __rcu_reclaim kernel/rcu/rcu.h:178 [inline]
        rcu_do_batch kernel/rcu/tree.c:2558 [inline]
        invoke_rcu_callbacks kernel/rcu/tree.c:2818 [inline]
        __rcu_process_callbacks kernel/rcu/tree.c:2785 [inline]
        rcu_process_callbacks+0xe9d/0x1760 kernel/rcu/tree.c:2802
        __do_softirq+0x2e0/0xaf5 kernel/softirq.c:284
        run_ksoftirqd+0x86/0x100 kernel/softirq.c:645
        smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164
        kthread+0x345/0x410 kernel/kthread.c:240
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
      
      Allocated by task 6675:
        save_stack+0x43/0xd0 mm/kasan/kasan.c:448
        set_track mm/kasan/kasan.c:460 [inline]
        kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
        kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
        kmalloc include/linux/slab.h:513 [inline]
        kzalloc include/linux/slab.h:706 [inline]
        dev_map_alloc+0x208/0x7f0 kernel/bpf/devmap.c:102
        find_and_alloc_map kernel/bpf/syscall.c:129 [inline]
        map_create+0x393/0x1010 kernel/bpf/syscall.c:453
        __do_sys_bpf kernel/bpf/syscall.c:2351 [inline]
        __se_sys_bpf kernel/bpf/syscall.c:2328 [inline]
        __x64_sys_bpf+0x303/0x510 kernel/bpf/syscall.c:2328
        do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 26:
        save_stack+0x43/0xd0 mm/kasan/kasan.c:448
        set_track mm/kasan/kasan.c:460 [inline]
        __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
        kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
        __cache_free mm/slab.c:3498 [inline]
        kfree+0xd9/0x260 mm/slab.c:3813
        dev_map_free+0x4fa/0x670 kernel/bpf/devmap.c:191
        bpf_map_free_deferred+0xba/0xf0 kernel/bpf/syscall.c:262
        process_one_work+0xc64/0x1b70 kernel/workqueue.c:2153
        worker_thread+0x181/0x13a0 kernel/workqueue.c:2296
        kthread+0x345/0x410 kernel/kthread.c:240
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
      
      The buggy address belongs to the object at ffff8801b8da37c0
        which belongs to the cache kmalloc-512 of size 512
      The buggy address is located 264 bytes inside of
        512-byte region [ffff8801b8da37c0, ffff8801b8da39c0)
      The buggy address belongs to the page:
      page:ffffea0006e368c0 count:1 mapcount:0 mapping:ffff8801da800940
      index:0xffff8801b8da3540
      flags: 0x2fffc0000000100(slab)
      raw: 02fffc0000000100 ffffea0007217b88 ffffea0006e30cc8 ffff8801da800940
      raw: ffff8801b8da3540 ffff8801b8da3040 0000000100000004 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
        ffff8801b8da3780: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
        ffff8801b8da3800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      > ffff8801b8da3880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                     ^
        ffff8801b8da3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ffff8801b8da3980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      
      Fixes: 546ac1ff
      
       ("bpf: add devmap, a map for storing net device references")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatar <syzbot+457d3e2ffbcf31aee5c0@syzkaller.appspotmail.com>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      2baae354
    • Corentin Labbe's avatar
      net: ethernet: stmmac: dwmac-sun8i: enable support of unicast filtering · d4c26eb6
      Corentin Labbe authored
      When adding more MAC addresses to a dwmac-sun8i interface, the device goes
      directly in promiscuous mode.
      This is due to IFF_UNICAST_FLT missing flag.
      
      So since the hardware support unicast filtering, let's add IFF_UNICAST_FLT.
      
      Fixes: 9f93ac8d
      
       ("net-next: stmmac: Add dwmac-sun8i")
      Signed-off-by: default avatarCorentin Labbe <clabbe@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4c26eb6
    • Grygorii Strashko's avatar
      net: ethernet: ti: netcp_ethss: fix build · a8577e13
      Grygorii Strashko authored
      Fix reported build fail:
      ERROR: "cpsw_ale_flush_multicast" [drivers/net/ethernet/ti/keystone_netcp_ethss.ko] undefined!
      ERROR: "cpsw_ale_create" [drivers/net/ethernet/ti/keystone_netcp_ethss.ko] undefined!
      ERROR: "cpsw_ale_add_vlan" [drivers/net/ethernet/ti/keystone_netcp_ethss.ko] undefined!
      
      Fixes: 16f54164
      
       ("net: ethernet: ti: cpsw: drop CONFIG_TI_CPSW_ALE config option")
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8577e13
    • Eric Dumazet's avatar
      flow_dissector: disable preemption around BPF calls · b1c17a9a
      Eric Dumazet authored
      Various things in eBPF really require us to disable preemption
      before running an eBPF program.
      
      syzbot reported :
      
      BUG: assuming atomic context at net/core/flow_dissector.c:737
      in_atomic(): 0, irqs_disabled(): 0, pid: 24710, name: syz-executor.3
      2 locks held by syz-executor.3/24710:
       #0: 00000000e81a4bf1 (&tfile->napi_mutex){+.+.}, at: tun_get_user+0x168e/0x3ff0 drivers/net/tun.c:1850
       #1: 00000000254afebd (rcu_read_lock){....}, at: __skb_flow_dissect+0x1e1/0x4bb0 net/core/flow_dissector.c:822
      CPU: 1 PID: 24710 Comm: syz-executor.3 Not tainted 5.1.0+ #6
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       __cant_sleep kernel/sched/core.c:6165 [inline]
       __cant_sleep.cold+0xa3/0xbb kernel/sched/core.c:6142
       bpf_flow_dissect+0xfe/0x390 net/core/flow_dissector.c:737
       __skb_flow_dissect+0x362/0x4bb0 net/core/flow_dissector.c:853
       skb_flow_dissect_flow_keys_basic include/linux/skbuff.h:1322 [inline]
       skb_probe_transport_header include/linux/skbuff.h:2500 [inline]
       skb_probe_transport_header include/linux/skbuff.h:2493 [inline]
       tun_get_user+0x2cfe/0x3ff0 drivers/net/tun.c:1940
       tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2037
       call_write_iter include/linux/fs.h:1872 [inline]
       do_iter_readv_writev+0x5fd/0x900 fs/read_write.c:693
       do_iter_write fs/read_write.c:970 [inline]
       do_iter_write+0x184/0x610 fs/read_write.c:951
       vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015
       do_writev+0x15b/0x330 fs/read_write.c:1058
       __do_sys_writev fs/read_write.c:1131 [inline]
       __se_sys_writev fs/read_write.c:1128 [inline]
       __x64_sys_writev+0x75/0xb0 fs/read_write.c:1128
       do_syscall_64+0x103/0x670 arch/x86/entry/common.c:298
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: d58e468b
      
       ("flow_dissector: implements flow dissector BPF hook")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Petar Penkov <ppenkov@google.com>
      Cc: Stanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1c17a9a
    • Jarod Wilson's avatar
      bonding: fix arp_validate toggling in active-backup mode · a9b8a2b3
      Jarod Wilson authored
      There's currently a problem with toggling arp_validate on and off with an
      active-backup bond. At the moment, you can start up a bond, like so:
      
      modprobe bonding mode=1 arp_interval=100 arp_validate=0 arp_ip_targets=192.168.1.1
      ip link set bond0 down
      echo "ens4f0" > /sys/class/net/bond0/bonding/slaves
      echo "ens4f1" > /sys/class/net/bond0/bonding/slaves
      ip link set bond0 up
      ip addr add 192.168.1.2/24 dev bond0
      
      Pings to 192.168.1.1 work just fine. Now turn on arp_validate:
      
      echo 1 > /sys/class/net/bond0/bonding/arp_validate
      
      Pings to 192.168.1.1 continue to work just fine. Now when you go to turn
      arp_validate off again, the link falls flat on it's face:
      
      echo 0 > /sys/class/net/bond0/bonding/arp_validate
      dmesg
      ...
      [133191.911987] bond0: Setting arp_validate to none (0)
      [133194.257793] bond0: bond_should_notify_peers: slave ens4f0
      [133194.258031] bond0: link status definitely down for interface ens4f0, disabling it
      [133194.259000] bond0: making interface ens4f1 the new active one
      [133197.330130] bond0: link status definitely down for interface ens4f1, disabling it
      [133197.331191] bond0: now running without any active interface!
      
      The problem lies in bond_options.c, where passing in arp_validate=0
      results in bond->recv_probe getting set to NULL. This flies directly in
      the face of commit 3fe68df9, which says we need to set recv_probe =
      bond_arp_recv, even if we're not using arp_validate. Said commit fixed
      this in bond_option_arp_interval_set, but missed that we can get to that
      same state in bond_option_arp_validate_set as well.
      
      One solution would be to universally set recv_probe = bond_arp_recv here
      as well, but I don't think bond_option_arp_validate_set has any business
      touching recv_probe at all, and that should be left to the arp_interval
      code, so we can just make things much tidier here.
      
      Fixes: 3fe68df9
      
       ("bonding: always set recv_probe to bond_arp_rcv in arp monitor")
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: netdev@vger.kernel.org
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9b8a2b3
    • Jerome Brunet's avatar
      net: meson: fixup g12a glue ephy id · 0ecfc7e1
      Jerome Brunet authored
      The phy id chosen by Amlogic is incorrectly set in the mdio mux and
      does not match the phy driver.
      
      It was not detected before because DT forces the use the correct driver
      for the internal PHY.
      
      Fixes: 70904251
      
       ("net: phy: add amlogic g12a mdio mux support")
      Reported-by: default avatarQi Duan <qi.duan@amlogic.com>
      Signed-off-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ecfc7e1
    • Kunihiko Hayashi's avatar
      net: phy: realtek: Replace phy functions with non-locked version in rtl8211e_config_init() · dffe7d2e
      Kunihiko Hayashi authored
      After calling phy_select_page() and until calling phy_restore_page(),
      the mutex 'mdio_lock' is already locked, so the driver should use
      non-locked version of phy functions. Or there will be a deadlock with
      'mdio_lock'.
      
      This replaces phy functions called from rtl8211e_config_init() to avoid
      the deadlock issue.
      
      Fixes: f81dadbc
      
       ("net: phy: realtek: Add rtl8211e rx/tx delays config")
      Signed-off-by: default avatarKunihiko Hayashi <hayashi.kunihiko@socionext.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dffe7d2e
    • Thomas Bogendoerfer's avatar
      net: seeq: fix crash caused by not set dev.parent · 5afcd14c
      Thomas Bogendoerfer authored
      The old MIPS implementation of dma_cache_sync() didn't use the dev argument,
      but commit c9eb6172
      
       ("dma-mapping: turn dma_cache_sync into a
      dma_map_ops method") changed that, so we now need to set dev.parent.
      
      Signed-off-by: default avatarThomas Bogendoerfer <tbogendoerfer@suse.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5afcd14c
  4. May 13, 2019
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 3ebb41bf
      David S. Miller authored
      
      
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Postpone chain policy update to drop after transaction is complete,
         from Florian Westphal.
      
      2) Add entry to flowtable after confirmation to fix UDP flows with
         packets going in one single direction.
      
      3) Reference count leak in dst object, from Taehee Yoo.
      
      4) Check for TTL field in flowtable datapath, from Taehee Yoo.
      
      5) Fix h323 conntrack helper due to incorrect boundary check,
         from Jakub Jankowski.
      
      6) Fix incorrect rcu dereference when fetching basechain stats,
         from Florian Westphal.
      
      7) Missing error check when adding new entries to flowtable,
         from Taehee Yoo.
      
      8) Use version field in nfnetlink message to honor the nfgen_family
         field, from Kristian Evensen.
      
      9) Remove incorrect configuration check for CONFIG_NF_CONNTRACK_IPV6,
         from Subash Abhinov Kasiviswanathan.
      
      10) Prevent dying entries from being added to the flowtable,
          from Taehee Yoo.
      
      11) Don't hit WARN_ON() with malformed blob in ebtables with
          trailing data after last rule, reported by syzbot, patch
          from Florian Westphal.
      
      12) Remove NFT_CT_TIMEOUT enumeration, never used in the kernel
          code.
      
      13) Fix incorrect definition for NFT_LOGLEVEL_MAX, from Florian
          Westphal.
      
      This batch comes with a conflict that can be fixed with this patch:
      
      diff --cc include/uapi/linux/netfilter/nf_tables.h
      index 7bdb234f3d8c,f0cf7b0f4f35..505393c6e959
      --- a/include/uapi/linux/netfilter/nf_tables.h
      +++ b/include/uapi/linux/netfilter/nf_tables.h
      @@@ -966,6 -966,8 +966,7 @@@ enum nft_socket_keys
         * @NFT_CT_DST_IP: conntrack layer 3 protocol destination (IPv4 address)
         * @NFT_CT_SRC_IP6: conntrack layer 3 protocol source (IPv6 address)
         * @NFT_CT_DST_IP6: conntrack layer 3 protocol destination (IPv6 address)
       - * @NFT_CT_TIMEOUT: connection tracking timeout policy assigned to conntrack
      +  * @NFT_CT_ID: conntrack id
         */
        enum nft_ct_keys {
        	NFT_CT_STATE,
      @@@ -991,6 -993,8 +992,7 @@@
        	NFT_CT_DST_IP,
        	NFT_CT_SRC_IP6,
        	NFT_CT_DST_IP6,
       -	NFT_CT_TIMEOUT,
      + 	NFT_CT_ID,
        	__NFT_CT_MAX
        };
        #define NFT_CT_MAX		(__NFT_CT_MAX - 1)
      
      That replaces the unused NFT_CT_TIMEOUT definition by NFT_CT_ID. If you prefer,
      I can also solve this conflict here, just let me know.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ebb41bf
    • Petr Štetiar's avatar
      of_net: Fix missing of_find_device_by_node ref count drop · 3ee9ae74
      Petr Štetiar authored
      of_find_device_by_node takes a reference to the embedded struct device
      which needs to be dropped after use.
      
      Fixes: d01f449c
      
       ("of_net: add NVMEM support to of_get_mac_address")
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Reported-by: default avatarJulia Lawall <julia.lawall@lip6.fr>
      Signed-off-by: default avatarPetr Štetiar <ynezz@true.cz>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ee9ae74
    • Maxime Chevallier's avatar
      net: mvpp2: cls: Add missing NETIF_F_NTUPLE flag · da86f59f
      Maxime Chevallier authored
      Now that the mvpp2 driver supports classification offloading, we must
      add the NETIF_F_NTUPLE to the features list.
      
      Since the current code doesn't allow disabling the feature, we don't set
      the flag in dev->hw_features.
      
      Fixes: 90b509b3
      
       ("net: mvpp2: cls: Add Classification offload support")
      Reported-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da86f59f
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 69dda13f
      David S. Miller authored
      
      
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2019-05-13
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix out of bounds backwards jumps due to a bug in dead code
         removal, from Daniel.
      
      2) Fix libbpf users by detecting unsupported BTF kernel features
         and sanitize them before load, from Andrii.
      
      3) Fix undefined behavior in narrow load handling of context
         fields, from Krzesimir.
      
      4) Various BPF uapi header doc/man page fixes, from Quentin.
      
      5) Misc .gitignore fixups to exclude built files, from Kelsey.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      69dda13f
    • Krzesimir Nowak's avatar
      bpf: fix undefined behavior in narrow load handling · e2f7fc0a
      Krzesimir Nowak authored
      Commit 31fd8581 ("bpf: permits narrower load from bpf program
      context fields") made the verifier add AND instructions to clear the
      unwanted bits with a mask when doing a narrow load. The mask is
      computed with
      
        (1 << size * 8) - 1
      
      where "size" is the size of the narrow load. When doing a 4 byte load
      of a an 8 byte field the verifier shifts the literal 1 by 32 places to
      the left. This results in an overflow of a signed integer, which is an
      undefined behavior. Typically, the computed mask was zero, so the
      result of the narrow load ended up being zero too.
      
      Cast the literal to long long to avoid overflows. Note that narrow
      load of the 4 byte fields does not have the undefined behavior,
      because the load size can only be either 1 or 2 bytes, so shifting 1
      by 8 or 16 places will not overflow it. And reading 4 bytes would not
      be a narrow load of a 4 bytes field.
      
      Fixes: 31fd8581
      
       ("bpf: permits narrower load from bpf program context fields")
      Reviewed-by: default avatarAlban Crequy <alban@kinvolk.io>
      Reviewed-by: default avatarIago López Galeiras <iago@kinvolk.io>
      Signed-off-by: default avatarKrzesimir Nowak <krzesimir@kinvolk.io>
      Cc: Yonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      e2f7fc0a
    • Andrii Nakryiko's avatar
      libbpf: detect supported kernel BTF features and sanitize BTF · d7c4b398
      Andrii Nakryiko authored
      
      
      Depending on used versions of libbpf, Clang, and kernel, it's possible to
      have valid BPF object files with valid BTF information, that still won't
      load successfully due to Clang emitting newer BTF features (e.g.,
      BTF_KIND_FUNC, .BTF.ext's line_info/func_info, BTF_KIND_DATASEC, etc), that
      are not yet supported by older kernel.
      
      This patch adds detection of BTF features and sanitizes BPF object's BTF
      by substituting various supported BTF kinds, which have compatible layout:
        - BTF_KIND_FUNC -> BTF_KIND_TYPEDEF
        - BTF_KIND_FUNC_PROTO -> BTF_KIND_ENUM
        - BTF_KIND_VAR -> BTF_KIND_INT
        - BTF_KIND_DATASEC -> BTF_KIND_STRUCT
      
      Replacement is done in such a way as to preserve as much information as
      possible (names, sizes, etc) where possible without violating kernel's
      validation rules.
      
      v2->v3:
        - remove duplicate #defines from libbpf_util.h
      
      v1->v2:
        - add internal libbpf_internal.h w/ common stuff
        - switch SK storage BTF to use new libbpf__probe_raw_btf()
      
      Reported-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      d7c4b398
    • Kelsey Skunberg's avatar
      selftests: bpf: Add files generated after build to .gitignore · ff1f28c0
      Kelsey Skunberg authored
      
      
      The following files are generated after building /selftests/bpf/ and
      should be added to .gitignore:
      
      	- libbpf.pc
      	- libbpf.so.*
      
      Signed-off-by: default avatarKelsey Skunberg <skunberg.kelsey@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      ff1f28c0
    • Daniel Borkmann's avatar
      Merge branch 'bpf-uapi-doc-fixes' · 6b1d90b7
      Daniel Borkmann authored
      
      
      Quentin Monnet says:
      
      ====================
      Another round of fixes for the doc in the BPF UAPI header, which can be
      turned into a manual page. First patch is the most important, as it fixes
      parsing for the bpf_strtoul() helper doc. Following patches are formatting
      fixes (nitpicks, mostly). The last one updates the copy of the header,
      located under tools/.
      ====================
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      6b1d90b7
    • Quentin Monnet's avatar
      tools: bpf: synchronise BPF UAPI header with tools · c1fe1e70
      Quentin Monnet authored
      
      
      Synchronise the bpf.h header under tools, to report the fixes and
      additions recently brought to the documentation for the BPF helpers.
      
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c1fe1e70
    • Quentin Monnet's avatar
      bpf: fix minor issues in documentation for BPF helpers. · 80867c5e
      Quentin Monnet authored
      
      
      This commit brings many minor fixes to the documentation for BPF helper
      functions. Mostly, this is limited to formatting fixes and improvements.
      In particular, fix broken formatting for bpf_skb_adjust_room().
      
      Besides formatting, replace the mention of "bpf_fullsock()" (that is not
      associated with any function or type exposed to the user) in the
      description of bpf_sk_storage_get() by "full socket".
      
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      80867c5e
    • Quentin Monnet's avatar
      bpf: fix recurring typo in documentation for BPF helpers · 32e7dc28
      Quentin Monnet authored
      
      
      "Underlaying packet buffer" should be an "underlying" one, in the
      warning about invalidated data and data_end pointers. Through
      copy-and-paste, the typo occurred no fewer than 19 times in the
      documentation. Let's fix it.
      
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      32e7dc28
    • Quentin Monnet's avatar
      bpf: fix script for generating man page on BPF helpers · 748c7c82
      Quentin Monnet authored
      
      
      The script broke on parsing function prototype for bpf_strtoul(). This
      is because the last argument for the function is a pointer to an
      "unsigned long". The current version of the script only accepts "const"
      and "struct", but not "unsigned", at the beginning of argument types
      made of several words.
      
      One solution could be to add "unsigned" to the list, but the issue could
      come up again in the future (what about "long int"?). It turns out we do
      not need to have such restrictions on the words: so let's simply accept
      any series of words instead.
      
      Reported-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      748c7c82
    • Daniel Borkmann's avatar
      bpf: add various test cases for backward jumps · 98583812
      Daniel Borkmann authored
      
      
      Add a couple of tests to make sure branch(/call) offset adjustments
      are correctly performed.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      98583812
    • Hariprasad Kelam's avatar
      net: dccp : proto: remove Unneeded variable "err" · 3285a9aa
      Hariprasad Kelam authored
      
      
      Fix below issue reported by coccicheck
      
      net/dccp/proto.c:266:5-8: Unneeded variable: "err". Return "0" on line
      310
      
      Signed-off-by: default avatarHariprasad Kelam <hariprasad.kelam@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3285a9aa
    • David S. Miller's avatar
      Merge branch 'dsa-Fix-a-bug-and-avoid-dangerous-usage-patterns' · 08b0dec4
      David S. Miller authored
      
      
      Vladimir Oltean says:
      
      ====================
      Fix a bug and avoid dangerous usage patterns around DSA_SKB_CB
      
      Making DSA use the sk_buff control block was my idea during the
      'Traffic-support-for-SJA1105-DSA-driver' patchset, and I had also
      introduced a series of macro helpers that turned out to not be so
      helpful:
      
      1. DSA_SKB_ZERO() zeroizes the 48-byte skb->cb area, but due to the high
         performance impact in the hotpath it was only intended to be called
         from the timestamping path. But it turns out that not zeroizing it
         has uncovered the reading of an uninitialized member field of
         DSA_SKB_CB, so in the future just be careful about what needs
         initialization and remove this macro.
      2. DSA_SKB_CLONE() contains a flaw in its body definition (originally
         put there to silence checkpatch.pl) and is unusable at this point
         (will only cause NPE's when used). So remove it.
      3. For DSA_SKB_COPY() the same performance considerations apply as above
         and therefore it's best to prune this function before it reaches a
         stable kernel and potentially any users.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      08b0dec4
    • Vladimir Oltean's avatar
      net: dsa: Remove the now unused DSA_SKB_CB_COPY() macro · 1c9b1420
      Vladimir Oltean authored
      It's best to not expose this, due to the performance hit it may cause
      when calling it.
      
      Fixes: b68b0dd0
      
       ("net: dsa: Keep private info in the skb->cb")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c9b1420
    • Vladimir Oltean's avatar
      net: dsa: Remove dangerous DSA_SKB_CLONE() macro · 506f0e09
      Vladimir Oltean authored
      This does not cause any bug now because it has no users, but its body
      contains two pointer definitions within a code block:
      
      		struct sk_buff *clone = _clone;	\
      		struct sk_buff *skb = _skb;	\
      
      When calling the macro as DSA_SKB_CLONE(clone, skb), these variables
      would obscure the arguments that the macro was called with, and the
      initializers would be a no-op instead of doing their job (undefined
      behavior, by the way, but GCC nicely puts NULL pointers instead).
      
      So simply remove this broken macro and leave users to simply call
      "DSA_SKB_CB(skb)->clone = clone" by hand when needed.
      
      There is one functional difference when doing what I just suggested
      above: the control block won't be transferred from the original skb into
      the clone. Since there's no foreseen need for the control block in the
      clone ATM, this is ok.
      
      Fixes: b68b0dd0
      
       ("net: dsa: Keep private info in the skb->cb")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      506f0e09
    • Vladimir Oltean's avatar
      net: dsa: Initialize DSA_SKB_CB(skb)->deferred_xmit variable · 87671375
      Vladimir Oltean authored
      The sk_buff control block can have any contents on xmit put there by the
      stack, so initialization is mandatory, since we are checking its value
      after the actual DSA xmit (the tagger may have changed it).
      
      The DSA_SKB_ZERO() macro could have been used for this purpose, but:
      - Zeroizing a 48-byte memory region in the hotpath is best avoided.
      - It would have triggered a warning with newer compilers since
        __dsa_skb_cb contains a structure within a structure, and the {0}
        initializer was incorrect for that purpose.
      
      So simply remove the DSA_SKB_ZERO() macro and initialize the
      deferred_xmit variable by hand (which should be done for all further
      dsa_skb_cb variables which need initialization - currently none - to
      avoid the performance penalty).
      
      Fixes: 97a69a0d
      
       ("net: dsa: Add support for deferred xmit")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87671375