Skip to content
  1. Mar 31, 2022
  2. Mar 30, 2022
  3. Mar 29, 2022
    • Michael Walle's avatar
      net: lan966x: fix kernel oops on ioctl when I/F is down · ad7da1ce
      Michael Walle authored
      ioctls handled by phy_mii_ioctl() will cause a kernel oops when the
      interface is down. Fix it by making sure there is a PHY attached.
      
      Fixes: 735fec99
      
       ("net: lan966x: Implement SIOCSHWTSTAMP and SIOCGHWTSTAMP")
      Signed-off-by: default avatarMichael Walle <michael@walle.cc>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20220328220350.3118969-1-michael@walle.cc
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      ad7da1ce
    • Paolo Abeni's avatar
      Merge branch 'fix-uaf-bugs-caused-by-ax25_release' · 807ca64e
      Paolo Abeni authored
      Duoming Zhou says:
      
      ====================
      Fix UAF bugs caused by ax25_release()
      
      The first patch fixes UAF bugs in ax25_send_control, and
      the second patch fixes UAF bugs in ax25 timers.
      ====================
      
      Link: https://lore.kernel.org/r/cover.1648472006.git.duoming@zju.edu.cn
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      807ca64e
    • Duoming Zhou's avatar
      ax25: Fix UAF bugs in ax25 timers · 82e31755
      Duoming Zhou authored
      
      
      There are race conditions that may lead to UAF bugs in
      ax25_heartbeat_expiry(), ax25_t1timer_expiry(), ax25_t2timer_expiry(),
      ax25_t3timer_expiry() and ax25_idletimer_expiry(), when we call
      ax25_release() to deallocate ax25_dev.
      
      One of the UAF bugs caused by ax25_release() is shown below:
      
            (Thread 1)                    |      (Thread 2)
      ax25_dev_device_up() //(1)          |
      ...                                 | ax25_kill_by_device()
      ax25_bind()          //(2)          |
      ax25_connect()                      | ...
       ax25_std_establish_data_link()     |
        ax25_start_t1timer()              | ax25_dev_device_down() //(3)
         mod_timer(&ax25->t1timer,..)     |
                                          | ax25_release()
         (wait a time)                    |  ...
                                          |  ax25_dev_put(ax25_dev) //(4)FREE
         ax25_t1timer_expiry()            |
          ax25->ax25_dev->values[..] //USE|  ...
           ...                            |
      
      We increase the refcount of ax25_dev in position (1) and (2), and
      decrease the refcount of ax25_dev in position (3) and (4).
      The ax25_dev will be freed in position (4) and be used in
      ax25_t1timer_expiry().
      
      The fail log is shown below:
      ==============================================================
      
      [  106.116942] BUG: KASAN: use-after-free in ax25_t1timer_expiry+0x1c/0x60
      [  106.116942] Read of size 8 at addr ffff88800bda9028 by task swapper/0/0
      [  106.116942] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-06123-g0905eec574
      [  106.116942] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-14
      [  106.116942] Call Trace:
      ...
      [  106.116942]  ax25_t1timer_expiry+0x1c/0x60
      [  106.116942]  call_timer_fn+0x122/0x3d0
      [  106.116942]  __run_timers.part.0+0x3f6/0x520
      [  106.116942]  run_timer_softirq+0x4f/0xb0
      [  106.116942]  __do_softirq+0x1c2/0x651
      ...
      
      This patch adds del_timer_sync() in ax25_release(), which could ensure
      that all timers stop before we deallocate ax25_dev.
      
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      82e31755
    • Duoming Zhou's avatar
      ax25: fix UAF bug in ax25_send_control() · 5352a761
      Duoming Zhou authored
      There are UAF bugs in ax25_send_control(), when we call ax25_release()
      to deallocate ax25_dev. The possible race condition is shown below:
      
            (Thread 1)              |     (Thread 2)
      ax25_dev_device_up() //(1)    |
                                    | ax25_kill_by_device()
      ax25_bind()          //(2)    |
      ax25_connect()                | ...
       ax25->state = AX25_STATE_1   |
       ...                          | ax25_dev_device_down() //(3)
      
            (Thread 3)
      ax25_release()                |
       ax25_dev_put()  //(4) FREE   |
       case AX25_STATE_1:           |
        ax25_send_control()         |
         alloc_skb()       //USE    |
      
      The refcount of ax25_dev increases in position (1) and (2), and
      decreases in position (3) and (4). The ax25_dev will be freed
      before dereference sites in ax25_send_control().
      
      The following is part of the report:
      
      [  102.297448] BUG: KASAN: use-after-free in ax25_send_control+0x33/0x210
      [  102.297448] Read of size 8 at addr ffff888009e6e408 by task ax25_close/602
      [  102.297448] Call Trace:
      [  102.303751]  ax25_send_control+0x33/0x210
      [  102.303751]  ax25_release+0x356/0x450
      [  102.305431]  __sock_release+0x6d/0x120
      [  102.305431]  sock_close+0xf/0x20
      [  102.305431]  __fput+0x11f/0x420
      [  102.305431]  task_work_run+0x86/0xd0
      [  102.307130]  get_signal+0x1075/0x1220
      [  102.308253]  arch_do_signal_or_restart+0x1df/0xc00
      [  102.308253]  exit_to_user_mode_prepare+0x150/0x1e0
      [  102.308253]  syscall_exit_to_user_mode+0x19/0x50
      [  102.308253]  do_syscall_64+0x48/0x90
      [  102.308253]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  102.308253] RIP: 0033:0x405ae7
      
      This patch defers the free operation of ax25_dev and net_device after
      all corresponding dereference sites in ax25_release() to avoid UAF.
      
      Fixes: 9fd75b66
      
       ("ax25: Fix refcount leaks caused by ax25_cb_del()")
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5352a761
    • Martin Varghese's avatar
      openvswitch: Fixed nd target mask field in the flow dump. · f19c4445
      Martin Varghese authored
      IPv6 nd target mask was not getting populated in flow dump.
      
      In the function __ovs_nla_put_key the icmp code mask field was checked
      instead of icmp code key field to classify the flow as neighbour discovery.
      
      ufid:bdfbe3e5-60c2-43b0-a5ff-dfcac1c37328, recirc_id(0),dp_hash(0/0),
      skb_priority(0/0),in_port(ovs-nm1),skb_mark(0/0),ct_state(0/0),
      ct_zone(0/0),ct_mark(0/0),ct_label(0/0),
      eth(src=00:00:00:00:00:00/00:00:00:00:00:00,
      dst=00:00:00:00:00:00/00:00:00:00:00:00),
      eth_type(0x86dd),
      ipv6(src=::/::,dst=::/::,label=0/0,proto=58,tclass=0/0,hlimit=0/0,frag=no),
      icmpv6(type=135,code=0),
      nd(target=2001::2/::,
      sll=00:00:00:00:00:00/00:00:00:00:00:00,
      tll=00:00:00:00:00:00/00:00:00:00:00:00),
      packets:10, bytes:860, used:0.504s, dp:ovs, actions:ovs-nm2
      
      Fixes: e6445719
      
       (openvswitch: Restructure datapath.c and flow.c)
      Signed-off-by: default avatarMartin Varghese <martin.varghese@nokia.com>
      Link: https://lore.kernel.org/r/20220328054148.3057-1-martinvarghesenokia@gmail.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f19c4445
    • Yonghong Song's avatar
      selftests/bpf: Fix clang compilation errors · ccaff3d5
      Yonghong Song authored
      llvm upstream patch ([1]) added to issue warning for code like
        void test() {
          int j = 0;
          for (int i = 0; i < 1000; i++)
                  j++;
          return;
        }
      
      This triggered several errors in selftests/bpf build since
      compilation flag -Werror is used.
        ...
        test_lpm_map.c:212:15: error: variable 'n_matches' set but not used [-Werror,-Wunused-but-set-variable]
              size_t i, j, n_matches, n_matches_after_delete, n_nodes, n_lookups;
                           ^
        test_lpm_map.c:212:26: error: variable 'n_matches_after_delete' set but not used [-Werror,-Wunused-but-set-variable]
              size_t i, j, n_matches, n_matches_after_delete, n_nodes, n_lookups;
                                      ^
        ...
        prog_tests/get_stack_raw_tp.c:32:15: error: variable 'cnt' set but not used [-Werror,-Wunused-but-set-variable]
              static __u64 cnt;
                           ^
        ...
      
        For test_lpm_map.c, 'n_matches'/'n_matches_after_delete' are changed to be volatile
        in order to silent the warning. I didn't remove these two declarations since
        they are referenced in a commented code which might be used by people in certain
        cases. For get_stack_raw_tp.c, the variable 'cnt' is removed.
      
        [1] https://reviews.llvm.org/D122271
      
      
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220325200304.2915588-1-yhs@fb.com
      ccaff3d5
    • Alexei Starovoitov's avatar
      Merge branch 'xsk: another round of fixes' · 9e928831
      Alexei Starovoitov authored
      
      
      Maciej Fijalkowski says:
      
      ====================
      
      Hello,
      
      yet another fixes for XSK from Magnus and me.
      
      Magnus addresses the fact that xp_alloc() can return NULL, so this needs
      to be handled to avoid clearing entries in the SW ring on driver side.
      Then he addresses the off-by-one problem in Tx desc cleaning routine for
      ice ZC driver.
      
      From my side, I am adding protection to ZC Rx processing loop so that
      cleaning of descriptors wouldn't go over already processed entries.
      Then I also fix an issue with assigning XSK pool to Tx queues.
      
      This is directed to bpf tree.
      
      Thanks!
      
      Maciej Fijalkowski (2):
        ice: xsk: stop Rx processing when ntc catches ntu
        ice: xsk: fix indexing in ice_tx_xsk_pool()
      ====================
      
      Acked-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9e928831
    • Maciej Fijalkowski's avatar
      ice: xsk: Fix indexing in ice_tx_xsk_pool() · 1ac2524d
      Maciej Fijalkowski authored
      Ice driver tries to always create XDP rings array to be
      num_possible_cpus() sized, regardless of user's queue count setting that
      can be changed via ethtool -L for example.
      
      Currently, ice_tx_xsk_pool() calculates the qid by decrementing the
      ring->q_index by the count of XDP queues, but ring->q_index is set to 'i
      + vsi->alloc_txq'.
      
      When user did ethtool -L $IFACE combined 1, alloc_txq is 1, but
      vsi->num_xdp_txq is still num_possible_cpus(). Then, ice_tx_xsk_pool()
      will do OOB access and in the final result ring would not get xsk_pool
      pointer assigned. Then, each ice_xsk_wakeup() call will fail with error
      and it will not be possible to get into NAPI and do the processing from
      driver side.
      
      Fix this by decrementing vsi->alloc_txq instead of vsi->num_xdp_txq from
      ring-q_index in ice_tx_xsk_pool() so the calculation is reflected to the
      setting of ring->q_index.
      
      Fixes: 22bf877e
      
       ("ice: introduce XDP_TX fallback path")
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220328142123.170157-5-maciej.fijalkowski@intel.com
      1ac2524d
    • Maciej Fijalkowski's avatar
      ice: xsk: Stop Rx processing when ntc catches ntu · 0ec17130
      Maciej Fijalkowski authored
      This can happen with big budget values and some breakage of re-filling
      descriptors as we do not clear the entry that ntu is pointing at the end
      of ice_alloc_rx_bufs_zc. So if ntc is at ntu then it might be the case
      that status_error0 has an old, uncleared value and ntc would go over
      with processing which would result in false results.
      
      Break Rx loop when ntc == ntu to avoid broken behavior.
      
      Fixes: 3876ff52
      
       ("ice: xsk: Handle SW XDP ring wrap and bump tail more often")
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220328142123.170157-4-maciej.fijalkowski@intel.com
      0ec17130
    • Magnus Karlsson's avatar
      ice: xsk: Eliminate unnecessary loop iteration · 30d19d57
      Magnus Karlsson authored
      The NIC Tx ring completion routine cleans entries from the ring in
      batches. However, it processes one more batch than it is supposed
      to. Note that this does not matter from a functionality point of view
      since it will not find a set DD bit for the next batch and just exit
      the loop. But from a performance perspective, it is faster to
      terminate the loop before and not issue an expensive read over PCIe to
      get the DD bit.
      
      Fixes: 126cdfe1
      
       ("ice: xsk: Improve AF_XDP ZC Tx and use batching API")
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220328142123.170157-3-maciej.fijalkowski@intel.com
      30d19d57
    • Magnus Karlsson's avatar
      xsk: Do not write NULL in SW ring at allocation failure · a95a4d9b
      Magnus Karlsson authored
      For the case when xp_alloc_batch() is used but the batched allocation
      cannot be used, there is a slow path that uses the non-batched
      xp_alloc(). When it fails to allocate an entry, it returns NULL. The
      current code wrote this NULL into the entry of the provided results
      array (pointer to the driver SW ring usually) and returned. This might
      not be what the driver expects and to make things simpler, just write
      successfully allocated xdp_buffs into the SW ring,. The driver might
      have information in there that is still important after an allocation
      failure.
      
      Note that at this point in time, there are no drivers using
      xp_alloc_batch() that could trigger this slow path. But one might get
      added.
      
      Fixes: 47e4075d
      
       ("xsk: Batched buffer allocation for the pool")
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220328142123.170157-2-maciej.fijalkowski@intel.com
      a95a4d9b
    • Alexei Starovoitov's avatar
      Merge branch 'kprobes: rethook: x86: Replace kretprobe trampoline with rethook' · 7df482e6
      Alexei Starovoitov authored
      Masami Hiramatsu says:
      
      ====================
      Here are the 3rd version for generic kretprobe and kretprobe on x86 for
      replacing the kretprobe trampoline with rethook. The previous version
      is here[1]
      
      [1] https://lore.kernel.org/all/164821817332.2373735.12048266953420821089.stgit@devnote2/T/#u
      
      This version fixed typo and build issues for bpf-next and CONFIG_RETHOOK=y
      error. I also add temporary mitigation lines for ANNOTATE_NOENDBR macro
      issue for bpf-next tree [2/4].
      
      This will be removed after merging kernel IBT series.
      
      Background:
      
      This rethook came from Jiri's request of multiple kprobe for bpf[2].
      He tried to solve an issue that starting bpf with multiple kprobe will
      take a long time because bpf-kprobe will wait for RCU grace period for
      sync rcu events.
      
      Jiri wanted to attach a single bpf handler to multiple kprobes and
      he tried to introduce multiple-probe interface to kprobe. So I asked
      him to use ftrace and kretprobe-like hook if it is only for the
      function entry and exit, instead of adding ad-hoc interface
      to kprobes.
      For this purpose, I introduced the fprobe (kprobe like interface for
      ftrace) with the rethook (this is a generic return hook feature for
      fprobe exit handler)[3].
      
      [2] https://lore.kernel.org/all/20220104080943.113249-1-jolsa@kernel.org/T/#u
      [3] https://lore.kernel.org/all/164191321766.806991.7930388561276940676.stgit@devnote2/T/#u
      
      
      
      The rethook is basically same as the kretprobe trampoline. I just made
      it decoupled from kprobes. Eventually, the all arch dependent kretprobe
      trampolines will be replaced with the rethook trampoline instead of
      cloning and set HAVE_RETHOOK=y.
      When I port the rethook for all arch which supports kretprobe, the
      legacy kretprobe specific code (which is for CONFIG_KRETPROBE_ON_RETHOOK=n)
      will be removed eventually.
      ====================
      
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7df482e6