Skip to content
  1. Feb 12, 2022
    • Yonghong Song's avatar
      bpf: Fix a bpf_timer initialization issue · 5eaed6ee
      Yonghong Song authored
      The patch in [1] intends to fix a bpf_timer related issue,
      but the fix caused existing 'timer' selftest to fail with
      hang or some random errors. After some debug, I found
      an issue with check_and_init_map_value() in the hashtab.c.
      More specifically, in hashtab.c, we have code
        l_new = bpf_map_kmalloc_node(&htab->map, ...)
        check_and_init_map_value(&htab->map, l_new...)
      Note that bpf_map_kmalloc_node() does not do initialization
      so l_new contains random value.
      
      The function check_and_init_map_value() intends to zero the
      bpf_spin_lock and bpf_timer if they exist in the map.
      But I found bpf_spin_lock is zero'ed but bpf_timer is not zero'ed.
      With [1], later copy_map_value() skips copying of
      bpf_spin_lock and bpf_timer. The non-zero bpf_timer caused
      random failures for 'timer' selftest.
      Without [1], for both bpf_spin_lock and bpf_timer case,
      bpf_timer will be zero'ed, so 'timer' self test is okay.
      
      For check_and_init_map_value(), why bpf_spin_lock is zero'ed
      properly while bpf_timer not. In bpf uapi header, we have
        struct bpf_spin_lock {
              __u32   val;
        };
        struct bpf_timer {
              __u64 :64;
              __u64 :64;
        } __attribute__((aligned(8)));
      
      The initialization code:
        *(struct bpf_spin_lock *)(dst + map->spin_lock_off) =
            (struct bpf_spin_lock){};
        *(struct bpf_timer *)(dst + map->timer_off) =
            (struct bpf_timer){};
      It appears the compiler has no obligation to initialize anonymous fields.
      For example, let us use clang with bpf target as below:
        $ cat t.c
        struct bpf_timer {
              unsigned long long :64;
        };
        struct bpf_timer2 {
              unsigned long long a;
        };
      
        void test(struct bpf_timer *t) {
          *t = (struct bpf_timer){};
        }
        void test2(struct bpf_timer2 *t) {
          *t = (struct bpf_timer2){};
        }
        $ clang -target bpf -O2 -c -g t.c
        $ llvm-objdump -d t.o
         ...
         0000000000000000 <test>:
             0:       95 00 00 00 00 00 00 00 exit
         0000000000000008 <test2>:
             1:       b7 02 00 00 00 00 00 00 r2 = 0
             2:       7b 21 00 00 00 00 00 00 *(u64 *)(r1 + 0) = r2
             3:       95 00 00 00 00 00 00 00 exit
      
      gcc11.2 does not have the above issue. But from
        INTERNATIONAL STANDARD ©ISO/IEC ISO/IEC 9899:201x
        Programming languages — C
        http://www.open-std.org/Jtc1/sc22/wg14/www/docs/n1547.pdf
        page 157:
        Except where explicitly stated otherwise, for the purposes of
        this subclause unnamed members of objects of structure and union
        type do not participate in initialization. Unnamed members of
        structure objects have indeterminate value even after initialization.
      
      To fix the problem, let use memset for bpf_timer case in
      check_and_init_map_value(). For consistency, memset is also
      used for bpf_spin_lock case.
      
        [1] https://lore.kernel.org/bpf/20220209070324.1093182-2-memxor@gmail.com/
      
      Fixes: 68134668
      
       ("bpf: Add map side support for bpf timers.")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220211194953.3142152-1-yhs@fb.com
      5eaed6ee
    • Yonghong Song's avatar
      bpf: Emit bpf_timer in vmlinux BTF · 3bd916ee
      Yonghong Song authored
      
      
      Currently the following code in check_and_init_map_value()
        *(struct bpf_timer *)(dst + map->timer_off) =
            (struct bpf_timer){};
      can help generate bpf_timer definition in vmlinuxBTF.
      But the code above may not zero the whole structure
      due to anonymour members and that code will be replaced
      by memset in the subsequent patch and
      bpf_timer definition will disappear from vmlinuxBTF.
      Let us emit the type explicitly so bpf program can continue
      to use it from vmlinux.h.
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220211194948.3141529-1-yhs@fb.com
      3bd916ee
    • Alexei Starovoitov's avatar
      Merge branch 'Fix for crash due to overwrite in copy_map_value' · acc3c473
      Alexei Starovoitov authored
      Kumar Kartikeya says:
      
      ====================
      
      A fix for an oversight in copy_map_value that leads to kernel crash.
      
      Also, a question for BPF developers:
      It seems in arraymap.c, we always do check_and_free_timer_in_array after we do
      copy_map_value in map_update_elem callback, but the same is not done for
      hashtab.c. Is there a specific reason for this difference in behavior, or did I
      miss that it happens for hashtab.c as well?
      
      Changlog:
      ---------
      v1 -> v2:
      v1: https://lore.kernel.org/bpf/20220209051113.870717-1-memxor@gmail.com
      
      
      
       * Fix build error for selftests patch due to missing SYS_PREFIX in bpf tree
      ====================
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      acc3c473
    • Kumar Kartikeya Dwivedi's avatar
      selftests/bpf: Add test for bpf_timer overwriting crash · a7e75016
      Kumar Kartikeya Dwivedi authored
      
      
      Add a test that validates that timer value is not overwritten when doing
      a copy_map_value call in the kernel. Without the prior fix, this test
      triggers a crash.
      
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220209070324.1093182-3-memxor@gmail.com
      a7e75016
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Fix crash due to incorrect copy_map_value · a8abb0c3
      Kumar Kartikeya Dwivedi authored
      When both bpf_spin_lock and bpf_timer are present in a BPF map value,
      copy_map_value needs to skirt both objects when copying a value into and
      out of the map. However, the current code does not set both s_off and
      t_off in copy_map_value, which leads to a crash when e.g. bpf_spin_lock
      is placed in map value with bpf_timer, as bpf_map_update_elem call will
      be able to overwrite the other timer object.
      
      When the issue is not fixed, an overwriting can produce the following
      splat:
      
      [root@(none) bpf]# ./test_progs -t timer_crash
      [   15.930339] bpf_testmod: loading out-of-tree module taints kernel.
      [   16.037849] ==================================================================
      [   16.038458] BUG: KASAN: user-memory-access in __pv_queued_spin_lock_slowpath+0x32b/0x520
      [   16.038944] Write of size 8 at addr 0000000000043ec0 by task test_progs/325
      [   16.039399]
      [   16.039514] CPU: 0 PID: 325 Comm: test_progs Tainted: G           OE     5.16.0+ #278
      [   16.039983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
      [   16.040485] Call Trace:
      [   16.040645]  <TASK>
      [   16.040805]  dump_stack_lvl+0x59/0x73
      [   16.041069]  ? __pv_queued_spin_lock_slowpath+0x32b/0x520
      [   16.041427]  kasan_report.cold+0x116/0x11b
      [   16.041673]  ? __pv_queued_spin_lock_slowpath+0x32b/0x520
      [   16.042040]  __pv_queued_spin_lock_slowpath+0x32b/0x520
      [   16.042328]  ? memcpy+0x39/0x60
      [   16.042552]  ? pv_hash+0xd0/0xd0
      [   16.042785]  ? lockdep_hardirqs_off+0x95/0xd0
      [   16.043079]  __bpf_spin_lock_irqsave+0xdf/0xf0
      [   16.043366]  ? bpf_get_current_comm+0x50/0x50
      [   16.043608]  ? jhash+0x11a/0x270
      [   16.043848]  bpf_timer_cancel+0x34/0xe0
      [   16.044119]  bpf_prog_c4ea1c0f7449940d_sys_enter+0x7c/0x81
      [   16.044500]  bpf_trampoline_6442477838_0+0x36/0x1000
      [   16.044836]  __x64_sys_nanosleep+0x5/0x140
      [   16.045119]  do_syscall_64+0x59/0x80
      [   16.045377]  ? lock_is_held_type+0xe4/0x140
      [   16.045670]  ? irqentry_exit_to_user_mode+0xa/0x40
      [   16.046001]  ? mark_held_locks+0x24/0x90
      [   16.046287]  ? asm_exc_page_fault+0x1e/0x30
      [   16.046569]  ? asm_exc_page_fault+0x8/0x30
      [   16.046851]  ? lockdep_hardirqs_on+0x7e/0x100
      [   16.047137]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [   16.047405] RIP: 0033:0x7f9e4831718d
      [   16.047602] Code: b4 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b3 6c 0c 00 f7 d8 64 89 01 48
      [   16.048764] RSP: 002b:00007fff488086b8 EFLAGS: 00000206 ORIG_RAX: 0000000000000023
      [   16.049275] RAX: ffffffffffffffda RBX: 00007f9e48683740 RCX: 00007f9e4831718d
      [   16.049747] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007fff488086d0
      [   16.050225] RBP: 00007fff488086f0 R08: 00007fff488085d7 R09: 00007f9e4cb594a0
      [   16.050648] R10: 0000000000000000 R11: 0000000000000206 R12: 00007f9e484cde30
      [   16.051124] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      [   16.051608]  </TASK>
      [   16.051762] ==================================================================
      
      Fixes: 68134668
      
       ("bpf: Add map side support for bpf timers.")
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220209070324.1093182-2-memxor@gmail.com
      a8abb0c3
  2. Feb 11, 2022
  3. Feb 10, 2022
  4. Feb 09, 2022