Skip to content
  1. Dec 07, 2021
    • Eric Dumazet's avatar
    • Eric Dumazet's avatar
    • Eric Dumazet's avatar
    • Eric Dumazet's avatar
    • Eric Dumazet's avatar
    • Eric Dumazet's avatar
    • Eric Dumazet's avatar
    • Eric Dumazet's avatar
    • Eric Dumazet's avatar
      sit: add net device refcount tracking to ip_tunnel · c0fd407a
      Eric Dumazet authored
      
      
      Note that other ip_tunnel users do not seem to hold a reference
      on tunnel->dev. Probably needs some investigations.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c0fd407a
    • Eric Dumazet's avatar
    • Eric Dumazet's avatar
      net: dst: add net device refcount tracking to dst_entry · 9038c320
      Eric Dumazet authored
      
      
      We want to track all dev_hold()/dev_put() to ease leak hunting.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9038c320
    • Eric Dumazet's avatar
      drop_monitor: add net device refcount tracker · 4dbd24f6
      Eric Dumazet authored
      
      
      We want to track all dev_hold()/dev_put() to ease leak hunting.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4dbd24f6
    • Eric Dumazet's avatar
    • Eric Dumazet's avatar
      net: add net device refcount tracker to ethtool_phys_id() · 5ae21950
      Eric Dumazet authored
      
      
      This helper might hold a netdev reference for a long time,
      lets add reference tracking.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5ae21950
    • Eric Dumazet's avatar
      net: add net device refcount tracker to struct netdev_queue · 0b688f24
      Eric Dumazet authored
      
      
      This will help debugging pesky netdev reference leaks.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0b688f24
    • Eric Dumazet's avatar
      net: add net device refcount tracker to struct netdev_rx_queue · 80e8921b
      Eric Dumazet authored
      
      
      This helps debugging net device refcount leaks.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      80e8921b
    • Eric Dumazet's avatar
      net: add net device refcount tracker infrastructure · 4d92b95f
      Eric Dumazet authored
      
      
      net device are refcounted. Over the years we had numerous bugs
      caused by imbalanced dev_hold() and dev_put() calls.
      
      The general idea is to be able to precisely pair each decrement with
      a corresponding prior increment. Both share a cookie, basically
      a pointer to private data storing stack traces.
      
      This patch adds dev_hold_track() and dev_put_track().
      
      To use these helpers, each data structure owning a refcount
      should also use a "netdevice_tracker" to pair the hold and put.
      
      netdevice_tracker dev_tracker;
      ...
      dev_hold_track(dev, &dev_tracker, GFP_ATOMIC);
      ...
      dev_put_track(dev, &dev_tracker);
      
      Whenever a leak happens, we will get precise stack traces
      of the point dev_hold_track() happened, at device dismantle phase.
      
      We will also get a stack trace if too many dev_put_track() for the same
      netdevice_tracker are attempted.
      
      This is guarded by CONFIG_NET_DEV_REFCNT_TRACKER option.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4d92b95f
    • Eric Dumazet's avatar
      lib: add tests for reference tracker · 914a7b50
      Eric Dumazet authored
      
      
      This module uses reference tracker, forcing two issues.
      
      1) Double free of a tracker
      
      2) leak of two trackers, one being allocated from softirq context.
      
      "modprobe test_ref_tracker" would emit the following traces.
      (Use scripts/decode_stacktrace.sh if necessary)
      
      [  171.648681] reference already released.
      [  171.653213] allocated in:
      [  171.656523]  alloctest_ref_tracker_alloc2+0x1c/0x20 [test_ref_tracker]
      [  171.656526]  init_module+0x86/0x1000 [test_ref_tracker]
      [  171.656528]  do_one_initcall+0x9c/0x220
      [  171.656532]  do_init_module+0x60/0x240
      [  171.656536]  load_module+0x32b5/0x3610
      [  171.656538]  __do_sys_init_module+0x148/0x1a0
      [  171.656540]  __x64_sys_init_module+0x1d/0x20
      [  171.656542]  do_syscall_64+0x4a/0xb0
      [  171.656546]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  171.656549] freed in:
      [  171.659520]  alloctest_ref_tracker_free+0x13/0x20 [test_ref_tracker]
      [  171.659522]  init_module+0xec/0x1000 [test_ref_tracker]
      [  171.659523]  do_one_initcall+0x9c/0x220
      [  171.659525]  do_init_module+0x60/0x240
      [  171.659527]  load_module+0x32b5/0x3610
      [  171.659529]  __do_sys_init_module+0x148/0x1a0
      [  171.659532]  __x64_sys_init_module+0x1d/0x20
      [  171.659534]  do_syscall_64+0x4a/0xb0
      [  171.659536]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  171.659575] ------------[ cut here ]------------
      [  171.659576] WARNING: CPU: 5 PID: 13016 at lib/ref_tracker.c:112 ref_tracker_free+0x224/0x270
      [  171.659581] Modules linked in: test_ref_tracker(+)
      [  171.659591] CPU: 5 PID: 13016 Comm: modprobe Tainted: G S                5.16.0-smp-DEV #290
      [  171.659595] RIP: 0010:ref_tracker_free+0x224/0x270
      [  171.659599] Code: 5e 41 5f 5d c3 48 c7 c7 04 9c 74 a6 31 c0 e8 62 ee 67 00 83 7b 14 00 75 1a 83 7b 18 00 75 30 4c 89 ff 4c 89 f6 e8 9c 00 69 00 <0f> 0b bb ea ff ff ff eb ae 48 c7 c7 3a 0a 77 a6 31 c0 e8 34 ee 67
      [  171.659601] RSP: 0018:ffff89058ba0bbd0 EFLAGS: 00010286
      [  171.659603] RAX: 0000000000000029 RBX: ffff890586b19780 RCX: 08895bff57c7d100
      [  171.659604] RDX: c0000000ffff7fff RSI: 0000000000000282 RDI: ffffffffc0407000
      [  171.659606] RBP: ffff89058ba0bc88 R08: 0000000000000000 R09: ffffffffa6f342e0
      [  171.659607] R10: 00000000ffff7fff R11: 0000000000000000 R12: 000000008f000000
      [  171.659608] R13: 0000000000000014 R14: 0000000000000282 R15: ffffffffc0407000
      [  171.659609] FS:  00007f97ea29d740(0000) GS:ffff8923ff940000(0000) knlGS:0000000000000000
      [  171.659611] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  171.659613] CR2: 00007f97ea299000 CR3: 0000000186b4a004 CR4: 00000000001706e0
      [  171.659614] Call Trace:
      [  171.659615]  <TASK>
      [  171.659631]  ? alloctest_ref_tracker_free+0x13/0x20 [test_ref_tracker]
      [  171.659633]  ? init_module+0x105/0x1000 [test_ref_tracker]
      [  171.659636]  ? do_one_initcall+0x9c/0x220
      [  171.659638]  ? do_init_module+0x60/0x240
      [  171.659641]  ? load_module+0x32b5/0x3610
      [  171.659644]  ? __do_sys_init_module+0x148/0x1a0
      [  171.659646]  ? __x64_sys_init_module+0x1d/0x20
      [  171.659649]  ? do_syscall_64+0x4a/0xb0
      [  171.659652]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  171.659656]  ? 0xffffffffc040a000
      [  171.659658]  alloctest_ref_tracker_free+0x13/0x20 [test_ref_tracker]
      [  171.659660]  init_module+0x105/0x1000 [test_ref_tracker]
      [  171.659663]  do_one_initcall+0x9c/0x220
      [  171.659666]  do_init_module+0x60/0x240
      [  171.659669]  load_module+0x32b5/0x3610
      [  171.659672]  __do_sys_init_module+0x148/0x1a0
      [  171.659676]  __x64_sys_init_module+0x1d/0x20
      [  171.659678]  do_syscall_64+0x4a/0xb0
      [  171.659694]  ? exc_page_fault+0x6e/0x140
      [  171.659696]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  171.659698] RIP: 0033:0x7f97ea3dbe7a
      [  171.659700] Code: 48 8b 0d 61 8d 06 00 f7 d8 64 89 01 48 83 c8 ff c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2e 8d 06 00 f7 d8 64 89 01 48
      [  171.659701] RSP: 002b:00007ffea67ce608 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
      [  171.659703] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f97ea3dbe7a
      [  171.659704] RDX: 00000000013a0ba0 RSI: 0000000000002808 RDI: 00007f97ea299000
      [  171.659705] RBP: 00007ffea67ce670 R08: 0000000000000003 R09: 0000000000000000
      [  171.659706] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000013a1048
      [  171.659707] R13: 00000000013a0ba0 R14: 0000000001399930 R15: 00000000013a1030
      [  171.659709]  </TASK>
      [  171.659710] ---[ end trace f5dbd6afa41e60a9 ]---
      [  171.659712] leaked reference.
      [  171.663393]  alloctest_ref_tracker_alloc0+0x1c/0x20 [test_ref_tracker]
      [  171.663395]  test_ref_tracker_timer_func+0x9/0x20 [test_ref_tracker]
      [  171.663397]  call_timer_fn+0x31/0x140
      [  171.663401]  expire_timers+0x46/0x110
      [  171.663403]  __run_timers+0x16f/0x1b0
      [  171.663404]  run_timer_softirq+0x1d/0x40
      [  171.663406]  __do_softirq+0x148/0x2d3
      [  171.663408] leaked reference.
      [  171.667101]  alloctest_ref_tracker_alloc1+0x1c/0x20 [test_ref_tracker]
      [  171.667103]  init_module+0x81/0x1000 [test_ref_tracker]
      [  171.667104]  do_one_initcall+0x9c/0x220
      [  171.667106]  do_init_module+0x60/0x240
      [  171.667108]  load_module+0x32b5/0x3610
      [  171.667111]  __do_sys_init_module+0x148/0x1a0
      [  171.667113]  __x64_sys_init_module+0x1d/0x20
      [  171.667115]  do_syscall_64+0x4a/0xb0
      [  171.667117]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  171.667131] ------------[ cut here ]------------
      [  171.667132] WARNING: CPU: 5 PID: 13016 at lib/ref_tracker.c:30 ref_tracker_dir_exit+0x104/0x130
      [  171.667136] Modules linked in: test_ref_tracker(+)
      [  171.667144] CPU: 5 PID: 13016 Comm: modprobe Tainted: G S      W         5.16.0-smp-DEV #290
      [  171.667147] RIP: 0010:ref_tracker_dir_exit+0x104/0x130
      [  171.667150] Code: 01 00 00 00 00 ad de 48 89 03 4c 89 63 08 48 89 df e8 20 a0 d5 ff 4c 89 f3 4d 39 ee 75 a8 4c 89 ff 48 8b 75 d0 e8 7c 05 69 00 <0f> 0b eb 0c 4c 89 ff 48 8b 75 d0 e8 6c 05 69 00 41 8b 47 08 83 f8
      [  171.667151] RSP: 0018:ffff89058ba0bc68 EFLAGS: 00010286
      [  171.667154] RAX: 08895bff57c7d100 RBX: ffffffffc0407010 RCX: 000000000000003b
      [  171.667156] RDX: 000000000000003c RSI: 0000000000000282 RDI: ffffffffc0407000
      [  171.667157] RBP: ffff89058ba0bc98 R08: 0000000000000000 R09: ffffffffa6f342e0
      [  171.667159] R10: 00000000ffff7fff R11: 0000000000000000 R12: dead000000000122
      [  171.667160] R13: ffffffffc0407010 R14: ffffffffc0407010 R15: ffffffffc0407000
      [  171.667162] FS:  00007f97ea29d740(0000) GS:ffff8923ff940000(0000) knlGS:0000000000000000
      [  171.667164] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  171.667166] CR2: 00007f97ea299000 CR3: 0000000186b4a004 CR4: 00000000001706e0
      [  171.667169] Call Trace:
      [  171.667170]  <TASK>
      [  171.667171]  ? 0xffffffffc040a000
      [  171.667173]  init_module+0x126/0x1000 [test_ref_tracker]
      [  171.667175]  do_one_initcall+0x9c/0x220
      [  171.667179]  do_init_module+0x60/0x240
      [  171.667182]  load_module+0x32b5/0x3610
      [  171.667186]  __do_sys_init_module+0x148/0x1a0
      [  171.667189]  __x64_sys_init_module+0x1d/0x20
      [  171.667192]  do_syscall_64+0x4a/0xb0
      [  171.667194]  ? exc_page_fault+0x6e/0x140
      [  171.667196]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  171.667199] RIP: 0033:0x7f97ea3dbe7a
      [  171.667200] Code: 48 8b 0d 61 8d 06 00 f7 d8 64 89 01 48 83 c8 ff c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2e 8d 06 00 f7 d8 64 89 01 48
      [  171.667201] RSP: 002b:00007ffea67ce608 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
      [  171.667203] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f97ea3dbe7a
      [  171.667204] RDX: 00000000013a0ba0 RSI: 0000000000002808 RDI: 00007f97ea299000
      [  171.667205] RBP: 00007ffea67ce670 R08: 0000000000000003 R09: 0000000000000000
      [  171.667206] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000013a1048
      [  171.667207] R13: 00000000013a0ba0 R14: 0000000001399930 R15: 00000000013a1030
      [  171.667209]  </TASK>
      [  171.667210] ---[ end trace f5dbd6afa41e60aa ]---
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      914a7b50
    • Eric Dumazet's avatar
      lib: add reference counting tracking infrastructure · 4e66934e
      Eric Dumazet authored
      
      
      It can be hard to track where references are taken and released.
      
      In networking, we have annoying issues at device or netns dismantles,
      and we had various proposals to ease root causing them.
      
      This patch adds new infrastructure pairing refcount increases
      and decreases. This will self document code, because programmers
      will have to associate increments/decrements.
      
      This is controled by CONFIG_REF_TRACKER which can be selected
      by users of this feature.
      
      This adds both cpu and memory costs, and thus should probably be
      used with care.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4e66934e
  2. Dec 04, 2021
  3. Dec 03, 2021