Skip to content
  1. Feb 05, 2019
    • Ursula Braun's avatar
      net/smc: fix sender_free computation · b8649efa
      Ursula Braun authored
      In some scenarios a separate consumer cursor update is necessary.
      The decision is made in smc_tx_consumer_cursor_update(). The
      sender_free computation could be wrong:
      
      The rx confirmed cursor is always smaller than or equal to the
      rx producer cursor. The parameters in the smc_curs_diff() call
      have to be exchanged, otherwise sender_free might even be negative.
      
      And if more data arrives local_rx_ctrl.prod might be updated, enabling
      a cursor difference between local_rx_ctrl.prod and rx confirmed cursor
      larger than the RMB size. This case is not covered by smc_curs_diff().
      Thus function smc_curs_diff_large() is introduced here.
      
      If a recvmsg() is processed in parallel, local_tx_ctrl.cons might
      change during smc_cdc_msg_send. Make sure rx_curs_confirmed is updated
      with the actually sent local_tx_ctrl.cons value.
      
      Fixes: e82f2e31
      
       ("net/smc: optimize consumer cursor updates")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8649efa
    • Ursula Braun's avatar
      net/smc: preallocated memory for rdma work requests · ad6f317f
      Ursula Braun authored
      
      
      The work requests for rdma writes are built in local variables within
      function smc_tx_rdma_write(). This violates the rule that the work
      request storage has to stay till the work request is confirmed by
      a completion queue response.
      This patch introduces preallocated memory for these work requests.
      The storage is allocated, once a link (and thus a queue pair) is
      established.
      
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad6f317f
    • Sebastian Andrzej Siewior's avatar
      net: dp83640: expire old TX-skb · 53bc8d2a
      Sebastian Andrzej Siewior authored
      During sendmsg() a cloned skb is saved via dp83640_txtstamp() in
      ->tx_queue. After the NIC sends this packet, the PHY will reply with a
      timestamp for that TX packet. If the cable is pulled at the right time I
      don't see that packet. It might gets flushed as part of queue shutdown
      on NIC's side.
      Once the link is up again then after the next sendmsg() we enqueue
      another skb in dp83640_txtstamp() and have two on the list. Then the PHY
      will send a reply and decode_txts() attaches it to the first skb on the
      list.
      No crash occurs since refcounting works but we are one packet behind.
      linuxptp/ptp4l usually closes the socket and opens a new one (in such a
      timeout case) so those "stale" replies never get there. However it does
      not resume normal operation anymore.
      
      Purge old skbs in decode_txts().
      
      Fixes: cb646e2b
      
       ("ptp: Added a clock driver for the National Semiconductor PHYTER.")
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarKurt Kanzenbach <kurt@linutronix.de>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53bc8d2a
  2. Feb 04, 2019
  3. Feb 03, 2019
  4. Feb 02, 2019
  5. Feb 01, 2019
    • Johannes Berg's avatar
      cfg80211: call disconnect_wk when AP stops · e005bd7d
      Johannes Berg authored
      Since we now prevent regulatory restore during STA disconnect
      if concurrent AP interfaces are active, we need to reschedule
      this check when the AP state changes. This fixes never doing
      a restore when an AP is the last interface to stop. Or to put
      it another way: we need to re-check after anything we check
      here changes.
      
      Cc: stable@vger.kernel.org
      Fixes: 113f3aaa
      
       ("cfg80211: Prevent regulatory restore during STA disconnect in concurrent interfaces")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      e005bd7d
    • Felix Fietkau's avatar
      mac80211: ensure that mgmt tx skbs have tailroom for encryption · 9d0f50b8
      Felix Fietkau authored
      
      
      Some drivers use IEEE80211_KEY_FLAG_SW_MGMT_TX to indicate that management
      frames need to be software encrypted. Since normal data packets are still
      encrypted by the hardware, crypto_tx_tailroom_needed_cnt gets decremented
      after key upload to hw. This can lead to passing skbs to ccmp_encrypt_skb,
      which don't have the necessary tailroom for software encryption.
      
      Change the code to add tailroom for encrypted management packets, even if
      crypto_tx_tailroom_needed_cnt is 0.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      9d0f50b8
    • Daniel Borkmann's avatar
      Merge branch 'bpf-lockdep-fixes' · f01c2803
      Daniel Borkmann authored
      Alexei Starovoitov says:
      
      ====================
      v1->v2:
      - reworded 2nd patch. It's a real dead lock. Not a false positive
      - dropped the lockdep fix for up_read_non_owner in bpf_get_stackid
      
      In addition to preempt_disable patch for socket filters
      https://patchwork.ozlabs.org/patch/1032437/
      
      
      First patch fixes lockdep false positive in percpu_freelist
      Second patch fixes potential deadlock in bpf_prog_register
      Third patch fixes another potential deadlock in stackmap access
      from tracing bpf prog and from syscall.
      ====================
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f01c2803
    • Martin KaFai Lau's avatar
      bpf: Fix syscall's stackmap lookup potential deadlock · 7c4cd051
      Martin KaFai Lau authored
      The map_lookup_elem used to not acquiring spinlock
      in order to optimize the reader.
      
      It was true until commit 557c0c6e ("bpf: convert stackmap to pre-allocation")
      The syscall's map_lookup_elem(stackmap) calls bpf_stackmap_copy().
      bpf_stackmap_copy() may find the elem no longer needed after the copy is done.
      If that is the case, pcpu_freelist_push() saves this elem for reuse later.
      This push requires a spinlock.
      
      If a tracing bpf_prog got run in the middle of the syscall's
      map_lookup_elem(stackmap) and this tracing bpf_prog is calling
      bpf_get_stackid(stackmap) which also requires the same pcpu_freelist's
      spinlock, it may end up with a dead lock situation as reported by
      Eric Dumazet in https://patchwork.ozlabs.org/patch/1030266/
      
      The situation is the same as the syscall's map_update_elem() which
      needs to acquire the pcpu_freelist's spinlock and could race
      with tracing bpf_prog.  Hence, this patch fixes it by protecting
      bpf_stackmap_copy() with this_cpu_inc(bpf_prog_active)
      to prevent tracing bpf_prog from running.
      
      A later syscall's map_lookup_elem commit f1a2e44a ("bpf: add queue and stack maps")
      also acquires a spinlock and races with tracing bpf_prog similarly.
      Hence, this patch is forward looking and protects the majority
      of the map lookups.  bpf_map_offload_lookup_elem() is the exception
      since it is for network bpf_prog only (i.e. never called by tracing
      bpf_prog).
      
      Fixes: 557c0c6e
      
       ("bpf: convert stackmap to pre-allocation")
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      7c4cd051
    • Alexei Starovoitov's avatar
      bpf: fix potential deadlock in bpf_prog_register · e16ec340
      Alexei Starovoitov authored
      Lockdep found a potential deadlock between cpu_hotplug_lock, bpf_event_mutex, and cpuctx_mutex:
      [   13.007000] WARNING: possible circular locking dependency detected
      [   13.007587] 5.0.0-rc3-00018-g2fa53f892422-dirty #477 Not tainted
      [   13.008124] ------------------------------------------------------
      [   13.008624] test_progs/246 is trying to acquire lock:
      [   13.009030] 0000000094160d1d (tracepoints_mutex){+.+.}, at: tracepoint_probe_register_prio+0x2d/0x300
      [   13.009770]
      [   13.009770] but task is already holding lock:
      [   13.010239] 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
      [   13.010877]
      [   13.010877] which lock already depends on the new lock.
      [   13.010877]
      [   13.011532]
      [   13.011532] the existing dependency chain (in reverse order) is:
      [   13.012129]
      [   13.012129] -> #4 (bpf_event_mutex){+.+.}:
      [   13.012582]        perf_event_query_prog_array+0x9b/0x130
      [   13.013016]        _perf_ioctl+0x3aa/0x830
      [   13.013354]        perf_ioctl+0x2e/0x50
      [   13.013668]        do_vfs_ioctl+0x8f/0x6a0
      [   13.014003]        ksys_ioctl+0x70/0x80
      [   13.014320]        __x64_sys_ioctl+0x16/0x20
      [   13.014668]        do_syscall_64+0x4a/0x180
      [   13.015007]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   13.015469]
      [   13.015469] -> #3 (&cpuctx_mutex){+.+.}:
      [   13.015910]        perf_event_init_cpu+0x5a/0x90
      [   13.016291]        perf_event_init+0x1b2/0x1de
      [   13.016654]        start_kernel+0x2b8/0x42a
      [   13.016995]        secondary_startup_64+0xa4/0xb0
      [   13.017382]
      [   13.017382] -> #2 (pmus_lock){+.+.}:
      [   13.017794]        perf_event_init_cpu+0x21/0x90
      [   13.018172]        cpuhp_invoke_callback+0xb3/0x960
      [   13.018573]        _cpu_up+0xa7/0x140
      [   13.018871]        do_cpu_up+0xa4/0xc0
      [   13.019178]        smp_init+0xcd/0xd2
      [   13.019483]        kernel_init_freeable+0x123/0x24f
      [   13.019878]        kernel_init+0xa/0x110
      [   13.020201]        ret_from_fork+0x24/0x30
      [   13.020541]
      [   13.020541] -> #1 (cpu_hotplug_lock.rw_sem){++++}:
      [   13.021051]        static_key_slow_inc+0xe/0x20
      [   13.021424]        tracepoint_probe_register_prio+0x28c/0x300
      [   13.021891]        perf_trace_event_init+0x11f/0x250
      [   13.022297]        perf_trace_init+0x6b/0xa0
      [   13.022644]        perf_tp_event_init+0x25/0x40
      [   13.023011]        perf_try_init_event+0x6b/0x90
      [   13.023386]        perf_event_alloc+0x9a8/0xc40
      [   13.023754]        __do_sys_perf_event_open+0x1dd/0xd30
      [   13.024173]        do_syscall_64+0x4a/0x180
      [   13.024519]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   13.024968]
      [   13.024968] -> #0 (tracepoints_mutex){+.+.}:
      [   13.025434]        __mutex_lock+0x86/0x970
      [   13.025764]        tracepoint_probe_register_prio+0x2d/0x300
      [   13.026215]        bpf_probe_register+0x40/0x60
      [   13.026584]        bpf_raw_tracepoint_open.isra.34+0xa4/0x130
      [   13.027042]        __do_sys_bpf+0x94f/0x1a90
      [   13.027389]        do_syscall_64+0x4a/0x180
      [   13.027727]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   13.028171]
      [   13.028171] other info that might help us debug this:
      [   13.028171]
      [   13.028807] Chain exists of:
      [   13.028807]   tracepoints_mutex --> &cpuctx_mutex --> bpf_event_mutex
      [   13.028807]
      [   13.029666]  Possible unsafe locking scenario:
      [   13.029666]
      [   13.030140]        CPU0                    CPU1
      [   13.030510]        ----                    ----
      [   13.030875]   lock(bpf_event_mutex);
      [   13.031166]                                lock(&cpuctx_mutex);
      [   13.031645]                                lock(bpf_event_mutex);
      [   13.032135]   lock(tracepoints_mutex);
      [   13.032441]
      [   13.032441]  *** DEADLOCK ***
      [   13.032441]
      [   13.032911] 1 lock held by test_progs/246:
      [   13.033239]  #0: 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
      [   13.033909]
      [   13.033909] stack backtrace:
      [   13.034258] CPU: 1 PID: 246 Comm: test_progs Not tainted 5.0.0-rc3-00018-g2fa53f892422-dirty #477
      [   13.034964] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      [   13.035657] Call Trace:
      [   13.035859]  dump_stack+0x5f/0x8b
      [   13.036130]  print_circular_bug.isra.37+0x1ce/0x1db
      [   13.036526]  __lock_acquire+0x1158/0x1350
      [   13.036852]  ? lock_acquire+0x98/0x190
      [   13.037154]  lock_acquire+0x98/0x190
      [   13.037447]  ? tracepoint_probe_register_prio+0x2d/0x300
      [   13.037876]  __mutex_lock+0x86/0x970
      [   13.038167]  ? tracepoint_probe_register_prio+0x2d/0x300
      [   13.038600]  ? tracepoint_probe_register_prio+0x2d/0x300
      [   13.039028]  ? __mutex_lock+0x86/0x970
      [   13.039337]  ? __mutex_lock+0x24a/0x970
      [   13.039649]  ? bpf_probe_register+0x1d/0x60
      [   13.039992]  ? __bpf_trace_sched_wake_idle_without_ipi+0x10/0x10
      [   13.040478]  ? tracepoint_probe_register_prio+0x2d/0x300
      [   13.040906]  tracepoint_probe_register_prio+0x2d/0x300
      [   13.041325]  bpf_probe_register+0x40/0x60
      [   13.041649]  bpf_raw_tracepoint_open.isra.34+0xa4/0x130
      [   13.042068]  ? __might_fault+0x3e/0x90
      [   13.042374]  __do_sys_bpf+0x94f/0x1a90
      [   13.042678]  do_syscall_64+0x4a/0x180
      [   13.042975]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   13.043382] RIP: 0033:0x7f23b10a07f9
      [   13.045155] RSP: 002b:00007ffdef42fdd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
      [   13.045759] RAX: ffffffffffffffda RBX: 00007ffdef42ff70 RCX: 00007f23b10a07f9
      [   13.046326] RDX: 0000000000000070 RSI: 00007ffdef42fe10 RDI: 0000000000000011
      [   13.046893] RBP: 00007ffdef42fdf0 R08: 0000000000000038 R09: 00007ffdef42fe10
      [   13.047462] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
      [   13.048029] R13: 0000000000000016 R14: 00007f23b1db4690 R15: 0000000000000000
      
      Since tracepoints_mutex will be taken in tracepoint_probe_register/unregister()
      there is no need to take bpf_event_mutex too.
      bpf_event_mutex is protecting modifications to prog array used in kprobe/perf bpf progs.
      bpf_raw_tracepoints don't need to take this mutex.
      
      Fixes: c4f6699d
      
       ("bpf: introduce BPF_RAW_TRACEPOINT")
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      e16ec340
    • Alexei Starovoitov's avatar
      bpf: fix lockdep false positive in percpu_freelist · a89fac57
      Alexei Starovoitov authored
      Lockdep warns about false positive:
      [   12.492084] 00000000e6b28347 (&head->lock){+...}, at: pcpu_freelist_push+0x2a/0x40
      [   12.492696] but this lock was taken by another, HARDIRQ-safe lock in the past:
      [   12.493275]  (&rq->lock){-.-.}
      [   12.493276]
      [   12.493276]
      [   12.493276] and interrupts could create inverse lock ordering between them.
      [   12.493276]
      [   12.494435]
      [   12.494435] other info that might help us debug this:
      [   12.494979]  Possible interrupt unsafe locking scenario:
      [   12.494979]
      [   12.495518]        CPU0                    CPU1
      [   12.495879]        ----                    ----
      [   12.496243]   lock(&head->lock);
      [   12.496502]                                local_irq_disable();
      [   12.496969]                                lock(&rq->lock);
      [   12.497431]                                lock(&head->lock);
      [   12.497890]   <Interrupt>
      [   12.498104]     lock(&rq->lock);
      [   12.498368]
      [   12.498368]  *** DEADLOCK ***
      [   12.498368]
      [   12.498837] 1 lock held by dd/276:
      [   12.499110]  #0: 00000000c58cb2ee (rcu_read_lock){....}, at: trace_call_bpf+0x5e/0x240
      [   12.499747]
      [   12.499747] the shortest dependencies between 2nd lock and 1st lock:
      [   12.500389]  -> (&rq->lock){-.-.} {
      [   12.500669]     IN-HARDIRQ-W at:
      [   12.500934]                       _raw_spin_lock+0x2f/0x40
      [   12.501373]                       scheduler_tick+0x4c/0xf0
      [   12.501812]                       update_process_times+0x40/0x50
      [   12.502294]                       tick_periodic+0x27/0xb0
      [   12.502723]                       tick_handle_periodic+0x1f/0x60
      [   12.503203]                       timer_interrupt+0x11/0x20
      [   12.503651]                       __handle_irq_event_percpu+0x43/0x2c0
      [   12.504167]                       handle_irq_event_percpu+0x20/0x50
      [   12.504674]                       handle_irq_event+0x37/0x60
      [   12.505139]                       handle_level_irq+0xa7/0x120
      [   12.505601]                       handle_irq+0xa1/0x150
      [   12.506018]                       do_IRQ+0x77/0x140
      [   12.506411]                       ret_from_intr+0x0/0x1d
      [   12.506834]                       _raw_spin_unlock_irqrestore+0x53/0x60
      [   12.507362]                       __setup_irq+0x481/0x730
      [   12.507789]                       setup_irq+0x49/0x80
      [   12.508195]                       hpet_time_init+0x21/0x32
      [   12.508644]                       x86_late_time_init+0xb/0x16
      [   12.509106]                       start_kernel+0x390/0x42a
      [   12.509554]                       secondary_startup_64+0xa4/0xb0
      [   12.510034]     IN-SOFTIRQ-W at:
      [   12.510305]                       _raw_spin_lock+0x2f/0x40
      [   12.510772]                       try_to_wake_up+0x1c7/0x4e0
      [   12.511220]                       swake_up_locked+0x20/0x40
      [   12.511657]                       swake_up_one+0x1a/0x30
      [   12.512070]                       rcu_process_callbacks+0xc5/0x650
      [   12.512553]                       __do_softirq+0xe6/0x47b
      [   12.512978]                       irq_exit+0xc3/0xd0
      [   12.513372]                       smp_apic_timer_interrupt+0xa9/0x250
      [   12.513876]                       apic_timer_interrupt+0xf/0x20
      [   12.514343]                       default_idle+0x1c/0x170
      [   12.514765]                       do_idle+0x199/0x240
      [   12.515159]                       cpu_startup_entry+0x19/0x20
      [   12.515614]                       start_kernel+0x422/0x42a
      [   12.516045]                       secondary_startup_64+0xa4/0xb0
      [   12.516521]     INITIAL USE at:
      [   12.516774]                      _raw_spin_lock_irqsave+0x38/0x50
      [   12.517258]                      rq_attach_root+0x16/0xd0
      [   12.517685]                      sched_init+0x2f2/0x3eb
      [   12.518096]                      start_kernel+0x1fb/0x42a
      [   12.518525]                      secondary_startup_64+0xa4/0xb0
      [   12.518986]   }
      [   12.519132]   ... key      at: [<ffffffff82b7bc28>] __key.71384+0x0/0x8
      [   12.519649]   ... acquired at:
      [   12.519892]    pcpu_freelist_pop+0x7b/0xd0
      [   12.520221]    bpf_get_stackid+0x1d2/0x4d0
      [   12.520563]    ___bpf_prog_run+0x8b4/0x11a0
      [   12.520887]
      [   12.521008] -> (&head->lock){+...} {
      [   12.521292]    HARDIRQ-ON-W at:
      [   12.521539]                     _raw_spin_lock+0x2f/0x40
      [   12.521950]                     pcpu_freelist_push+0x2a/0x40
      [   12.522396]                     bpf_get_stackid+0x494/0x4d0
      [   12.522828]                     ___bpf_prog_run+0x8b4/0x11a0
      [   12.523296]    INITIAL USE at:
      [   12.523537]                    _raw_spin_lock+0x2f/0x40
      [   12.523944]                    pcpu_freelist_populate+0xc0/0x120
      [   12.524417]                    htab_map_alloc+0x405/0x500
      [   12.524835]                    __do_sys_bpf+0x1a3/0x1a90
      [   12.525253]                    do_syscall_64+0x4a/0x180
      [   12.525659]                    entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   12.526167]  }
      [   12.526311]  ... key      at: [<ffffffff838f7668>] __key.13130+0x0/0x8
      [   12.526812]  ... acquired at:
      [   12.527047]    __lock_acquire+0x521/0x1350
      [   12.527371]    lock_acquire+0x98/0x190
      [   12.527680]    _raw_spin_lock+0x2f/0x40
      [   12.527994]    pcpu_freelist_push+0x2a/0x40
      [   12.528325]    bpf_get_stackid+0x494/0x4d0
      [   12.528645]    ___bpf_prog_run+0x8b4/0x11a0
      [   12.528970]
      [   12.529092]
      [   12.529092] stack backtrace:
      [   12.529444] CPU: 0 PID: 276 Comm: dd Not tainted 5.0.0-rc3-00018-g2fa53f892422 #475
      [   12.530043] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      [   12.530750] Call Trace:
      [   12.530948]  dump_stack+0x5f/0x8b
      [   12.531248]  check_usage_backwards+0x10c/0x120
      [   12.531598]  ? ___bpf_prog_run+0x8b4/0x11a0
      [   12.531935]  ? mark_lock+0x382/0x560
      [   12.532229]  mark_lock+0x382/0x560
      [   12.532496]  ? print_shortest_lock_dependencies+0x180/0x180
      [   12.532928]  __lock_acquire+0x521/0x1350
      [   12.533271]  ? find_get_entry+0x17f/0x2e0
      [   12.533586]  ? find_get_entry+0x19c/0x2e0
      [   12.533902]  ? lock_acquire+0x98/0x190
      [   12.534196]  lock_acquire+0x98/0x190
      [   12.534482]  ? pcpu_freelist_push+0x2a/0x40
      [   12.534810]  _raw_spin_lock+0x2f/0x40
      [   12.535099]  ? pcpu_freelist_push+0x2a/0x40
      [   12.535432]  pcpu_freelist_push+0x2a/0x40
      [   12.535750]  bpf_get_stackid+0x494/0x4d0
      [   12.536062]  ___bpf_prog_run+0x8b4/0x11a0
      
      It has been explained that is a false positive here:
      https://lkml.org/lkml/2018/7/25/756
      Recap:
      - stackmap uses pcpu_freelist
      - The lock in pcpu_freelist is a percpu lock
      - stackmap is only used by tracing bpf_prog
      - A tracing bpf_prog cannot be run if another bpf_prog
        has already been running (ensured by the percpu bpf_prog_active counter).
      
      Eric pointed out that this lockdep splats stops other
      legit lockdep splats in selftests/bpf/test_progs.c.
      
      Fix this by calling local_irq_save/restore for stackmap.
      
      Another false positive had also been worked around by calling
      local_irq_save in commit 89ad2fa3 ("bpf: fix lockdep splat").
      That commit added unnecessary irq_save/restore to fast path of
      bpf hash map. irqs are already disabled at that point, since htab
      is holding per bucket spin_lock with irqsave.
      
      Let's reduce overhead for htab by introducing __pcpu_freelist_push/pop
      function w/o irqsave and convert pcpu_freelist_push/pop to irqsave
      to be used elsewhere (right now only in stackmap).
      It stops lockdep false positive in stackmap with a bit of acceptable overhead.
      
      Fixes: 557c0c6e
      
       ("bpf: convert stackmap to pre-allocation")
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a89fac57
    • Alexei Starovoitov's avatar
      bpf: run bpf programs with preemption disabled · 6cab5e90
      Alexei Starovoitov authored
      
      
      Disabled preemption is necessary for proper access to per-cpu maps
      from BPF programs.
      
      But the sender side of socket filters didn't have preemption disabled:
      unix_dgram_sendmsg->sk_filter->sk_filter_trim_cap->bpf_prog_run_save_cb->BPF_PROG_RUN
      
      and a combination of af_packet with tun device didn't disable either:
      tpacket_snd->packet_direct_xmit->packet_pick_tx_queue->ndo_select_queue->
        tun_select_queue->tun_ebpf_select_queue->bpf_prog_run_clear_cb->BPF_PROG_RUN
      
      Disable preemption before executing BPF programs (both classic and extended).
      
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      6cab5e90
    • Martynas Pumputis's avatar
      bpf, selftests: fix handling of sparse CPU allocations · 1bb54c40
      Martynas Pumputis authored
      
      
      Previously, bpf_num_possible_cpus() had a bug when calculating a
      number of possible CPUs in the case of sparse CPU allocations, as
      it was considering only the first range or element of
      /sys/devices/system/cpu/possible.
      
      E.g. in the case of "0,2-3" (CPU 1 is not available), the function
      returned 1 instead of 3.
      
      This patch fixes the function by making it parse all CPU ranges and
      elements.
      
      Signed-off-by: default avatarMartynas Pumputis <m@lambda.lt>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      1bb54c40
    • Michael Chan's avatar
      bnxt_en: Disable interrupts when allocating CP rings or NQs. · 5e66e35a
      Michael Chan authored
      When calling firmware to allocate a CP ring or NQ, an interrupt associated
      with that ring may be generated immediately before the doorbell is even
      setup after the firmware call returns.  When servicing the interrupt, the
      driver may crash when trying to access the doorbell.
      
      Fix it by disabling interrupt on that vector until the doorbell is
      set up.
      
      Fixes: 697197e5
      
       ("bnxt_en: Re-structure doorbells.")
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e66e35a
    • David S. Miller's avatar
      Merge branch 'ieee802154-for-davem-2019-01-31' of... · da0e5171
      David S. Miller authored
      Merge branch 'ieee802154-for-davem-2019-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
      
      
      
      Stefan Schmidt says:
      
      ====================
      pull-request: ieee802154 for net 2019-01-31
      
      An update from ieee802154 for your *net* tree.
      
      I waited a while to see if anything else comes up, but it seems this time
      we only have one fixup patch for the -rc rounds.
      Colin fixed some indentation in the mcr20a drivers. That's about it.
      
      If there are any problems with taking these two before the final 5.0 let
      me know.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da0e5171
    • Eric Dumazet's avatar
      rds: fix refcount bug in rds_sock_addref · 6fa19f56
      Eric Dumazet authored
      syzbot was able to catch a bug in rds [1]
      
      The issue here is that the socket might be found in a hash table
      but that its refcount has already be set to 0 by another cpu.
      
      We need to use refcount_inc_not_zero() to be safe here.
      
      [1]
      
      refcount_t: increment on 0; use-after-free.
      WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked lib/refcount.c:153 [inline]
      WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked+0x61/0x70 lib/refcount.c:151
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 1 PID: 23129 Comm: syz-executor3 Not tainted 5.0.0-rc4+ #53
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
       panic+0x2cb/0x65c kernel/panic.c:214
       __warn.cold+0x20/0x48 kernel/panic.c:571
       report_bug+0x263/0x2b0 lib/bug.c:186
       fixup_bug arch/x86/kernel/traps.c:178 [inline]
       fixup_bug arch/x86/kernel/traps.c:173 [inline]
       do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
       do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
       invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
      RIP: 0010:refcount_inc_checked lib/refcount.c:153 [inline]
      RIP: 0010:refcount_inc_checked+0x61/0x70 lib/refcount.c:151
      Code: 1d 51 63 c8 06 31 ff 89 de e8 eb 1b f2 fd 84 db 75 dd e8 a2 1a f2 fd 48 c7 c7 60 9f 81 88 c6 05 31 63 c8 06 01 e8 af 65 bb fd <0f> 0b eb c1 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 54 49
      RSP: 0018:ffff8880a0cbf1e8 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc90006113000
      RDX: 000000000001047d RSI: ffffffff81685776 RDI: 0000000000000005
      RBP: ffff8880a0cbf1f8 R08: ffff888097c9e100 R09: ffffed1015ce5021
      R10: ffffed1015ce5020 R11: ffff8880ae728107 R12: ffff8880723c20c0
      R13: ffff8880723c24b0 R14: dffffc0000000000 R15: ffffed1014197e64
       sock_hold include/net/sock.h:647 [inline]
       rds_sock_addref+0x19/0x20 net/rds/af_rds.c:675
       rds_find_bound+0x97c/0x1080 net/rds/bind.c:82
       rds_recv_incoming+0x3be/0x1430 net/rds/recv.c:362
       rds_loop_xmit+0xf3/0x2a0 net/rds/loop.c:96
       rds_send_xmit+0x1355/0x2a10 net/rds/send.c:355
       rds_sendmsg+0x323c/0x44e0 net/rds/send.c:1368
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg+0xdd/0x130 net/socket.c:631
       __sys_sendto+0x387/0x5f0 net/socket.c:1788
       __do_sys_sendto net/socket.c:1800 [inline]
       __se_sys_sendto net/socket.c:1796 [inline]
       __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1796
       do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x458089
      Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fc266df8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 0000000000458089
      RDX: 0000000000000000 RSI: 00000000204b3fff RDI: 0000000000000005
      RBP: 000000000073bf00 R08: 00000000202b4000 R09: 0000000000000010
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc266df96d4
      R13: 00000000004c56e4 R14: 00000000004d94a8 R15: 00000000ffffffff
      
      Fixes: cc4dfb7f
      
       ("rds: fix two RCU related problems")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
      Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
      Cc: rds-devel@oss.oracle.com
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6fa19f56
    • Bart Van Assche's avatar
      lib/test_rhashtable: Make test_insert_dup() allocate its hash table dynamically · fc42a689
      Bart Van Assche authored
      
      
      The test_insert_dup() function from lib/test_rhashtable.c passes a
      pointer to a stack object to rhltable_init(). Allocate the hash table
      dynamically to avoid that the following is reported with object
      debugging enabled:
      
      ODEBUG: object (ptrval) is on stack (ptrval), but NOT annotated.
      WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:368 __debug_object_init+0x312/0x480
      Modules linked in:
      EIP: __debug_object_init+0x312/0x480
      Call Trace:
       ? debug_object_init+0x1a/0x20
       ? __init_work+0x16/0x30
       ? rhashtable_init+0x1e1/0x460
       ? sched_clock_cpu+0x57/0xe0
       ? rhltable_init+0xb/0x20
       ? test_insert_dup+0x32/0x20f
       ? trace_hardirqs_on+0x38/0xf0
       ? ida_dump+0x10/0x10
       ? jhash+0x130/0x130
       ? my_hashfn+0x30/0x30
       ? test_rht_init+0x6aa/0xab4
       ? ida_dump+0x10/0x10
       ? test_rhltable+0xc5c/0xc5c
       ? do_one_initcall+0x67/0x28e
       ? trace_hardirqs_off+0x22/0xe0
       ? restore_all_kernel+0xf/0x70
       ? trace_hardirqs_on_thunk+0xc/0x10
       ? restore_all_kernel+0xf/0x70
       ? kernel_init_freeable+0x142/0x213
       ? rest_init+0x230/0x230
       ? kernel_init+0x10/0x110
       ? schedule_tail_wrapper+0x9/0xc
       ? ret_from_fork+0x19/0x24
      
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc42a689
    • Jacob Wen's avatar
      l2tp: copy 4 more bytes to linear part if necessary · 91c52470
      Jacob Wen authored
      The size of L2TPv2 header with all optional fields is 14 bytes.
      l2tp_udp_recv_core only moves 10 bytes to the linear part of a
      skb. This may lead to l2tp_recv_common read data outside of a skb.
      
      This patch make sure that there is at least 14 bytes in the linear
      part of a skb to meet the maximum need of l2tp_udp_recv_core and
      l2tp_recv_common. The minimum size of both PPP HDLC-like frame and
      Ethernet frame is larger than 14 bytes, so we are safe to do so.
      
      Also remove L2TP_HDR_SIZE_NOSEQ, it is unused now.
      
      Fixes: fd558d18
      
       ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
      Suggested-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarJacob Wen <jian.w.wen@oracle.com>
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91c52470