Skip to content
  1. Mar 23, 2018
    • Stephen Hemminger's avatar
      hv_netvsc: use RCU to fix concurrent rx and queue changes · 02400fce
      Stephen Hemminger authored
      The receive processing may continue to happen while the
      internal network device state is in RCU grace period.
      The internal RNDIS structure is associated with the
      internal netvsc_device structure; both have the same
      RCU lifetime.
      
      Defer freeing all associated parts until after grace
      period.
      
      Fixes: 0cf73780
      
       ("hv_netvsc: netvsc_teardown_gpadl() split")
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02400fce
    • Stephen Hemminger's avatar
      hv_netvsc: disable NAPI before channel close · 8348e046
      Stephen Hemminger authored
      This makes sure that no CPU is still process packets when
      the channel is closed.
      
      Fixes: 76bb5db5
      
       ("netvsc: fix use after free on module removal")
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8348e046
    • David Ahern's avatar
      net/ipv6: Handle onlink flag with multipath routes · 68e2ffde
      David Ahern authored
      For multipath routes the ONLINK flag can be specified per nexthop in
      rtnh_flags or globally in rtm_flags. Update ip6_route_multipath_add
      to consider the ONLINK setting coming from rtnh_flags. Each loop over
      nexthops the config for the sibling route is initialized to the global
      config and then per nexthop settings overlayed. The flag is 'or'ed into
      fib6_config to handle the ONLINK flag coming from either rtm_flags or
      rtnh_flags.
      
      Fixes: fc1e64e1
      
       ("net/ipv6: Add support for onlink flag")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68e2ffde
    • Guillaume Nault's avatar
      ppp: avoid loop in xmit recursion detection code · 6d066734
      Guillaume Nault authored
      We already detect situations where a PPP channel sends packets back to
      its upper PPP device. While this is enough to avoid deadlocking on xmit
      locks, this doesn't prevent packets from looping between the channel
      and the unit.
      
      The problem is that ppp_start_xmit() enqueues packets in ppp->file.xq
      before checking for xmit recursion. Therefore, __ppp_xmit_process()
      might dequeue a packet from ppp->file.xq and send it on the channel
      which, in turn, loops it back on the unit. Then ppp_start_xmit()
      queues the packet back to ppp->file.xq and __ppp_xmit_process() picks
      it up and sends it again through the channel. Therefore, the packet
      will loop between __ppp_xmit_process() and ppp_start_xmit() until some
      other part of the xmit path drops it.
      
      For L2TP, we rapidly fill the skb's headroom and pppol2tp_xmit() drops
      the packet after a few iterations. But PPTP reallocates the headroom
      if necessary, letting the loop run and exhaust the machine resources
      (as reported in https://bugzilla.kernel.org/show_bug.cgi?id=199109
      
      ).
      
      Fix this by letting __ppp_xmit_process() enqueue the skb to
      ppp->file.xq, so that we can check for recursion before adding it to
      the queue. Now ppp_xmit_process() can drop the packet when recursion is
      detected.
      
      __ppp_channel_push() is a bit special. It calls __ppp_xmit_process()
      without having any actual packet to send. This is used by
      ppp_output_wakeup() to re-enable transmission on the parent unit (for
      implementations like ppp_async.c, where the .start_xmit() function
      might not consume the skb, leaving it in ppp->xmit_pending and
      disabling transmission).
      Therefore, __ppp_xmit_process() needs to handle the case where skb is
      NULL, dequeuing as many packets as possible from ppp->file.xq.
      
      Reported-by: default avatarxu heng <xuheng333@zoho.com>
      Fixes: 55454a56
      
       ("ppp: avoid dealock on recursive xmit")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d066734
    • David Lebrun's avatar
      ipv6: sr: fix NULL pointer dereference when setting encap source address · 8936ef76
      David Lebrun authored
      When using seg6 in encap mode, we call ipv6_dev_get_saddr() to set the
      source address of the outer IPv6 header, in case none was specified.
      Using skb->dev can lead to BUG() when it is in an inconsistent state.
      This patch uses the net_device attached to the skb's dst instead.
      
      [940807.667429] BUG: unable to handle kernel NULL pointer dereference at 000000000000047c
      [940807.762427] IP: ipv6_dev_get_saddr+0x8b/0x1d0
      [940807.815725] PGD 0 P4D 0
      [940807.847173] Oops: 0000 [#1] SMP PTI
      [940807.890073] Modules linked in:
      [940807.927765] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G        W        4.16.0-rc1-seg6bpf+ #2
      [940808.028988] Hardware name: HP ProLiant DL120 G6/ProLiant DL120 G6, BIOS O26    09/06/2010
      [940808.128128] RIP: 0010:ipv6_dev_get_saddr+0x8b/0x1d0
      [940808.187667] RSP: 0018:ffff88043fd836b0 EFLAGS: 00010206
      [940808.251366] RAX: 0000000000000005 RBX: ffff88042cb1c860 RCX: 00000000000000fe
      [940808.338025] RDX: 00000000000002c0 RSI: ffff88042cb1c860 RDI: 0000000000004500
      [940808.424683] RBP: ffff88043fd83740 R08: 0000000000000000 R09: ffffffffffffffff
      [940808.511342] R10: 0000000000000040 R11: 0000000000000000 R12: ffff88042cb1c850
      [940808.598012] R13: ffffffff8208e380 R14: ffff88042ac8da00 R15: 0000000000000002
      [940808.684675] FS:  0000000000000000(0000) GS:ffff88043fd80000(0000) knlGS:0000000000000000
      [940808.783036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [940808.852975] CR2: 000000000000047c CR3: 00000004255fe000 CR4: 00000000000006e0
      [940808.939634] Call Trace:
      [940808.970041]  <IRQ>
      [940808.995250]  ? ip6t_do_table+0x265/0x640
      [940809.043341]  seg6_do_srh_encap+0x28f/0x300
      [940809.093516]  ? seg6_do_srh+0x1a0/0x210
      [940809.139528]  seg6_do_srh+0x1a0/0x210
      [940809.183462]  seg6_output+0x28/0x1e0
      [940809.226358]  lwtunnel_output+0x3f/0x70
      [940809.272370]  ip6_xmit+0x2b8/0x530
      [940809.313185]  ? ac6_proc_exit+0x20/0x20
      [940809.359197]  inet6_csk_xmit+0x7d/0xc0
      [940809.404173]  tcp_transmit_skb+0x548/0x9a0
      [940809.453304]  __tcp_retransmit_skb+0x1a8/0x7a0
      [940809.506603]  ? ip6_default_advmss+0x40/0x40
      [940809.557824]  ? tcp_current_mss+0x24/0x90
      [940809.605925]  tcp_retransmit_skb+0xd/0x80
      [940809.654016]  tcp_xmit_retransmit_queue.part.17+0xf9/0x210
      [940809.719797]  tcp_ack+0xa47/0x1110
      [940809.760612]  tcp_rcv_established+0x13c/0x570
      [940809.812865]  tcp_v6_do_rcv+0x151/0x3d0
      [940809.858879]  tcp_v6_rcv+0xa5c/0xb10
      [940809.901770]  ? seg6_output+0xdd/0x1e0
      [940809.946745]  ip6_input_finish+0xbb/0x460
      [940809.994837]  ip6_input+0x74/0x80
      [940810.034612]  ? ip6_rcv_finish+0xb0/0xb0
      [940810.081663]  ipv6_rcv+0x31c/0x4c0
      ...
      
      Fixes: 6c8702c6
      
       ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      Reported-by: default avatarTom Herbert <tom@quantonium.net>
      Signed-off-by: default avatarDavid Lebrun <dlebrun@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8936ef76
    • David Lebrun's avatar
      ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state · 191f86ca
      David Lebrun authored
      The seg6_build_state() function is called with RCU read lock held,
      so we cannot use GFP_KERNEL. This patch uses GFP_ATOMIC instead.
      
      [   92.770271] =============================
      [   92.770628] WARNING: suspicious RCU usage
      [   92.770921] 4.16.0-rc4+ #12 Not tainted
      [   92.771277] -----------------------------
      [   92.771585] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section!
      [   92.772279]
      [   92.772279] other info that might help us debug this:
      [   92.772279]
      [   92.773067]
      [   92.773067] rcu_scheduler_active = 2, debug_locks = 1
      [   92.773514] 2 locks held by ip/2413:
      [   92.773765]  #0:  (rtnl_mutex){+.+.}, at: [<00000000e5461720>] rtnetlink_rcv_msg+0x441/0x4d0
      [   92.774377]  #1:  (rcu_read_lock){....}, at: [<00000000df4f161e>] lwtunnel_build_state+0x59/0x210
      [   92.775065]
      [   92.775065] stack backtrace:
      [   92.775371] CPU: 0 PID: 2413 Comm: ip Not tainted 4.16.0-rc4+ #12
      [   92.775791] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014
      [   92.776608] Call Trace:
      [   92.776852]  dump_stack+0x7d/0xbc
      [   92.777130]  __schedule+0x133/0xf00
      [   92.777393]  ? unwind_get_return_address_ptr+0x50/0x50
      [   92.777783]  ? __sched_text_start+0x8/0x8
      [   92.778073]  ? rcu_is_watching+0x19/0x30
      [   92.778383]  ? kernel_text_address+0x49/0x60
      [   92.778800]  ? __kernel_text_address+0x9/0x30
      [   92.779241]  ? unwind_get_return_address+0x29/0x40
      [   92.779727]  ? pcpu_alloc+0x102/0x8f0
      [   92.780101]  _cond_resched+0x23/0x50
      [   92.780459]  __mutex_lock+0xbd/0xad0
      [   92.780818]  ? pcpu_alloc+0x102/0x8f0
      [   92.781194]  ? seg6_build_state+0x11d/0x240
      [   92.781611]  ? save_stack+0x9b/0xb0
      [   92.781965]  ? __ww_mutex_wakeup_for_backoff+0xf0/0xf0
      [   92.782480]  ? seg6_build_state+0x11d/0x240
      [   92.782925]  ? lwtunnel_build_state+0x1bd/0x210
      [   92.783393]  ? ip6_route_info_create+0x687/0x1640
      [   92.783846]  ? ip6_route_add+0x74/0x110
      [   92.784236]  ? inet6_rtm_newroute+0x8a/0xd0
      
      Fixes: 6c8702c6
      
       ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      Signed-off-by: default avatarDavid Lebrun <dlebrun@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      191f86ca
    • David S. Miller's avatar
      Merge branch 'aquantia-fixes' · 9b894cdd
      David S. Miller authored
      
      
      Igor Russkikh says:
      
      ====================
      Aquantia atlantic hot fixes 03-2018
      
      This is a set of atlantic driver hot fixes for various areas:
      
      Some issues with hardware reset covered,
      Fixed napi_poll flood happening on some traffic conditions,
      Allow system to change MAC address on live device,
      Add pci shutdown handler.
      
      patch v2:
      - reverse christmas tree
      - remove driver private parameter, replacing it with define.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b894cdd
    • Igor Russkikh's avatar
      c89bf1cd
    • Igor Russkikh's avatar
      net: aquantia: Implement pci shutdown callback · 90869ddf
      Igor Russkikh authored
      
      
      We should close link and all NIC operations during shutdown.
      On some systems graceful reboot never closes NIC interface on its own,
      but only indicates pci device shutdown. Without explicit handler, NIC
      rx rings continued to transfer DMA data into prepared buffers while CPU
      rebooted already. That caused memory corruptions on soft reboot.
      
      Signed-off-by: default avatarIgor Russkikh <igor.russkikh@aquantia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90869ddf
    • Igor Russkikh's avatar
      net: aquantia: Allow live mac address changes · 3e9a5451
      Igor Russkikh authored
      
      
      There is nothing prevents us from changing MAC on the running interface.
      Allow this with ndev priv flag.
      
      Signed-off-by: default avatarIgor Russkikh <igor.russkikh@aquantia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e9a5451
    • Igor Russkikh's avatar
      net: aquantia: Add tx clean budget and valid budget handling logic · b647d398
      Igor Russkikh authored
      
      
      We should report to napi full budget only when we have more job to do.
      Before this fix, on any tx queue cleanup we forced napi to do poll again.
      Thats a waste of cpu resources and caused storming with napi polls when
      there was at least one tx on each interrupt.
      
      With this fix we report full budget only when there is more job on TX
      to do. Or, as before, when rx budget was fully consumed.
      
      Signed-off-by: default avatarIgor Russkikh <igor.russkikh@aquantia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b647d398
    • Igor Russkikh's avatar
      net: aquantia: Change inefficient wait loop on fw data reads · 47203b34
      Igor Russkikh authored
      
      
      B1 hardware changes behavior of mailbox interface, it has busy bit
      always raised. Data ready condition should be detected by increment
      of address register.
      
      Old code has empty `for` loop, and that caused cpu overloads on B1
      hardware. aq_nic_service_timer_cb consumed ~100ms because of that.
      
      Signed-off-by: default avatarIgor Russkikh <igor.russkikh@aquantia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47203b34
    • Igor Russkikh's avatar
      net: aquantia: Fix a regression with reset on old firmware · d0f0fb25
      Igor Russkikh authored
      
      
      FW 1.5.58 and below needs a fixed delay even after 0x18 register
      is filled. Otherwise, setting MPI_INIT state too fast causes
      traffic hang.
      
      Signed-off-by: default avatarIgor Russkikh <igor.russkikh@aquantia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0f0fb25
    • Igor Russkikh's avatar
      net: aquantia: Fix hardware reset when SPI may rarely hangup · 1bf9a752
      Igor Russkikh authored
      
      
      Under some circumstances (notably using thunderbolt interface) SPI
      on chip reset may be in active transaction.
      Here we forcibly cleanup SPI to prevent possible hangups.
      
      Signed-off-by: default avatarIgor Russkikh <igor.russkikh@aquantia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1bf9a752
  2. Mar 22, 2018
    • David S. Miller's avatar
      Merge branch 's390-qeth-fixes' · 1959031e
      David S. Miller authored
      
      
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2018-03-20
      
      Please apply one final set of qeth patches for 4.16.
      All of these fix long-standing bugs, so please queue them up for -stable
      as well.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1959031e
    • Julian Wiedmann's avatar
      s390/qeth: on channel error, reject further cmd requests · a6c3d939
      Julian Wiedmann authored
      
      
      When the IRQ handler determines that one of the cmd IO channels has
      failed and schedules recovery, block any further cmd requests from
      being submitted. The request would inevitably stall, and prevent the
      recovery from making progress until the request times out.
      
      This sort of error was observed after Live Guest Relocation, where
      the pending IO on the READ channel intentionally gets terminated to
      kick-start recovery. Simultaneously the guest executed SIOCETHTOOL,
      triggering qeth to issue a QUERY CARD INFO command. The command
      then stalled in the inoperabel WRITE channel.
      
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6c3d939
    • Julian Wiedmann's avatar
      s390/qeth: lock read device while queueing next buffer · 17bf8c9b
      Julian Wiedmann authored
      
      
      For calling ccw_device_start(), issue_next_read() needs to hold the
      device's ccwlock.
      This is satisfied for the IRQ handler path (where qeth_irq() gets called
      under the ccwlock), but we need explicit locking for the initial call by
      the MPC initialization.
      
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17bf8c9b
    • Julian Wiedmann's avatar
      s390/qeth: when thread completes, wake up all waiters · 1063e432
      Julian Wiedmann authored
      
      
      qeth_wait_for_threads() is potentially called by multiple users, make
      sure to notify all of them after qeth_clear_thread_running_bit()
      adjusted the thread_running_mask. With no timeout, callers would
      otherwise stall.
      
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1063e432
    • Julian Wiedmann's avatar
      s390/qeth: free netdevice when removing a card · 6be68739
      Julian Wiedmann authored
      On removal, a qeth card's netdevice is currently not properly freed
      because the call chain looks as follows:
      
      qeth_core_remove_device(card)
      	lx_remove_device(card)
      		unregister_netdev(card->dev)
      		card->dev = NULL			!!!
      	qeth_core_free_card(card)
      		if (card->dev)				!!!
      			free_netdev(card->dev)
      
      Fix it by free'ing the netdev straight after unregistering. This also
      fixes the sysfs-driven layer switch case (qeth_dev_layer2_store()),
      where the need to free the current netdevice was not considered at all.
      
      Note that free_netdev() takes care of the netif_napi_del() for us too.
      
      Fixes: 4a71df50
      
       ("qeth: new qeth device driver")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6be68739
    • David S. Miller's avatar
      Merge branch 'net-phy-Add-general-dummy-stubs-for-MMD-register-access' · 3d21ac6f
      David S. Miller authored
      Kevin Hao says:
      
      ====================
      net: phy: Add general dummy stubs for MMD register access
      
      v2:
      As suggested by Andrew:
        - Add general dummy stubs
        - Also use that for the micrel phy
      
      This patch series fix the Ethernet broken on the mpc8315erdb board introduced
      by commit b6b5e8a6
      
       ("gianfar: Disable EEE autoneg by default").
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d21ac6f
    • Kevin Hao's avatar
      net: phy: micrel: Use the general dummy stubs for MMD register access · c846a2b7
      Kevin Hao authored
      
      
      The new general dummy stubs for MMD register access were introduced.
      Use that for the codes reuse.
      
      Signed-off-by: default avatarKevin Hao <haokexin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c846a2b7
    • Kevin Hao's avatar
      net: phy: realtek: Use the dummy stubs for MMD register access for rtl8211b · 0231b1a0
      Kevin Hao authored
      The Ethernet on mpc8315erdb is broken since commit b6b5e8a6
      ("gianfar: Disable EEE autoneg by default"). The reason is that
      even though the rtl8211b doesn't support the MMD extended registers
      access, it does return some random values if we trying to access
      the MMD register via indirect method. This makes it seem that the
      EEE is supported by this phy device. And the subsequent writing to
      the MMD registers does cause the phy malfunction. So use the dummy
      stubs for the MMD register access to fix this issue.
      
      Fixes: b6b5e8a6
      
       ("gianfar: Disable EEE autoneg by default")
      Signed-off-by: default avatarKevin Hao <haokexin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0231b1a0
    • Kevin Hao's avatar
      net: phy: Add general dummy stubs for MMD register access · 5df7af85
      Kevin Hao authored
      For some phy devices, even though they don't support the MMD extended
      register access, it does have some side effect if we are trying to
      read/write the MMD registers via indirect method. So introduce general
      dummy stubs for MMD register access which these devices can use to avoid
      such side effect.
      
      Fixes: b6b5e8a6
      
       ("gianfar: Disable EEE autoneg by default")
      Signed-off-by: default avatarKevin Hao <haokexin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5df7af85
    • David S. Miller's avatar
      Merge tag 'batadv-net-for-davem-20180319' of git://git.open-mesh.org/linux-merge · ee54a9f9
      David S. Miller authored
      
      
      Simon Wunderlich says:
      
      ====================
      Here are some batman-adv bugfixes:
      
       - fix possible IPv6 packet loss when multicast extension is used, by Linus Luessing
      
       - fix SKB handling issues for TTVN and DAT, by Matthias Schiffer (two patches)
      
       - fix include for eventpoll, by Sven Eckelmann
      
       - fix skb checksum for ttvn reroutes, by Sven Eckelmann
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee54a9f9
    • David S. Miller's avatar
      Merge branch 'net-sched-action-idr-leak' · ba9a1908
      David S. Miller authored
      
      
      Davide Caratti says:
      
      ====================
      fix idr leak in actions
      
      This series fixes situations where a temporary failure to install a TC
      action results in the permanent impossibility to reuse the configured
      value of 'index'.
      
      Thanks to Cong Wang for the initial review.
      
      v2: fix build error in act_ipt.c, reported by kbuild test robot
      ====================
      
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba9a1908
    • Davide Caratti's avatar
      net/sched: fix idr leak in the error path of tcf_skbmod_init() · f29cdfbe
      Davide Caratti authored
      tcf_skbmod_init() can fail after the idr has been successfully reserved.
      When this happens, every subsequent attempt to configure skbmod rules
      using the same idr value will systematically fail with -ENOSPC, unless
      the first attempt was done using the 'replace' keyword:
      
       # tc action add action skbmod swap mac index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action add action skbmod swap mac index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc action add action skbmod swap mac index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       ...
      
      Fix this in tcf_skbmod_init(), ensuring that tcf_idr_release() is called
      on the error path when the idr has been reserved, but not yet inserted.
      Also, don't test 'ovr' in the error path, to avoid a 'replace' failure
      implicitly become a 'delete' that leaks refcount in act_skbmod module:
      
       # rmmod act_skbmod; modprobe act_skbmod
       # tc action add action skbmod swap mac index 100
       # tc action add action skbmod swap mac continue index 100
       RTNETLINK answers: File exists
       We have an error talking to the kernel
       # tc action replace action skbmod swap mac continue index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action list action skbmod
       #
       # rmmod  act_skbmod
       rmmod: ERROR: Module act_skbmod is in use
      
      Fixes: 65a206c0
      
       ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f29cdfbe
    • Davide Caratti's avatar
      net/sched: fix idr leak in the error path of tcf_vlan_init() · d7f20015
      Davide Caratti authored
      tcf_vlan_init() can fail after the idr has been successfully reserved.
      When this happens, every subsequent attempt to configure vlan rules using
      the same idr value will systematically fail with -ENOSPC, unless the first
      attempt was done using the 'replace' keyword.
      
       # tc action add action vlan pop index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action add action vlan pop index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc action add action vlan pop index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       ...
      
      Fix this in tcf_vlan_init(), ensuring that tcf_idr_release() is called on
      the error path when the idr has been reserved, but not yet inserted. Also,
      don't test 'ovr' in the error path, to avoid a 'replace' failure implicitly
      become a 'delete' that leaks refcount in act_vlan module:
      
       # rmmod act_vlan; modprobe act_vlan
       # tc action add action vlan push id 5 index 100
       # tc action replace action vlan push id 7 index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action list action vlan
       #
       # rmmod act_vlan
       rmmod: ERROR: Module act_vlan is in use
      
      Fixes: 4c5b9d96 ("act_vlan: VLAN action rewrite to use RCU lock/unlock and update")
      Fixes: 65a206c0
      
       ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7f20015
    • Davide Caratti's avatar
      net/sched: fix idr leak in the error path of __tcf_ipt_init() · 1e46ef17
      Davide Caratti authored
      __tcf_ipt_init() can fail after the idr has been successfully reserved.
      When this happens, subsequent attempts to configure xt/ipt rules using
      the same idr value systematically fail with -ENOSPC:
      
       # tc action add action xt -j LOG --log-prefix test1 index 100
       tablename: mangle hook: NF_IP_POST_ROUTING
               target:  LOG level warning prefix "test1" index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       Command "(null)" is unknown, try "tc actions help".
       # tc action add action xt -j LOG --log-prefix test1 index 100
       tablename: mangle hook: NF_IP_POST_ROUTING
               target:  LOG level warning prefix "test1" index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       Command "(null)" is unknown, try "tc actions help".
       # tc action add action xt -j LOG --log-prefix test1 index 100
       tablename: mangle hook: NF_IP_POST_ROUTING
               target:  LOG level warning prefix "test1" index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       ...
      
      Fix this in the error path of __tcf_ipt_init(), calling tcf_idr_release()
      in place of tcf_idr_cleanup(). Since tcf_ipt_release() can now be called
      when tcfi_t is NULL, we also need to protect calls to ipt_destroy_target()
      to avoid NULL pointer dereference.
      
      Fixes: 65a206c0
      
       ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e46ef17
    • Davide Caratti's avatar
      net/sched: fix idr leak in the error path of tcp_pedit_init() · 94fa3f92
      Davide Caratti authored
      tcf_pedit_init() can fail to allocate 'keys' after the idr has been
      successfully reserved. When this happens, subsequent attempts to configure
      a pedit rule using the same idr value systematically fail with -ENOSPC:
      
       # tc action add action pedit munge ip ttl set 63 index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action add action pedit munge ip ttl set 63 index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc action add action pedit munge ip ttl set 63 index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       ...
      
      Fix this in the error path of tcf_act_pedit_init(), calling
      tcf_idr_release() in place of tcf_idr_cleanup().
      
      Fixes: 65a206c0
      
       ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94fa3f92
    • Davide Caratti's avatar
      net/sched: fix idr leak in the error path of tcf_act_police_init() · 5bf7f818
      Davide Caratti authored
      tcf_act_police_init() can fail after the idr has been successfully
      reserved (e.g., qdisc_get_rtab() may return NULL). When this happens,
      subsequent attempts to configure a police rule using the same idr value
      systematiclly fail with -ENOSPC:
      
       # tc action add action police rate 1000 burst 1000 drop index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action add action police rate 1000 burst 1000 drop index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc action add action police rate 1000 burst 1000 drop index 100
       RTNETLINK answers: No space left on device
       ...
      
      Fix this in the error path of tcf_act_police_init(), calling
      tcf_idr_release() in place of tcf_idr_cleanup().
      
      Fixes: 65a206c0
      
       ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5bf7f818
    • Davide Caratti's avatar
      net/sched: fix idr leak in the error path of tcf_simp_init() · 60e10b3a
      Davide Caratti authored
      if the kernel fails to duplicate 'sdata', creation of a new action fails
      with -ENOMEM. However, subsequent attempts to install the same action
      using the same value of 'index' systematically fail with -ENOSPC, and
      that value of 'index' will no more be usable by act_simple, until rmmod /
      insmod of act_simple.ko is done:
      
       # tc actions add action simple sdata hello index 100
       # tc actions list action simple
      
              action order 0: Simple <hello>
               index 100 ref 1 bind 0
       # tc actions flush action simple
       # tc actions add action simple sdata hello index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc actions flush action simple
       # tc actions add action simple sdata hello index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc actions add action simple sdata hello index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       ...
      
      Fix this in the error path of tcf_simp_init(), calling tcf_idr_release()
      in place of tcf_idr_cleanup().
      
      Fixes: 65a206c0
      
       ("net/sched: Change act_api and act_xxx modules to use IDR")
      Suggested-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60e10b3a
    • Davide Caratti's avatar
      net/sched: fix idr leak on the error path of tcf_bpf_init() · bbc09e78
      Davide Caratti authored
      when the following command sequence is entered
      
       # tc action add action bpf bytecode '4,40 0 0 12,31 0 1 2048,6 0 0 262144,6 0 0 0' index 100
       RTNETLINK answers: Invalid argument
       We have an error talking to the kernel
       # tc action add action bpf bytecode '4,40 0 0 12,21 0 1 2048,6 0 0 262144,6 0 0 0' index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
      
      act_bpf correctly refuses to install the first TC rule, because 31 is not
      a valid instruction. However, it refuses to install the second TC rule,
      even if the BPF code is correct. Furthermore, it's no more possible to
      install any other rule having the same value of 'index' until act_bpf
      module is unloaded/inserted again. After the idr has been reserved, call
      tcf_idr_release() instead of tcf_idr_cleanup(), to fix this issue.
      
      Fixes: 65a206c0
      
       ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbc09e78
    • Colin Ian King's avatar
      qede: fix spelling mistake: "registeration" -> "registration" · 3f2176dd
      Colin Ian King authored
      
      
      Trivial fix to spelling mistakes in DP_ERR error message text and
      comments
      
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f2176dd
    • Colin Ian King's avatar
      bnx2x: fix spelling mistake: "registeration" -> "registration" · 924613d3
      Colin Ian King authored
      
      
      Trivial fix to spelling mistake in BNX2X_ERR error message text
      
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Acked-by: default avatarSudarsana Kalluru <Sudarsana.Kalluru@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      924613d3
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 3d27484e
      David S. Miller authored
      
      
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-03-21
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Follow-up fix to the fault injection framework to prevent jump
         optimization on the kprobe by installing a dummy post-handler,
         from Masami.
      
      2) Drop bpf_perf_prog_read_value helper from tracepoint type programs
         which was mistakenly added there and would otherwise crash due to
         wrong input context, from Yonghong.
      
      3) Fix a crash in BPF fs when compiled with clang. Code appears to
         be fine just that clang tries to overly aggressive optimize in
         non C conform ways, therefore fix the kernel's Makefile to
         generally prevent such issues, from Daniel.
      
      4) Skip unnecessary capability checks in bpf syscall, which is otherwise
         triggering unnecessary security hooks on capability checking and
         causing false alarms on unprivileged processes trying to access
         CAP_SYS_ADMIN restricted infra, from Chenbo.
      
      5) Fix the test_bpf.ko module when CONFIG_BPF_JIT_ALWAYS_ON is set
         with regards to a test case that is really just supposed to fail
         on x8_64 JIT but not others, from Thadeu.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d27484e
  3. Mar 21, 2018
    • Daniel Borkmann's avatar
      kbuild: disable clang's default use of -fmerge-all-constants · 87e0d4f0
      Daniel Borkmann authored
      Prasad reported that he has seen crashes in BPF subsystem with netd
      on Android with arm64 in the form of (note, the taint is unrelated):
      
        [ 4134.721483] Unable to handle kernel paging request at virtual address 800000001
        [ 4134.820925] Mem abort info:
        [ 4134.901283]   Exception class = DABT (current EL), IL = 32 bits
        [ 4135.016736]   SET = 0, FnV = 0
        [ 4135.119820]   EA = 0, S1PTW = 0
        [ 4135.201431] Data abort info:
        [ 4135.301388]   ISV = 0, ISS = 0x00000021
        [ 4135.359599]   CM = 0, WnR = 0
        [ 4135.470873] user pgtable: 4k pages, 39-bit VAs, pgd = ffffffe39b946000
        [ 4135.499757] [0000000800000001] *pgd=0000000000000000, *pud=0000000000000000
        [ 4135.660725] Internal error: Oops: 96000021 [#1] PREEMPT SMP
        [ 4135.674610] Modules linked in:
        [ 4135.682883] CPU: 5 PID: 1260 Comm: netd Tainted: G S      W       4.14.19+ #1
        [ 4135.716188] task: ffffffe39f4aa380 task.stack: ffffff801d4e0000
        [ 4135.731599] PC is at bpf_prog_add+0x20/0x68
        [ 4135.741746] LR is at bpf_prog_inc+0x20/0x2c
        [ 4135.751788] pc : [<ffffff94ab7ad584>] lr : [<ffffff94ab7ad638>] pstate: 60400145
        [ 4135.769062] sp : ffffff801d4e3ce0
        [...]
        [ 4136.258315] Process netd (pid: 1260, stack limit = 0xffffff801d4e0000)
        [ 4136.273746] Call trace:
        [...]
        [ 4136.442494] 3ca0: ffffff94ab7ad584 0000000060400145 ffffffe3a01bf8f8 0000000000000006
        [ 4136.460936] 3cc0: 0000008000000000 ffffff94ab844204 ffffff801d4e3cf0 ffffff94ab7ad584
        [ 4136.479241] [<ffffff94ab7ad584>] bpf_prog_add+0x20/0x68
        [ 4136.491767] [<ffffff94ab7ad638>] bpf_prog_inc+0x20/0x2c
        [ 4136.504536] [<ffffff94ab7b5d08>] bpf_obj_get_user+0x204/0x22c
        [ 4136.518746] [<ffffff94ab7ade68>] SyS_bpf+0x5a8/0x1a88
      
      Android's netd was basically pinning the uid cookie BPF map in BPF
      fs (/sys/fs/bpf/traffic_cookie_uid_map) and later on retrieving it
      again resulting in above panic. Issue is that the map was wrongly
      identified as a prog! Above kernel was compiled with clang 4.0,
      and it turns out that clang decided to merge the bpf_prog_iops and
      bpf_map_iops into a single memory location, such that the two i_ops
      could then not be distinguished anymore.
      
      Reason for this miscompilation is that clang has the more aggressive
      -fmerge-all-constants enabled by default. In fact, clang source code
      has a comment about it in lib/AST/ExprConstant.cpp on why it is okay
      to do so:
      
        Pointers with different bases cannot represent the same object.
        (Note that clang defaults to -fmerge-all-constants, which can
        lead to inconsistent results for comparisons involving the address
        of a constant; this generally doesn't matter in practice.)
      
      The issue never appeared with gcc however, since gcc does not enable
      -fmerge-all-constants by default and even *explicitly* states in
      it's option description that using this flag results in non-conforming
      behavior, quote from man gcc:
      
        Languages like C or C++ require each variable, including multiple
        instances of the same variable in recursive calls, to have distinct
        locations, so using this option results in non-conforming behavior.
      
      There are also various clang bug reports open on that matter [1],
      where clang developers acknowledge the non-conforming behavior,
      and refer to disabling it with -fno-merge-all-constants. But even
      if this gets fixed in clang today, there are already users out there
      that triggered this. Thus, fix this issue by explicitly adding
      -fno-merge-all-constants to the kernel's Makefile to generically
      disable this optimization, since potentially other places in the
      kernel could subtly break as well.
      
      Note, there is also a flag called -fmerge-constants (not supported
      by clang), which is more conservative and only applies to strings
      and it's enabled in gcc's -O/-O2/-O3/-Os optimization levels. In
      gcc's code, the two flags -fmerge-{all-,}constants share the same
      variable internally, so when disabling it via -fno-merge-all-constants,
      then we really don't merge any const data (e.g. strings), and text
      size increases with gcc (14,927,214 -> 14,942,646 for vmlinux.o).
      
        $ gcc -fverbose-asm -O2 foo.c -S -o foo.S
          -> foo.S lists -fmerge-constants under options enabled
        $ gcc -fverbose-asm -O2 -fno-merge-all-constants foo.c -S -o foo.S
          -> foo.S doesn't list -fmerge-constants under options enabled
        $ gcc -fverbose-asm -O2 -fno-merge-all-constants -fmerge-constants foo.c -S -o foo.S
          -> foo.S lists -fmerge-constants under options enabled
      
      Thus, as a workaround we need to set both -fno-merge-all-constants
      *and* -fmerge-constants in the Makefile in order for text size to
      stay as is.
      
        [1] https://bugs.llvm.org/show_bug.cgi?id=18538
      
      
      
      Reported-by: default avatarPrasad Sodagudi <psodagud@codeaurora.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Chenbo Feng <fengc@google.com>
      Cc: Richard Smith <richard-llvm@metafoo.co.uk>
      Cc: Chandler Carruth <chandlerc@gmail.com>
      Cc: linux-kernel@vger.kernel.org
      Tested-by: default avatarPrasad Sodagudi <psodagud@codeaurora.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      87e0d4f0
    • Chenbo Feng's avatar
      bpf: skip unnecessary capability check · 0fa4fe85
      Chenbo Feng authored
      
      
      The current check statement in BPF syscall will do a capability check
      for CAP_SYS_ADMIN before checking sysctl_unprivileged_bpf_disabled. This
      code path will trigger unnecessary security hooks on capability checking
      and cause false alarms on unprivileged process trying to get CAP_SYS_ADMIN
      access. This can be resolved by simply switch the order of the statement
      and CAP_SYS_ADMIN is not required anyway if unprivileged bpf syscall is
      allowed.
      
      Signed-off-by: default avatarChenbo Feng <fengc@google.com>
      Acked-by: default avatarLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      0fa4fe85
    • Yonghong Song's avatar
      trace/bpf: remove helper bpf_perf_prog_read_value from tracepoint type programs · f005afed
      Yonghong Song authored
      Commit 4bebdc7a ("bpf: add helper bpf_perf_prog_read_value")
      added helper bpf_perf_prog_read_value so that perf_event type program
      can read event counter and enabled/running time.
      This commit, however, introduced a bug which allows this helper
      for tracepoint type programs. This is incorrect as bpf_perf_prog_read_value
      needs to access perf_event through its bpf_perf_event_data_kern type context,
      which is not available for tracepoint type program.
      
      This patch fixed the issue by separating bpf_func_proto between tracepoint
      and perf_event type programs and removed bpf_perf_prog_read_value
      from tracepoint func prototype.
      
      Fixes: 4bebdc7a
      
       ("bpf: add helper bpf_perf_prog_read_value")
      Reported-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f005afed
    • Thadeu Lima de Souza Cascardo's avatar
      test_bpf: Fix testing with CONFIG_BPF_JIT_ALWAYS_ON=y on other arches · 52fda36d
      Thadeu Lima de Souza Cascardo authored
      Function bpf_fill_maxinsns11 is designed to not be able to be JITed on
      x86_64. So, it fails when CONFIG_BPF_JIT_ALWAYS_ON=y, and
      commit 09584b40 ("bpf: fix selftests/bpf test_kmod.sh failure when
      CONFIG_BPF_JIT_ALWAYS_ON=y") makes sure that failure is detected on that
      case.
      
      However, it does not fail on other architectures, which have a different
      JIT compiler design. So, test_bpf has started to fail to load on those.
      
      After this fix, test_bpf loads fine on both x86_64 and ppc64el.
      
      Fixes: 09584b40
      
       ("bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y")
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Reviewed-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      52fda36d
    • Stefano Brivio's avatar
      ipv6: old_dport should be a __be16 in __ip6_datagram_connect() · 5f2fb802
      Stefano Brivio authored
      Fixes: 2f987a76
      
       ("net: ipv6: keep sk status consistent after datagram connect failure")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f2fb802