Skip to content
  1. Nov 28, 2017
  2. Nov 27, 2017
  3. Nov 26, 2017
  4. Nov 25, 2017
    • Roman Kapl's avatar
      net: sched: crash on blocks with goto chain action · a60b3f51
      Roman Kapl authored
      tcf_block_put_ext has assumed that all filters (and thus their goto
      actions) are destroyed in RCU callback and thus can not race with our
      list iteration. However, that is not true during netns cleanup (see
      tcf_exts_get_net comment).
      
      Prevent the user after free by holding all chains (except 0, that one is
      already held). foreach_safe is not enough in this case.
      
      To reproduce, run the following in a netns and then delete the ns:
          ip link add dtest type dummy
          tc qdisc add dev dtest ingress
          tc filter add dev dtest chain 1 parent ffff: handle 1 prio 1 flower action goto chain 2
      
      Fixes: 822e86d9
      
       ("net_sched: remove tcf_block_put_deferred()")
      Signed-off-by: default avatarRoman Kapl <code@rkapl.cz>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a60b3f51
    • Mika Westerberg's avatar
      net: thunderbolt: Stop using zero to mean no valid DMA mapping · 540c1115
      Mika Westerberg authored
      Commit 86dabda4
      
       ("net: thunderbolt: Clear finished Tx frame bus
      address in tbnet_tx_callback()") fixed a DMA-API violation where the
      driver called dma_unmap_page() in tbnet_free_buffers() for a bus address
      that might already be unmapped. The fix was to zero out the bus address
      of a frame in tbnet_tx_callback().
      
      However, as pointed out by David Miller, zero might well be valid
      mapping (at least in theory) so it is not good idea to use it here.
      
      It turns out that we don't need the whole map/unmap dance for Tx buffers
      at all. Instead we can map the buffers when they are initially allocated
      and unmap them when the interface is brought down. In between we just
      DMA sync the buffers for the CPU or device as needed.
      
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      540c1115
    • Sunil Goutham's avatar
      net: thunderx: Fix TCP/UDP checksum offload for IPv6 pkts · fa6d7cb5
      Sunil Goutham authored
      Don't offload IP header checksum to NIC.
      
      This fixes a previous patch which enabled checksum offloading
      for both IPv4 and IPv6 packets.  So L3 checksum offload was
      getting enabled for IPv6 pkts.  And HW is dropping these pkts
      as it assumes the pkt is IPv4 when IP csum offload is set
      in the SQ descriptor.
      
      Fixes:  3a9024f5
      
       ("net: thunderx: Enable TSO and checksum offloads for ipv6")
      Signed-off-by: default avatarSunil Goutham <sgoutham@cavium.com>
      Signed-off-by: default avatarAleksey Makarov <aleksey.makarov@auriga.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa6d7cb5
    • Johannes Berg's avatar
      cfg80211: select CRYPTO_SHA256 if needed · 01a95b21
      Johannes Berg authored
      
      
      When regulatory database certificates are built-in, they're
      currently using the SHA256 digest algorithm, so add that to
      the build in that case.
      
      Also add a note that for custom certificates, one may need
      to add the right algorithms.
      
      Reported-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      01a95b21
    • Zhu Yanjun's avatar
      forcedeth: replace pci_unmap_page with dma_unmap_page · ca43a0c7
      Zhu Yanjun authored
      
      
      The function pci_unmap_page is obsolete. So it is replaced with
      the function dma_unmap_page.
      
      CC: Srinivas Eeda <srinivas.eeda@oracle.com>
      CC: Joe Jin <joe.jin@oracle.com>
      CC: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca43a0c7
    • David S. Miller's avatar
      Merge tag 'rxrpc-fixes-20171124' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 5f109b94
      David S. Miller authored
      
      
      David Howells says:
      
      ====================
      rxrpc: Fixes and improvements
      
      Here's a set of patches that fix and improve some stuff in the AF_RXRPC
      protocol:
      
      The patches are:
      
       (1) Unlock mutex returned by rxrpc_accept_call().
      
       (2) Don't set connection upgrade by default.
      
       (3) Differentiate the call->user_mutex used by the kernel from that used
           by userspace calling sendmsg() to avoid lockdep warnings.
      
       (4) Delay terminal ACK transmission to a work queue so that it can be
           replaced by the next call if there is one.
      
       (5) Split the call parameters from the connection parameters so that more
           call-specific parameters can be passed through.
      
       (6) Fix the call timeouts to work the same as for other RxRPC/AFS
           implementations.
      
       (7) Don't transmit DELAY ACKs immediately, but instead delay them slightly
           so that can be discarded or can represent more packets.
      
       (8) Use RTT to calculate certain protocol timeouts.
      
       (9) Add a timeout to detect lost ACK/DATA packets.
      
      (10) Add a keepalive function so that we ping the peer if we haven't
           transmitted for a short while, thereby keeping intervening firewall
           routes open.
      
      (11) Make service endpoints expire like they're supposed to so that the UDP
           port can be reused.
      
      (12) Fix connection expiry timers to make cleanup happen in a more timely
           fashion.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f109b94
  5. Nov 24, 2017
    • David Howells's avatar
      rxrpc: Fix conn expiry timers · 3d18cbb7
      David Howells authored
      
      
      Fix the rxrpc connection expiry timers so that connections for closed
      AF_RXRPC sockets get deleted in a more timely fashion, freeing up the
      transport UDP port much more quickly.
      
       (1) Replace the delayed work items with work items plus timers so that
           timer_reduce() can be used to shorten them and so that the timer
           doesn't requeue the work item if the net namespace is dead.
      
       (2) Don't use queue_delayed_work() as that won't alter the timeout if the
           timer is already running.
      
       (3) Don't rearm the timers if the network namespace is dead.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      3d18cbb7
    • David Howells's avatar
      rxrpc: Fix service endpoint expiry · f859ab61
      David Howells authored
      
      
      RxRPC service endpoints expire like they're supposed to by the following
      means:
      
       (1) Mark dead rxrpc_net structs (with ->live) rather than twiddling the
           global service conn timeout, otherwise the first rxrpc_net struct to
           die will cause connections on all others to expire immediately from
           then on.
      
       (2) Mark local service endpoints for which the socket has been closed
           (->service_closed) so that the expiration timeout can be much
           shortened for service and client connections going through that
           endpoint.
      
       (3) rxrpc_put_service_conn() needs to schedule the reaper when the usage
           count reaches 1, not 0, as idle conns have a 1 count.
      
       (4) The accumulator for the earliest time we might want to schedule for
           should be initialised to jiffies + MAX_JIFFY_OFFSET, not ULONG_MAX as
           the comparison functions use signed arithmetic.
      
       (5) Simplify the expiration handling, adding the expiration value to the
           idle timestamp each time rather than keeping track of the time in the
           past before which the idle timestamp must go to be expired.  This is
           much easier to read.
      
       (6) Ignore the timeouts if the net namespace is dead.
      
       (7) Restart the service reaper work item rather the client reaper.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f859ab61
    • David Howells's avatar
      rxrpc: Add keepalive for a call · 415f44e4
      David Howells authored
      
      
      We need to transmit a packet every so often to act as a keepalive for the
      peer (which has a timeout from the last time it received a packet) and also
      to prevent any intervening firewalls from closing the route.
      
      Do this by resetting a timer every time we transmit a packet.  If the timer
      ever expires, we transmit a PING ACK packet and thereby also elicit a PING
      RESPONSE ACK from the other side - which prevents our last-rx timeout from
      expiring.
      
      The timer is set to 1/6 of the last-rx timeout so that we can detect the
      other side going away if it misses 6 replies in a row.
      
      This is particularly necessary for servers where the processing of the
      service function may take a significant amount of time.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      415f44e4
    • David Howells's avatar
      rxrpc: Add a timeout for detecting lost ACKs/lost DATA · bd1fdf8c
      David Howells authored
      
      
      Add an extra timeout that is set/updated when we send a DATA packet that
      has the request-ack flag set.  This allows us to detect if we don't get an
      ACK in response to the latest flagged packet.
      
      The ACK packet is adjudged to have been lost if it doesn't turn up within
      2*RTT of the transmission.
      
      If the timeout occurs, we schedule the sending of a PING ACK to find out
      the state of the other side.  If a new DATA packet is ready to go sooner,
      we cancel the sending of the ping and set the request-ack flag on that
      instead.
      
      If we get back a PING-RESPONSE ACK that indicates a lower tx_top than what
      we had at the time of the ping transmission, we adjudge all the DATA
      packets sent between the response tx_top and the ping-time tx_top to have
      been lost and retransmit immediately.
      
      Rather than sending a PING ACK, we could just pick a DATA packet and
      speculatively retransmit that with request-ack set.  It should result in
      either a REQUESTED ACK or a DUPLICATE ACK which we can then use in lieu the
      a PING-RESPONSE ACK mentioned above.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      bd1fdf8c
    • David Howells's avatar
      rxrpc: Express protocol timeouts in terms of RTT · beb8e5e4
      David Howells authored
      
      
      Express protocol timeouts for data retransmission and deferred ack
      generation in terms on RTT rather than specified timeouts once we have
      sufficient RTT samples.
      
      For the moment, this requires just one RTT sample to be able to use this
      for ack deferral and two for data retransmission.
      
      The data retransmission timeout is set at RTT*1.5 and the ACK deferral
      timeout is set at RTT.
      
      Note that the calculated timeout is limited to a minimum of 4ns to make
      sure it doesn't happen too quickly.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      beb8e5e4
    • David Howells's avatar
      rxrpc: Don't transmit DELAY ACKs immediately on proposal · 8637abaa
      David Howells authored
      
      
      Don't transmit a DELAY ACK immediately on proposal when the Rx window is
      rotated, but rather defer it to the work function.  This means that we have
      a chance to queue/consume more received packets before we actually send the
      DELAY ACK, or even cancel it entirely, thereby reducing the number of
      packets transmitted.
      
      We do, however, want to continue sending other types of packet immediately,
      particularly REQUESTED ACKs, as they may be used for RTT calculation by the
      other side.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      8637abaa
    • David Howells's avatar
      rxrpc: Fix call timeouts · a158bdd3
      David Howells authored
      
      
      Fix the rxrpc call expiration timeouts and make them settable from
      userspace.  By analogy with other rx implementations, there should be three
      timeouts:
      
       (1) "Normal timeout"
      
           This is set for all calls and is triggered if we haven't received any
           packets from the peer in a while.  It is measured from the last time
           we received any packet on that call.  This is not reset by any
           connection packets (such as CHALLENGE/RESPONSE packets).
      
           If a service operation takes a long time, the server should generate
           PING ACKs at a duration that's substantially less than the normal
           timeout so is to keep both sides alive.  This is set at 1/6 of normal
           timeout.
      
       (2) "Idle timeout"
      
           This is set only for a service call and is triggered if we stop
           receiving the DATA packets that comprise the request data.  It is
           measured from the last time we received a DATA packet.
      
       (3) "Hard timeout"
      
           This can be set for a call and specified the maximum lifetime of that
           call.  It should not be specified by default.  Some operations (such
           as volume transfer) take a long time.
      
      Allow userspace to set/change the timeouts on a call with sendmsg, using a
      control message:
      
      	RXRPC_SET_CALL_TIMEOUTS
      
      The data to the message is a number of 32-bit words, not all of which need
      be given:
      
      	u32 hard_timeout;	/* sec from first packet */
      	u32 idle_timeout;	/* msec from packet Rx */
      	u32 normal_timeout;	/* msec from data Rx */
      
      This can be set in combination with any other sendmsg() that affects a
      call.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      a158bdd3
    • David Howells's avatar
      rxrpc: Split the call params from the operation params · 48124178
      David Howells authored
      
      
      When rxrpc_sendmsg() parses the control message buffer, it places the
      parameters extracted into a structure, but lumps together call parameters
      (such as user call ID) with operation parameters (such as whether to send
      data, send an abort or accept a call).
      
      Split the call parameters out into their own structure, a copy of which is
      then embedded in the operation parameters struct.
      
      The call parameters struct is then passed down into the places that need it
      instead of passing the individual parameters.  This allows for extra call
      parameters to be added.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      48124178
    • David Howells's avatar
      rxrpc: Delay terminal ACK transmission on a client call · 3136ef49
      David Howells authored
      
      
      Delay terminal ACK transmission on a client call by deferring it to the
      connection processor.  This allows it to be skipped if we can send the next
      call instead, the first DATA packet of which will implicitly ack this call.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      3136ef49
    • David Howells's avatar
      rxrpc: Provide a different lockdep key for call->user_mutex for kernel calls · 9faaff59
      David Howells authored
      
      
      Provide a different lockdep key for rxrpc_call::user_mutex when the call is
      made on a kernel socket, such as by the AFS filesystem.
      
      The problem is that lockdep registers a false positive between userspace
      calling the sendmsg syscall on a user socket where call->user_mutex is held
      whilst userspace memory is accessed whereas the AFS filesystem may perform
      operations with mmap_sem held by the caller.
      
      In such a case, the following warning is produced.
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.14.0-fscache+ #243 Tainted: G            E
      ------------------------------------------------------
      modpost/16701 is trying to acquire lock:
       (&vnode->io_lock){+.+.}, at: [<ffffffffa000fc40>] afs_begin_vnode_operation+0x33/0x77 [kafs]
      
      but task is already holding lock:
       (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #3 (&mm->mmap_sem){++++}:
             __might_fault+0x61/0x89
             _copy_from_iter_full+0x40/0x1fa
             rxrpc_send_data+0x8dc/0xff3
             rxrpc_do_sendmsg+0x62f/0x6a1
             rxrpc_sendmsg+0x166/0x1b7
             sock_sendmsg+0x2d/0x39
             ___sys_sendmsg+0x1ad/0x22b
             __sys_sendmsg+0x41/0x62
             do_syscall_64+0x89/0x1be
             return_from_SYSCALL_64+0x0/0x75
      
      -> #2 (&call->user_mutex){+.+.}:
             __mutex_lock+0x86/0x7d2
             rxrpc_new_client_call+0x378/0x80e
             rxrpc_kernel_begin_call+0xf3/0x154
             afs_make_call+0x195/0x454 [kafs]
             afs_vl_get_capabilities+0x193/0x198 [kafs]
             afs_vl_lookup_vldb+0x5f/0x151 [kafs]
             afs_create_volume+0x2e/0x2f4 [kafs]
             afs_mount+0x56a/0x8d7 [kafs]
             mount_fs+0x6a/0x109
             vfs_kern_mount+0x67/0x135
             do_mount+0x90b/0xb57
             SyS_mount+0x72/0x98
             do_syscall_64+0x89/0x1be
             return_from_SYSCALL_64+0x0/0x75
      
      -> #1 (k-sk_lock-AF_RXRPC){+.+.}:
             lock_sock_nested+0x74/0x8a
             rxrpc_kernel_begin_call+0x8a/0x154
             afs_make_call+0x195/0x454 [kafs]
             afs_fs_get_capabilities+0x17a/0x17f [kafs]
             afs_probe_fileserver+0xf7/0x2f0 [kafs]
             afs_select_fileserver+0x83f/0x903 [kafs]
             afs_fetch_status+0x89/0x11d [kafs]
             afs_iget+0x16f/0x4f8 [kafs]
             afs_mount+0x6c6/0x8d7 [kafs]
             mount_fs+0x6a/0x109
             vfs_kern_mount+0x67/0x135
             do_mount+0x90b/0xb57
             SyS_mount+0x72/0x98
             do_syscall_64+0x89/0x1be
             return_from_SYSCALL_64+0x0/0x75
      
      -> #0 (&vnode->io_lock){+.+.}:
             lock_acquire+0x174/0x19f
             __mutex_lock+0x86/0x7d2
             afs_begin_vnode_operation+0x33/0x77 [kafs]
             afs_fetch_data+0x80/0x12a [kafs]
             afs_readpages+0x314/0x405 [kafs]
             __do_page_cache_readahead+0x203/0x2ba
             filemap_fault+0x179/0x54d
             __do_fault+0x17/0x60
             __handle_mm_fault+0x6d7/0x95c
             handle_mm_fault+0x24e/0x2a3
             __do_page_fault+0x301/0x486
             do_page_fault+0x236/0x259
             page_fault+0x22/0x30
             __clear_user+0x3d/0x60
             padzero+0x1c/0x2b
             load_elf_binary+0x785/0xdc7
             search_binary_handler+0x81/0x1ff
             do_execveat_common.isra.14+0x600/0x888
             do_execve+0x1f/0x21
             SyS_execve+0x28/0x2f
             do_syscall_64+0x89/0x1be
             return_from_SYSCALL_64+0x0/0x75
      
      other info that might help us debug this:
      
      Chain exists of:
        &vnode->io_lock --> &call->user_mutex --> &mm->mmap_sem
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&mm->mmap_sem);
                                     lock(&call->user_mutex);
                                     lock(&mm->mmap_sem);
        lock(&vnode->io_lock);
      
       *** DEADLOCK ***
      
      1 lock held by modpost/16701:
       #0:  (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486
      
      stack backtrace:
      CPU: 0 PID: 16701 Comm: modpost Tainted: G            E   4.14.0-fscache+ #243
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      Call Trace:
       dump_stack+0x67/0x8e
       print_circular_bug+0x341/0x34f
       check_prev_add+0x11f/0x5d4
       ? add_lock_to_list.isra.12+0x8b/0x8b
       ? add_lock_to_list.isra.12+0x8b/0x8b
       ? __lock_acquire+0xf77/0x10b4
       __lock_acquire+0xf77/0x10b4
       lock_acquire+0x174/0x19f
       ? afs_begin_vnode_operation+0x33/0x77 [kafs]
       __mutex_lock+0x86/0x7d2
       ? afs_begin_vnode_operation+0x33/0x77 [kafs]
       ? afs_begin_vnode_operation+0x33/0x77 [kafs]
       ? afs_begin_vnode_operation+0x33/0x77 [kafs]
       afs_begin_vnode_operation+0x33/0x77 [kafs]
       afs_fetch_data+0x80/0x12a [kafs]
       afs_readpages+0x314/0x405 [kafs]
       __do_page_cache_readahead+0x203/0x2ba
       ? filemap_fault+0x179/0x54d
       filemap_fault+0x179/0x54d
       __do_fault+0x17/0x60
       __handle_mm_fault+0x6d7/0x95c
       handle_mm_fault+0x24e/0x2a3
       __do_page_fault+0x301/0x486
       do_page_fault+0x236/0x259
       page_fault+0x22/0x30
      RIP: 0010:__clear_user+0x3d/0x60
      RSP: 0018:ffff880071e93da0 EFLAGS: 00010202
      RAX: 0000000000000000 RBX: 000000000000011c RCX: 000000000000011c
      RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000060f720
      RBP: 000000000060f720 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000001 R11: ffff8800b5459b68 R12: ffff8800ce150e00
      R13: 000000000060f720 R14: 00000000006127a8 R15: 0000000000000000
       padzero+0x1c/0x2b
       load_elf_binary+0x785/0xdc7
       search_binary_handler+0x81/0x1ff
       do_execveat_common.isra.14+0x600/0x888
       do_execve+0x1f/0x21
       SyS_execve+0x28/0x2f
       do_syscall_64+0x89/0x1be
       entry_SYSCALL64_slow_path+0x25/0x25
      RIP: 0033:0x7fdb6009ee07
      RSP: 002b:00007fff566d9728 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
      RAX: ffffffffffffffda RBX: 000055ba57280900 RCX: 00007fdb6009ee07
      RDX: 000055ba5727f270 RSI: 000055ba5727cac0 RDI: 000055ba57280900
      RBP: 000055ba57280900 R08: 00007fff566d9700 R09: 0000000000000000
      R10: 000055ba5727cac0 R11: 0000000000000246 R12: 0000000000000000
      R13: 000055ba5727cac0 R14: 000055ba5727f270 R15: 0000000000000000
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      9faaff59