- Feb 01, 2016
-
-
Lorenzo Pieralisi authored
commit de818bd4 upstream. The function graph tracer adds instrumentation that is required to trace both entry and exit of a function. In particular the function graph tracer updates the "return address" of a function in order to insert a trace callback on function exit. Kernel power management functions like cpu_suspend() are called upon power down entry with functions called "finishers" that are in turn called to trigger the power down sequence but they may not return to the kernel through the normal return path. When the core resumes from low-power it returns to the cpu_suspend() function through the cpu_resume path, which leaves the trace stack frame set-up by the function tracer in an incosistent state upon return to the kernel when tracing is enabled. This patch fixes the issue by pausing/resuming the function graph tracer on the thread executing cpu_suspend() (ie the function call that subsequently triggers the "suspend finishers"), so that the function graph tracer state is kept consistent across functions that enter power down states and never return by effectively disabling graph tracer while they are executing. Fixes: 819e50e2 ("arm64: Add ftrace support") Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reported-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Zi Shen Lim authored
commit 14e589ff upstream. Turns out in the case of modulo by zero in a BPF program: A = A % X; (X == 0) the expected behavior is to terminate with return value 0. The bug in JIT is exposed by a new test case [1]. [1] https://lkml.org/lkml/2015/11/4/499 Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com> Reported-by: Yang Shi <yang.shi@linaro.org> Reported-by: Xi Wang <xi.wang@gmail.com> CC: Alexei Starovoitov <ast@plumgrid.com> Fixes: e54bcde3 ("arm64: eBPF JIT compiler") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Zi Shen Lim authored
commit 251599e1 upstream. In the case of division by zero in a BPF program: A = A / X; (X == 0) the expected behavior is to terminate with return value 0. This is confirmed by the test case introduced in commit 86bf1721 ("test_bpf: add tests checking that JIT/interpreter sets A and X to 0."). Reported-by: Yang Shi <yang.shi@linaro.org> Tested-by: Yang Shi <yang.shi@linaro.org> CC: Xi Wang <xi.wang@gmail.com> CC: Alexei Starovoitov <ast@plumgrid.com> CC: linux-arm-kernel@lists.infradead.org CC: linux-kernel@vger.kernel.org Fixes: e54bcde3 ("arm64: eBPF JIT compiler") Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Li Bin authored
commit 2ee8a74f upstream. By now, the recordmcount only records the function that in following sections: .text/.ref.text/.sched.text/.spinlock.text/.irqentry.text/ .kprobes.text/.text.unlikely For the function that not in these sections, the call mcount will be in place and not be replaced when kernel boot up. And it will bring performance overhead, such as do_mem_abort (in .exception.text section). This patch make the call mcount to nop for this case in recordmcount. Link: http://lkml.kernel.org/r/1446019445-14421-1-git-send-email-huawei.libin@huawei.com Link: http://lkml.kernel.org/r/1446193864-24593-4-git-send-email-huawei.libin@huawei.com Cc: <lkp@intel.com> Cc: <catalin.marinas@arm.com> Cc: <takahiro.akashi@linaro.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Li Bin <huawei.libin@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ulrich Weigand authored
commit a61674bd upstream. GCC 6 will include changes to generated code with -mcmodel=large, which is used to build kernel modules on powerpc64le. This was necessary because the large model is supposed to allow arbitrary sizes and locations of the code and data sections, but the ELFv2 global entry point prolog still made the unconditional assumption that the TOC associated with any particular function can be found within 2 GB of the function entry point: func: addis r2,r12,(.TOC.-func)@ha addi r2,r2,(.TOC.-func)@l .localentry func, .-func To remove this assumption, GCC will now generate instead this global entry point prolog sequence when using -mcmodel=large: .quad .TOC.-func func: .reloc ., R_PPC64_ENTRY ld r2, -8(r12) add r2, r2, r12 .localentry func, .-func The new .reloc triggers an optimization in the linker that will replace this new prolog with the original code (see above) if the linker determines that the distance between .TOC. and func is in range after all. Since this new relocation is now present in module object files, the kernel module loader is required to handle them too. This patch adds support for the new relocation and implements the same optimization done by the GNU linker. Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ulrich Weigand authored
commit 2e50c4be upstream. If a text section starts out with a data blob before the first function start label, disassembly parsing doing in recordmcount.pl gets confused on powerpc, leading to creation of corrupted module objects. This was not a problem so far since the compiler would never create such text sections. However, this has changed with a recent change in GCC 6 to support distances of > 2GB between a function and its assoicated TOC in the ELFv2 ABI, exposing this problem. There is already code in recordmcount.pl to handle such data blobs on the sparc64 platform. This patch uses the same method to handle those on powerpc as well. Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Boqun Feng authored
commit 81d7a329 upstream. According to memory-barriers.txt, xchg*, cmpxchg* and their atomic_ versions all need to be fully ordered, however they are now just RELEASE+ACQUIRE, which are not fully ordered. So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in __{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit b97021f8 ("powerpc: Fix atomic_xxx_return barrier semantics") This patch depends on patch "powerpc: Make value-returning atomics fully ordered" for PPC_ATOMIC_ENTRY_BARRIER definition. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Boqun Feng authored
commit 49e9cf3f upstream. According to memory-barriers.txt: > Any atomic operation that modifies some state in memory and returns > information about the state (old or new) implies an SMP-conditional > general memory barrier (smp_mb()) on each side of the actual > operation ... Which mean these operations should be fully ordered. However on PPC, PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation, which is currently "lwsync" if SMP=y. The leading "lwsync" can not guarantee fully ordered atomics, according to Paul Mckenney: https://lkml.org/lkml/2015/10/14/970 To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee the fully-ordered semantics. This also makes futex atomics fully ordered, which can avoid possible memory ordering problems if userspace code relies on futex system call for fully ordered semantics. Fixes: b97021f8 ("powerpc: Fix atomic_xxx_return barrier semantics") Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stewart Smith authored
commit 98da62b7 upstream. When running on newer OPAL firmware that supports sending extra OPAL_MSG types, we would print a warning on *every* message received. This could be a problem for kernels that don't support OPAL_MSG_OCC on machines that are running real close to thermal limits and the OCC is throttling the chip. For a kernel that is paying attention to the message queue, we could get these notifications quite often. Conceivably, future message types could also come fairly often, and printing that we didn't understand them 10,000 times provides no further information than printing them once. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Neuling authored
commit 7f821fc9 upstream. Currently we can hit a scenario where we'll tm_reclaim() twice. This results in a TM bad thing exception because the second reclaim occurs when not in suspend mode. The scenario in which this can happen is the following. We attempt to deliver a signal to userspace. To do this we need obtain the stack pointer to write the signal context. To get this stack pointer we must tm_reclaim() in case we need to use the checkpointed stack pointer (see get_tm_stackpointer()). Normally we'd then return directly to userspace to deliver the signal without going through __switch_to(). Unfortunatley, if at this point we get an error (such as a bad userspace stack pointer), we need to exit the process. The exit will result in a __switch_to(). __switch_to() will attempt to save the process state which results in another tm_reclaim(). This tm_reclaim() now causes a TM Bad Thing exception as this state has already been saved and the processor is no longer in TM suspend mode. Whee! This patch checks the state of the MSR to ensure we are TM suspended before we attempt the tm_reclaim(). If we've already saved the state away, we should no longer be in TM suspend mode. This has the additional advantage of checking for a potential TM Bad Thing exception. Found using syscall fuzzer. Fixes: fb09692e ("powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes") Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Neuling authored
commit d2b9d2a5 upstream. Currently we allow both the MSR T and S bits to be set by userspace on a signal return. Unfortunately this is a reserved configuration and will cause a TM Bad Thing exception if attempted (via rfid). This patch checks for this case in both the 32 and 64 bit signals code. If both T and S are set, we mark the context as invalid. Found using a syscall fuzzer. Fixes: 2b0a576d ("powerpc: Add new transactional memory state to the signal context") Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Streetman authored
[ Upstream commit a8a572a6 ] Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops templates; their dst_entries counters will never be used. Move the xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init and dst_entries_destroy for each net namespace. The ipv4 and ipv6 xfrms each create dst_ops template, and perform dst_entries_init on the templates. The template values are copied to each net namespace's xfrm.xfrm*_dst_ops. The problem there is the dst_ops pcpuc_entries field is a percpu counter and cannot be used correctly by simply copying it to another object. The result of this is a very subtle bug; changes to the dst entries counter from one net namespace may sometimes get applied to a different net namespace dst entries counter. This is because of how the percpu counter works; it has a main count field as well as a pointer to the percpu variables. Each net namespace maintains its own main count variable, but all point to one set of percpu variables. When any net namespace happens to change one of the percpu variables to outside its small batch range, its count is moved to the net namespace's main count variable. So with multiple net namespaces operating concurrently, the dst_ops entries counter can stray from the actual value that it should be; if counts are consistently moved from one net namespace to another (which my testing showed is likely), then one net namespace winds up with a negative dst_ops count while another winds up with a continually increasing count, eventually reaching its gc_thresh limit, which causes all new traffic on the net namespace to fail with -ENOBUFS. Signed-off-by: Dan Streetman <dan.streetman@canonical.com> Signed-off-by: Dan Streetman <ddstreet@ieee.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Joe Jin authored
[ Upstream commit ca88ea12 ] Sometimes xennet_create_queues() may failed to created all requested queues, we need to update num_queues to real created to avoid NULL pointer dereference. Signed-off-by: Joe Jin <joe.jin@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David S. Miller <davem@davemloft.net> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Wei Liu authored
[ Upstream commit 32a84405 ] Originally that parameter was always reset to num_online_cpus during module initialisation, which renders it useless. The fix is to only set max_queues to num_online_cpus when user has not provided a value. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Tested-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Wei Liu authored
[ Upstream commit 4c82ac3c ] Originally that parameter was always reset to num_online_cpus during module initialisation, which renders it useless. The fix is to only set max_queues to num_online_cpus when user has not provided a value. Reported-by: Johnny Strom <johnny.strom@linuxsolutions.fi> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Karl Heiss authored
[ Upstream commit 635682a1 ] A case can occur when sctp_accept() is called by the user during a heartbeat timeout event after the 4-way handshake. Since sctp_assoc_migrate() changes both assoc->base.sk and assoc->ep, the bh_sock_lock in sctp_generate_heartbeat_event() will be taken with the listening socket but released with the new association socket. The result is a deadlock on any future attempts to take the listening socket lock. Note that this race can occur with other SCTP timeouts that take the bh_lock_sock() in the event sctp_accept() is called. BUG: soft lockup - CPU#9 stuck for 67s! [swapper:0] ... RIP: 0010:[<ffffffff8152d48e>] [<ffffffff8152d48e>] _spin_lock+0x1e/0x30 RSP: 0018:ffff880028323b20 EFLAGS: 00000206 RAX: 0000000000000002 RBX: ffff880028323b20 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff880028323be0 RDI: ffff8804632c4b48 RBP: ffffffff8100bb93 R08: 0000000000000000 R09: 0000000000000000 R10: ffff880610662280 R11: 0000000000000100 R12: ffff880028323aa0 R13: ffff8804383c3880 R14: ffff880028323a90 R15: ffffffff81534225 FS: 0000000000000000(0000) GS:ffff880028320000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000006df528 CR3: 0000000001a85000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff880616b70000, task ffff880616b6cab0) Stack: ffff880028323c40 ffffffffa01c2582 ffff880614cfb020 0000000000000000 <d> 0100000000000000 00000014383a6c44 ffff8804383c3880 ffff880614e93c00 <d> ffff880614e93c00 0000000000000000 ffff8804632c4b00 ffff8804383c38b8 Call Trace: <IRQ> [<ffffffffa01c2582>] ? sctp_rcv+0x492/0xa10 [sctp] [<ffffffff8148c559>] ? nf_iterate+0x69/0xb0 [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff8148c716>] ? nf_hook_slow+0x76/0x120 [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0 [<ffffffff8149757d>] ? ip_local_deliver_finish+0xdd/0x2d0 [<ffffffff81497808>] ? ip_local_deliver+0x98/0xa0 [<ffffffff81496ccd>] ? ip_rcv_finish+0x12d/0x440 [<ffffffff81497255>] ? ip_rcv+0x275/0x350 [<ffffffff8145cfeb>] ? __netif_receive_skb+0x4ab/0x750 ... With lockdep debugging: ===================================== [ BUG: bad unlock balance detected! ] ------------------------------------- CslRx/12087 is trying to release lock (slock-AF_INET) at: [<ffffffffa01bcae0>] sctp_generate_timeout_event+0x40/0xe0 [sctp] but there are no more locks to release! other info that might help us debug this: 2 locks held by CslRx/12087: #0: (&asoc->timers[i]){+.-...}, at: [<ffffffff8108ce1f>] run_timer_softirq+0x16f/0x3e0 #1: (slock-AF_INET){+.-...}, at: [<ffffffffa01bcac3>] sctp_generate_timeout_event+0x23/0xe0 [sctp] Ensure the socket taken is also the same one that is released by saving a copy of the socket before entering the timeout event critical section. Signed-off-by: Karl Heiss <kheiss@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ido Schimmel authored
[ Upstream commit 60a6531b ] We can't be within an RCU read-side critical section when deleting VLANs, as underlying drivers might sleep during the hardware operation. Therefore, replace the RCU critical section with a mutex. This is consistent with team_vlan_rx_add_vid. Fixes: 3d249d4c ("net: introduce ethernet teaming device") Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sven Eckelmann authored
[ Upstream commit 42eff6a6 ] It is not allowed to free the memory of an object which is part of a list which is protected by rcu-read-side-critical sections without making sure that no other context is accessing the object anymore. This usually happens by removing the references to this object and then waiting until the rcu grace period is over and no one (allowedly) accesses it anymore. But the _now functions ignore this completely. They free the object directly even when a different context still tries to access it. This has to be avoided and thus these functions must be removed and all functions have to use batadv_orig_node_free_ref. Fixes: 72822225 ("batman-adv: Fix rcu_barrier() miss due to double call_rcu() in TT code") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sven Eckelmann authored
[ Upstream commit b4d922cf ] It is not allowed to free the memory of an object which is part of a list which is protected by rcu-read-side-critical sections without making sure that no other context is accessing the object anymore. This usually happens by removing the references to this object and then waiting until the rcu grace period is over and no one (allowedly) accesses it anymore. But the _now functions ignore this completely. They free the object directly even when a different context still tries to access it. This has to be avoided and thus these functions must be removed and all functions have to use batadv_hardif_free_ref. Fixes: 89652331 ("batman-adv: split tq information in neigh_node struct") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sven Eckelmann authored
[ Upstream commit ae3e1e36 ] It is not allowed to free the memory of an object which is part of a list which is protected by rcu-read-side-critical sections without making sure that no other context is accessing the object anymore. This usually happens by removing the references to this object and then waiting until the rcu grace period is over and no one (allowedly) accesses it anymore. But the _now functions ignore this completely. They free the object directly even when a different context still tries to access it. This has to be avoided and thus these functions must be removed and all functions have to use batadv_neigh_ifinfo_free_ref. Fixes: 89652331 ("batman-adv: split tq information in neigh_node struct") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sven Eckelmann authored
[ Upstream commit 2baa753c ] It is not allowed to free the memory of an object which is part of a list which is protected by rcu-read-side-critical sections without making sure that no other context is accessing the object anymore. This usually happens by removing the references to this object and then waiting until the rcu grace period is over and no one (allowedly) accesses it anymore. But the _now functions ignore this completely. They free the object directly even when a different context still tries to access it. This has to be avoided and thus these functions must be removed and all functions have to use batadv_neigh_node_free_ref. Fixes: 89652331 ("batman-adv: split tq information in neigh_node struct") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sven Eckelmann authored
[ Upstream commit deed9660 ] It is not allowed to free the memory of an object which is part of a list which is protected by rcu-read-side-critical sections without making sure that no other context is accessing the object anymore. This usually happens by removing the references to this object and then waiting until the rcu grace period is over and no one (allowedly) accesses it anymore. But the _now functions ignore this completely. They free the object directly even when a different context still tries to access it. This has to be avoided and thus these functions must be removed and all functions have to use batadv_orig_ifinfo_free_ref. Fixes: 7351a482 ("batman-adv: split out router from orig_node") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sven Eckelmann authored
[ Upstream commit 44e8e7e9 ] The batadv_nc_node_free_ref function uses call_rcu to delay the free of the batadv_nc_node object until no (already started) rcu_read_lock is enabled anymore. This makes sure that no context is still trying to access the object which should be removed. But batadv_nc_node also contains a reference to orig_node which must be removed. The reference drop of orig_node was done in the call_rcu function batadv_nc_node_free_rcu but should actually be done in the batadv_nc_node_release function to avoid nested call_rcus. This is important because rcu_barrier (e.g. batadv_softif_free or batadv_exit) will not detect the inner call_rcu as relevant for its execution. Otherwise this barrier will most likely be inserted in the queue before the callback of the first call_rcu was executed. The caller of rcu_barrier will therefore continue to run before the inner call_rcu callback finished. Fixes: d56b1705 ("batman-adv: network coding - detect coding nodes and remove these after timeout") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sven Eckelmann authored
[ Upstream commit 63b39927 ] The batadv_claim_free_ref function uses call_rcu to delay the free of the batadv_bla_claim object until no (already started) rcu_read_lock is enabled anymore. This makes sure that no context is still trying to access the object which should be removed. But batadv_bla_claim also contains a reference to backbone_gw which must be removed. The reference drop of backbone_gw was done in the call_rcu function batadv_claim_free_rcu but should actually be done in the batadv_claim_release function to avoid nested call_rcus. This is important because rcu_barrier (e.g. batadv_softif_free or batadv_exit) will not detect the inner call_rcu as relevant for its execution. Otherwise this barrier will most likely be inserted in the queue before the callback of the first call_rcu was executed. The caller of rcu_barrier will therefore continue to run before the inner call_rcu callback finished. Fixes: 23721387 ("batman-adv: add basic bridge loop avoidance code") Signed-off-by: Sven Eckelmann <sven@narfation.org> Acked-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ben Hutchings authored
[ Upstream commit 4ab42d78 ] Currently slhc_init() treats out-of-range values of rslots and tslots as equivalent to 0, except that if tslots is too large it will dereference a null pointer (CVE-2015-7799). Add a range-check at the top of the function and make it return an ERR_PTR() on error instead of NULL. Change the callers accordingly. Compile-tested only. Reported-by: 郭永刚 <guoyonggang@360.cn> References: http://article.gmane.org/gmane.comp.security.oss.general/17908 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ben Hutchings authored
[ Upstream commit 0baa57d8 ] Compile-tested only. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Raanan Avargil authored
[ Upstream commit 8695a144 ] I’m using the compilation flag -Werror=old-style-declaration, which requires that the “inline” word would come at the beginning of the code line. $ make drivers/net/ethernet/intel/e1000e/e1000e.ko ... include/net/inet_timewait_sock.h:116:1: error: ‘inline’ is not at beginning of declaration [-Werror=old-style-declaration] static void inline inet_twsk_schedule(struct inet_timewait_sock *tw, int timeo) include/net/inet_timewait_sock.h:121:1: error: ‘inline’ is not at beginning of declaration [-Werror=old-style-declaration] static void inline inet_twsk_reschedule(struct inet_timewait_sock *tw, int timeo) Fixes: ed2e9239 ("tcp/dccp: fix timewait races in timer handling") Signed-off-by: Raanan Avargil <raanan.avargil@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit ed2e9239 ] When creating a timewait socket, we need to arm the timer before allowing other cpus to find it. The signal allowing cpus to find the socket is setting tw_refcnt to non zero value. As we set tw_refcnt in __inet_twsk_hashdance(), we therefore need to call inet_twsk_schedule() first. This also means we need to remove tw_refcnt changes from inet_twsk_schedule() and let the caller handle it. Note that because we use mod_timer_pinned(), we have the guarantee the timer wont expire before we set tw_refcnt as we run in BH context. To make things more readable I introduced inet_twsk_reschedule() helper. When rearming the timer, we can use mod_timer_pending() to make sure we do not rearm a canceled timer. Note: This bug can possibly trigger if packets of a flow can hit multiple cpus. This does not normally happen, unless flow steering is broken somehow. This explains this bug was spotted ~5 months after its introduction. A similar fix is needed for SYN_RECV sockets in reqsk_queue_hash_req(), but will be provided in a separate patch for proper tracking. Fixes: 789f558c ("tcp/dccp: get rid of central timewait timer") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Ying Cai <ycai@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nikolay Aleksandrov authored
[ Upstream commit c6894dec ] After promisc mode management was introduced a bridge device could do dev_set_promiscuity from its ndo_change_rx_flags() callback which in turn can be called after the bridge's addr_list_lock has been taken (e.g. by dev_uc_add). This causes a false positive lockdep splat because the port interfaces' addr_list_lock is taken when br_manage_promisc() runs after the bridge's addr list lock was already taken. To remove the false positive introduce a custom bridge addr_list_lock class and set it on bridge init. A simple way to reproduce this is with the following: $ brctl addbr br0 $ ip l add l br0 br0.100 type vlan id 100 $ ip l set br0 up $ ip l set br0.100 up $ echo 1 > /sys/class/net/br0/bridge/vlan_filtering $ brctl addif br0 eth0 Splat: [ 43.684325] ============================================= [ 43.684485] [ INFO: possible recursive locking detected ] [ 43.684636] 4.4.0-rc8+ #54 Not tainted [ 43.684755] --------------------------------------------- [ 43.684906] brctl/1187 is trying to acquire lock: [ 43.685047] (_xmit_ETHER){+.....}, at: [<ffffffff8150169e>] dev_set_rx_mode+0x1e/0x40 [ 43.685460] but task is already holding lock: [ 43.685618] (_xmit_ETHER){+.....}, at: [<ffffffff815072a7>] dev_uc_add+0x27/0x80 [ 43.686015] other info that might help us debug this: [ 43.686316] Possible unsafe locking scenario: [ 43.686743] CPU0 [ 43.686967] ---- [ 43.687197] lock(_xmit_ETHER); [ 43.687544] lock(_xmit_ETHER); [ 43.687886] *** DEADLOCK *** [ 43.688438] May be due to missing lock nesting notation [ 43.688882] 2 locks held by brctl/1187: [ 43.689134] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff81510317>] rtnl_lock+0x17/0x20 [ 43.689852] #1: (_xmit_ETHER){+.....}, at: [<ffffffff815072a7>] dev_uc_add+0x27/0x80 [ 43.690575] stack backtrace: [ 43.690970] CPU: 0 PID: 1187 Comm: brctl Not tainted 4.4.0-rc8+ #54 [ 43.691270] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 [ 43.691770] ffffffff826a25c0 ffff8800369fb8e0 ffffffff81360ceb ffffffff826a25c0 [ 43.692425] ffff8800369fb9b8 ffffffff810d0466 ffff8800369fb968 ffffffff81537139 [ 43.693071] ffff88003a08c880 0000000000000000 00000000ffffffff 0000000002080020 [ 43.693709] Call Trace: [ 43.693931] [<ffffffff81360ceb>] dump_stack+0x4b/0x70 [ 43.694199] [<ffffffff810d0466>] __lock_acquire+0x1e46/0x1e90 [ 43.694483] [<ffffffff81537139>] ? netlink_broadcast_filtered+0x139/0x3e0 [ 43.694789] [<ffffffff8153b5da>] ? nlmsg_notify+0x5a/0xc0 [ 43.695064] [<ffffffff810d10f5>] lock_acquire+0xe5/0x1f0 [ 43.695340] [<ffffffff8150169e>] ? dev_set_rx_mode+0x1e/0x40 [ 43.695623] [<ffffffff815edea5>] _raw_spin_lock_bh+0x45/0x80 [ 43.695901] [<ffffffff8150169e>] ? dev_set_rx_mode+0x1e/0x40 [ 43.696180] [<ffffffff8150169e>] dev_set_rx_mode+0x1e/0x40 [ 43.696460] [<ffffffff8150189c>] dev_set_promiscuity+0x3c/0x50 [ 43.696750] [<ffffffffa0586845>] br_port_set_promisc+0x25/0x50 [bridge] [ 43.697052] [<ffffffffa05869aa>] br_manage_promisc+0x8a/0xe0 [bridge] [ 43.697348] [<ffffffffa05826ee>] br_dev_change_rx_flags+0x1e/0x20 [bridge] [ 43.697655] [<ffffffff81501532>] __dev_set_promiscuity+0x132/0x1f0 [ 43.697943] [<ffffffff81501672>] __dev_set_rx_mode+0x82/0x90 [ 43.698223] [<ffffffff815072de>] dev_uc_add+0x5e/0x80 [ 43.698498] [<ffffffffa05b3c62>] vlan_device_event+0x542/0x650 [8021q] [ 43.698798] [<ffffffff8109886d>] notifier_call_chain+0x5d/0x80 [ 43.699083] [<ffffffff810988b6>] raw_notifier_call_chain+0x16/0x20 [ 43.699374] [<ffffffff814f456e>] call_netdevice_notifiers_info+0x6e/0x80 [ 43.699678] [<ffffffff814f4596>] call_netdevice_notifiers+0x16/0x20 [ 43.699973] [<ffffffffa05872be>] br_add_if+0x47e/0x4c0 [bridge] [ 43.700259] [<ffffffffa058801e>] add_del_if+0x6e/0x80 [bridge] [ 43.700548] [<ffffffffa0588b5f>] br_dev_ioctl+0xaf/0xc0 [bridge] [ 43.700836] [<ffffffff8151a7ac>] dev_ifsioc+0x30c/0x3c0 [ 43.701106] [<ffffffff8151aac9>] dev_ioctl+0xf9/0x6f0 [ 43.701379] [<ffffffff81254345>] ? mntput_no_expire+0x5/0x450 [ 43.701665] [<ffffffff812543ee>] ? mntput_no_expire+0xae/0x450 [ 43.701947] [<ffffffff814d7b02>] sock_do_ioctl+0x42/0x50 [ 43.702219] [<ffffffff814d8175>] sock_ioctl+0x1e5/0x290 [ 43.702500] [<ffffffff81242d0b>] do_vfs_ioctl+0x2cb/0x5c0 [ 43.702771] [<ffffffff81243079>] SyS_ioctl+0x79/0x90 [ 43.703033] [<ffffffff815eebb6>] entry_SYSCALL_64_fastpath+0x16/0x7a CC: Vlad Yasevich <vyasevic@redhat.com> CC: Stephen Hemminger <stephen@networkplumber.org> CC: Bridge list <bridge@lists.linux-foundation.org> CC: Andy Gospodarek <gospo@cumulusnetworks.com> CC: Roopa Prabhu <roopa@cumulusnetworks.com> Fixes: 2796d0c6 ("bridge: Automatically manage port promiscuous mode.") Reported-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 34ae6a1a ] When a tunnel decapsulates the outer header, it has to comply with RFC 6080 and eventually propagate CE mark into inner header. It turns out IP6_ECN_set_ce() does not correctly update skb->csum for CHECKSUM_COMPLETE packets, triggering infamous "hw csum failure" messages and stack traces. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Rabin Vincent authored
[ Upstream commit 229394e8 ] On ARM64, a BUG() is triggered in the eBPF JIT if a filter with a constant shift that can't be encoded in the immediate field of the UBFM/SBFM instructions is passed to the JIT. Since these shifts amounts, which are negative or >= regsize, are invalid, reject them in the eBPF verifier and the classic BPF filter checker, for all architectures. Signed-off-by: Rabin Vincent <rabin@rab.in> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 7aaed57c ] Ivaylo Dimitrov reported a regression caused by commit 7866a621 ("dev: add per net_device packet type chains"). skb->dev becomes NULL and we crash in __netif_receive_skb_core(). Before above commit, different kind of bugs or corruptions could happen without major crash. But the root cause is that phonet_rcv() can queue skb without checking if skb is shared or not. Many thanks to Ivaylo Dimitrov for his help, diagnosis and tests. Reported-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> Tested-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Remi Denis-Courmont <courmisch@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Karl Heiss authored
[ Upstream commit 03d84a5f ] Commit 1f718f0f ("bonding: populate neighbour's private on enslave") undoes the fix provided by commit c2edacf8 ("bonding / ipv6: no addrconf for slaves separately from master") by effectively setting the slave flag after the slave has been opened. If the slave comes up quickly enough, it will go through the IPv6 addrconf before the slave flag has been set and will get a link local IPv6 address. In order to ensure that addrconf knows to ignore the slave devices on state change, set IFF_SLAVE before dev_open() during bonding enslavement. Fixes: 1f718f0f ("bonding: populate neighbour's private on enslave") Signed-off-by: Karl Heiss <kheiss@gmail.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Konstantin Khlebnikov authored
[ Upstream commit 9207f9d4 ] Skb_gso_segment() uses skb control block during segmentation. This patch adds 32-bytes room for previous control block which will be copied into all resulting segments. This patch fixes kernel crash during fragmenting forwarded packets. Fragmentation requires valid IP CB in skb for clearing ip options. Also patch removes custom save/restore in ovs code, now it's redundant. Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michal Kubeček authored
[ Upstream commit 40ba3302 ] Commit acf8dd0a ("udp: only allow UFO for packets from SOCK_DGRAM sockets") disallows UFO for packets sent from raw sockets. We need to do the same also for SOCK_DGRAM sockets with SO_NO_CHECK options, even if for a bit different reason: while such socket would override the CHECKSUM_PARTIAL set by ip_ufo_append_data(), gso_size is still set and bad offloading flags warning is triggered in __skb_gso_segment(). In the IPv6 case, SO_NO_CHECK option is ignored but we need to disallow UFO for packets sent by sockets with UDP_NO_CHECK6_TX option. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Tested-by: Shannon Nelson <shannon.nelson@intel.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Neal Cardwell authored
[ Upstream commit 83d15e70 ] For tcp_yeah, use an ssthresh floor of 2, the same floor used by Reno and CUBIC, per RFC 5681 (equation 4). tcp_yeah_ssthresh() was sometimes returning a 0 or negative ssthresh value if the intended reduction is as big or bigger than the current cwnd. Congestion control modules should never return a zero or negative ssthresh. A zero ssthresh generally results in a zero cwnd, causing the connection to stall. A negative ssthresh value will be interpreted as a u32 and will set a target cwnd for PRR near 4 billion. Oleksandr Natalenko reported that a system using tcp_yeah with ECN could see a warning about a prior_cwnd of 0 in tcp_cwnd_reduction(). Testing verified that this was due to tcp_yeah_ssthresh() misbehaving in this way. Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 3e4006f0 ] When first SYNACK is sent, we already hold rcu_read_lock(), but this is not true if a SYNACK is retransmitted, as a timer (soft) interrupt does not hold rcu_read_lock() Fixes: 45f6fad8 ("ipv6: add complete rcu protection around np->opt") Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sasha Levin authored
[ Upstream commit 320f1a4a ] proc_dostring() needs an initialized destination string, while the one provided in proc_sctp_do_hmac_alg() contains stack garbage. Thus, writing to cookie_hmac_alg would strlen() that garbage and end up accessing invalid memory. Fixes: 3c68198e ("sctp: Make hmac algorithm selection for cookie generation dynamic") Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nicolas Dichtel authored
[ Upstream commit 07b9b37c ] When a vxlan interface is created, the driver checks that there is not another vxlan interface with the same properties. To do this, it checks the existing vxlan udp socket. Since commit 1c51a915, the creation of the vxlan socket is done only when the interface is set up, thus it breaks that test. Example: $ ip l a vxlan10 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0 $ ip l a vxlan11 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0 $ ip -br l | grep vxlan vxlan10 DOWN f2:55:1c:6a:fb:00 <BROADCAST,MULTICAST> vxlan11 DOWN 7a:cb:b9:38:59:0d <BROADCAST,MULTICAST> Instead of checking sockets, let's loop over the vxlan iface list. Fixes: 1c51a915 ("vxlan: fix race caused by dropping rtnl_unlock") Reported-by: Thomas Faivre <thomas.faivre@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Francesco Ruggeri authored
[ Upstream commit 07a5d384 ] dst_release should not access dst->flags after decrementing __refcnt to 0. The dst_entry may be in dst_busy_list and dst_gc_task may dst_destroy it before dst_release gets a chance to access dst->flags. Fixes: d69bbf88 ("net: fix a race in dst_release()") Fixes: 27b75c95 ("net: avoid RCU for NOCACHE dst") Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-