Commits · 3895bcd96a9e8e1d3103e77f293dd83c7b52ba8e · Mirrors / github.com / raspberrypi_linux

Feb 17, 2021

x86/split_lock: Enable the split lock feature on another Alder Lake CPU · 3895bcd9

Fenghua Yu authored Feb 01, 2021

[ Upstream commit 8acf4178

 ]

Add Alder Lake mobile processor to CPU list to enumerate and enable the
split lock feature.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20210201190007.4031869-1-fenghua.yu@intel.com


Signed-off-by: Sasha Levin <sashal@kernel.org>

3895bcd9

scsi: lpfc: Fix EEH encountering oops with NVMe traffic · 020680e3

James Smart authored Jan 27, 2021

[ Upstream commit 8c65830a ]

In testing, in a configuration with Redfish and native NVMe multipath when
an EEH is injected, a kernel oops is being encountered:

(unreliable)
lpfc_nvme_ls_req+0x328/0x720 [lpfc]
__nvme_fc_send_ls_req.constprop.13+0x1d8/0x3d0 [nvme_fc]
nvme_fc_create_association+0x224/0xd10 [nvme_fc]
nvme_fc_reset_ctrl_work+0x110/0x154 [nvme_fc]
process_one_work+0x304/0x5d

the NBMe transport is issuing a Disconnect LS request, which the driver
receives and tries to post but the work queue used by the driver is already
being torn down by the eeh.

Fix by validating the validity of the work queue before proceeding with the
LS transmit.

Link: https://lore.kernel.org/r/20210127221601.84878-1-jsmart2021@gmail.com


Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

020680e3

ovl: skip getxattr of security labels · 116826d6

Amir Goldstein authored Dec 19, 2020

[ Upstream commit 03fedf93

 ]

When inode has no listxattr op of its own (e.g. squashfs) vfs_listxattr
calls the LSM inode_listsecurity hooks to list the xattrs that LSMs will
intercept in inode_getxattr hooks.

When selinux LSM is installed but not initialized, it will list the
security.selinux xattr in inode_listsecurity, but will not intercept it
in inode_getxattr.  This results in -ENODATA for a getxattr call for an
xattr returned by listxattr.

This situation was manifested as overlayfs failure to copy up lower
files from squashfs when selinux is built-in but not initialized,
because ovl_copy_xattr() iterates the lower inode xattrs by
vfs_listxattr() and vfs_getxattr().

ovl_copy_xattr() skips copy up of security labels that are indentified by
inode_copy_up_xattr LSM hooks, but it does that after vfs_getxattr().
Since we are not going to copy them, skip vfs_getxattr() of the security
labels.

Reported-by: Michael Labriola <michael.d.labriola@gmail.com>
Tested-by: Michael Labriola <michael.d.labriola@gmail.com>
Link: https://lore.kernel.org/linux-unionfs/2nv9d47zt7.fsf@aldarion.sourceruckus.org/


Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

116826d6

cap: fix conversions on getxattr · 02dee03d

Miklos Szeredi authored Jan 28, 2021

[ Upstream commit f2b00be4

 ]

If a capability is stored on disk in v2 format cap_inode_getsecurity() will
currently return in v2 format unconditionally.

This is wrong: v2 cap should be equivalent to a v3 cap with zero rootid,
and so the same conversions performed on it.

If the rootid cannot be mapped, v3 is returned unconverted.  Fix this so
that both v2 and v3 return -EOVERFLOW if the rootid (or the owner of the fs
user namespace in case of v2) cannot be mapped into the current user
namespace.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

02dee03d

ovl: perform vfs_getxattr() with mounter creds · cbb9404a

Miklos Szeredi authored Jan 28, 2021

[ Upstream commit 554677b9

 ]

The vfs_getxattr() in ovl_xattr_set() is used to check whether an xattr
exist on a lower layer file that is to be removed.  If the xattr does not
exist, then no need to copy up the file.

This call of vfs_getxattr() wasn't wrapped in credential override, and this
is probably okay.  But for consitency wrap this instance as well.

Reported-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

cbb9404a

arm64: dts: rockchip: Disable display for NanoPi R2S · f66fa5ec

Robin Murphy authored Jan 20, 2021

[ Upstream commit 74532de4

 ]

NanoPi R2S is headless, so rightly does not enable any of the display
interface hardware, which currently provokes an obnoxious error in the
boot log from the fake DRM device failing to find anything to bind to.
It probably isn't *too* hard to obviate the fake device shenanigans
entirely with a bit of driver reshuffling, but for now let's just
disable it here to shut up the spurious error.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/c4553dfad1ad6792c4f22454c135ff55de77e2d6.1611186099.git.robin.murphy@arm.com


Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>

f66fa5ec

platform/x86: hp-wmi: Disable tablet-mode reporting by default · 2a2e9114

Hans de Goede authored Jan 20, 2021

[ Upstream commit 67fbe02a ]

Recently userspace has started making more use of SW_TABLET_MODE
(when an input-dev reports this).

Specifically recent GNOME3 versions will:

1.  When SW_TABLET_MODE is reported and is reporting 0:
1.1 Disable accelerometer-based screen auto-rotation
1.2 Disable automatically showing the on-screen keyboard when a
    text-input field is focussed

2.  When SW_TABLET_MODE is reported and is reporting 1:
2.1 Ignore input-events from the builtin keyboard and touchpad
    (this is for 360° hinges style 2-in-1s where the keyboard and
     touchpads are accessible on the back of the tablet when folded
     into tablet-mode)

This means that claiming to support SW_TABLET_MODE when it does not
actually work / reports correct values has bad side-effects.

The check in the hp-wmi code which is used to decide if the input-dev
should claim SW_TABLET_MODE support, only checks if the
HPWMI_HARDWARE_QUERY is supported. It does *not* check if the hardware
actually is capable of reporting SW_TABLET_MODE.

This leads to the hp-wmi input-dev claiming SW_TABLET_MODE support,
while in reality it will always report 0 as SW_TABLET_MODE value.
This has been seen on a "HP ENVY x360 Convertible 15-cp0xxx" and
this likely is the case on a whole lot of other HP models.

This problem causes both auto-rotation and on-screen keyboard
support to not work on affected x360 models.

There is no easy fix for this, but since userspace expects
SW_TABLET_MODE reporting to be reliable when advertised it is
better to not claim/report SW_TABLET_MODE support at all, then
to claim to support it while it does not work.

To avoid the mentioned problems, add a new enable_tablet_mode_sw
module-parameter which defaults to false.

Note I've made this an int using the standard -1=auto, 0=off, 1=on
triplett, with the hope that in the future we can come up with a
better way to detect SW_TABLET_MODE support. ATM the default
auto option just does the same as off.

BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1918255


Cc: Stefan Brüns <stefan.bruens@rwth-aachen.de>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Mark Gross <mgross@linux.intel.com>
Link: https://lore.kernel.org/r/20210120124941.73409-1-hdegoede@redhat.com


Signed-off-by: Sasha Levin <sashal@kernel.org>

2a2e9114

arm64: dts: rockchip: remove interrupt-names property from rk3399 vdec node · d33b28e0

Johan Jonker authored Jan 17, 2021

[ Upstream commit 94a5400f

 ]

A test with the command below gives this error:
/arch/arm64/boot/dts/rockchip/rk3399-evb.dt.yaml: video-codec@ff660000:
'interrupt-names' does not match any of the regexes: 'pinctrl-[0-9]+'

The rkvdec driver gets it irq with help of the platform_get_irq()
function, so remove the interrupt-names property from the rk3399
vdec node.

make ARCH=arm64 dtbs_check
DT_SCHEMA_FILES=Documentation/devicetree/bindings/
media/rockchip,vdec.yaml

Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Link: https://lore.kernel.org/r/20210117181653.24886-1-jbx6244@gmail.com


Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>

d33b28e0

ARM: OMAP2+: Fix suspcious RCU usage splats for omap_enter_idle_coupled · 697091f9

Tony Lindgren authored Jan 10, 2021

[ Upstream commit 06862d78 ]

We get suspcious RCU usage splats with cpuidle in several places in
omap_enter_idle_coupled() with the kernel debug options enabled:

RCU used illegally from extended quiescent state!
...
(_raw_spin_lock_irqsave)
(omap_enter_idle_coupled+0x17c/0x2d8)
(omap_enter_idle_coupled)
(cpuidle_enter_state)
(cpuidle_enter_state_coupled)
(cpuidle_enter)

Let's use RCU_NONIDLE to suppress these splats. Things got changed around
with commit 1098582a ("sched,idle,rcu: Push rcu_idle deeper into the
idle path") that started triggering these warnings.

For the tick_broadcast related calls, ideally we'd just switch over to
using CPUIDLE_FLAG_TIMER_STOP for omap_enter_idle_coupled() to have the
generic cpuidle code handle the tick_broadcast related calls for us and
then just drop the tick_broadcast calls here.

But we're currently missing the call in the common cpuidle code for
tick_broadcast_enable() that CPU1 hotplug needs as described in earlier
commit 50d6b3cf

 ("ARM: OMAP2+: fix lack of timer interrupts on CPU1
after hotplug").

Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

697091f9

arm64: dts: qcom: sdm845: Reserve LPASS clocks in gcc · 6c152ac1

Bjorn Andersson authored Dec 21, 2020

[ Upstream commit 93f2a115

 ]

The GCC_LPASS_Q6_AXI_CLK and GCC_LPASS_SWAY_CLK clocks may not be
touched on a typical UEFI based SDM845 device, but when the kernel is
built with CONFIG_SDM_LPASSCC_845 this happens, unless they are marked
as protected-clocks in the DT.

This was done for the MTP and the Pocophone, but not for DB845c and the
Lenovo Yoga C630 - causing these to fail to boot if the LPASS clock
controller is enabled (which it typically isn't).

Tested-by: Vinod Koul <vkoul@kernel.org> #on db845c
Reviewed-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20201222001103.3112306-1-bjorn.andersson@linaro.org


Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

6c152ac1

arm64: dts: rockchip: Fix PCIe DT properties on rk3399 · 413a2353

Marc Zyngier authored Aug 15, 2020

[ Upstream commit 43f20b1c

 ]

It recently became apparent that the lack of a 'device_type = "pci"'
in the PCIe root complex node for rk3399 is a violation of the PCI
binding, as documented in IEEE Std 1275-1994. Changes to the kernel's
parsing of the DT made such violation fatal, as drivers cannot
probe the controller anymore.

Add the missing property makes the PCIe node compliant. While we
are at it, drop the pointless linux,pci-domain property, which only
makes sense when there are multiple host bridges.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200815125112.462652-3-maz@kernel.org


Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>

413a2353

soc: ti: omap-prm: Fix boot time errors for rst_map_012 bits 0 and 1 · 8e25e1ee

Tony Lindgren authored Dec 08, 2020

[ Upstream commit 7078a5ba

 ]

We have rst_map_012 used for various accelerators like dsp, ipu and iva.
For these use cases, we have rstctrl bit 2 control the subsystem module
reset, and have and bits 0 and 1 control the accelerator specific
features.

If the bootloader, or kexec boot, has left any accelerator specific
reset bits deasserted, deasserting bit 2 reset will potentially enable
an accelerator with unconfigured MMU and no firmware. And we may get
spammed with a lot by warnings on boot with "Data Access in User mode
during Functional access", or depending on the accelerator, the system
can also just hang.

This issue can be quite easily reproduced by setting a rst_map_012 type
rstctrl register to 0 or 4 in the bootloader, and booting the system.

Let's just assert all reset bits for rst_map_012 type resets. So far
it looks like the other rstctrl types don't need this. If it turns out
that the other type rstctrl bits also need reset on init, we need to
add an instance specific reset mask for the bits to avoid resetting
unwanted bits.

Reported-by: Carl Philipp Klemm <philipp@uvos.xyz>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Santosh Shilimkar <ssantosh@kernel.org>
Cc: Suman Anna <s-anna@ti.com>
Cc: Tero Kristo <t-kristo@ti.com>
Tested-by: Carl Philipp Klemm <philipp@uvos.xyz>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

8e25e1ee

tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha · 8c5864d2

Seth Forshee authored Feb 09, 2021

commit ad69c389 upstream.

As with s390, alpha is a 64-bit architecture with a 32-bit ino_t.  With
CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers and
display "inode64" in the mount options, whereas passing "inode64" in the
mount options will fail.  This leads to erroneous behaviours such as
this:

  # mkdir mnt
  # mount -t tmpfs nodev mnt
  # mount -o remount,rw mnt
  mount: /home/ubuntu/mnt: mount point not mounted or bad option.

Prevent CONFIG_TMPFS_INODE64 from being selected on alpha.

Link: https://lkml.kernel.org/r/20210208215726.608197-1-seth.forshee@canonical.com
Fixes: ea3271f7

 ("tmpfs: support 64-bit inums per-sb")
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Chris Down <chris@chrisdown.name>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: <stable@vger.kernel.org>	[5.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

8c5864d2

tmpfs: disallow CONFIG_TMPFS_INODE64 on s390 · b03a0d5c

Seth Forshee authored Feb 09, 2021

commit b85a7a8b upstream.

Currently there is an assumption in tmpfs that 64-bit architectures also
have a 64-bit ino_t.  This is not true on s390 which has a 32-bit ino_t.
With CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers
and display "inode64" in the mount options, but passing the "inode64"
mount option will fail.  This leads to the following behavior:

  # mkdir mnt
  # mount -t tmpfs nodev mnt
  # mount -o remount,rw mnt
  mount: /home/ubuntu/mnt: mount point not mounted or bad option.

As mount sees "inode64" in the mount options and thus passes it in the
options for the remount.

So prevent CONFIG_TMPFS_INODE64 from being selected on s390.

Link: https://lkml.kernel.org/r/20210205230620.518245-1-seth.forshee@canonical.com
Fixes: ea3271f7

 ("tmpfs: support 64-bit inums per-sb")
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Chris Down <chris@chrisdown.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: <stable@vger.kernel.org>	[5.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

b03a0d5c

dmaengine: move channel device_node deletion to driver · 285b5759

Dave Jiang authored Jan 18, 2021

commit e5944431 upstream.

Channel device_node deletion is managed by the device driver rather than
the dmaengine core. The deletion was accidentally introduced when making
channel unregister dynamic. It causes xilinx_dma module to crash on unload
as reported by Radhey. Remove chan->device_node delete in dmaengine and
also fix up idxd driver.

[   42.142705] Internal error: Oops: 96000044 [#1] SMP
[   42.147566] Modules linked in: xilinx_dma(-) clk_xlnx_clock_wizard uio_pdrv_genirq
[   42.155139] CPU: 1 PID: 2075 Comm: rmmod Not tainted 5.10.1-00026-g3a2e6dd7a05-dirty #192
[   42.163302] Hardware name: Enclustra XU5 SOM (DT)
[   42.167992] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
[   42.173996] pc : xilinx_dma_chan_remove+0x74/0xa0 [xilinx_dma]
[   42.179815] lr : xilinx_dma_chan_remove+0x70/0xa0 [xilinx_dma]
[   42.185636] sp : ffffffc01112bca0
[   42.188935] x29: ffffffc01112bca0 x28: ffffff80402ea640

xilinx_dma_chan_remove+0x74/0xa0:
__list_del at ./include/linux/list.h:112 (inlined by)
__list_del_entry at./include/linux/list.h:135 (inlined by)
list_del at ./include/linux/list.h:146 (inlined by)
xilinx_dma_chan_remove at drivers/dma/xilinx/xilinx_dma.c:2546

Fixes: e81274cd

 ("dmaengine: add support to dynamic register/unregister of channels")
Reported-by: Radhey Shyam Pandey <radheys@xilinx.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Link: https://lore.kernel.org/r/161099092469.2495902.5064826526660062342.stgit@djiang5-desk3.ch.intel.com


Signed-off-by: Vinod Koul <vkoul@kernel.org>
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

285b5759

drm/dp_mst: Don't report ports connected if nothing is attached to them · deae1e63

Imre Deak authored Feb 01, 2021

commit 873e5bb9 upstream.

Reporting a port as connected if nothing is attached to them leads to
any i2c transactions on this port trying to use an uninitialized i2c
adapter, fix this.

Let's account for this case even if branch devices have no good reason
to report a port as plugged with their peer device type set to 'none'.

Fixes: db1a0795 ("drm/dp_mst: Handle SST-only branch device case")
References: https://gitlab.freedesktop.org/drm/intel/-/issues/2987
References: https://gitlab.freedesktop.org/drm/intel/-/issues/1963


Cc: Wayne Lin <Wayne.Lin@amd.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: <stable@vger.kernel.org> # v5.5+
Cc: intel-gfx@lists.freedesktop.org
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Reported-by: Thiago Macieira <gitlab@gitlab.freedesktop.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20210201120145.350258-1-imre.deak@intel.com


Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

deae1e63

drm/i915/tgl+: Make sure TypeC FIA is powered up when initializing it · 5a36371f

Imre Deak authored Feb 08, 2021

commit 2f51312b

 upstream.

The TypeC FIA can be powered down if the TC-COLD power state is allowed,
so block the TC-COLD state when initializing the FIA.

Note that this isn't needed on ICL where the FIA is never modular and
which has no generic way to block TC-COLD (except for platforms with a
legacy TypeC port and on those too only via these legacy ports, not via
a DP-alt/TBT port).

Cc: <stable@vger.kernel.org> # v5.10+
Cc: José Roberto de Souza <jose.souza@intel.com>
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3027


Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210208154303.6839-1-imre.deak@intel.com


Reviewed-by: Jos� Roberto de Souza <jose.souza@intel.com>
(cherry picked from commit f48993e5

)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5a36371f

Revert "drm/amd/display: Update NV1x SR latency values" · e11345ed

Alex Deucher authored Feb 03, 2021

commit cf050f96 upstream.

This reverts commit 4a3dea89.

This causes blank screens for some users.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1482


Cc: Alvin Lee <alvin.lee2@amd.com>
Cc: Jun Lei <Jun.Lei@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e11345ed

cgroup: fix psi monitor for root cgroup · e72a6580

Odin Ugedal authored Jan 16, 2021

commit 385aac15 upstream.

Fix NULL pointer dereference when adding new psi monitor to the root
cgroup. PSI files for root cgroup was introduced in df5ba5be

 by using
system wide psi struct when reading, but file write/monitor was not
properly fixed. Since the PSI config for the root cgroup isn't
initialized, the current implementation tries to lock a NULL ptr,
resulting in a crash.

Can be triggered by running this as root:
$ tee /sys/fs/cgroup/cpu.pressure <<< "some 10000 1000000"

Signed-off-by: Odin Ugedal <odin@uged.al>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Dan Schatzberg <dschatzberg@fb.com>
Fixes: df5ba5be

 ("kernel/sched/psi.c: expose pressure metrics on root cgroup")
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: stable@vger.kernel.org # 5.2+
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e72a6580

arm/xen: Don't probe xenbus as part of an early initcall · 89b0c20d

Julien Grall authored Feb 10, 2021

commit c4295ab0 upstream.

After Commit 3499ba81 ("xen: Fix event channel callback via
INTX/GSI"), xenbus_probe() will be called too early on Arm. This will
recent to a guest hang during boot.

If the hang wasn't there, we would have ended up to call
xenbus_probe() twice (the second time is in xenbus_probe_initcall()).

We don't need to initialize xenbus_probe() early for Arm guest.
Therefore, the call in xen_guest_init() is now removed.

After this change, there is no more external caller for xenbus_probe().
So the function is turned to a static one. Interestingly there were two
prototypes for it.

Cc: stable@vger.kernel.org
Fixes: 3499ba81

 ("xen: Fix event channel callback via INTX/GSI")
Reported-by: Ian Jackson <iwj@xenproject.org>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Link: https://lore.kernel.org/r/20210210170654.5377-1-julien@xen.org


Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

89b0c20d

drm/i915: Fix overlay frontbuffer tracking · bef1f148

Ville Syrjälä authored Feb 09, 2021

commit 5feba0e9 upstream.

We don't have a persistent fb holding a reference to the frontbuffer
object, so every time we do the get+put we throw the frontbuffer object
immediately away. And so the next time around we get a pristine
frontbuffer object with bits==0 even for the old vma. This confuses
the frontbuffer tracking code which understandably expects the old
frontbuffer to have the overlay's bit set.

Fix this by hanging on to the frontbuffer reference until the next
flip. And just to make this a bit more clear let's track the frontbuffer
explicitly instead of just grabbing it via the old vma.

Cc: stable@vger.kernel.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1136


Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210209021918.16234-2-ville.syrjala@linux.intel.com
Fixes: 8e7cb179

 ("drm/i915: Extract intel_frontbuffer active tracking")
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit 553c23bd

)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bef1f148

tracing: Check length before giving out the filter buffer · 7c93d8cf

Steven Rostedt (VMware) authored Feb 10, 2021

commit b220c049 upstream.

When filters are used by trace events, a page is allocated on each CPU and
used to copy the trace event fields to this page before writing to the ring
buffer. The reason to use the filter and not write directly into the ring
buffer is because a filter may discard the event and there's more overhead
on discarding from the ring buffer than the extra copy.

The problem here is that there is no check against the size being allocated
when using this page. If an event asks for more than a page size while being
filtered, it will get only a page, leading to the caller writing more that
what was allocated.

Check the length of the request, and if it is more than PAGE_SIZE minus the
header default back to allocating from the ring buffer directly. The ring
buffer may reject the event if its too big anyway, but it wont overflow.

Link: https://lore.kernel.org/ath10k/1612839593-2308-1-git-send-email-wgong@codeaurora.org/

Cc: stable@vger.kernel.org
Fixes: 0fc1b09f

 ("tracing: Use temp buffer when filtering events")
Reported-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

7c93d8cf

tracing: Do not count ftrace events in top level enable output · a38c1ee1

Steven Rostedt (VMware) authored Feb 05, 2021

commit 256cfdd6 upstream.

The file /sys/kernel/tracing/events/enable is used to enable all events by
echoing in "1", or disabling all events when echoing in "0". To know if all
events are enabled, disabled, or some are enabled but not all of them,
cating the file should show either "1" (all enabled), "0" (all disabled), or
"X" (some enabled but not all of them). This works the same as the "enable"
files in the individule system directories (like tracing/events/sched/enable).

But when all events are enabled, the top level "enable" file shows "X". The
reason is that its checking the "ftrace" events, which are special events
that only exist for their format files. These include the format for the
function tracer events, that are enabled when the function tracer is
enabled, but not by the "enable" file. The check includes these events,
which will always be disabled, and even though all true events are enabled,
the top level "enable" file will show "X" instead of "1".

To fix this, have the check test the event's flags to see if it has the
"IGNORE_ENABLE" flag set, and if so, not test it.

Cc: stable@vger.kernel.org
Fixes: 553552ce

("tracing: Combine event filter_active and enable into single flags field")
Reported-by: "Yordan Karadzhov (VMware)" <y.karadz@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a38c1ee1

gpio: ep93xx: Fix single irqchip with multi gpiochips · d9b7ea4c

Nikita Shubin authored Feb 09, 2021

commit 28dc10eb upstream.

Fixes the following warnings which results in interrupts disabled on
port B/F:

gpio gpiochip1: (B): detected irqchip that is shared with multiple gpiochips: please fix the driver.
gpio gpiochip5: (F): detected irqchip that is shared with multiple gpiochips: please fix the driver.

- added separate irqchip for each interrupt capable gpiochip
- provided unique names for each irqchip

Fixes: d2b09196

 ("gpio: ep93xx: Pass irqchip when adding gpiochip")
Cc: <stable@vger.kernel.org>
Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me>
Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

d9b7ea4c

gpio: ep93xx: fix BUG_ON port F usage · 10538b86

Nikita Shubin authored Feb 09, 2021

commit 8b81a7ab upstream.

Two index spaces and ep93xx_gpio_port are confusing.

Instead add a separate struct to store necessary data and remove
ep93xx_gpio_port.

- add struct to store IRQ related data for each IRQ capable chip
- replace offset array with defined offsets
- add IRQ registers offset for each IRQ capable chip into
  ep93xx_gpio_banks

------------[ cut here ]------------
kernel BUG at drivers/gpio/gpio-ep93xx.c:64!
---[ end trace 3f6544e133e9f5ae ]---

Fixes: fd935fc4

 ("gpio: ep93xx: Do not pingpong irq numbers")
Cc: <stable@vger.kernel.org>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

10538b86

gpio: mxs: GPIO_MXS should not default to y unconditionally · e072d454

Geert Uytterhoeven authored Feb 08, 2021

commit 97c6e28d upstream.

Merely enabling CONFIG_COMPILE_TEST should not enable additional code.
To fix this, restrict the automatic enabling of GPIO_MXS to ARCH_MXS,
and ask the user in case of compile-testing.

Fixes: 6876ca31

 ("gpio: mxs: add COMPILE_TEST support for GPIO_MXS")
Cc: <stable@vger.kernel.org>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e072d454

Revert "dts: phy: add GPIO number and active state used for phy reset" · 039e0f62

Palmer Dabbelt authored Feb 04, 2021

commit 3da3cc1b upstream.

VSC8541 phys need a special reset sequence, which the driver doesn't
currentlny support.  As a result enabling the reset via GPIO essentially
guarnteees that the device won't work correctly.  We've been relying on
bootloaders to reset the device for years, with this revert we'll go
back to doing so until we can sort out how to get the reset sequence
into the kernel.

This reverts commit a0fa9d72.

Fixes: a0fa9d72

 ("dts: phy: add GPIO number and active state used for phy reset")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

039e0f62

objtool: Fix seg fault with Clang non-section symbols · 2b02985b

Josh Poimboeuf authored Dec 14, 2020

commit 44f6a7c0 upstream.

The Clang assembler likes to strip section symbols, which means objtool
can't reference some text code by its section.  This confuses objtool
greatly, causing it to seg fault.

The fix is similar to what was done before, for ORC reloc generation:

  e81e0724

 ("objtool: Support Clang non-section symbols in ORC generation")

Factor out that code into a common helper and use it for static call
reloc generation as well.

Reported-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://github.com/ClangBuiltLinux/linux/issues/1207
Link: https://lkml.kernel.org/r/ba6b6c0f0dd5acbba66e403955a967d9fdd1726a.1607983452.git.jpoimboe@redhat.com


Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2b02985b

Feb 13, 2021

Linux 5.10.16 · de53befa

Greg Kroah-Hartman authored Feb 13, 2021



Tested-by: Jason Self <jason@bluehome.net>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Pavel Machek (CIP) <pavel@denx.de>
Tested-by: Ross Schmidt <ross.schm.dev@gmail.com>
Link: https://lore.kernel.org/r/20210211150152.885701259@linuxfoundation.org


Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

de53befa

squashfs: add more sanity checks in xattr id lookup · bddcce15

Phillip Lougher authored Feb 09, 2021

commit 506220d2 upstream.

Sysbot has reported a warning where a kmalloc() attempt exceeds the
maximum limit.  This has been identified as corruption of the xattr_ids
count when reading the xattr id lookup table.

This patch adds a number of additional sanity checks to detect this
corruption and others.

1. It checks for a corrupted xattr index read from the inode.  This could
   be because the metadata block is uncompressed, or because the
   "compression" bit has been corrupted (turning a compressed block
   into an uncompressed block).  This would cause an out of bounds read.

2. It checks against corruption of the xattr_ids count.  This can either
   lead to the above kmalloc failure, or a smaller than expected
   table to be read.

3. It checks the contents of the index table for corruption.

[phillip@squashfs.org.uk: fix checkpatch issue]
  Link: https://lkml.kernel.org/r/270245655.754655.1612770082682@webmail.123-reg.co.uk

Link: https://lkml.kernel.org/r/20210204130249.4495-5-phillip@squashfs.org.uk


Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by:  <syzbot+2ccea6339d368360800d@syzkaller.appspotmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bddcce15

squashfs: add more sanity checks in inode lookup · 5e22b39b

Phillip Lougher authored Feb 09, 2021

commit eabac19e upstream.

Sysbot has reported an "slab-out-of-bounds read" error which has been
identified as being caused by a corrupted "ino_num" value read from the
inode.  This could be because the metadata block is uncompressed, or
because the "compression" bit has been corrupted (turning a compressed
block into an uncompressed block).

This patch adds additional sanity checks to detect this, and the
following corruption.

1. It checks against corruption of the inodes count.  This can either
   lead to a larger table to be read, or a smaller than expected
   table to be read.

   In the case of a too large inodes count, this would often have been
   trapped by the existing sanity checks, but this patch introduces
   a more exact check, which can identify too small values.

2. It checks the contents of the index table for corruption.

[phillip@squashfs.org.uk: fix checkpatch issue]
  Link: https://lkml.kernel.org/r/527909353.754618.1612769948607@webmail.123-reg.co.uk

Link: https://lkml.kernel.org/r/20210204130249.4495-4-phillip@squashfs.org.uk


Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by:  <syzbot+04419e3ff19d2970ea28@syzkaller.appspotmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5e22b39b

squashfs: add more sanity checks in id lookup · 6634147f

Phillip Lougher authored Feb 09, 2021

commit f37aa4c7 upstream.

Sysbot has reported a number of "slab-out-of-bounds reads" and
"use-after-free read" errors which has been identified as being caused
by a corrupted index value read from the inode.  This could be because
the metadata block is uncompressed, or because the "compression" bit has
been corrupted (turning a compressed block into an uncompressed block).

This patch adds additional sanity checks to detect this, and the
following corruption.

1. It checks against corruption of the ids count.  This can either
   lead to a larger table to be read, or a smaller than expected
   table to be read.

   In the case of a too large ids count, this would often have been
   trapped by the existing sanity checks, but this patch introduces
   a more exact check, which can identify too small values.

2. It checks the contents of the index table for corruption.

Link: https://lkml.kernel.org/r/20210204130249.4495-3-phillip@squashfs.org.uk


Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by:  <syzbot+b06d57ba83f604522af2@syzkaller.appspotmail.com>
Reported-by:  <syzbot+c021ba012da41ee9807c@syzkaller.appspotmail.com>
Reported-by:  <syzbot+5024636e8b5fd19f0f19@syzkaller.appspotmail.com>
Reported-by:  <syzbot+bcbc661df46657d0fa4f@syzkaller.appspotmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6634147f

squashfs: avoid out of bounds writes in decompressors · ff3a75bd

Phillip Lougher authored Feb 09, 2021

commit e812cbbb upstream.

Patch series "Squashfs: fix BIO migration regression and add sanity checks".

Patch [1/4] fixes a regression introduced by the "migrate from
ll_rw_block usage to BIO" patch, which has produced a number of
Sysbot/Syzkaller reports.

Patches [2/4], [3/4], and [4/4] fix a number of filesystem corruption
issues which have produced Sysbot reports in the id, inode and xattr
lookup code.

Each patch has been tested against the Sysbot reproducers using the
given kernel configuration.  They have the appropriate "Reported-by:"
lines added.

Additionally, all of the reproducer filesystems are indirectly fixed by
patch [4/4] due to the fact they all have xattr corruption which is now
detected there.

Additional testing with other configurations and architectures (32bit,
big endian), and normal filesystems has also been done to trap any
inadvertent regressions caused by the additional sanity checks.

This patch (of 4):

This is a regression introduced by the patch "migrate from ll_rw_block
usage to BIO".

Sysbot/Syskaller has reported a number of "out of bounds writes" and
"unable to handle kernel paging request in squashfs_decompress" errors
which have been identified as a regression introduced by the above
patch.

Specifically, the patch removed the following sanity check

        if (length < 0 || length > output->length ||
		(index + length) > msblk->bytes_used)

This check did two things:

1. It ensured any reads were not beyond the end of the filesystem

2. It ensured that the "length" field read from the filesystem
   was within the expected maximum length.  Without this any
   corrupted values can over-run allocated buffers.

Link: https://lkml.kernel.org/r/20210204130249.4495-1-phillip@squashfs.org.uk
Link: https://lkml.kernel.org/r/20210204130249.4495-2-phillip@squashfs.org.uk
Fixes: 93e72b3c

 ("squashfs: migrate from ll_rw_block usage to BIO")
Reported-by:  <syzbot+6fba78f99b9afd4b5634@syzkaller.appspotmail.com>
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Philippe Liard <pliard@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ff3a75bd

Revert "mm: memcontrol: avoid workload stalls when lowering memory.high" · dd0a41bc

Johannes Weiner authored Feb 09, 2021

commit e82553c1 upstream.

This reverts commit 536d3bf2, as it can
cause writers to memory.high to get stuck in the kernel forever,
performing page reclaim and consuming excessive amounts of CPU cycles.

Before the patch, a write to memory.high would first put the new limit
in place for the workload, and then reclaim the requested delta.  After
the patch, the kernel tries to reclaim the delta before putting the new
limit into place, in order to not overwhelm the workload with a sudden,
large excess over the limit.  However, if reclaim is actively racing
with new allocations from the uncurbed workload, it can keep the write()
working inside the kernel indefinitely.

This is causing problems in Facebook production.  A privileged
system-level daemon that adjusts memory.high for various workloads
running on a host can get unexpectedly stuck in the kernel and
essentially turn into a sort of involuntary kswapd for one of the
workloads.  We've observed that daemon busy-spin in a write() for
minutes at a time, neglecting its other duties on the system, and
expending privileged system resources on behalf of a workload.

To remedy this, we have first considered changing the reclaim logic to
break out after a couple of loops - whether the workload has converged
to the new limit or not - and bound the write() call this way.  However,
the root cause that inspired the sequence change in the first place has
been fixed through other means, and so a revert back to the proven
limit-setting sequence, also used by memory.max, is preferable.

The sequence was changed to avoid extreme latencies in the workload when
the limit was lowered: the sudden, large excess created by the limit
lowering would erroneously trigger the penalty sleeping code that is
meant to throttle excessive growth from below.  Allocating threads could
end up sleeping long after the write() had already reclaimed the delta
for which they were being punished.

However, erroneous throttling also caused problems in other scenarios at
around the same time.  This resulted in commit b3ff9291 ("mm, memcg:
reclaim more aggressively before high allocator throttling"), included
in the same release as the offending commit.  When allocating threads
now encounter large excess caused by a racing write() to memory.high,
instead of entering punitive sleeps, they will simply be tasked with
helping reclaim down the excess, and will be held no longer than it
takes to accomplish that.  This is in line with regular limit
enforcement - i.e.  if the workload allocates up against or over an
otherwise unchanged limit from below.

With the patch breaking userspace, and the root cause addressed by other
means already, revert it again.

Link: https://lkml.kernel.org/r/20210122184341.292461-1-hannes@cmpxchg.org
Fixes: 536d3bf2

 ("mm: memcontrol: avoid workload stalls when lowering memory.high")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Tejun Heo <tj@kernel.org>
Acked-by: Chris Down <chris@chrisdown.name>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: <stable@vger.kernel.org>	[5.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dd0a41bc

nilfs2: make splice write available again · 237ee288

Joachim Henke authored Feb 09, 2021

commit a35d8f01 upstream.

Since 5.10, splice() or sendfile() to NILFS2 return EINVAL.  This was
caused by commit 36e2c742 ("fs: don't allow splice read/write
without explicit ops").

This patch initializes the splice_write field in file_operations, like
most file systems do, to restore the functionality.

Link: https://lkml.kernel.org/r/1612784101-14353-1-git-send-email-konishi.ryusuke@gmail.com


Signed-off-by: Joachim Henke <joachim.henke@t-systems.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>	[5.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

237ee288

drm/i915: Skip vswing programming for TBT · 4e78c338

Ville Syrjälä authored Jan 28, 2021

commit eaf5bfe3

 upstream.

In thunderbolt mode the PHY is owned by the thunderbolt controller.
We are not supposed to touch it. So skip the vswing programming
as well (we already skipped the other steps not applicable to TBT).

Touching this stuff could supposedly interfere with the PHY
programming done by the thunderbolt controller.

Cc: stable@vger.kernel.org
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210128155948.13678-1-ville.syrjala@linux.intel.com


Reviewed-by: Imre Deak <imre.deak@intel.com>
(cherry picked from commit f8c6b615

)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

4e78c338

drm/i915: Fix ICL MG PHY vswing handling · 43f39b85

Ville Syrjälä authored Dec 07, 2020

commit a2a5f562 upstream.

The MH PHY vswing table does have all the entries these days. Get
rid of the old hacks in the code which claim otherwise.

This hack was totally bogus anyway. The correct way to handle the
lack of those two entries would have been to declare our max
vswing and pre-emph to both be level 2.

Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Clinton Taylor <clinton.a.taylor@intel.com>
Fixes: 9f7ffa29

 ("drm/i915/tc/icl: Update TC vswing tables")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201207203512.1718-1-ville.syrjala@linux.intel.com


Reviewed-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
(cherry picked from commit 5ec34647

)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

43f39b85

bpf: Fix verifier jsgt branch analysis on max bound · 67afdc7d

Daniel Borkmann authored Feb 05, 2021

commit ee114dd6 upstream.

Fix incorrect is_branch{32,64}_taken() analysis for the jsgt case. The return
code for both will tell the caller whether a given conditional jump is taken
or not, e.g. 1 means branch will be taken [for the involved registers] and the
goto target will be executed, 0 means branch will not be taken and instead we
fall-through to the next insn, and last but not least a -1 denotes that it is
not known at verification time whether a branch will be taken or not. Now while
the jsgt has the branch-taken case correct with reg->s32_min_value > sval, the
branch-not-taken case is off-by-one when testing for reg->s32_max_value < sval
since the branch will also be taken for reg->s32_max_value == sval. The jgt
branch analysis, for example, gets this right.

Fixes: 3f50f132 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Fixes: 4f7b3e82

 ("bpf: improve verifier branch analysis")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

67afdc7d

bpf: Fix 32 bit src register truncation on div/mod · 1d16cc21

Daniel Borkmann authored Feb 09, 2021

commit e88b2c6e upstream.

While reviewing a different fix, John and I noticed an oddity in one of the
BPF program dumps that stood out, for example:

  # bpftool p d x i 13
   0: (b7) r0 = 808464450
   1: (b4) w4 = 808464432
   2: (bc) w0 = w0
   3: (15) if r0 == 0x0 goto pc+1
   4: (9c) w4 %= w0
  [...]

In line 2 we noticed that the mov32 would 32 bit truncate the original src
register for the div/mod operation. While for the two operations the dst
register is typically marked unknown e.g. from adjust_scalar_min_max_vals()
the src register is not, and thus verifier keeps tracking original bounds,
simplified:

  0: R1=ctx(id=0,off=0,imm=0) R10=fp0
  0: (b7) r0 = -1
  1: R0_w=invP-1 R1=ctx(id=0,off=0,imm=0) R10=fp0
  1: (b7) r1 = -1
  2: R0_w=invP-1 R1_w=invP-1 R10=fp0
  2: (3c) w0 /= w1
  3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP-1 R10=fp0
  3: (77) r1 >>= 32
  4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP4294967295 R10=fp0
  4: (bf) r0 = r1
  5: R0_w=invP4294967295 R1_w=invP4294967295 R10=fp0
  5: (95) exit
  processed 6 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

Runtime result of r0 at exit is 0 instead of expected -1. Remove the
verifier mov32 src rewrite in div/mod and replace it with a jmp32 test
instead. After the fix, we result in the following code generation when
having dividend r1 and divisor r6:

  div, 64 bit:                             div, 32 bit:

   0: (b7) r6 = 8                           0: (b7) r6 = 8
   1: (b7) r1 = 8                           1: (b7) r1 = 8
   2: (55) if r6 != 0x0 goto pc+2           2: (56) if w6 != 0x0 goto pc+2
   3: (ac) w1 ^= w1                         3: (ac) w1 ^= w1
   4: (05) goto pc+1                        4: (05) goto pc+1
   5: (3f) r1 /= r6                         5: (3c) w1 /= w6
   6: (b7) r0 = 0                           6: (b7) r0 = 0
   7: (95) exit                             7: (95) exit

  mod, 64 bit:                             mod, 32 bit:

   0: (b7) r6 = 8                           0: (b7) r6 = 8
   1: (b7) r1 = 8                           1: (b7) r1 = 8
   2: (15) if r6 == 0x0 goto pc+1           2: (16) if w6 == 0x0 goto pc+1
   3: (9f) r1 %= r6                         3: (9c) w1 %= w6
   4: (b7) r0 = 0                           4: (b7) r0 = 0
   5: (95) exit                             5: (95) exit

x86 in particular can throw a 'divide error' exception for div
instruction not only for divisor being zero, but also for the case
when the quotient is too large for the designated register. For the
edx:eax and rdx:rax dividend pair it is not an issue in x86 BPF JIT
since we always zero edx (rdx). Hence really the only protection
needed is against divisor being zero.

Fixes: 68fda450

 ("bpf: fix 32-bit divide by zero")
Co-developed-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

1d16cc21

bpf: Fix verifier jmp32 pruning decision logic · 569033c0

Daniel Borkmann authored Feb 05, 2021

commit fd675184 upstream.

Anatoly has been fuzzing with kBdysch harness and reported a hang in
one of the outcomes:

func#0 @0
0: R1=ctx(id=0,off=0,imm=0) R10=fp0
0: (b7) r0 = 808464450
1: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R10=fp0
1: (b4) w4 = 808464432
2: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP808464432 R10=fp0
2: (9c) w4 %= w0
3: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
3: (66) if w4 s> 0x30303030 goto pc+0
R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0
4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0
4: (7f) r0 >>= r0
5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0
5: (9c) w4 %= w0
6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
6: (66) if w0 s> 0x3030 goto pc+0
R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
7: (d6) if w0 s<= 0x303030 goto pc+1
9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
9: (95) exit
propagating r0

from 6 to 7: safe
4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0
4: (7f) r0 >>= r0
5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0
5: (9c) w4 %= w0
6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
6: (66) if w0 s> 0x3030 goto pc+0
R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
propagating r0
7: safe
propagating r0

from 6 to 7: safe
processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1

The underlying program was xlated as follows:

# bpftool p d x i 10
0: (b7) r0 = 808464450
1: (b4) w4 = 808464432
2: (bc) w0 = w0
3: (15) if r0 == 0x0 goto pc+1
4: (9c) w4 %= w0
5: (66) if w4 s> 0x30303030 goto pc+0
6: (7f) r0 >>= r0
7: (bc) w0 = w0
8: (15) if r0 == 0x0 goto pc+1
9: (9c) w4 %= w0
10: (66) if w0 s> 0x3030 goto pc+0
11: (d6) if w0 s<= 0x303030 goto pc+1
12: (05) goto pc-1
13: (95) exit

The verifier rewrote original instructions it recognized as dead code with
'goto pc-1', but reality differs from verifier simulation in that we are
actually able to trigger a hang due to hitting the 'goto pc-1' instructions.

Taking a closer look at the verifier analysis, the reason is that it misjudges
its pruning decision at the first 'from 6 to 7: safe' occasion. What happens
is that while both old/cur registers are marked as precise, they get misjudged
for the jmp32 case as range_within() yields true, meaning that the prior
verification path with a wider register bound could be verified successfully
and therefore the current path with a narrower register bound is deemed safe
as well whereas in reality it's not. R0 old/cur path's bounds compare as
follows:

old: smin_value=0x8000000000000000,smax_value=0x7fffffffffffffff,umin_value=0x0,umax_value=0xffffffffffffffff,var_off=(0x0; 0xffffffffffffffff)
cur: smin_value=0x8000000000000000,smax_value=0x7fffffff7fffffff,umin_value=0x0,umax_value=0xffffffff7fffffff,var_off=(0x0; 0xffffffff7fffffff)

old: s32_min_value=0x80000000,s32_max_value=0x00003030,u32_min_value=0x00000000,u32_max_value=0xffffffff
cur: s32_min_value=0x00003031,s32_max_value=0x7fffffff,u32_min_value=0x00003031,u32_max_value=0x7fffffff

The 64 bit bounds generally look okay and while the information that got
propagated from 32 to 64 bit looks correct as well, it's not precise enough
for judging a conditional jmp32. Given the latter only operates on subregisters
we also need to take these into account as well for a range_within() probe
in order to be able to prune paths. Extending the range_within() constraint
to both bounds will be able to tell us that the old signed 32 bit bounds are
not wider than the cur signed 32 bit bounds.

With the fix in place, the program will now verify the 'goto' branch case as
it should have been:

[...]
6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
6: (66) if w0 s> 0x3030 goto pc+0
R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
7: (d6) if w0 s<= 0x303030 goto pc+1
9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
9: (95) exit

7: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=12337,u32_min_value=12337,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
7: (d6) if w0 s<= 0x303030 goto pc+1
R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
8: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
8: (30) r0 = *(u8 *)skb[808464432]
BPF_LD_[ABS|IND] uses reserved fields
processed 11 insns (limit 1000000) max_states_per_insn 1 total_states 1 peak_states 1 mark_read 1

The bug is quite subtle in the sense that when verifier would determine that
a given branch is dead code, it would (here: wrongly) remove these instructions
from the program and hard-wire the taken branch for privileged programs instead
of the 'goto pc-1' rewrites which will cause hard to debug problems.

Fixes: 3f50f132

("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

569033c0