- Jun 18, 2020
-
-
notro authored
Add the bare minimum needed to boot BCM2708 from a Device Tree. Signed-off-by: Noralf Tronnes <notro@tronnes.org> BCM2708: DT: change 'axi' nodename to 'soc' Change DT node named 'axi' to 'soc' so it matches ARCH_BCM2835. The VC4 bootloader fills in certain properties in the 'axi' subtree, but since this is part of an upstreaming effort, the name is changed. Signed-off-by: Noralf Tronnes <notro@tronnes.org> BCM2708_DT: Correct length of the peripheral space Use dts-dirs feature for overlays. The kernel makefiles have a dts-dirs target that is for vendor subdirectories. Using this fixes the install_dtbs target, which previously did not install the overlays. BCM270X_DT: configure I2S DMA channels Signed-off-by: Matthias Reichl <hias@horus.com> BCM270X_DT: switch to bcm2835-i2s I2S soundcard drivers with proper devicetree support (i.e. not linking to the cpu_dai/platform via name but to cpu/platform via of_node) will work out of the box without any modifications. When the kernel is compiled without devicetree support the platform code will instantiate the bcm2708-i2s driver and I2S soundcard drivers will link to it via name, as before. Signed-off-by: Matthias Reichl <hias@horus.com> SDIO-overlay: add poll_once-boolean parameter Add paramter to toggle sdio-device-polling done every second or once at boot-time. Signed-off-by: Patrick Boettcher <patrick.boettcher@posteo.de> BCM270X_DT: Make mmc overlay compatible with current firmware The original DT overlay logic followed a merge-then-patch procedure, i.e. parameters are applied to the loaded overlay before the overlay is merged into the base DTB. This sequence has been changed to patch-then-merge, in order to support parameterised node names, and to protect against bad overlays. As a result, overrides (parameters) must only target labels in the overlay, but the overlay can obviously target nodes in the base DTB. mmc-overlay.dts (that switches back to the original mmc sdcard driver) is the only overlay violating that rule, and this patch fixes it. bcm270x_dt: Use the sdhost MMC controller by default The "mmc" overlay reverts to using the other controller. squash: Add cprman to dt BCM270X_DT: Use clk_core for I2C interfaces BCM270X_DT: Use bcm283x.dtsi, bcm2835.dtsi and bcm2836.dtsi The mainline Device Tree files are quite close to downstream now. Let's use bcm283x.dtsi, bcm2835.dtsi and bcm2836.dtsi as base files for our dts files. Mainline dts files are based on these files: bcm2835-rpi.dtsi bcm2835.dtsi bcm2836.dtsi bcm283x.dtsi Current downstream are based on these: bcm2708.dtsi bcm2709.dtsi bcm2710.dtsi bcm2708_common.dtsi This patch introduces this dependency: bcm2708.dtsi bcm2709.dtsi bcm2708-rpi.dtsi bcm270x.dtsi bcm2835.dtsi bcm2836.dtsi bcm283x.dtsi And: bcm2710.dtsi bcm2708-rpi.dtsi bcm270x.dtsi bcm283x.dtsi bcm270x.dtsi contains the downstream bcm283x.dtsi diff. bcm2708-rpi.dtsi is the downstream version of bcm2835-rpi.dtsi. Other changes: - The led node has moved from /soc/leds to /leds. This is not a problem since the label is used to reference it. - The clk_osc reg property changes from 6 to 3. - The gpu nodes has their interrupt property set in the base file. - the clocks label does not point to the /clocks node anymore, but points to the cprman node. This is not a problem since the overlays that use the clock node refer to it directly: target-path = "/clocks"; - some nodes now have 2 labels since mainline and downstream differs in this respect: cprman/clocks, spi0/spi, gpu/vc4. - some nodes doesn't have an explicit status = "okay" since they're not disabled in the base file: watchdog and random. - gpiomem doesn't need an explicit status = "okay". - bcm2708-rpi-cm.dts got the hpd-gpios property from bcm2708_common.dtsi, it's now set directly in that file. - bcm2709-rpi-2-b.dts has the timer node moved from /soc/timer to /timer. - Removed clock-frequency property on the bcm{2709,2710}.dtsi timer nodes. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> BCM270X_DT: Use raspberrypi-power to turn on USB power Use the raspberrypi-power driver to turn on USB power. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> BCM270X_DT: Add a .dtbo target, use for overlays Change the filenames and extensions to keep the pre-DDT style of overlay (<name>-overlay.dtb) distinct from new ones that use a different style of local fixups (<name>.dtbo), and to match other platforms. The RPi firmware uses the DDTK trailer atom to choose which type of overlay to use for each kernel. Signed-off-by: Phil Elwell <phil@raspberrypi.org> BCM270X_DT: Don't generate "linux,phandle" props The EPAPR standard says to use "phandle" properties to store phandles, rather than the deprecated "linux,phandle" version. By default, dtc generates both, but adding "-H epapr" causes it to only generate "phandle"s, saving some space and clutter. Signed-off-by: Phil Elwell <phil@raspberrypi.org> BCM270X_DT: Add overlay for enc28j60 on SPI2 Works on SPI2 for compute module BCM270X_DT: Add midi-uart0 overlay MIDI requires 31.25kbaud, a baudrate unsupported by Linux. The midi-uart0 overlay configures uart0 (ttyAMA0) to use a fake clock so that requesting 38.4kbaud actually gets 31.25kbaud. Signed-off-by: Phil Elwell <phil@raspberrypi.org> BCM270X_DT: Add i2c-sensor overlay The i2c-sensor overlay is a container for various pressure and temperature sensors, currently bmp085 and bmp280. The standalone bmp085_i2c-sensor overlay is now deprecated. Signed-off-by: Phil Elwell <phil@raspberrypi.org> BCM270X_DT: overlays/*-overlay.dtb -> overlays/*.dtbo (#1752) We now create overlays as .dtbo files. build: support for .dtbo files for dtb overlays Kernel 4.4.6+ on RaspberryPi support .dtbo files for overlays, instead of .dtb. Patch the kernel, which has faulty rules to generate .dtbo the way yocto does Signed-off-by: Herve Jourdain <herve.jourdain@neuf.fr> Signed-off-by: Khem Raj <raj.khem@gmail.com> BCM270X: Drop position requirement for CMA in VC4 overlay. No longer necessary since 2aefcd57 , and will probably let peeople that want to choose a larger CMA allocation (particularly on pi0/1). Signed-off-by: Eric Anholt <eric@anholt.net> BCM270X_DT: RPi Device Tree tidy Use the upstream sdhost node, add thermal-zones, and factor out some common elements. Signed-off-by: Phil Elwell <phil@raspberrypi.org> kbuild: Silence unhelpful DTC warnings Signed-off-by: Phil Elwell <phil@raspberrypi.org> BCM270X_DT: DT build rules no longer arch-specific Signed-off-by: Phil Elwell <phil@raspberrypi.org>
-
popcornmix authored
Signed-off-by: popcornmix <popcornmix@gmail.com> usb: dwc: fix lockdep false positive Signed-off-by: Kari Suvanto <karis79@gmail.com> usb: dwc: fix inconsistent lock state Signed-off-by: Kari Suvanto <karis79@gmail.com> Add FIQ patch to dwc_otg driver. Enable with dwc_otg.fiq_fix_enable=1. Should give about 10% more ARM performance. Thanks to Gordon and Costas Avoid dynamic memory allocation for channel lock in USB driver. Thanks ddv2005. Add NAK holdoff scheme. Enabled by default, disable with dwc_otg.nak_holdoff_enable=0. Thanks gsh Make sure we wait for the reset to finish dwc_otg: fix bug in dwc_otg_hcd.c resulting in silent kernel memory corruption, escalating to OOPS under high USB load. dwc_otg: Fix unsafe access of QTD during URB enqueue In dwc_otg_hcd_urb_enqueue during qtd creation, it was possible that the transaction could complete almost immediately after the qtd was assigned to a host channel during URB enqueue, which meant the qtd pointer was no longer valid having been completed and removed. Usually, this resulted in an OOPS during URB submission. By predetermining whether transactions need to be queued or not, this unsafe pointer access is avoided. This bug was only evident on the Pi model A where a device was attached that had no periodic endpoints (e.g. USB pendrive or some wlan devices). dwc_otg: Fix incorrect URB allocation error handling If the memory allocation for a dwc_otg_urb failed, the kernel would OOPS because for some reason a member of the *unallocated* struct was set to zero. Error handling changed to fail correctly. dwc_otg: fix potential use-after-free case in interrupt handler If a transaction had previously aborted, certain interrupts are enabled to track error counts and reset where necessary. On IN endpoints the host generates an ACK interrupt near-simultaneously with completion of transfer. In the case where this transfer had previously had an error, this results in a use-after-free on the QTD memory space with a 1-byte length being overwritten to 0x00. dwc_otg: add handling of SPLIT transaction data toggle errors Previously a data toggle error on packets from a USB1.1 device behind a TT would result in the Pi locking up as the driver never handled the associated interrupt. Patch adds basic retry mechanism and interrupt acknowledgement to cater for either a chance toggle error or for devices that have a broken initial toggle state (FT8U232/FT232BM). dwc_otg: implement tasklet for returning URBs to usbcore hcd layer The dwc_otg driver interrupt handler for transfer completion will spend a very long time with interrupts disabled when a URB is completed - this is because usb_hcd_giveback_urb is called from within the handler which for a USB device driver with complicated processing (e.g. webcam) will take an exorbitant amount of time to complete. This results in missed completion interrupts for other USB packets which lead to them being dropped due to microframe overruns. This patch splits returning the URB to the usb hcd layer into a high-priority tasklet. This will have most benefit for isochronous IN transfers but will also have incidental benefit where multiple periodic devices are active at once. dwc_otg: fix NAK holdoff and allow on split transactions only This corrects a bug where if a single active non-periodic endpoint had at least one transaction in its qh, on frnum == MAX_FRNUM the qh would get skipped and never get queued again. This would result in a silent device until error detection (automatic or otherwise) would either reset the device or flush and requeue the URBs. Additionally the NAK holdoff was enabled for all transactions - this would potentially stall a HS endpoint for 1ms if a previous error state enabled this interrupt and the next response was a NAK. Fix so that only split transactions get held off. dwc_otg: Call usb_hcd_unlink_urb_from_ep with lock held in completion handler usb_hcd_unlink_urb_from_ep must be called with the HCD lock held. Calling it asynchronously in the tasklet was not safe (regression in c4564d4a). This change unlinks it from the endpoint prior to queueing it for handling in the tasklet, and also adds a check to ensure the urb is OK to be unlinked before doing so. NULL pointer dereference kernel oopses had been observed in usb_hcd_giveback_urb when a USB device was unplugged/replugged during data transfer. This effect was reproduced using automated USB port power control, hundreds of replug events were performed during active transfers to confirm that the problem was eliminated. USB fix using a FIQ to implement split transactions This commit adds a FIQ implementaion that schedules the split transactions using a FIQ so we don't get held off by the interrupt latency of Linux dwc_otg: fix device attributes and avoid kernel warnings on boot dcw_otg: avoid logging function that can cause panics See: https://github.com/raspberrypi/firmware/issues/21 Thanks to cleverca22 for fix dwc_otg: mask correct interrupts after transaction error recovery The dwc_otg driver will unmask certain interrupts on a transaction that previously halted in the error state in order to reset the QTD error count. The various fine-grained interrupt handlers do not consider that other interrupts besides themselves were unmasked. By disabling the two other interrupts only ever enabled in DMA mode for this purpose, we can avoid unnecessary function calls in the IRQ handler. This will also prevent an unneccesary FIQ interrupt from being generated if the FIQ is enabled. dwc_otg: fiq: prevent FIQ thrash and incorrect state passing to IRQ In the case of a transaction to a device that had previously aborted due to an error, several interrupts are enabled to reset the error count when a device responds. This has the side-effect of making the FIQ thrash because the hardware will generate multiple instances of a NAK on an IN bulk/interrupt endpoint and multiple instances of ACK on an OUT bulk/interrupt endpoint. Make the FIQ mask and clear the associated interrupts. Additionally, on non-split transactions make sure that only unmasked interrupts are cleared. This caused a hard-to-trigger but serious race condition when you had the combination of an endpoint awaiting error recovery and a transaction completed on an endpoint - due to the sequencing and timing of interrupts generated by the dwc_otg core, it was possible to confuse the IRQ handler. Fix function tracing dwc_otg: whitespace cleanup in dwc_otg_urb_enqueue dwc_otg: prevent OOPSes during device disconnects The dwc_otg_urb_enqueue function is thread-unsafe. In particular the access of urb->hcpriv, usb_hcd_link_urb_to_ep, dwc_otg_urb->qtd and friends does not occur within a critical section and so if a device was unplugged during activity there was a high chance that the usbcore hub_thread would try to disable the endpoint with partially- formed entries in the URB queue. This would result in BUG() or null pointer dereferences. Fix so that access of urb->hcpriv, enqueuing to the hardware and adding to usbcore endpoint URB lists is contained within a single critical section. dwc_otg: prevent BUG() in TT allocation if hub address is > 16 A fixed-size array is used to track TT allocation. This was previously set to 16 which caused a crash because dwc_otg_hcd_allocate_port would read past the end of the array. This was hit if a hub was plugged in which enumerated as addr > 16, due to previous device resets or unplugs. Also add #ifdef FIQ_DEBUG around hcd->hub_port_alloc[], which grows to a large size if 128 hub addresses are supported. This field is for debug only for tracking which frame an allocate happened in. dwc_otg: make channel halts with unknown state less damaging If the IRQ received a channel halt interrupt through the FIQ with no other bits set, the IRQ would not release the host channel and never complete the URB. Add catchall handling to treat as a transaction error and retry. dwc_otg: fiq_split: use TTs with more granularity This fixes certain issues with split transaction scheduling. - Isochronous multi-packet OUT transactions now hog the TT until they are completed - this prevents hubs aborting transactions if they get a periodic start-split out-of-order - Don't perform TT allocation on non-periodic endpoints - this allows simultaneous use of the TT's bulk/control and periodic transaction buffers This commit will mainly affect USB audio playback. dwc_otg: fix potential sleep while atomic during urb enqueue Fixes a regression introduced with eb1b482a. Kmalloc called from dwc_otg_hcd_qtd_add / dwc_otg_hcd_qtd_create did not always have the GPF_ATOMIC flag set. Force this flag when inside the larger critical section. dwc_otg: make fiq_split_enable imply fiq_fix_enable Failing to set up the FIQ correctly would result in "IRQ 32: nobody cared" errors in dmesg. dwc_otg: prevent crashes on host port disconnects Fix several issues resulting in crashes or inconsistent state if a Model A root port was disconnected. - Clean up queue heads properly in kill_urbs_in_qh_list by removing the empty QHs from the schedule lists - Set the halt status properly to prevent IRQ handlers from using freed memory - Add fiq_split related cleanup for saved registers - Make microframe scheduling reclaim host channels if active during a disconnect - Abort URBs with -ESHUTDOWN status response, informing device drivers so they respond in a more correct fashion and don't try to resubmit URBs - Prevent IRQ handlers from attempting to handle channel interrupts if the associated URB was dequeued (and the driver state was cleared) dwc_otg: prevent leaking URBs during enqueue A dwc_otg_urb would get leaked if the HCD enqueue function failed for any reason. Free the URB at the appropriate points. dwc_otg: Enable NAK holdoff for control split transactions Certain low-speed devices take a very long time to complete a data or status stage of a control transaction, producing NAK responses until they complete internal processing - the USB2.0 spec limit is up to 500mS. This causes the same type of interrupt storm as seen with USB-serial dongles prior to c8edb238. In certain circumstances, usually while booting, this interrupt storm could cause SD card timeouts. dwc_otg: Fix for occasional lockup on boot when doing a USB reset dwc_otg: Don't issue traffic to LS devices in FS mode Issuing low-speed packets when the root port is in full-speed mode causes the root port to stop responding. Explicitly fail when enqueuing URBs to a LS endpoint on a FS bus. Fix ARM architecture issue with local_irq_restore() If local_fiq_enable() is called before a local_irq_restore(flags) where the flags variable has the F bit set, the FIQ will be erroneously disabled. Fixup arch_local_irq_restore to avoid trampling the F bit in CPSR. Also fix some of the hacks previously implemented for previous dwc_otg incarnations. dwc_otg: fiq_fsm: Base commit for driver rewrite This commit removes the previous FIQ fixes entirely and adds fiq_fsm. This rewrite features much more complete support for split transactions and takes into account several OTG hardware bugs. High-speed isochronous transactions are also capable of being performed by fiq_fsm. All driver options have been removed and replaced with: - dwc_otg.fiq_enable (bool) - dwc_otg.fiq_fsm_enable (bool) - dwc_otg.fiq_fsm_mask (bitmask) - dwc_otg.nak_holdoff (unsigned int) Defaults are specified such that fiq_fsm behaves similarly to the previously implemented FIQ fixes. fiq_fsm: Push error recovery into the FIQ when fiq_fsm is used If the transfer associated with a QTD failed due to a bus error, the HCD would retry the transfer up to 3 times (implementing the USB2.0 three-strikes retry in software). Due to the masking mechanism used by fiq_fsm, it is only possible to pass a single interrupt through to the HCD per-transfer. In this instance host channels would fall off the radar because the error reset would function, but the subsequent channel halt would be lost. Push the error count reset into the FIQ handler. fiq_fsm: Implement timeout mechanism For full-speed endpoints with a large packet size, interrupt latency runs the risk of the FIQ starting a transaction too late in a full-speed frame. If the device is still transmitting data when EOF2 for the downstream frame occurs, the hub will disable the port. This change is not reflected in the hub status endpoint and the device becomes unresponsive. Prevent high-bandwidth transactions from being started too late in a frame. The mechanism is not guaranteed: a combination of bit stuffing and hub latency may still result in a device overrunning. fiq_fsm: fix bounce buffer utilisation for Isochronous OUT Multi-packet isochronous OUT transactions were subject to a few bounday bugs. Fix them. Audio playback is now much more robust: however, an issue stands with devices that have adaptive sinks - ALSA plays samples too fast. dwc_otg: Return full-speed frame numbers in HS mode The frame counter increments on every *microframe* in high-speed mode. Most device drivers expect this number to be in full-speed frames - this caused considerable confusion to e.g. snd_usb_audio which uses the frame counter to estimate the number of samples played. fiq_fsm: save PID on completion of interrupt OUT transfers Also add edge case handling for interrupt transports. Note that for periodic split IN, data toggles are unimplemented in the OTG host hardware - it unconditionally accepts any PID. fiq_fsm: add missing case for fiq_fsm_tt_in_use() Certain combinations of bitrate and endpoint activity could result in a periodic transaction erroneously getting started while the previous Isochronous OUT was still active. fiq_fsm: clear hcintmsk for aborted transactions Prevents the FIQ from erroneously handling interrupts on a timed out channel. fiq_fsm: enable by default fiq_fsm: fix dequeues for non-periodic split transactions If a dequeue happened between the SSPLIT and CSPLIT phases of the transaction, the HCD would never receive an interrupt. fiq_fsm: Disable by default fiq_fsm: Handle HC babble errors The HCTSIZ transfer size field raises a babble interrupt if the counter wraps. Handle the resulting interrupt in this case. dwc_otg: fix interrupt registration for fiq_enable=0 Additionally make the module parameter conditional for wherever hcd->fiq_state is touched. fiq_fsm: Enable by default dwc_otg: Fix various issues with root port and transaction errors Process the host port interrupts correctly (and don't trample them). Root port hotplug now functional again. Fix a few thinkos with the transaction error passthrough for fiq_fsm. fiq_fsm: Implement hack for Split Interrupt transactions Hubs aren't too picky about which endpoint we send Control type split transactions to. By treating Interrupt transfers as Control, it is possible to use the non-periodic queue in the OTG core as well as the non-periodic FIFOs in the hub itself. This massively reduces the microframe exclusivity/contention that periodic split transactions otherwise have to enforce. It goes without saying that this is a fairly egregious USB specification violation, but it works. Original idea by Hans Petter Selasky @ FreeBSD.org. dwc_otg: FIQ support on SMP. Set up FIQ stack and handler on Core 0 only. dwc_otg: introduce fiq_fsm_spin(un|)lock() SMP safety for the FIQ relies on register read-modify write cycles being completed in the correct order. Several places in the DWC code modify registers also touched by the FIQ. Protect these by a bare-bones lock mechanism. This also makes it possible to run the FIQ and IRQ handlers on different cores. fiq_fsm: fix build on bcm2708 and bcm2709 platforms dwc_otg: put some barriers back where they should be for UP bcm2709/dwc_otg: Setup FIQ on core 1 if >1 core active dwc_otg: fixup read-modify-write in critical paths Be more careful about read-modify-write on registers that the FIQ also touches. Guard fiq_fsm_spin_lock with fiq_enable check fiq_fsm: Falling out of the state machine isn't fatal This edge case can be hit if the port is disabled while the FIQ is in the middle of a transaction. Make the effects less severe. Also get rid of the useless return value. squash: dwc_otg: Allow to build without SMP usb: core: make overcurrent messages more prominent Hub overcurrent messages are more serious than "debug". Increase loglevel. usb: dwc_otg: Don't use dma_to_virt() Commit 6ce0d200 changes dma_to_virt() which breaks this driver. Open code the old dma_to_virt() implementation to work around this. Limit the use of __bus_to_virt() to cases where transfer_buffer_length is set and transfer_buffer is not set. This is done to increase the chance that this driver will also work on ARCH_BCM2835. transfer_buffer should not be NULL if the length is set, but the comment in the code indicates that there are situations where this might happen. drivers/usb/isp1760/isp1760-hcd.c also has a similar comment pointing to a possible: 'usb storage / SCSI bug'. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> dwc_otg: Fix crash when fiq_enable=0 dwc_otg: fiq_fsm: Make high-speed isochronous strided transfers work properly Certain low-bandwidth high-speed USB devices (specialist audio devices, compressed-frame webcams) have packet intervals > 1 microframe. Stride these transfers in the FIQ by using the start-of-frame interrupt to restart the channel at the right time. dwc_otg: Force host mode to fix incorrect compute module boards dwc_otg: Add ARCH_BCM2835 support Signed-off-by: Noralf Trønnes <noralf@tronnes.org> dwc_otg: Simplify FIQ irq number code Dropping ATAGS means we can simplify the FIQ irq number code. Also add error checking on the returned irq number. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> dwc_otg: Remove duplicate gadget probe/unregister function dwc_otg: Properly set the HFIR Douglas Anderson reported: According to the most up to date version of the dwc2 databook, the FRINT field of the HFIR register should be programmed to: * 125 us * (PHY clock freq for HS) - 1 * 1000 us * (PHY clock freq for FS/LS) - 1 This is opposed to older versions of the doc that claimed it should be: * 125 us * (PHY clock freq for HS) * 1000 us * (PHY clock freq for FS/LS) and reported lower timing jitter on a USB analyser dcw_otg: trim xfer length when buffer larger than allocated size is received dwc_otg: Don't free qh align buffers in atomic context dwc_otg: Enable the hack for Split Interrupt transactions by default dwc_otg.fiq_fsm_mask=0xF has long been a suggestion for users with audio stutters or other USB bandwidth issues. So far we are aware of many success stories but no failure caused by this setting. Make it a default to learn more. See: https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=70437 Signed-off-by: popcornmix <popcornmix@gmail.com> dwc_otg: Use kzalloc when suitable dwc_otg: Pass struct device to dma_alloc*() This makes it possible to get the bus address from Device Tree. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> dwc_otg: fix summarize urb->actual_length for isochronous transfers Kernel does not copy input data of ISO transfers to userspace if actual_length is set only in ISO transfers and not summarized in urb->actual_length. Fixes raspberrypi/linux#903 fiq_fsm: Use correct states when starting isoc OUT transfers In fiq_fsm_start_next_periodic() if an isochronous OUT transfer was selected, no regard was given as to whether this was a single-packet transfer or a multi-packet staged transfer. For single-packet transfers, this had the effect of repeatedly sending OUT packets with bogus data and lengths. Eventually if the channel was repeatedly enabled enough times, this would lock up the OTG core and no further bus transfers would happen. Set the FSM state up properly if we select a single-packet transfer. Fixes https://github.com/raspberrypi/linux/issues/1842 dwc_otg: make nak_holdoff work as intended with empty queues If URBs reading from non-periodic split endpoints were dequeued and the last transfer from the endpoint was a NAK handshake, the resulting qh->nak_frame value was stale which would result in unnecessarily long polling intervals for the first subsequent transfer with a fresh URB. Fixup qh->nak_frame in dwc_otg_hcd_urb_dequeue and also guard against a case where a single URB is submitted to the endpoint, a NAK was received on the transfer immediately prior to receiving data and the device subsequently resubmits another URB past the qh->nak_frame interval. Fixes https://github.com/raspberrypi/linux/issues/1709 dwc_otg: fix split transaction data toggle handling around dequeues See https://github.com/raspberrypi/linux/issues/1709 Fix several issues regarding endpoint state when URBs are dequeued - If the HCD is disconnected, flush FIQ-enabled channels properly - Save the data toggle state for bulk endpoints if the last transfer from an endpoint where URBs were dequeued returned a data packet - Reset hc->start_pkt_count properly in assign_and_init_hc() dwc_otg: fix several potential crash sources On root port disconnect events, the host driver state is cleared and in-progress host channels are forcibly stopped. This doesn't play well with the FIQ running in the background, so: - Guard the disconnect callback with both the host spinlock and FIQ spinlock - Move qtd dereference in dwc_otg_handle_hc_fsm() after the early-out so we don't dereference a qtd that has gone away - Turn catch-all BUG()s in dwc_otg_handle_hc_fsm() into warnings. dwc_otg: delete hcd->channel_lock The lock serves no purpose as it is only held while the HCD spinlock is already being held. dwc_otg: remove unnecessary dma-mode channel halts on disconnect interrupt Host channels are already halted in kill_urbs_in_qh_list() with the subsequent interrupt processing behaving as if the URB was dequeued via HCD callback. There's no need to clobber the host channel registers a second time as this exposes races between the driver and host channel resulting in hcd->free_hc_list becoming corrupted. dwcotg: Allow to build without FIQ on ARM64 Signed-off-by: popcornmix <popcornmix@gmail.com> dwc_otg: make periodic scheduling behave properly for FS buses If the root port is in full-speed mode, transfer times at 12mbit/s would be calculated but matched against high-speed quotas. Reinitialise hcd->frame_usecs[i] on each port enable event so that full-speed bandwidth can be tracked sensibly. Also, don't bother using the FIQ for transfers when in full-speed mode - at the slower bus speed, interrupt frequency is reduced by an order of magnitude. Related issue: https://github.com/raspberrypi/linux/issues/2020 dwc_otg: fiq_fsm: Make isochronous compatibility checks work properly Get rid of the spammy printk and local pointer mangling. Also, there is a nominal benefit for using fiq_fsm for isochronous transfers in FS mode (~1.1k IRQs per second vs 2.1k IRQs per second) so remove the root port speed check. dwc_otg: add module parameter int_ep_interval_min Add a module parameter (defaulting to ignored) that clamps the polling rate of high-speed Interrupt endpoints to a minimum microframe interval. The parameter is modifiable at runtime as it is used when activating new endpoints (such as on device connect). dwc_otg: fiq_fsm: Add non-periodic TT exclusivity constraints Certain hub types do not discriminate between pipe direction (IN or OUT) when considering non-periodic transfers. Therefore these hubs get confused if multiple transfers are issued in different directions with the same device address and endpoint number. Constrain queuing non-periodic split transactions so they are performed serially in such cases. Related: https://github.com/raspberrypi/linux/issues/2024 dwc_otg: Fixup change to DRIVER_ATTR interface dwc_otg: Fix compilation warnings Signed-off-by: Phil Elwell <phil@raspberrypi.org> USB_DWCOTG: Disable building dwc_otg as a module (#2265) When dwc_otg is built as a module, build will fail with the following error: ERROR: "DWC_TASK_HI_SCHEDULE" [drivers/usb/host/dwc_otg/dwc_otg.ko] undefined! scripts/Makefile.modpost:91: recipe for target '__modpost' failed make[1]: *** [__modpost] Error 1 Makefile:1199: recipe for target 'modules' failed make: *** [modules] Error 2 Even if the error is solved by including the missing DWC_TASK_HI_SCHEDULE function, the kernel will panic when loading dwc_otg. As a workaround, simply prevent user from building dwc_otg as a module as the current kernel does not support it. See: https://github.com/raspberrypi/linux/issues/2258 Signed-off-by: Malik Olivier Boussejra <malik@boussejra.com> dwc_otg: New timer API dwc_otg: Fix removed ACCESS_ONCE->READ_ONCE dwc_otg: don't unconditionally force host mode in dwc_otg_cil_init() Add the ability to disable force_host_mode for those that want to use dwc_otg in both device and host modes. dwc_otg: Fix a regression when dequeueing isochronous transfers In 282bed95 (dwc_otg: make nak_holdoff work as intended with empty queues) the dequeue mechanism was changed to leave FIQ-enabled transfers to run to completion - to avoid leaving hub TT buffers with stale packets lying around. This broke FIQ-accelerated isochronous transfers, as this then meant that dozens of transfers were performed after the dequeue function returned. Restore the state machine fence for isochronous transfers. fiq_fsm: rewind DMA pointer for OUT transactions that fail (#2288) See: https://github.com/raspberrypi/linux/issues/2140 dwc_otg: add smp_mb() to prevent driver state corruption on boot Occasional crashes have been seen where the FIQ code dereferences invalid/random pointers immediately after being set up, leading to panic on boot. The crash occurs as the FIQ code races against hcd_init_fiq() and the hcd_init_fiq() code races against the outstanding memory stores from dwc_otg_hcd_init(). Use explicit barriers after touching driver state. usb: dwc_otg: fix memory corruption in dwc_otg driver [Upstream commit 51b1b649 ] The move from the staging tree to the main tree exposed a longstanding memory corruption bug in the dwc2 driver. The reordering of the driver initialization caused the dwc2 driver to corrupt the initialization data of the sdhci driver on the Raspberry Pi platform, which made the bug show up. The error is in calling to_usb_device(hsotg->dev), since ->dev is not a member of struct usb_device. The easiest fix is to just remove the offending code, since it is not really needed. Thanks to Stephen Warren for tracking down the cause of this. Reported-by: Andre Heider <a.heider@gmail.com> Tested-by: Stephen Warren <swarren@wwwdotorg.org> Signed-off-by: Paul Zimmerman <paulz@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [lukas: port from upstream dwc2 to out-of-tree dwc_otg driver] Signed-off-by: Lukas Wunner <lukas@wunner.de> usb: dwb_otg: Fix unreachable switch statement warning This warning appears with GCC 7.3.0 from toolchains.bootlin.com: ../drivers/usb/host/dwc_otg/dwc_otg_fiq_fsm.c: In function ‘fiq_fsm_update_hs_isoc’: ../drivers/usb/host/dwc_otg/dwc_otg_fiq_fsm.c:595:61: warning: statement will never be executed [-Wswitch-unreachable] st->hctsiz_copy.b.xfersize = nrpackets * st->hcchar_copy.b.mps; ~~~~~~~~~~~~~~~~~^~~~ Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> dwc_otg: fiq_fsm: fix incorrect DMA register offset calculation Rationalise the offset and update all call sites. Fixes https://github.com/raspberrypi/linux/issues/2408 dwc_otg: fix bug with port_addr assignment for single-TT hubs See https://github.com/raspberrypi/linux/issues/2734 The "Hub Port" field in the split transaction packet was always set to 1 for single-TT hubs. The majority of single-TT hub products apparently ignore this field and broadcast to all downstream enabled ports, which masked the issue. A subset of hub devices apparently need the port number to be exact or split transactions will fail. usb: dwc_otg: Clean up build warnings on 64bit kernels No functional changes. Almost all are changes to logging lines. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org> usb: dwc_otg: Use dma allocation for mphi dummy_send buffer The FIQ driver used a kzalloc'ed buffer for dummy_send, passing a kernel virtual address to the hardware block. The buffer is only ever used for a dummy read, so it should be harmless, but there is the chance that it will cause exceptions. Use a dma allocation so that we have a genuine bus address, and read from that. Free the allocation when done for good measure. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org> dwc_otg: only do_split when we actually need to do a split The previous test would fail if the root port was in fullspeed mode and there was a hub between the FS device and the root port. While the transfer worked, the schedule mangling performed for high-speed split transfers would break leading to an 8ms polling interval. dwc_otg: fix locking around dequeueing and killing URBs kill_urbs_in_qh_list() is practically only ever called with the fiq lock already held, so don't spinlock twice in the case where we need to cancel an isochronous transfer. Also fix up a case where the global interrupt register could be read with the fiq lock not held. Fixes the deadlock seen in https://github.com/raspberrypi/linux/issues/2907 ARM64/DWC_OTG: Port dwc_otg driver to ARM64 In ARM64, the FIQ mechanism used by this driver is not current implemented. As a workaround, reqular IRQ is used instead of FIQ. In a separate change, the IRQ-CPU mapping is round robined on ARM64 to increase concurrency and allow multiple interrupts to be serviced at a time. This reduces the need for FIQ. Tests Run: This mechanism is most likely to break when multiple USB devices are attached at the same time. So the system was tested under stress. Devices: 1. USB Speakers playing back a FLAC audio through VLC at 96KHz.(Higher then typically, but supported on my speakers). 2. sftp transferring large files through the buildin ethernet connection which is connected through USB. 3. Keyboard and mouse attached and being used. Although I do occasionally hear some glitches, the music seems to play quite well. Signed-off-by: Michael Zoran <mzoran@crowfest.net> usb: dwc_otg: Clean up interrupt claiming code The FIQ/IRQ interrupt number identification code is scattered through the dwc_otg driver. Rationalise it, simplifying the code and solving an existing issue. See: https://github.com/raspberrypi/linux/issues/2612 Signed-off-by: Phil Elwell <phil@raspberrypi.org> dwc_otg: Choose appropriate IRQ handover strategy 2711 has no MPHI peripheral, but the ARM Control block can fake interrupts. Use the size of the DTB "mphi" reg block to determine which is required. Signed-off-by: Phil Elwell <phil@raspberrypi.org> usb: host: dwc_otg: fix compiling in separate directory The dwc_otg Makefile does not respect the O=path argument correctly: include paths in CFLAGS are given relatively to object path, not source path. Compiling in a separate directory yields #include errors. Signed-off-by: Marek Behún <marek.behun@nic.cz> dwc_otg: use align_buf for small IN control transfers (#3150) The hardware will do a 4-byte write to memory on any IN packet received that is between 1 and 3 bytes long. This tramples memory in the uvcvideo driver, as it uses a sequence of 1- and 2-byte control transfers to query the min/max/range/step of each individual camera control and gives us buffers that are offsets into a struct. Catch small control transfers in the data phase and use the align_buf to bounce the correct number of bytes into the URB's buffer. In general, short packets on non-control endpoints should be OK as URBs should have enough buffer space for a wMaxPacket size transfer. See: https://github.com/raspberrypi/linux/issues/3148 Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org> dwc_otg: Declare DMA capability with HCD_DMA flag Following [1], USB controllers have to declare DMA capabilities in order for them to be used by adding the HCD_DMA flag to their hc_driver struct. [1] 7b81cb6b ("usb: add a HCD_DMA flag instead of guestimating DMA capabilities") Signed-off-by: Phil Elwell <phil@raspberrypi.org> dwc_otg: checking the urb->transfer_buffer too early (#3332) After enable the HIGHMEM and VMSPLIT_3G, the dwc_otg driver doesn't work well on Pi2/3 boards with 1G physical ram. Users experience the failure when copying a file of 600M size to the USB stick. And at the same time, the dmesg shows: usb 1-1.1.2: reset high-speed USB device number 8 using dwc_otg sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK blk_update_request: I/O error, dev sda, sector 3024048 op 0x1:(WRITE) flags 0x4000 phys_seg 15 prio class 0 When this happens, the sg_buf sent to the driver is located in the highmem region, the usb_sg_init() in the core/message.c will leave transfer_buffer to NULL if the sg_buf is in highmem, but in the dwc_otg driver, it returns -EINVAL unconditionally if transfer_buffer is NULL. The driver can handle the situation of buffer to be NULL, if it is in DMA mode, it will convert an address from transfer_dma. But if the conversion fails or it is in the PIO mode, we should check buffer and return -EINVAL if it is NULL. BugLink: https://bugs.launchpad.net/bugs/1852510 Signed-off-by: Hui Wang <hui.wang@canonical.com> dwc_otg: constrain endpoint max packet and transfer size on split IN The hcd would unconditionally set the transfer length to the endpoint packet size for non-isoc IN transfers. If the remaining buffer length was less than the length of returned data, random memory would get scribbled over, with bad effects if it crossed a page boundary. Force a babble error if this happens by limiting the max transfer size to the available buffer space. DMA will stop writing to memory on a babble condition. The hardware expects xfersize to be an integer multiple of maxpacket size, so override hcchar.b.mps as well. Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org> dwc_otg: fiq_fsm: pause when cancelling split transactions Non-periodic splits will DMA to/from the driver-provided transfer_buffer, which may be freed immediately after the dequeue call returns. Block until we know the transfer is complete. A similar delay is needed when cleaning up disconnects, as the FIQ could have started a periodic transfer in the previous microframe to the one that triggered a disconnect. Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org> dwc_otg: fiq_fsm: add a barrier on entry into FIQ handler(s) On BCM2835, there is no hardware guarantee that multiple outstanding reads to different peripherals will complete in-order. The FIQ code uses peripheral reads without barriers for performance, so in the case where a read to a slow peripheral was issued immediately prior to FIQ entry, the first peripheral read that the FIQ did could end up with wrong read data returned. Add dsb(sy) on entry so that all outstanding reads are retired. The FIQ only issues reads to the dwc_otg core, so per-read barriers in the handler itself are not required. On BCM2836 and BCM2837 the barrier is not strictly required due to differences in how the peripheral bus is implemented, but having arch-specific handlers that introduce different latencies is risky. Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org>
-
popcornmix authored
Signed-off-by: popcornmix <popcornmix@gmail.com> Signed-off-by: Noralf Trønnes <noralf@tronnes.org> bcm2709: Drop platform smp and timer init code irq-bcm2836 handles this through these functions: bcm2835_init_local_timer_frequency() bcm2836_arm_irqchip_smp_init() Signed-off-by: Noralf Trønnes <noralf@tronnes.org> bcm270x: Use watchdog for reboot/poweroff The watchdog driver already has support for reboot/poweroff. Make use of this and remove the code from the platform files. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> board_bcm2835: Remove coherent dma pool increase - API has gone
-
Claggy3 authored
Christopher Alexander Tobias Schulze - May 2, 2015, 11:57 a.m. This patch fixes a problem with VFP state save and restore related to exception handling (panic with message "BUG: unsupported FP instruction in kernel mode") present on VFP11 floating point units (as used with ARM1176JZF-S CPUs, e.g. on first generation Raspberry Pi boards). This patch was developed and discussed on https://github.com/raspberrypi/linux/issues/859 A precondition to see the crashes is that floating point exception traps are enabled. In this case, the VFP11 might determine that a FPU operation needs to trap at a point in time when it is not possible to signal this to the ARM11 core any more. The VFP11 will then set the FPEXC.EX bit and store the trapped opcode in FPINST. (In some cases, a second opcode might have been accepted by the VFP11 before the exception was detected and could be reported to the ARM11 - in this case, the VFP11 also sets FPEXC.FP2V and stores the second opcode in FPINST2.) If FPEXC.EX is set, the VFP11 will "bounce" the next FPU opcode issued by the ARM11 CPU, which will be seen by the ARM11 as an undefined opcode trap. The VFP support code examines the FPEXC.EX and FPEXC.FP2V bits to decide what actions to take, i.e., whether to emulate the opcodes found in FPINST and FPINST2, and whether to retry the bounced instruction. If a user space application has left the VFP11 in this "pending trap" state, the next FPU opcode issued to the VFP11 might actually be the VSTMIA operation vfp_save_state() uses to store the FPU registers to memory (in our test cases, when building the signal stack frame). In this case, the kernel crashes as described above. This patch fixes the problem by making sure that vfp_save_state() is always entered with FPEXC.EX cleared. (The current value of FPEXC has already been saved, so this does not corrupt the context. Clearing FPEXC.EX has no effects on FPINST or FPINST2. Also note that many callers already modify FPEXC by setting FPEXC.EN before invoking vfp_save_state().) This patch also addresses a second problem related to FPEXC.EX: After returning from signal handling, the kernel reloads the VFP context from the user mode stack. However, the current code explicitly clears both FPEXC.EX and FPEXC.FP2V during reload. As VFP11 requires these bits to be preserved, this patch disables clearing them for VFP implementations belonging to architecture 1. There should be no negative side effects: the user can set both bits by executing FPU opcodes anyway, and while user code may now place arbitrary values into FPINST and FPINST2 (e.g., non-VFP ARM opcodes) the VFP support code knows which instructions can be emulated, and rejects other opcodes with "unhandled bounce" messages, so there should be no security impact from allowing reloading FPEXC.EX and FPEXC.FP2V. Signed-off-by: Christopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net>
-
popcornmix authored
-
Noralf Trønnes authored
Add a duplicate irq range with an offset on the hwirq's so the driver can detect that enable_fiq() is used. Tested with downstream dwc_otg USB controller driver. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Stephen Warren <swarren@wwwdotorg.org>
-
Dan Pasanen authored
* Re-expose some dmi APIs for use in VCSM
-
- Jun 17, 2020
-
-
Marc Zyngier authored
commit ef3e40a7 upstream. When using the PtrAuth feature in a guest, we need to save the host's keys before allowing the guest to program them. For that, we dump them in a per-CPU data structure (the so called host context). But both call sites that do this are in preemptible context, which may end up in disaster should the vcpu thread get preempted before reentering the guest. Instead, save the keys eagerly on each vcpu_load(). This has an increased overhead, but is at least safe. Cc: stable@vger.kernel.org Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marc Zyngier authored
commit 0370964d upstream. On a VHE system, the EL1 state is left in the CPU most of the time, and only syncronized back to memory when vcpu_put() is called (most of the time on preemption). Which means that when injecting an exception, we'd better have a way to either: (1) write directly to the EL1 sysregs (2) synchronize the state back to memory, and do the changes there For an AArch64, we already do (1), so we are safe. Unfortunately, doing the same thing for AArch32 would be pretty invasive. Instead, we can easily implement (2) by calling the put/load architectural backends, and keep preemption disabled. We can then reload the state back into EL1. Cc: stable@vger.kernel.org Reported-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ludovic Desroches authored
commit a1af7f36 upstream. Remove non-removable and mmc-ddr-1_8v properties from the sdmmc0 node which come probably from an unchecked copy/paste. Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com> Fixes:42ed5355 "ARM: dts: at91: introduce the sama5d2 ptc ek board" Cc: stable@vger.kernel.org # 4.19 and later Link: https://lore.kernel.org/r/20200401221504.41196-1-ludovic.desroches@microchip.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marc Zyngier authored
commit 3204be41 upstream. AArch32 CP1x registers are overlayed on their AArch64 counterparts in the vcpu struct. This leads to an interesting problem as they are stored in their CPU-local format, and thus a CP1x register doesn't "hit" the lower 32bit portion of the AArch64 register on a BE host. To workaround this unfortunate situation, introduce a bias trick in the vcpu_cp1x() accessors which picks the correct half of the 64bit register. Cc: stable@vger.kernel.org Reported-by: James Morse <james.morse@arm.com> Tested-by: James Morse <james.morse@arm.com> Acked-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
James Morse authored
commit 7c582bf4 upstream. aarch32 has pairs of registers to access the high and low parts of 64bit registers. KVM has a union of 64bit sys_regs[] and 32bit copro[]. The 32bit accessors read the high or low part of the 64bit sys_reg[] value through the union. Both sys_reg_descs[] and cp15_regs[] list access_csselr() as the accessor for CSSELR{,_EL1}. access_csselr() is only aware of the 64bit sys_regs[], and expects r->reg to be 'CSSELR_EL1' in the enum, index 2 of the 64bit array. cp15_regs[] uses the 32bit copro[] alias of sys_regs[]. Here CSSELR is c0_CSSELR which is the same location in sys_reg[]. r->reg is 'c0_CSSELR', index 4 in the 32bit array. access_csselr() uses the 32bit r->reg value to access the 64bit array, so reads and write the wrong value. sys_regs[4], is ACTLR_EL1, which is subsequently save/restored when we enter the guest. ACTLR_EL1 is supposed to be read-only for the guest. This register only affects execution at EL1, and the host's value is restored before we return to host EL1. Convert the 32bit register index back to the 64bit version. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200529150656.7339-2-james.morse@arm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Xing Li authored
commit 5816c76d upstream. If a CPU support more than 32bit vmbits (which is true for 64bit CPUs), VPN2_MASK set to fixed 0xffffe000 will lead to a wrong EntryHi in some functions such as _kvm_mips_host_tlb_inv(). The cpu_vmbits definition of 32bit CPU in cpu-features.h is 31, so we still use the old definition. Cc: Stable <stable@vger.kernel.org> Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Signed-off-by: Xing Li <lixing@loongson.cn> [Huacai: Improve commit messages] Signed-off-by: Huacai Chen <chenhc@lemote.com> Message-Id: <1590220602-3547-3-git-send-email-chenhc@lemote.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Xing Li authored
commit fe2b73db upstream. The code in decode_config4() of arch/mips/kernel/cpu-probe.c asid_mask = MIPS_ENTRYHI_ASID; if (config4 & MIPS_CONF4_AE) asid_mask |= MIPS_ENTRYHI_ASIDX; set_cpu_asid_mask(c, asid_mask); set asid_mask to cpuinfo->asid_mask. So in order to support variable ASID_MASK, KVM_ENTRYHI_ASID should also be changed to cpu_asid_mask(&boot_cpu_data). Cc: Stable <stable@vger.kernel.org> #4.9+ Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Signed-off-by: Xing Li <lixing@loongson.cn> [Huacai: Change current_cpu_data to boot_cpu_data for optimization] Signed-off-by: Huacai Chen <chenhc@lemote.com> Message-Id: <1590220602-3547-2-git-send-email-chenhc@lemote.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sean Christopherson authored
commit 2ebac8bb upstream. Consult only the basic exit reason, i.e. bits 15:0 of vmcs.EXIT_REASON, when determining whether a nested VM-Exit should be reflected into L1 or handled by KVM in L0. For better or worse, the switch statement in nested_vmx_exit_reflected() currently defaults to "true", i.e. reflects any nested VM-Exit without dedicated logic. Because the case statements only contain the basic exit reason, any VM-Exit with modifier bits set will be reflected to L1, even if KVM intended to handle it in L0. Practically speaking, this only affects EXIT_REASON_MCE_DURING_VMENTRY, i.e. a #MC that occurs on nested VM-Enter would be incorrectly routed to L1, as "failed VM-Entry" is the only modifier that KVM can currently encounter. The SMM modifiers will never be generated as KVM doesn't support/employ a SMI Transfer Monitor. Ditto for "exit from enclave", as KVM doesn't yet support virtualizing SGX, i.e. it's impossible to enter an enclave in a KVM guest (L1 or L2). Fixes: 644d711a ("KVM: nVMX: Deciding if L0 or L1 should handle an L2 exit") Cc: Jim Mattson <jmattson@google.com> Cc: Xiaoyao Li <xiaoyao.li@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200227174430.26371-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paolo Bonzini authored
commit 6c0238c4 upstream. Restoring the ASID from the hsave area on VMEXIT is wrong, because its value depends on the handling of TLB flushes. Just skipping the field in copy_vmcb_control_area will do. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paolo Bonzini authored
commit a3535be7 upstream. Async page faults have to be trapped in the host (L1 in this case), since the APF reason was passed from L0 to L1 and stored in the L1 APF data page. This was completely reversed: the page faults were passed to the guest, a L2 hypervisor. Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sean Christopherson authored
commit 5c911bef upstream. Skip the Indirect Branch Prediction Barrier that is triggered on a VMCS switch when running with spectre_v2_user=on/auto if the switch is between two VMCSes in the same guest, i.e. between vmcs01 and vmcs02. The IBPB is intended to prevent one guest from attacking another, which is unnecessary in the nested case as it's the same guest from KVM's perspective. This all but eliminates the overhead observed for nested VMX transitions when running with CONFIG_RETPOLINE=y and spectre_v2_user=on/auto, which can be significant, e.g. roughly 3x on current systems. Reported-by: Alexander Graf <graf@amazon.com> Cc: KarimAllah Raslan <karahmed@amazon.de> Cc: stable@vger.kernel.org Fixes: 15d45071 ("KVM/x86: Add IBPB support") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200501163117.4655-1-sean.j.christopherson@intel.com> [Invert direction of bool argument. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tony Luck authored
commit 17fae129 upstream. An interesting thing happened when a guest Linux instance took a machine check. The VMM unmapped the bad page from guest physical space and passed the machine check to the guest. Linux took all the normal actions to offline the page from the process that was using it. But then guest Linux crashed because it said there was a second machine check inside the kernel with this stack trace: do_memory_failure set_mce_nospec set_memory_uc _set_memory_uc change_page_attr_set_clr cpa_flush clflush_cache_range_opt This was odd, because a CLFLUSH instruction shouldn't raise a machine check (it isn't consuming the data). Further investigation showed that the VMM had passed in another machine check because is appeared that the guest was accessing the bad page. Fix is to check the scope of the poison by checking the MCi_MISC register. If the entire page is affected, then unmap the page. If only part of the page is affected, then mark the page as uncacheable. This assumes that VMMs will do the logical thing and pass in the "whole page scope" via the MCi_MISC register (since they unmapped the entire page). [ bp: Adjust to x86/entry changes. ] Fixes: 284ce401 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()") Reported-by: Jue Wang <juew@google.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jue Wang <juew@google.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200520163546.GA7977@agluck-desk2.amr.corp.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nick Desaulniers authored
commit a194c33f upstream. Will reported a UBSAN warning: UBSAN: null-ptr-deref in arch/arm64/kernel/smp.c:596:6 member access within null pointer of type 'struct acpi_madt_generic_interrupt' CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-rc6-00124-g96bc42ff0a82 #1 Call trace: dump_backtrace+0x0/0x384 show_stack+0x28/0x38 dump_stack+0xec/0x174 handle_null_ptr_deref+0x134/0x174 __ubsan_handle_type_mismatch_v1+0x84/0xa4 acpi_parse_gic_cpu_interface+0x60/0xe8 acpi_parse_entries_array+0x288/0x498 acpi_table_parse_entries_array+0x178/0x1b4 acpi_table_parse_madt+0xa4/0x110 acpi_parse_and_init_cpus+0x38/0x100 smp_init_cpus+0x74/0x258 setup_arch+0x350/0x3ec start_kernel+0x98/0x6f4 This is from the use of the ACPI_OFFSET in arch/arm64/include/asm/acpi.h. Replace its use with offsetof from include/linux/stddef.h which should implement the same logic using __builtin_offsetof, so that UBSAN wont warn. Reported-by: Will Deacon <will@kernel.org> Suggested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/20200521100952.GA5360@willie-the-truck/ Link: https://lore.kernel.org/r/20200608203818.189423-1-ndesaulniers@google.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christophe Leroy authored
commit b00ff6d8 upstream. In order to properly display information regardless of the page size, it is necessary to take into account real page size. Fixes: cabe8138 ("powerpc: dump as a single line areas mapping a single physical page.") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a53b2a0ffd042a8d85464bf90d55bc5b970e00a1.1589866984.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eiichi Tsukata authored
commit e649b3f0 upstream. Commit b1394e74 ("KVM: x86: fix APIC page invalidation") tried to fix inappropriate APIC page invalidation by re-introducing arch specific kvm_arch_mmu_notifier_invalidate_range() and calling it from kvm_mmu_notifier_invalidate_range_start. However, the patch left a possible race where the VMCS APIC address cache is updated *before* it is unmapped: (Invalidator) kvm_mmu_notifier_invalidate_range_start() (Invalidator) kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD) (KVM VCPU) vcpu_enter_guest() (KVM VCPU) kvm_vcpu_reload_apic_access_page() (Invalidator) actually unmap page Because of the above race, there can be a mismatch between the host physical address stored in the APIC_ACCESS_PAGE VMCS field and the host physical address stored in the EPT entry for the APIC GPA (0xfee0000). When this happens, the processor will not trap APIC accesses, and will instead show the raw contents of the APIC-access page. Because Windows OS periodically checks for unexpected modifications to the LAPIC register, this will show up as a BSOD crash with BugCheck CRITICAL_STRUCTURE_CORRUPTION (109) we are currently seeing in https://bugzilla.redhat.com/show_bug.cgi?id=1751017. The root cause of the issue is that kvm_arch_mmu_notifier_invalidate_range() cannot guarantee that no additional references are taken to the pages in the range before kvm_mmu_notifier_invalidate_range_end(). Fortunately, this case is supported by the MMU notifier API, as documented in include/linux/mmu_notifier.h: * If the subsystem * can't guarantee that no additional references are taken to * the pages in the range, it has to implement the * invalidate_range() notifier to remove any references taken * after invalidate_range_start(). The fix therefore is to reload the APIC-access page field in the VMCS from kvm_mmu_notifier_invalidate_range() instead of ..._range_start(). Cc: stable@vger.kernel.org Fixes: b1394e74 ("KVM: x86: fix APIC page invalidation") Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=197951 Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com> Message-Id: <20200606042627.61070-1-eiichi.tsukata@nutanix.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Felipe Franciosi authored
commit 384dea1c upstream. When userspace configures KVM_GUESTDBG_SINGLESTEP, KVM will manage the presence of X86_EFLAGS_TF via kvm_set/get_rflags on vcpus. The actual rflag bit is therefore hidden from callers. That includes init_emulate_ctxt() which uses the value returned from kvm_get_flags() to set ctxt->tf. As a result, x86_emulate_instruction() will skip a single step, leaving singlestep_rip stale and not returning to userspace. This resolves the issue by observing the vcpu guest_debug configuration alongside ctxt->tf in x86_emulate_instruction(), performing the single step if set. Cc: stable@vger.kernel.org Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Message-Id: <20200519081048.8204-1-felipe@nutanix.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sean Christopherson authored
commit 6129ed87 upstream. Set the mmio_value to '0' instead of simply clearing the present bit to squash a benign warning in kvm_mmu_set_mmio_spte_mask() that complains about the mmio_value overlapping the lower GFN mask on systems with 52 bits of PA space. Opportunistically clean up the code and comments. Cc: stable@vger.kernel.org Fixes: d43e2675 ("KVM: x86: only do L1TF workaround on affected processors") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200527084909.23492-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paolo Bonzini authored
commit df2a69af upstream. The migration functionality was left incomplete in commit 5ef8acbd ("KVM: nVMX: Emulate MTF when performing instruction emulation", 2020-02-23), fix it. Fixes: 5ef8acbd ("KVM: nVMX: Emulate MTF when performing instruction emulation") Cc: stable@vger.kernel.org Reviewed-by: Oliver Upton <oupton@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kan Liang authored
commit 0813c405 upstream. The mask in the extra_regs for Intel Tremont need to be extended to allow more defined bits. "Outstanding Requests" (bit 63) is only available on MSR_OFFCORE_RSP0; Fixes: 6daeb873 ("perf/x86/intel: Add Tremont core PMU support") Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200501125442.7030-1-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hill Ma authored
commit 140fd4ac upstream. On MacBook6,1 reboot would hang unless parameter reboot=pci is added. Make it automatic. Signed-off-by: Hill Ma <maahiuzeon@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200425200641.GA1554@cslab.localdomain Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Anthony Steinhauser authored
commit 4d8df8cb upstream. Currently, it is possible to enable indirect branch speculation even after it was force-disabled using the PR_SPEC_FORCE_DISABLE option. Moreover, the PR_GET_SPECULATION_CTRL command gives afterwards an incorrect result (force-disabled when it is in fact enabled). This also is inconsistent vs. STIBP and the documention which cleary states that PR_SPEC_FORCE_DISABLE cannot be undone. Fix this by actually enforcing force-disabled indirect branch speculation. PR_SPEC_ENABLE called after PR_SPEC_FORCE_DISABLE now fails with -EPERM as described in the documentation. Fixes: 9137bb27 ("x86/speculation: Add prctl() control for indirect branch speculation") Signed-off-by: Anthony Steinhauser <asteinhauser@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Anthony Steinhauser authored
commit 21998a35 upstream. When STIBP is unavailable or enhanced IBRS is available, Linux force-disables the IBPB mitigation of Spectre-BTB even when simultaneous multithreading is disabled. While attempts to enable IBPB using prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, ...) fail with EPERM, the seccomp syscall (or its prctl(PR_SET_SECCOMP, ...) equivalent) which are used e.g. by Chromium or OpenSSH succeed with no errors but the application remains silently vulnerable to cross-process Spectre v2 attacks (classical BTB poisoning). At the same time the SYSFS reporting (/sys/devices/system/cpu/vulnerabilities/spectre_v2) displays that IBPB is conditionally enabled when in fact it is unconditionally disabled. STIBP is useful only when SMT is enabled. When SMT is disabled and STIBP is unavailable, it makes no sense to force-disable also IBPB, because IBPB protects against cross-process Spectre-BTB attacks regardless of the SMT state. At the same time since missing STIBP was only observed on AMD CPUs, AMD does not recommend using STIBP, but recommends using IBPB, so disabling IBPB because of missing STIBP goes directly against AMD's advice: https://developer.amd.com/wp-content/resources/Architecture_Guidelines_Update_Indirect_Branch_Control.pdf Similarly, enhanced IBRS is designed to protect cross-core BTB poisoning and BTB-poisoning attacks from user space against kernel (and BTB-poisoning attacks from guest against hypervisor), it is not designed to prevent cross-process (or cross-VM) BTB poisoning between processes (or VMs) running on the same core. Therefore, even with enhanced IBRS it is necessary to flush the BTB during context-switches, so there is no reason to force disable IBPB when enhanced IBRS is available. Enable the prctl control of IBPB even when STIBP is unavailable or enhanced IBRS is available. Fixes: 7cc765a6 ("x86/speculation: Enable prctl mode for spectre_v2_user") Signed-off-by: Anthony Steinhauser <asteinhauser@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Anthony Steinhauser authored
commit dbbe2ad0 upstream. On context switch the change of TIF_SSBD and TIF_SPEC_IB are evaluated to adjust the mitigations accordingly. This is optimized to avoid the expensive MSR write if not needed. This optimization is buggy and allows an attacker to shutdown the SSBD protection of a victim process. The update logic reads the cached base value for the speculation control MSR which has neither the SSBD nor the STIBP bit set. It then OR's the SSBD bit only when TIF_SSBD is different and requests the MSR update. That means if TIF_SSBD of the previous and next task are the same, then the base value is not updated, even if TIF_SSBD is set. The MSR write is not requested. Subsequently if the TIF_STIBP bit differs then the STIBP bit is updated in the base value and the MSR is written with a wrong SSBD value. This was introduced when the per task/process conditional STIPB switching was added on top of the existing SSBD switching. It is exploitable if the attacker creates a process which enforces SSBD and has the contrary value of STIBP than the victim process (i.e. if the victim process enforces STIBP, the attacker process must not enforce it; if the victim process does not enforce STIBP, the attacker process must enforce it) and schedule it on the same core as the victim process. If the victim runs after the attacker the victim becomes vulnerable to Spectre V4. To fix this, update the MSR value independent of the TIF_SSBD difference and dependent on the SSBD mitigation method available. This ensures that a subsequent STIPB initiated MSR write has the correct state of SSBD. [ tglx: Handle X86_FEATURE_VIRT_SSBD & X86_FEATURE_VIRT_SSBD correctly and massaged changelog ] Fixes: 5bfbe3ad ("x86/speculation: Prepare for per task indirect branch speculation control") Signed-off-by: Anthony Steinhauser <asteinhauser@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Xiaochun Lee authored
commit 1574051e upstream. The Intel C620 Platform Controller Hub has MROM functions that have non-PCI registers (undocumented in the public spec) where BAR 0 is supposed to be, which results in messages like this: pci 0000:00:11.0: [Firmware Bug]: reg 0x30: invalid BAR (can't size) Mark these MROM functions as having non-compliant BARs so we don't try to probe any of them. There are no other BARs on these devices. See the Intel C620 Series Chipset Platform Controller Hub Datasheet, May 2019, Document Number 336067-007US, sec 2.1, 35.5, 35.6. [bhelgaas: commit log, add 0xa26d] Link: https://lore.kernel.org/r/1589513467-17070-1-git-send-email-lixiaochun.2888@163.com Signed-off-by: Xiaochun Lee <lixc17@lenovo.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Steven Price authored
commit 1494e0c3 upstream. Patch series "Fix W+X debug feature on x86" Jan alerted me[1] that the W+X detection debug feature was broken in x86 by my change[2] to switch x86 to use the generic ptdump infrastructure. Fundamentally the approach of trying to move the calculation of effective permissions into note_page() was broken because note_page() is only called for 'leaf' entries and the effective permissions are passed down via the internal nodes of the page tree. The solution I've taken here is to create a new (optional) callback which is called for all nodes of the page tree and therefore can calculate the effective permissions. Secondly on some configurations (32 bit with PAE) "unsigned long" is not large enough to store the table entries. The fix here is simple - let's just use a u64. [1] https://lore.kernel.org/lkml/d573dc7e-e742-84de-473d-f971142fa319@suse.com/ [2] 2ae27137 ("x86: mm: convert dump_pagetables to use walk_page_range") This patch (of 2): By switching the x86 page table dump code to use the generic code the effective permissions are no longer calculated correctly because the note_page() function is only called for *leaf* entries. To calculate the actual effective permissions it is necessary to observe the full hierarchy of the page tree. Introduce a new callback for ptdump which is called for every entry and can therefore update the prot_levels array correctly. note_page() can then simply access the appropriate element in the array. [steven.price@arm.com: make the assignment conditional on val != 0] Link: http://lkml.kernel.org/r/430c8ab4-e7cd-6933-dde6-087fac6db872@arm.com Fixes: 2ae27137 ("x86: mm: convert dump_pagetables to use walk_page_range") Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Qian Cai <cai@lca.pw> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200521152308.33096-1-steven.price@arm.com Link: http://lkml.kernel.org/r/20200521152308.33096-2-steven.price@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Bob Haarman authored
commit d8ad6d39 upstream. 'jiffies' and 'jiffies_64' are meant to alias (two different symbols that share the same address). Most architectures make the symbols alias to the same address via a linker script assignment in their arch/<arch>/kernel/vmlinux.lds.S: jiffies = jiffies_64; which is effectively a definition of jiffies. jiffies and jiffies_64 are both forward declared for all architectures in include/linux/jiffies.h. jiffies_64 is defined in kernel/time/timer.c. x86_64 was peculiar in that it wasn't doing the above linker script assignment, but rather was: 1. defining jiffies in arch/x86/kernel/time.c instead via the linker script. 2. overriding the symbol jiffies_64 from kernel/time/timer.c in arch/x86/kernel/vmlinux.lds.s via 'jiffies_64 = jiffies;'. As Fangrui notes: In LLD, symbol assignments in linker scripts override definitions in object files. GNU ld appears to have the same behavior. It would probably make sense for LLD to error "duplicate symbol" but GNU ld is unlikely to adopt for compatibility reasons. This results in an ODR violation (UB), which seems to have survived thus far. Where it becomes harmful is when; 1. -fno-semantic-interposition is used: As Fangrui notes: Clang after LLVM commit 5b22bcc2b70d ("[X86][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$local") defaults to -fno-semantic-interposition similar semantics which help -fpic/-fPIC code avoid GOT/PLT when the referenced symbol is defined within the same translation unit. Unlike GCC -fno-semantic-interposition, Clang emits such relocations referencing local symbols for non-pic code as well. This causes references to jiffies to refer to '.Ljiffies$local' when jiffies is defined in the same translation unit. Likewise, references to jiffies_64 become references to '.Ljiffies_64$local' in translation units that define jiffies_64. Because these differ from the names used in the linker script, they will not be rewritten to alias one another. 2. Full LTO Full LTO effectively treats all source files as one translation unit, causing these local references to be produced everywhere. When the linker processes the linker script, there are no longer any references to jiffies_64' anywhere to replace with 'jiffies'. And thus '.Ljiffies$local' and '.Ljiffies_64$local' no longer alias at all. In the process of porting patches enabling Full LTO from arm64 to x86_64, spooky bugs have been observed where the kernel appeared to boot, but init doesn't get scheduled. Avoid the ODR violation by matching other architectures and define jiffies only by linker script. For -fno-semantic-interposition + Full LTO, there is no longer a global definition of jiffies for the compiler to produce a local symbol which the linker script won't ensure aliases to jiffies_64. Fixes: 40747ffa ("asmlinkage: Make jiffies visible") Reported-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: Alistair Delva <adelva@google.com> Debugged-by: Nick Desaulniers <ndesaulniers@google.com> Debugged-by: Sami Tolvanen <samitolvanen@google.com> Suggested-by: Fangrui Song <maskray@google.com> Signed-off-by: Bob Haarman <inglorion@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # build+boot on Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: stable@vger.kernel.org Link: https://github.com/ClangBuiltLinux/linux/issues/852 Link: https://lkml.kernel.org/r/20200602193100.229287-1-inglorion@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paolo Bonzini authored
[ Upstream commit d43e2675 ] KVM stores the gfn in MMIO SPTEs as a caching optimization. These are split in two parts, as in "[high 11111 low]", to thwart any attempt to use these bits in an L1TF attack. This works as long as there are 5 free bits between MAXPHYADDR and bit 50 (inclusive), leaving bit 51 free so that the MMIO access triggers a reserved-bit-set page fault. The bit positions however were computed wrongly for AMD processors that have encryption support. In this case, x86_phys_bits is reduced (for example from 48 to 43, to account for the C bit at position 47 and four bits used internally to store the SEV ASID and other stuff) while x86_cache_bits in would remain set to 48, and _all_ bits between the reduced MAXPHYADDR and bit 51 are set. Then low_phys_bits would also cover some of the bits that are set in the shadow_mmio_value, terribly confusing the gfn caching mechanism. To fix this, avoid splitting gfns as long as the processor does not have the L1TF bug (which includes all AMD processors). When there is no splitting, low_phys_bits can be set to the reduced MAXPHYADDR removing the overlap. This fixes "npt=0" operation on EPYC processors. Thanks to Maxim Levitsky for bisecting this bug. Cc: stable@vger.kernel.org Fixes: 52918ed5 ("KVM: SVM: Override default MMIO mask if memory encryption is enabled") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Kim Phillips authored
[ Upstream commit e2abfc04 ] Commit 21b5ee59 ("x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF") mistakenly added erratum #1054 as an OS Visible Workaround (OSVW) ID 0. Erratum #1054 is not OSVW ID 0 [1], so make it a legacy erratum. There would never have been a false positive on older hardware that has OSVW bit 0 set, since the IRPERF feature was not available. However, save a couple of RDMSR executions per thread, on modern system configurations that correctly set non-zero values in their OSVW_ID_Length MSRs. [1] Revision Guide for AMD Family 17h Models 00h-0Fh Processors. The revision guide is available from the bugzilla link below. Fixes: 21b5ee59 ("x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200417143356.26054-1-kim.phillips@amd.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Petr Tesarik authored
[ Upstream commit e1750a3d ] After disabling a function, the original handle is logged instead of the disabled handle. Link: https://lkml.kernel.org/r/20200522183922.5253-1-ptesarik@suse.com Fixes: 17cdec96 ("s390/pci: Recover handle in clp_set_pci_fn()") Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Petr Tesarik <ptesarik@suse.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Cédric Le Goater authored
[ Upstream commit a101950f ] Commit 1ca3dec2 ("powerpc/xive: Prevent page fault issues in the machine crash handler") fixed an issue in the FW assisted dump of machines using hash MMU and the XIVE interrupt mode under the POWER hypervisor. It forced the mapping of the ESB page of interrupts being mapped in the Linux IRQ number space to make sure the 'crash kexec' sequence worked during such an event. But it didn't handle the un-mapping. This mapping is now blocking the removal of a passthrough IO adapter under the POWER hypervisor because it expects the guest OS to have cleared all page table entries related to the adapter. If some are still present, the RTAS call which isolates the PCI slot returns error 9001 "valid outstanding translations". Remove these mapping in the IRQ data cleanup routine. Under KVM, this cleanup is not required because the ESB pages for the adapter interrupts are un-mapped from the guest by the hypervisor in the KVM XIVE native device. This is now redundant but it's harmless. Fixes: 1ca3dec2 ("powerpc/xive: Prevent page fault issues in the machine crash handler") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200429075122.1216388-2-clg@kaod.org Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Guo Ren authored
[ Upstream commit e0bbb538 ] Current implementation could destory a4 & a5 when strace, so we need to get them from pt_regs by SAVE_ALL. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Fredrik Strupe authored
[ Upstream commit 3866f217 ] call_undef_hook() in traps.c applies the same instr_mask for both 16-bit and 32-bit thumb instructions. If instr_mask then is only 16 bits wide (0xffff as opposed to 0xffffffff), the first half-word of 32-bit thumb instructions will be masked out. This makes the function match 32-bit thumb instructions where the second half-word is equal to instr_val, regardless of the first half-word. The result in this case is that all undefined 32-bit thumb instructions with the second half-word equal to 0xde01 (udf #1) work as breakpoints and will raise a SIGTRAP instead of a SIGILL, instead of just the one intended 16-bit instruction. An example of such an instruction is 0xeaa0de01, which is unallocated according to Arm ARM and should raise a SIGILL, but instead raises a SIGTRAP. This patch fixes the issue by setting all the bits in instr_mask, which will still match the intended 16-bit thumb instruction (where the upper half is always 0), but not any 32-bit thumb instructions. Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Fredrik Strupe <fredrik@strupe.net> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Fangrui Song authored
commit 90ceddcb upstream. Simplify gen_btf logic to make it work with llvm-objcopy. The existing 'file format' and 'architecture' parsing logic is brittle and does not work with llvm-objcopy/llvm-objdump. 'file format' output of llvm-objdump>=11 will match GNU objdump, but 'architecture' (bfdarch) may not. .BTF in .tmp_vmlinux.btf is non-SHF_ALLOC. Add the SHF_ALLOC flag because it is part of vmlinux image used for introspection. C code can reference the section via linker script defined __start_BTF and __stop_BTF. This fixes a small problem that previous .BTF had the SHF_WRITE flag (objcopy -I binary -O elf* synthesized .data). Additionally, `objcopy -I binary` synthesized symbols _binary__btf_vmlinux_bin_start and _binary__btf_vmlinux_bin_stop (not used elsewhere) are replaced with more commonplace __start_BTF and __stop_BTF. Add 2>/dev/null because GNU objcopy (but not llvm-objcopy) warns "empty loadable segment detected at vaddr=0xffffffff81000000, is this intentional?" We use a dd command to change the e_type field in the ELF header from ET_EXEC to ET_REL so that lld will accept .btf.vmlinux.bin.o. Accepting ET_EXEC as an input file is an extremely rare GNU ld feature that lld does not intend to support, because this is error-prone. The output section description .BTF in include/asm-generic/vmlinux.lds.h avoids potential subtle orphan section placement issues and suppresses --orphan-handling=warn warnings. Fixes: df786c9b ("bpf: Force .BTF section start to zero when dumping from vmlinux") Fixes: cb0cc635 ("powerpc: Include .BTF section") Reported-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Fangrui Song <maskray@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Stanislav Fomichev <sdf@google.com> Tested-by: Andrii Nakryiko <andriin@fb.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Link: https://github.com/ClangBuiltLinux/linux/issues/871 Link: https://lore.kernel.org/bpf/20200318222746.173648-1-maskray@google.com Signed-off-by: Maria Teguiani <teguiani@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-