Commits · eb5c3f1c86470fc1a57ab28cce15c12e4d6cdf8b · 方亚芬 / linux

Nov 06, 2017

powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint · eb5c3f1c

Cyril Bur authored Nov 02, 2017

Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

tm_reclaim() has optimisations to not always save the FP/Altivec
registers to the checkpointed save area. This was originally done
because the caller might have information that the checkpointed
registers aren't valid due to lazy save and restore. We've also been a
little vague as to how tm_reclaim() leaves the FP/Altivec state since it
doesn't necessarily always save it to the thread struct. This has lead
to an (incorrect) assumption that it leaves the checkpointed state on
the CPU.

tm_recheckpoint() has similar optimisations in reverse. It may not
always reload the checkpointed FP/Altivec registers from the thread
struct before the trecheckpoint. It is therefore quite unclear where it
expects to get the state from. This didn't help with the assumption
made about tm_reclaim().

These optimisations sit in what is by definition a slow path. If a
process has to go through a reclaim/recheckpoint then its transaction
will be doomed on returning to userspace. This mean that the process
will be unable to complete its transaction and be forced to its failure
handler. This is already an out if line case for userspace. Furthermore,
the cost of copying 64 times 128 bits from registers isn't very long[0]
(at all) on modern processors. As such it appears these optimisations
have only served to increase code complexity and are unlikely to have
had a measurable performance impact.

Our transactional memory handling has been riddled with bugs. A cause
of this has been difficulty in following the code flow, code complexity
has not been our friend here. It makes sense to remove these
optimisations in favour of a (hopefully) more stable implementation.

This patch does mean that some times the assembly will needlessly save
'junk' registers which will subsequently get overwritten with the
correct value by the C code which calls the assembly function. This
small inefficiency is far outweighed by the reduction in complexity for
general TM code, context switching paths, and transactional facility
unavailable exception handler.

0: I tried to measure it once for other work and found that it was
hiding in the noise of everything else I was working with. I find it
exceedingly likely this will be the case here.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

eb5c3f1c

powerpc: Force reload for recheckpoint during tm {fp, vec, vsx} unavailable exception · 91381b9c

Cyril Bur authored Nov 02, 2017

Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

tm_reclaim() has optimisations to not always save the FP/Altivec
registers to the checkpointed save area. This was originally done
because the caller might have information that the checkpointed
registers aren't valid due to lazy save and restore. We've also been a
little vague as to how tm_reclaim() leaves the FP/Altivec state since it
doesn't necessarily always save it to the thread struct. This has lead
to an (incorrect) assumption that it leaves the checkpointed state on
the CPU.

tm_recheckpoint() has similar optimisations in reverse. It may not
always reload the checkpointed FP/Altivec registers from the thread
struct before the trecheckpoint. It is therefore quite unclear where it
expects to get the state from. This didn't help with the assumption
made about tm_reclaim().

This patch is a minimal fix for ease of backporting. A more correct fix
which removes the msr parameter to tm_reclaim() and tm_recheckpoint()
altogether has been upstreamed to apply on top of this patch.

Fixes: dc310669

 ("powerpc: tm: Always use fp_state and vr_state to
store live registers")

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

91381b9c

powerpc: Don't enable FP/Altivec if not checkpointed · a7771176

Cyril Bur authored Nov 02, 2017

Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

Lazy save and restore of FP/Altivec cannot be done if a process is
transactional. If a facility was enabled it must remain enabled whenever
a thread is transactional.

Commit dc16b553 ("powerpc: Always restore FPU/VEC/VSX if hardware
transactional memory in use") ensures that the facilities are always
enabled if a thread is transactional. A bug in the introduced code may
cause it to inadvertently enable a facility that was (and should remain)
disabled. The problem with this extraneous enablement is that the
registers for the erroneously enabled facility have not been correctly
recheckpointed - the recheckpointing code assumed the facility would
remain disabled.

Further compounding the issue, the transactional {fp,altivec,vsx}
unavailable code has been incorrectly using the MSR to enable
facilities. The presence of the {FP,VEC,VSX} bit in the regs->msr simply
means if the registers are live on the CPU, not if the kernel should
load them before returning to userspace. This has worked due to the bug
mentioned above.

This causes transactional threads which return to their failure handler
to observe incorrect checkpointed registers. Perhaps an example will
help illustrate the problem:

A userspace process is running and uses both FP and Altivec registers.
This process then continues to run for some time without touching
either sets of registers. The kernel subsequently disables the
facilities as part of lazy save and restore. The userspace process then
performs a tbegin and the CPU checkpoints 'junk' FP and Altivec
registers. The process then performs a floating point instruction
triggering a fp unavailable exception in the kernel.

The kernel then loads the FP registers - and only the FP registers.
Since the thread is transactional it must perform a reclaim and
recheckpoint to ensure both the checkpointed registers and the
transactional registers are correct. It then (correctly) enables
MSR[FP] for the process. Later (on exception exist) the kernel also
(inadvertently) enables MSR[VEC]. The process is then returned to
userspace.

Since the act of loading the FP registers doomed the transaction we know
CPU will fail the transaction, restore its checkpointed registers, and
return the process to its failure handler. The problem is that we're
now running with Altivec enabled and the 'junk' checkpointed registers
are restored. The kernel had only recheckpointed FP.

This patch solves this by only activating FP/Altivec if userspace was
using them when it entered the kernel and not simply if the process is
transactional.

Fixes: dc16b553

 ("powerpc: Always restore FPU/VEC/VSX if hardware
transactional memory in use")

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

a7771176

mtd: powernv_flash: Use opal_async_wait_response_interruptible() · 6f469b67

Cyril Bur authored Nov 03, 2017



The OPAL calls performed in this driver shouldn't be using
opal_async_wait_response() as this performs a wait_event() which, on
long running OPAL calls could result in hung task warnings. wait_event()
prevents timely signal delivery which is also undesirable.

This patch also attempts to quieten down the use of dev_err() when
errors haven't actually occurred and also to return better information up
the stack rather than always -EIO.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

6f469b67

powerpc/powernv: Add OPAL_BUSY to opal_error_code() · 77adbd22

Cyril Bur authored Nov 03, 2017



Also export opal_error_code() so that it can be used in modules

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

77adbd22

powerpc/opal: Add opal_async_wait_response_interruptible() to opal-async · 9aab2449

Cyril Bur authored Nov 03, 2017



This patch adds an _interruptible version of opal_async_wait_response().
This is useful when a long running OPAL call is performed on behalf of
a userspace thread, for example, the opal_flash_{read,write,erase}
functions performed by the powernv-flash MTD driver.

It is foreseeable that these functions would take upwards of two
minutes causing the wait_event() to block long enough to cause hung
task warnings. Furthermore, wait_event_interruptible() is preferable
as otherwise there is no way for signals to stop the process which is
going to be confusing in userspace.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

9aab2449

powernv/opal-sensor: remove not needed lock · 95e1bc1d

Stewart Smith authored Nov 03, 2017



Parallel sensor reads could run out of async tokens due to
opal_get_sensor_data grabbing tokens but then doing the sensor
read behind a mutex, essentially serializing the (possibly
asynchronous and relatively slow) sensor read.

It turns out that the mutex isn't needed at all, not only
should the OPAL interface allow concurrent reads, the implementation
is certainly safe for that, and if any sensor we were reading
from somewhere isn't, doing the mutual exclusion in the kernel
is the wrong place to do it, OPAL should be doing it for the kernel.

So, remove the mutex.

Additionally, we shouldn't be printing out an error when we don't
get a token as the only way this should happen is if we've been
interrupted in down_interruptible() on the semaphore.

Reported-by: Robert Lippert <rlippert@google.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

95e1bc1d

powerpc/opal: Rework the opal-async interface · 86cd6d98

Cyril Bur authored Nov 03, 2017



Future work will add an opal_async_wait_response_interruptible()
which will call wait_event_interruptible(). This work requires extra
token state to be tracked as wait_event_interruptible() can return and
the caller could release the token before OPAL responds.

Currently token state is tracked with two bitfields which are 64 bits
big but may not need to be as OPAL informs Linux how many async tokens
there are. It also uses an array indexed by token to store response
messages for each token.

The bitfields make it difficult to add more state and also provide a
hard maximum as to how many tokens there can be - it is possible that
OPAL will inform Linux that there are more than 64 tokens.

Rather than add a bitfield to track the extra state, rework the
internals slightly.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
[mpe: Fix __opal_async_get_token() when no tokens are free]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

86cd6d98

powerpc/opal: Make __opal_async_{get, release}_token() static · 59cf9a1c

Cyril Bur authored Nov 03, 2017



There are no callers of both __opal_async_get_token() and
__opal_async_release_token().

This patch also removes the possibility of "emergency through
synchronous call to __opal_async_get_token()" as such it makes more
sense to initialise opal_sync_sem for the maximum number of async
tokens.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

59cf9a1c

mtd: powernv_flash: Don't return -ERESTARTSYS on interrupted token acquisition · efe69414

Cyril Bur authored Nov 03, 2017



Because the MTD core might split up a read() or write() from userspace
into several calls to the driver, we may fail to get a token but already
have done some work, best to return -EINTR back to userspace and have
them decide what to do.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

efe69414

mtd: powernv_flash: Remove pointless goto in driver init · e32ec15a

Cyril Bur authored Nov 03, 2017



powernv_flash_probe() has pointless goto statements which jump to the
end of the function to simply return a variable. Rather than checking
for error and going to the label, just return the error as soon as it is
detected.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

e32ec15a

mtd: powernv_flash: Don't treat OPAL_SUCCESS as an error · 25ee52e6

Cyril Bur authored Nov 03, 2017



While this driver expects to interact asynchronously, OPAL is well
within its rights to return OPAL_SUCCESS to indicate that the operation
completed without the need for a callback. We shouldn't treat
OPAL_SUCCESS as an error rather we should wrap up and return promptly to
the caller.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

25ee52e6

mtd: powernv_flash: Use WARN_ON_ONCE() rather than BUG_ON() · 44e2aa2b

Cyril Bur authored Nov 03, 2017



BUG_ON() should be reserved in situations where we can not longer
guarantee the integrity of the system. In the case where
powernv_flash_async_op() receives an impossible op, we can still
guarantee the integrity of the system.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

44e2aa2b

powerpc/opal: Fix EBUSY bug in acquiring tokens · 71e24d77

William A. Kennington III authored Sep 22, 2017

The current code checks the completion map to look for the first token
that is complete. In some cases, a completion can come in but the
token can still be on lease to the caller processing the completion.
If this completed but unreleased token is the first token found in the
bitmap by another tasks trying to acquire a token, then the
__test_and_set_bit call will fail since the token will still be on
lease. The acquisition will then fail with an EBUSY.

This patch reorganizes the acquisition code to look at the
opal_async_token_map for an unleased token. If the token has no lease
it must have no outstanding completions so we should never see an
EBUSY, unless we have leased out too many tokens. Since
opal_async_get_token_inrerruptible is protected by a semaphore, we
will practically never see EBUSY anymore.

Fixes: 8d724823

 ("powerpc/powernv: Infrastructure to support OPAL async completion")
Signed-off-by: William A. Kennington III <wak@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

71e24d77

powerpc/eeh: Stop using do_gettimeofday() · edfd17ff

Arnd Bergmann authored Nov 04, 2017



This interface is inefficient and deprecated because of the y2038
overflow.

ktime_get_seconds() is an appropriate replacement here, since it
has sufficient granularity but is more efficient and uses monotonic
time.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

edfd17ff

cxl: Rework the implementation of cxl_stop_trace_psl9() · cbb55eeb

Vaibhav Jain authored Oct 11, 2017



Presently the PSL9 specific cxl_stop_trace_psl9() only stops the RX0
traces on the CXL adapter when a PSL error irq is triggered. The patch
updates the function to stop all the traces arrays and move them to
the FIN state. The implementation issues the mmio to TRACECFG register
to stop the trace array iff it already not in FIN state. This prevents
the issue of trace data being reset in case of multiple stop mmio
issued for a single trace array.

Also the patch does some refactoring of existing cxl_stop_trace_psl9()
and cxl_stop_trace_psl8() functions by moving them to 'pci.c' from
'debugfs.c' file and marking them as static.

Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

cbb55eeb

bpf: take advantage of stack_depth tracking in powerpc JIT · ac0761eb

Sandipan Das authored Sep 02, 2017



Take advantage of stack_depth tracking, originally introduced for
x64, in powerpc JIT as well. Round up allocated stack by 16 bytes
to make sure it stays aligned for functions called from JITed bpf
program.

Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

ac0761eb

powerpc/tm: Don't check for WARN in TM Bad Thing handling · 632f0574

Michael Ellerman authored Oct 12, 2017



Currently when we take a TM Bad Thing program check exception, we
search the bug table to see if the program check was generated by a
WARN/WARN_ON etc.

That makes no sense, the WARN macros use trap instructions, which
should never generate a TM Bad Thing exception. If they ever did that
would be a bug and we should oops.

We do have some hand-coded bugs in tm.S, using EMIT_BUG_ENTRY, but
those are all BUGs not WARNs, and they all use trap instructions
anyway. Almost certainly this check was incorrectly copied from the
REASON_TRAP handling in the same function.

Remove it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-By: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

632f0574

powerpc/mm: Add a CONFIG option to choose if radix is used by default · 1fd6c022

Michael Ellerman authored Oct 24, 2017



Currently if the hardware supports the radix MMU we will use
it, *unless* "disable_radix" is passed on the kernel command line.

However some users would like the reverse semantics. ie. The kernel
uses the hash MMU by default, unless radix is explicitly requested on
the command line.

So add a CONFIG option to choose whether we use radix by default or
not, and expand the disable_radix command line option to allow
"disable_radix=no" which *enables* radix.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

1fd6c022

powerpc/64s: Replace CONFIG_PPC_STD_MMU_64 with CONFIG_PPC_BOOK3S_64 · 4e003747

Michael Ellerman authored Oct 19, 2017

CONFIG_PPC_STD_MMU_64 indicates support for the "standard" powerpc MMU
on 64-bit CPUs. The "standard" MMU refers to the hash page table MMU
found in "server" processors, from IBM mainly.

Currently CONFIG_PPC_STD_MMU_64 is == CONFIG_PPC_BOOK3S_64. While it's
annoying to have two symbols that always have the same value, it's not
quite annoying enough to bother removing one.

However with the arrival of Power9, we now have the situation where
CONFIG_PPC_STD_MMU_64 is enabled, but the kernel is running using the
Radix MMU - *not* the "standard" MMU. So it is now actively confusing
to use it, because it implies that code is disabled or inactive when
the Radix MMU is in use, however that is not necessarily true.

So s/CONFIG_PPC_STD_MMU_64/CONFIG_PPC_BOOK3S_64/, and do some minor
formatting updates of some of the affected lines.

This will be a pain for backports, but c'est la vie.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

4e003747

powerpc/64: Free up CPU_FTR_ICSWX · c1807e3f

Michael Ellerman authored Oct 19, 2017

The last user of CPU_FTR_ICSWX was removed in commit
6ff4d3e9

 ("powerpc: Remove old unused icswx based coprocessor
support"), so free the bit up for future use.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

c1807e3f

powerpc/mm/hash: Add pr_fmt() to hash_utils64.c · 7f142661

Aneesh Kumar K.V authored Oct 16, 2017

Make the printks look a bit nicer by adding a prefix.

Radix config now do
radix-mmu: Page sizes from device-tree:
radix-mmu: Page size shift = 12 AP=0x0
radix-mmu: Page size shift = 16 AP=0x5
radix-mmu: Page size shift = 21 AP=0x1
radix-mmu: Page size shift = 30 AP=0x2

This patch update hash config to do similar dmesg output. With the patch we have

hash-mmu: Page sizes from device-tree:
hash-mmu: base_shift=12: shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0
hash-mmu: base_shift=12: shift=16, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=7
hash-mmu: base_shift=12: shift=24, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=56
hash-mmu: base_shift=16: shift=16, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=1
hash-mmu: base_shift=16: shift=24, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=8
hash-mmu: base_shift=20: shift=20, sllp=0x0111, avpnm=0x00000000, tlbiel=0, penc=2
hash-mmu: base_shift=24: shift=24, sllp=0x0100, avpnm=0x00000001, tlbiel=0, penc=0
hash-mmu: base_shift=34: shift=34, sllp=0x0120, avpnm=0x000007ff, tlbiel=0, penc=3

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

7f142661

powerpc/ipic: Fix status get and status clear · 6b148a7c

Christophe Leroy authored Oct 18, 2017



IPIC Status is provided by register IPIC_SERSR and not by IPIC_SERMR
which is the mask register.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

6b148a7c

powerpc/powernv: Reserve a hole which appears after enabling IOV · d6f934fd

Alexey Kardashevskiy authored Sep 27, 2017



In order to make generic IOV code work, the physical function IOV BAR
should start from offset of the first VF. Since M64 segments share
PE number space across PHB, and some PEs may be in use at the time
when IOV is enabled, the existing code shifts the IOV BAR to the index
of the first PE/VF. This creates a hole in IOMEM space which can be
potentially taken by some other device.

This reserves a temporary hole on a parent and releases it when IOV is
disabled; the temporary resources are stored in pci_dn to avoid
kmalloc/free.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

d6f934fd

powerpc/pseries/vio: Dispose of virq mapping on vdevice unregister · b8f89fea

Tyrel Datwyler authored Sep 28, 2017

When a vdevice is DLPAR removed from the system the vio subsystem
doesn't bother unmapping the virq from the irq_domain. As a result we
have a virq mapped to a hardware irq that is no longer valid for the
irq_domain. A side effect is that we are left with /proc/irq/<irq#>
affinity entries, and attempts to modify the smp_affinity of the irq
will fail.

In the following observed example the kernel log is spammed by
ics_rtas_set_affinity errors after the removal of a VSCSI adapter.
This is a result of irqbalance trying to adjust the affinity every 10
seconds.

  rpadlpar_io: slot U8408.E8E.10A7ACV-V5-C25 removed
  ics_rtas_set_affinity: ibm,set-xive irq=655385 returns -3
  ics_rtas_set_affinity: ibm,set-xive irq=655385 returns -3

This patch fixes the issue by calling irq_dispose_mapping() on the
virq of the viodev on unregister.

Fixes: f2ab6219

 ("powerpc/pseries: Add PFO support to the VIO bus")
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

b8f89fea

powerpc/64s/radix: Fix process table entry cache invalidation · 30b49ec7

Nicholas Piggin authored Oct 24, 2017



According to the architecture, the process table entry cache must be
flushed with tlbie RIC=2.

Currently the process table entry is set to invalid right before the
PID is returned to the allocator, with no invalidation. This works on
existing implementations that are known to not cache the process table
entry for any except the current PIDR.

It is architecturally correct and cleaner to invalidate with RIC=2
after clearing the process table entry and before the PID is returned
to the allocator. This can be done in arch_exit_mmap that runs before
the final flush, and to ensure the final flush (fullmm) is always a
RIC=2 variant.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

30b49ec7

powerpc/64s/radix: Improve preempt handling in TLB code · dffe8449

Nicholas Piggin authored Oct 24, 2017



Preempt should be consistently disabled for mm_is_thread_local tests,
so bring the rest of these under preempt_disable().

Preempt does not need to be disabled for the mm->context.id tests,
which allows simplification and removal of gotos.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

dffe8449

powerpc/powernv: Use FIXUP_ENDIAN_HV in OPAL return · 63c9d8a4

Nicholas Piggin authored Oct 23, 2017



Close the recoverability gap for OPAL calls by using FIXUP_ENDIAN_HV
in the return path.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

63c9d8a4

powerpc/book3s: Add an HV variant of FIXUP_ENDIAN that is recoverable · 8ca9c08d

Nicholas Piggin authored Oct 23, 2017



Add an HV variant of FIXUP_ENDIAN which uses HSRR[01] and does not
clear MSR[RI], which improves recoverability.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

8ca9c08d

powerpc/book3s: Use label for FIXUP_ENDIAN macro branch · f848ea7f

Nicholas Piggin authored Oct 23, 2017



Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

f848ea7f

powerpc/64: Fix latency tracing for lazy irq replay · ff967900

Nicholas Piggin authored Oct 21, 2017



When returning from an exception to a soft-enabled context, pending
IRQs are replayed but IRQ tracing is not reset, so a number of them
can get chained together into the same IRQ-disabled trace.

Fix this by having __check_irq_replay re-set IRQ trace. This is
conceptually where we respond to the next interrupt, so it fits the
semantics of the IRQ tracer.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

ff967900

KVM: PPC: Book3S HV: Handle host system reset in guest mode · 6de6638b

Nicholas Piggin authored Nov 05, 2017

If the host takes a system reset interrupt while a guest is running,
the CPU must exit the guest before processing the host exception
handler.

After this patch, taking a sysrq+x with a CPU running in a guest
gives a trace like this:

   cpu 0x27: Vector: 100 (System Reset) at [c000000fdf5776f0]
       pc: c008000010158b80: kvmppc_run_core+0x16b8/0x1ad0 [kvm_hv]
       lr: c008000010158b80: kvmppc_run_core+0x16b8/0x1ad0 [kvm_hv]
       sp: c000000fdf577850
      msr: 9000000002803033
     current = 0xc000000fdf4b1e00
     paca    = 0xc00000000fd4d680	 softe: 3	 irq_happened: 0x01
       pid   = 6608, comm = qemu-system-ppc
   Linux version 4.14.0-rc7-01489-g47e1893a404a-dirty #26 SMP
   [c000000fdf577a00] c008000010159dd4 kvmppc_vcpu_run_hv+0x3dc/0x12d0 [kvm_hv]
   [c000000fdf577b30] c0080000100a537c kvmppc_vcpu_run+0x44/0x60 [kvm]
   [c000000fdf577b60] c0080000100a1ae0 kvm_arch_vcpu_ioctl_run+0x118/0x310 [kvm]
   [c000000fdf577c00] c008000010093e98 kvm_vcpu_ioctl+0x530/0x7c0 [kvm]
   [c000000fdf577d50] c000000000357bf8 do_vfs_ioctl+0xd8/0x8c0
   [c000000fdf577df0] c000000000358448 SyS_ioctl+0x68/0x100
   [c000000fdf577e30] c00000000000b220 system_call+0x58/0x6c
   --- Exception: c01 (System Call) at 00007fff76868df0
   SP (7fff7069baf0) is in userspace

Fixes: e36d0a2e

 ("powerpc/powernv: Implement NMI IPI with OPAL_SIGNAL_SYSTEM_RESET")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

6de6638b

Oct 22, 2017

powerpc/pseries: Cleanup error handling in iommu_pseries_alloc_group() · 4dd9eab3

Markus Elfring authored Oct 18, 2017



Although kfree(NULL) is legal, it's a bit lazy to rely on that to
implement the error handling. So do it the normal Linux way using
labels for each failure path.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
[mpe: Squash a few patches and rewrite change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

4dd9eab3

powerpc-opal: Fix a typo in a comment line of two file headers · c28237f1

Markus Elfring authored Oct 17, 2017



Fix a word in these descriptions.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

c28237f1

powerpc/axonram: Drop unnecessary variable initialisation · a7cd4586

Markus Elfring authored Sep 05, 2017



The local variable "rc" will eventually be set only to an error code.
Thus omit the explicit initialisation at the beginning.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

a7cd4586

powerpc: dts: acadia: DT fix s/#interrupts-parent/#interrupt-parent/ · 3d2d4339

Geert Uytterhoeven authored Jun 02, 2017



Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

3d2d4339

powerpc/perf/hv-24x7: Fix incorrect comparison in memord · 05c14c03

Michael Ellerman authored Oct 09, 2017



In the hv-24x7 code there is a function memord() which tries to
implement a sort function return -1, 0, 1. However one of the
conditions is incorrect, such that it can never be true, because we
will have already returned.

I don't believe there is a bug in practice though, because the
comparisons are an optimisation prior to calling memcmp().

Fix it by swapping the second comparision, so it can be true.

Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

05c14c03

powerpc: Disable the fast-endian switch syscall by default · 727f1361

Michael Ellerman authored Oct 09, 2017

Back in 2008 we added support for "fast little-endian switch" in the
syscall path. This added a special case syscall number 0x1ebe, which
is caught very early in the system call exception and switches endian
with as little overhead as possible. See commit 745a14cc


("[POWERPC] Add fast little-endian switch system call") for full
details.

Although it is fast, it's also completely non standard. The "syscall
number" is out of the range of normal syscalls, it can't be traced or
audited, and it's a bit of a wart. To the best of our knowledge it was
only used by one program, now long since discontinued.

So in an effort to shake out any current users, put it behind a config
option, and make it default n. If anyone *is* using it they can
quickly reinstate it with a rebuild, and we can flip it to default y.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

727f1361

powerpc/64s: Move the two FAST_ENDIAN macros next to each other · 5c2511bf
Michael Ellerman authored Oct 09, 2017
```
So we can #ifdef them in the next patch.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
```
5c2511bf

powerpc/xmon: Add kstack base to paca dump · 90d64737

Michael Ellerman authored Oct 09, 2017



When dumping the paca in xmon we currently show kstack. Although it's
not hard it's a bit fiddly to work out what the bounds of the kernel
stack should be based on the kstack value.

To make life easier and "kstack_base" which is the base (lowest
address) of the kernel stack, eg:

 kstack               = 0xc0000000f1a7be30      (0x258)
 kstack_base          = 0xc0000000f1a78000

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

90d64737