Skip to content
Commit e1b1eb01 authored by Russ Anderson's avatar Russ Anderson Committed by Tony Luck
Browse files

[IA64] Fix race when multiple cpus go through MCA



Additional testing uncovered a situation where the MCA recovery code could
hang due to a race condition.

According to the SAL spec, SAL sends a rendezvous interrupt to all but the first
CPU that goes into MCA.  This includes other CPUs that go into MCA at the same
time.  Those other CPUs will go into the linux MCA handler (rather than the
slave loop) with the rendezvous interrupt pending.  When all the CPUs have
completed MCA processing and the last monarch completes, freeing all the CPUs,
the CPUs with the pended rendezvous interrupt then go into the
ia64_mca_rendez_int_handler().  In ia64_mca_rendez_int_handler() the CPUs
get marked as rendezvoused, but then leave the handler (due to no MCA).
That leaves the CPUs marked as rendezvoused _before_ the next MCA event.

When the next MCA hits, the monarch will mistakenly believe that all the CPUs
are rendezvoused when they are not, opening up a window where a CPU can get
stuck in the slave loop.

This patch avoids leaving CPUs marked as rendezvoused when they are not.

Signed-off-by: default avatarRuss Anderson <rja@sgi.com>
Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
parent 2bc5c282
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment