Skip to content
  1. Nov 09, 2023
    • Zhang Rui's avatar
      x86/acpi: Ignore invalid x2APIC entries · ec9aedb2
      Zhang Rui authored
      Currently, the kernel enumerates the possible CPUs by parsing both ACPI
      MADT Local APIC entries and x2APIC entries. So CPUs with "valid" APIC IDs,
      even if they have duplicated APIC IDs in Local APIC and x2APIC, are always
      enumerated.
      
      Below is what ACPI MADT Local APIC and x2APIC describes on an
      Ivebridge-EP system,
      
      [02Ch 0044   1]                Subtable Type : 00 [Processor Local APIC]
      [02Fh 0047   1]                Local Apic ID : 00
      ...
      [164h 0356   1]                Subtable Type : 00 [Processor Local APIC]
      [167h 0359   1]                Local Apic ID : 39
      [16Ch 0364   1]                Subtable Type : 00 [Processor Local APIC]
      [16Fh 0367   1]                Local Apic ID : FF
      ...
      [3ECh 1004   1]                Subtable Type : 09 [Processor Local x2APIC]
      [3F0h 1008   4]                Processor x2Apic ID : 00000000
      ...
      [B5Ch 2908   1]                Subtable Type : 09 [Processor Local x2APIC]
      [B60h 2912   4]                Processor x2Apic ID : 00000077
      
      As a result, kernel shows "smpboot: Allowing 168 CPUs, 120 hotplug CPUs".
      And this wastes significant amount of memory for the per-cpu data.
      Plus this also breaks https://lore.kernel.org/all/87edm36qqb.ffs@tglx/,
      because __max_logical_packages is over-estimated by the APIC IDs in
      the x2APIC entries.
      
      According to https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#processor-local-x2apic-structure
      
      :
      
        "[Compatibility note] On some legacy OSes, Logical processors with APIC
         ID values less than 255 (whether in XAPIC or X2APIC mode) must use the
         Processor Local APIC structure to convey their APIC information to OSPM,
         and those processors must be declared in the DSDT using the Processor()
         keyword. Logical processors with APIC ID values 255 and greater must use
         the Processor Local x2APIC structure and be declared using the Device()
         keyword."
      
      Therefore prevent the registration of x2APIC entries with an APIC ID less
      than 255 if the local APIC table enumerates valid APIC IDs.
      
      [ tglx: Simplify the logic ]
      
      Signed-off-by: default avatarZhang Rui <rui.zhang@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230702162802.344176-1-rui.zhang@intel.com
      ec9aedb2
    • Rick Edgecombe's avatar
      x86/shstk: Delay signal entry SSP write until after user accesses · 31255e07
      Rick Edgecombe authored
      When a signal is being delivered, the kernel needs to make accesses to
      userspace. These accesses could encounter an access error, in which case
      the signal delivery itself will trigger a segfault. Usually this would
      result in the kernel killing the process. But in the case of a SEGV signal
      handler being configured, the failure of the first signal delivery will
      result in *another* signal getting delivered. The second signal may
      succeed if another thread has resolved the issue that triggered the
      segfault (i.e. a well timed mprotect()/mmap()), or the second signal is
      being delivered to another stack (i.e. an alt stack).
      
      On x86, in the non-shadow stack case, all the accesses to userspace are
      done before changes to the registers (in pt_regs). The operation is
      aborted when an access error occurs, so although there may be writes done
      for the first signal, control flow changes for the signal (regs->ip,
      regs->sp, etc) are not committed until all the accesses have already
      completed successfully. This means that the second signal will be
      delivered as if it happened at the time of the first signal. It will
      effectively replace the first aborted signal, overwriting the half-written
      frame of the aborted signal. So on sigreturn from the second signal,
      control flow will resume happily from the point of control flow where the
      original signal was delivered.
      
      The problem is, when shadow stack is active, the shadow stack SSP
      register/MSR is updated *before* some of the userspace accesses. This
      means if the earlier accesses succeed and the later ones fail, the second
      signal will not be delivered at the same spot on the shadow stack as the
      first one. So on sigreturn from the second signal, the SSP will be
      pointing to the wrong location on the shadow stack (off by a frame).
      
      Pengfei privately reported that while using a shadow stack enabled glibc,
      the “signal06” test in the LTP test-suite hung. It turns out it is
      testing the above described double signal scenario. When this test was
      compiled with shadow stack, the first signal pushed a shadow stack
      sigframe, then the second pushed another. When the second signal was
      handled, the SSP was at the first shadow stack signal frame instead of
      the original location. The test then got stuck as the #CP from the twice
      incremented SSP was incorrect and generated segfaults in a loop.
      
      Fix this by adjusting the SSP register only after any userspace accesses,
      such that there can be no failures after the SSP is adjusted. Do this by
      moving the shadow stack sigframe push logic to happen after all other
      userspace accesses.
      
      Note, sigreturn (as opposed to the signal delivery dealt with in this
      patch) has ordering behavior that could lead to similar failures. The
      ordering issues there extend beyond shadow stack to include the alt stack
      restoration. Fixing that would require cross-arch changes, and the
      ordering today does not cause any known test or apps breakages. So leave
      it as is, for now.
      
      [ dhansen: minor changelog/subject tweak ]
      
      Fixes: 05e36022
      
       ("x86/shstk: Handle signals for shadow stack")
      Reported-by: default avatarPengfei Xu <pengfei.xu@intel.com>
      Signed-off-by: default avatarRick Edgecombe <rick.p.edgecombe@intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Tested-by: default avatarPengfei Xu <pengfei.xu@intel.com>
      Cc:stable@vger.kernel.org
      Link: https://lore.kernel.org/all/20231107182251.91276-1-rick.p.edgecombe%40intel.com
      Link: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/signal/signal06.c
      31255e07
  2. Nov 06, 2023
  3. Nov 04, 2023
  4. Oct 30, 2023
  5. Oct 29, 2023
  6. Oct 28, 2023
  7. Oct 27, 2023
  8. Oct 26, 2023