Skip to content
  1. Oct 18, 2023
    • Nicholas Piggin's avatar
      powerpc/qspinlock: Fix stale propagated yield_cpu · f9bc9bbe
      Nicholas Piggin authored
      yield_cpu is a sample of a preempted lock holder that gets propagated
      back through the queue. Queued waiters use this to yield to the
      preempted lock holder without continually sampling the lock word (which
      would defeat the purpose of MCS queueing by bouncing the cache line).
      
      The problem is that yield_cpu can become stale. It can take some time to
      be passed down the chain, and if any queued waiter gets preempted then
      it will cease to propagate the yield_cpu to later waiters.
      
      This can result in yielding to a CPU that no longer holds the lock,
      which is bad, but particularly if it is currently in H_CEDE (idle),
      then it appears to be preempted and some hypervisors (PowerVM) can
      cause very long H_CONFER latencies waiting for H_CEDE wakeup. This
      results in latency spikes and hard lockups on oversubscribed
      partitions with lock contention.
      
      This is a minimal fix. Before yielding to yield_cpu, sample the lock
      word to confirm yield_cpu is still the owner, and bail out of it is not.
      
      Thanks to a bunch of people who reported this and tracked down the
      exact problem using tracepoints and dispatch trace logs.
      
      Fixes: 28db61e2
      
       ("powerpc/qspinlock: allow propagation of yield CPU down the queue")
      Cc: stable@vger.kernel.org # v6.2+
      Reported-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Reported-by: default avatarLaurent Dufour <ldufour@linux.ibm.com>
      Reported-by: default avatarShrikanth Hegde <sshegde@linux.vnet.ibm.com>
      Debugged-by: default avatar"Nysal Jan K.A" <nysal@linux.ibm.com>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Tested-by: default avatarShrikanth Hegde <sshegde@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20231016124305.139923-2-npiggin@gmail.com
      f9bc9bbe
    • Michael Ellerman's avatar
      powerpc/64s/radix: Don't warn on copros in radix__tlb_flush() · 20045f01
      Michael Ellerman authored
      Sachin reported a warning when running the inject-ra-err selftest:
      
        # selftests: powerpc/mce: inject-ra-err
        Disabling lock debugging due to kernel taint
        MCE: CPU19: machine check (Severe)  Real address Load/Store (foreign/control memory) [Not recovered]
        MCE: CPU19: PID: 5254 Comm: inject-ra-err NIP: [0000000010000e48]
        MCE: CPU19: Initiator CPU
        MCE: CPU19: Unknown
        ------------[ cut here ]------------
        WARNING: CPU: 19 PID: 5254 at arch/powerpc/mm/book3s64/radix_tlb.c:1221 radix__tlb_flush+0x160/0x180
        CPU: 19 PID: 5254 Comm: inject-ra-err Kdump: loaded Tainted: G   M        E      6.6.0-rc3-00055-g9ed22ae6be81 #4
        Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries
        ...
        NIP radix__tlb_flush+0x160/0x180
        LR  radix__tlb_flush+0x104/0x180
        Call Trace:
          radix__tlb_flush+0xf4/0x180 (unreliable)
          tlb_finish_mmu+0x15c/0x1e0
          exit_mmap+0x1a0/0x510
          __mmput+0x60/0x1e0
          exit_mm+0xdc/0x170
          do_exit+0x2bc/0x5a0
          do_group_exit+0x4c/0xc0
          sys_exit_group+0x28/0x30
          system_call_exception+0x138/0x330
          system_call_vectored_common+0x15c/0x2ec
      
      And bisected it to commit e43c0a0c ("powerpc/64s/radix: combine
      final TLB flush and lazy tlb mm shootdown IPIs"), which added a warning
      in radix__tlb_flush() if mm->context.copros is still elevated.
      
      However it's possible for the copros count to be elevated if a process
      exits without first closing file descriptors that are associated with a
      copro, eg. VAS.
      
      If the process exits with a VAS file still open, the release callback
      is queued up for exit_task_work() via:
        exit_files()
          put_files_struct()
            close_files()
              filp_close()
                fput()
      
      And called via:
        exit_task_work()
          ____fput()
            __fput()
              file->f_op->release(inode, file)
                coproc_release()
                  vas_user_win_ops->close_win()
                    vas_deallocate_window()
                      mm_context_remove_vas_window()
                        mm_context_remove_copro()
      
      But that is after exit_mm() has been called from do_exit() and triggered
      the warning.
      
      Fix it by dropping the warning, and always calling __flush_all_mm().
      
      In the normal case of no copros, that will result in a call to
      _tlbiel_pid(mm->context.id, RIC_FLUSH_ALL) just as the current code
      does.
      
      If the copros count is elevated then it will cause a global flush, which
      should flush translations from any copros. Note that the process table
      entry was cleared in arch_exit_mmap(), so copros should not be able to
      fetch any new translations.
      
      Fixes: e43c0a0c
      
       ("powerpc/64s/radix: combine final TLB flush and lazy tlb mm shootdown IPIs")
      Reported-by: default avatarSachin Sant <sachinp@linux.ibm.com>
      Closes: https://lore.kernel.org/all/A8E52547-4BF1-47CE-8AEA-BC5A9D7E3567@linux.ibm.com/
      
      
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: default avatarSachin Sant <sachinp@linux.ibm.com>
      Link: https://msgid.link/20231017121527.1574104-1-mpe@ellerman.id.au
      20045f01
  2. Oct 15, 2023
  3. Oct 11, 2023
    • Michael Ellerman's avatar
      powerpc/47x: Fix 47x syscall return crash · f0eee815
      Michael Ellerman authored
      Eddie reported that newer kernels were crashing during boot on his 476
      FSP2 system:
      
        kernel tried to execute user page (b7ee2000) - exploit attempt? (uid: 0)
        BUG: Unable to handle kernel instruction fetch
        Faulting instruction address: 0xb7ee2000
        Oops: Kernel access of bad area, sig: 11 [#1]
        BE PAGE_SIZE=4K FSP-2
        Modules linked in:
        CPU: 0 PID: 61 Comm: mount Not tainted 6.1.55-d23900f.ppcnf-fsp2 #1
        Hardware name: ibm,fsp2 476fpe 0x7ff520c0 FSP-2
        NIP:  b7ee2000 LR: 8c008000 CTR: 00000000
        REGS: bffebd83 TRAP: 0400   Not tainted (6.1.55-d23900f.ppcnf-fs p2)
        MSR:  00000030 <IR,DR>  CR: 00001000  XER: 20000000
        GPR00: c00110ac bffebe63 bffebe7e bffebe88 8c008000 00001000 00000d12 b7ee2000
        GPR08: 00000033 00000000 00000000 c139df10 48224824 1016c314 10160000 00000000
        GPR16: 10160000 10160000 00000008 00000000 10160000 00000000 10160000 1017f5b0
        GPR24: 1017fa50 1017f4f0 1017fa50 1017f740 1017f630 00000000 00000000 1017f4f0
        NIP [b7ee2000] 0xb7ee2000
        LR [8c008000] 0x8c008000
        Call Trace:
        Instruction dump:
        XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
        XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
        ---[ end trace 0000000000000000 ]---
      
      The problem is in ret_from_syscall where the check for
      icache_44x_need_flush is done. When the flush is needed the code jumps
      out-of-line to do the flush, and then intends to jump back to continue
      the syscall return.
      
      However the branch back to label 1b doesn't return to the correct
      location, instead branching back just prior to the return to userspace,
      causing bogus register values to be used by the rfi.
      
      The breakage was introduced by commit 6f76a011
      ("powerpc/syscall: implement system call entry/exit logic in C for PPC32") which
      inadvertently removed the "1" label and reused it elsewhere.
      
      Fix it by adding named local labels in the correct locations. Note that
      the return label needs to be outside the ifdef so that CONFIG_PPC_47x=n
      compiles.
      
      Fixes: 6f76a011
      
       ("powerpc/syscall: implement system call entry/exit logic in C for PPC32")
      Cc: stable@vger.kernel.org # v5.12+
      Reported-by: default avatarEddie James <eajames@linux.ibm.com>
      Tested-by: default avatarEddie James <eajames@linux.ibm.com>
      Link: https://lore.kernel.org/linuxppc-dev/fdaadc46-7476-9237-e104-1d2168526e72@linux.ibm.com/
      
      
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Link: https://msgid.link/20231010114750.847794-1-mpe@ellerman.id.au
      f0eee815
  4. Oct 10, 2023
  5. Oct 09, 2023
  6. Sep 30, 2023
    • Athira Rajeev's avatar
      powerpc/pseries: Remove unused r0 in the hcall tracing code · dfb5f8cb
      Athira Rajeev authored
      
      
      In the plpar_hcall trace code, currently we use r0
      to store the value of r4. But this value is not
      used subsequently in the code. Hence remove this unused
      save to r0 in plpar_hcall and plpar_hcall9
      
      Suggested-by: default avatarNaveen N Rao <naveen@kernel.org>
      Signed-off-by: default avatarAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20230929172337.7906-2-atrajeev@linux.vnet.ibm.com
      dfb5f8cb
    • Athira Rajeev's avatar
      powerpc/pseries: Fix STK_PARAM access in the hcall tracing code · 3b678768
      Athira Rajeev authored
      In powerpc pseries system, below behaviour is observed while
      enabling tracing on hcall:
        # cd /sys/kernel/debug/tracing/
        # cat events/powerpc/hcall_exit/enable
        0
        # echo 1 > events/powerpc/hcall_exit/enable
      
        # ls
        -bash: fork: Bad address
      
      Above is from power9 lpar with latest kernel. Past this, softlockup
      is observed. Initially while attempting via perf_event_open to
      use "PERF_TYPE_TRACEPOINT", kernel panic was observed.
      
      perf config used:
      ================
        memset(&pe[1],0,sizeof(struct perf_event_attr));
        pe[1].type=PERF_TYPE_TRACEPOINT;
        pe[1].size=96;
        pe[1].config=0x26ULL; /* 38 raw_syscalls/sys_exit */
        pe[1].sample_type=0; /* 0 */
        pe[1].read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP|0x10ULL; /* 1f */
        pe[1].inherit=1;
        pe[1].precise_ip=0; /* arbitrary skid */
        pe[1].wakeup_events=0;
        pe[1].bp_type=HW_BREAKPOINT_EMPTY;
        pe[1].config1=0x1ULL;
      
      Kernel panic logs:
      ==================
      
        Kernel attempted to read user page (8) - exploit attempt? (uid: 0)
        BUG: Kernel NULL pointer dereference on read at 0x00000008
        Faulting instruction address: 0xc0000000004c2814
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in: nfnetlink bonding tls rfkill sunrpc dm_service_time dm_multipath pseries_rng xts vmx_crypto xfs libcrc32c sd_mod t10_pi crc64_rocksoft crc64 sg ibmvfc scsi_transport_fc ibmveth dm_mirror dm_region_hash dm_log dm_mod fuse
        CPU: 0 PID: 1431 Comm: login Not tainted 6.4.0+ #1
        Hardware name: IBM,8375-42A POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.30 (VL950_892) hv:phyp pSeries
        NIP page_remove_rmap+0x44/0x320
        LR  wp_page_copy+0x384/0xec0
        Call Trace:
          0xc00000001416e400 (unreliable)
          wp_page_copy+0x384/0xec0
          __handle_mm_fault+0x9d4/0xfb0
          handle_mm_fault+0xf0/0x350
          ___do_page_fault+0x48c/0xc90
          hash__do_page_fault+0x30/0x70
          do_hash_fault+0x1a4/0x330
          data_access_common_virt+0x198/0x1f0
         --- interrupt: 300 at 0x7fffae971abc
      
      git bisect tracked this down to below commit:
      'commit baa49d81 ("powerpc/pseries: hvcall stack frame overhead")'
      
      This commit changed STACK_FRAME_OVERHEAD (112 ) to
      STACK_FRAME_MIN_SIZE (32 ) since 32 bytes is the minimum size
      for ELFv2 stack. With the latest kernel, when running on ELFv2,
      STACK_FRAME_MIN_SIZE is used to allocate stack size.
      
      During plpar_hcall_trace, first call is made to HCALL_INST_PRECALL
      which saves the registers and allocates new stack frame. In the
      plpar_hcall_trace code, STK_PARAM is accessed at two places.
        1. To save r4: std     r4,STK_PARAM(R4)(r1)
        2. To access r4 back: ld      r12,STK_PARAM(R4)(r1)
      
      HCALL_INST_PRECALL precall allocates a new stack frame. So all
      the stack parameter access after the precall, needs to be accessed
      with +STACK_FRAME_MIN_SIZE. So the store instruction should be:
        std     r4,STACK_FRAME_MIN_SIZE+STK_PARAM(R4)(r1)
      
      If the "std" is not updated with STACK_FRAME_MIN_SIZE, we will
      end up with overwriting stack contents and cause corruption.
      But instead of updating 'std', we can instead remove it since
      HCALL_INST_PRECALL already saves it to the correct location.
      
      similarly load instruction should be:
        ld      r12,STACK_FRAME_MIN_SIZE+STK_PARAM(R4)(r1)
      
      Fix the load instruction to correctly access the stack parameter
      with +STACK_FRAME_MIN_SIZE and remove the store of r4 since the
      precall saves it correctly.
      
      Cc: stable@vger.kernel.org # v6.2+
      Fixes: baa49d81
      
       ("powerpc/pseries: hvcall stack frame overhead")
      Co-developed-by: default avatarNaveen N Rao <naveen@kernel.org>
      Signed-off-by: default avatarNaveen N Rao <naveen@kernel.org>
      Signed-off-by: default avatarAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20230929172337.7906-1-atrajeev@linux.vnet.ibm.com
      3b678768
  7. Sep 22, 2023
  8. Sep 18, 2023
  9. Sep 17, 2023
    • Song Liu's avatar
      x86/purgatory: Remove LTO flags · 75b2f7e4
      Song Liu authored
      -flto* implies -ffunction-sections. With LTO enabled, ld.lld generates
      multiple .text sections for purgatory.ro:
      
        $ readelf -S purgatory.ro  | grep " .text"
          [ 1] .text             PROGBITS         0000000000000000  00000040
          [ 7] .text.purgatory   PROGBITS         0000000000000000  000020e0
          [ 9] .text.warn        PROGBITS         0000000000000000  000021c0
          [13] .text.sha256_upda PROGBITS         0000000000000000  000022f0
          [15] .text.sha224_upda PROGBITS         0000000000000000  00002be0
          [17] .text.sha256_fina PROGBITS         0000000000000000  00002bf0
          [19] .text.sha224_fina PROGBITS         0000000000000000  00002cc0
      
      This causes WARNING from kexec_purgatory_setup_sechdrs():
      
        WARNING: CPU: 26 PID: 110894 at kernel/kexec_file.c:919
        kexec_load_purgatory+0x37f/0x390
      
      Fix this by disabling LTO for purgatory.
      
      [ AFAICT, x86 is the only arch that supports LTO and purgatory. ]
      
      We could also fix this with an explicit linker script to rejoin .text.*
      sections back into .text. However, given the benefit of LTOing purgatory
      is small, simply disable the production of more .text.* sections for now.
      
      Fixes: b33fff07
      
       ("x86, build: allow LTO to be selected")
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Link: https://lore.kernel.org/r/20230914170138.995606-1-song@kernel.org
      75b2f7e4
    • Kirill A. Shutemov's avatar
      x86/boot/compressed: Reserve more memory for page tables · f530ee95
      Kirill A. Shutemov authored
      The decompressor has a hard limit on the number of page tables it can
      allocate. This limit is defined at compile-time and will cause boot
      failure if it is reached.
      
      The kernel is very strict and calculates the limit precisely for the
      worst-case scenario based on the current configuration. However, it is
      easy to forget to adjust the limit when a new use-case arises. The
      worst-case scenario is rarely encountered during sanity checks.
      
      In the case of enabling 5-level paging, a use-case was overlooked. The
      limit needs to be increased by one to accommodate the additional level.
      This oversight went unnoticed until Aaron attempted to run the kernel
      via kexec with 5-level paging and unaccepted memory enabled.
      
      Update wost-case calculations to include 5-level paging.
      
      To address this issue, let's allocate some extra space for page tables.
      128K should be sufficient for any use-case. The logic can be simplified
      by using a single value for all kernel configurations.
      
      [ Also add a warning, should this memory run low - by Dave Hansen. ]
      
      Fixes: 34bbb000
      
       ("x86/boot/compressed: Enable 5-level paging during decompression stage")
      Reported-by: default avatarAaron Lu <aaron.lu@intel.com>
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20230915070221.10266-1-kirill.shutemov@linux.intel.com
      f530ee95
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v6.6' of... · f0b0d403
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - Fix kernel-devel RPM and linux-headers Deb package
      
       - Fix too long argument list error in 'make modules_install'
      
      * tag 'kbuild-fixes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kbuild: avoid long argument lists in make modules_install
        kbuild: fix kernel-devel RPM package and linux-headers Deb package
      f0b0d403
    • Linus Torvalds's avatar
      vm: fix move_vma() memory accounting being off · 3cec5049
      Linus Torvalds authored
      Commit 408579cd ("mm: Update do_vmi_align_munmap() return
      semantics") seems to have updated one of the callers of do_vmi_munmap()
      incorrectly: it used to check for the error case (which didn't
      change: negative means error).
      
      That commit changed the check to the success case (which did change:
      before that commit, 0 was success, and 1 was "success and lock
      downgraded".  After the change, it's always 0 for success, and the lock
      will have been released if requested).
      
      This didn't change any actual VM behavior _except_ for memory accounting
      when 'VM_ACCOUNT' was set on the vma.  Which made the wrong return value
      test fairly subtle, since everything continues to work.
      
      Or rather - it continues to work but the "Committed memory" accounting
      goes all wonky (Committed_AS value in /proc/meminfo), and depending on
      settings that then causes problems much much later as the VM relies on
      bogus statistics for its heuristics.
      
      Revert that one line of the change back to the original logic.
      
      Fixes: 408579cd
      
       ("mm: Update do_vmi_align_munmap() return semantics")
      Reported-by: default avatarChristoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
      Reported-bisected-and-tested-by: default avatarMichael Labiuk <michael.labiuk@virtuozzo.com>
      Cc: Bagas Sanjaya <bagasdotme@gmail.com>
      Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
      Link: https://lore.kernel.org/all/1694366957@msgid.manchmal.in-ulm.de/
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3cec5049
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · ad8a69f3
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "16 small(ish) fixes all in drivers.
      
        The major fixes are in pm8001 (fixes MSI-X issue going back to its
        origin), the qla2xxx endianness fix, which fixes a bug on big endian
        and the lpfc ones which can cause an oops on module removal without
        them"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: lpfc: Prevent use-after-free during rmmod with mapped NVMe rports
        scsi: lpfc: Early return after marking final NLP_DROPPED flag in dev_loss_tmo
        scsi: lpfc: Fix the NULL vs IS_ERR() bug for debugfs_create_file()
        scsi: target: core: Fix target_cmd_counter leak
        scsi: pm8001: Setup IRQs on resume
        scsi: pm80xx: Avoid leaking tags when processing OPC_INB_SET_CONTROLLER_CONFIG command
        scsi: pm80xx: Use phy-specific SAS address when sending PHY_START command
        scsi: ufs: core: Poll HCS.UCRDY before issuing a UIC command
        scsi: ufs: core: Move __ufshcd_send_uic_cmd() outside host_lock
        scsi: qedf: Add synchronization between I/O completions and abort
        scsi: target: Replace strlcpy() with strscpy()
        scsi: qla2xxx: Fix NULL vs IS_ERR() bug for debugfs_create_dir()
        scsi: qla2xxx: Use raw_smp_processor_id() instead of smp_processor_id()
        scsi: qla2xxx: Correct endianness for rqstlen and rsplen
        scsi: ppa: Fix accidentally reversed conditions for 16-bit and 32-bit EPP
        scsi: megaraid_sas: Fix deadlock on firmware crashdump
      ad8a69f3
    • Linus Torvalds's avatar
      Merge tag 'ata-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · cc3e5afc
      Linus Torvalds authored
      Pull ata fixes from Damien Le Moal:
      
       - Fix link power management transitions to disallow unsupported states
         (Niklas)
      
       - A small string handling fix for the sata_mv driver (Christophe)
      
       - Clear port pending interrupts before reset, as per AHCI
         specifications (Szuying).
      
         Followup fixes for this one are to not clear ATA_PFLAG_EH_PENDING in
         ata_eh_reset() to allow EH to continue on with other actions recorded
         with error interrupts triggered before EH completes. And an
         additional fix to avoid thawing a port twice in EH (Niklas)
      
       - Small code style fixes in the pata_parport driver to silence the
         build bot as it keeps complaining about bad indentation (me)
      
       - A fix for the recent CDL code to avoid fetching sense data for
         successful commands when not necessary for correct operation (Niklas)
      
      * tag 'ata-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
        ata: libata-core: fetch sense data for successful commands iff CDL enabled
        ata: libata-eh: do not thaw the port twice in ata_eh_reset()
        ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset()
        ata: pata_parport: Fix code style issues
        ata: libahci: clear pending interrupt status
        ata: sata_mv: Fix incorrect string length computation in mv_dump_mem()
        ata: libata: disallow dev-initiated LPM transitions to unsupported states
      cc3e5afc
    • Linus Torvalds's avatar
      Merge tag 'usb-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · cce67b6b
      Linus Torvalds authored
      Pull USB fix from Greg KH:
       "Here is a single USB fix for a much-reported regression for 6.6-rc1.
      
        It resolves a crash in the typec debugfs code for many systems. It's
        been in linux-next with no reported issues, and many people have
        reported it resolving their problem with 6.6-rc1"
      
      * tag 'usb-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: typec: ucsi: Fix NULL pointer dereference
      cce67b6b
    • Linus Torvalds's avatar
      Merge tag 'driver-core-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core · 205d0494
      Linus Torvalds authored
      Pull driver core fixes from Greg KH:
       "Here is a single driver core fix for a much-reported-by-sysbot issue
        that showed up in 6.6-rc1. It's been submitted by many people, all in
        the same way, so it obviously fixes things for them all.
      
        Also in here is a single documentation update adding riscv to the
        embargoed hardware document in case there are any future issues with
        that processor family.
      
        Both of these have been in linux-next with no reported problems"
      
      * tag 'driver-core-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        Documentation: embargoed-hardware-issues.rst: Add myself for RISC-V
        driver core: return an error when dev_set_name() hasn't happened
      205d0494
    • Linus Torvalds's avatar
      Merge tag 'char-misc-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · fd455e77
      Linus Torvalds authored
      Pull char/misc fix from Greg KH:
       "Here is a single patch for 6.6-rc2 that reverts a 6.5 change for the
        comedi subsystem that has ended up being incorrect and caused drivers
        that were working for people to be unable to be able to be selected to
        build at all.
      
        To fix this, the Kconfig change needs to be reverted and a future set
        of fixes for the ioport dependancies will show up in 6.7-rc1 (there's
        no rush for them.)
      
        This has been in linux-next with no reported issues"
      
      * tag 'char-misc-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        Revert "comedi: add HAS_IOPORT dependencies"
      fd455e77
    • Linus Torvalds's avatar
      Merge tag 'i2c-for-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · c37f8efc
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "The main thing is the removal of 'probe_new' because all i2c client
        drivers are converted now. Thanks Uwe, this marks the end of a long
        conversion process.
      
        Other than that, we have a few Kconfig updates and driver bugfixes"
      
      * tag 'i2c-for-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: cadence: Fix the kernel-doc warnings
        i2c: aspeed: Reset the i2c controller when timeout occurs
        i2c: I2C_MLXCPLD on ARM64 should depend on ACPI
        i2c: Make I2C_ATR invisible
        i2c: Drop legacy callback .probe_new()
        w1: ds2482: Switch back to use struct i2c_driver's .probe()
      c37f8efc
  10. Sep 16, 2023
    • Niklas Cassel's avatar
      ata: libata-core: fetch sense data for successful commands iff CDL enabled · 5e35a9ac
      Niklas Cassel authored
      Currently, we fetch sense data for a _successful_ command if either:
      1) Command was NCQ and ATA_DFLAG_CDL_ENABLED flag set (flag
         ATA_DFLAG_CDL_ENABLED will only be set if the Successful NCQ command
         sense data supported bit is set); or
      2) Command was non-NCQ and regular sense data reporting is enabled.
      
      This means that case 2) will trigger for a non-NCQ command which has
      ATA_SENSE bit set, regardless if CDL is enabled or not.
      
      This decision was by design. If the device reports that it has sense data
      available, it makes sense to fetch that sense data, since the sk/asc/ascq
      could be important information regardless if CDL is enabled or not.
      
      However, the fetching of sense data for a successful command is done via
      ATA EH. Considering how intricate the ATA EH is, we really do not want to
      invoke ATA EH unless absolutely needed.
      
      Before commit 18bd7718 ("scsi: ata: libata: Handle completion of CDL
      commands using policy 0xD") we never fetched sense data for successful
      commands.
      
      In order to not invoke the ATA EH unless absolutely necessary, even if the
      device claims support for sense data reporting, only fetch sense data for
      successful (NCQ and non-NCQ commands) commands that are using CDL.
      
      [Damien] Modified the check to test the qc flag ATA_QCFLAG_HAS_CDL
      instead of the device support for CDL, which is implied for commands
      using CDL.
      
      Fixes: 3ac873c7
      
       ("ata: libata-core: fix when to fetch sense data for successful commands")
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@wdc.com>
      Signed-off-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      5e35a9ac
    • Niklas Cassel's avatar
      ata: libata-eh: do not thaw the port twice in ata_eh_reset() · 7a3bc2b3
      Niklas Cassel authored
      commit 1e641060 ("libata: clear eh_info on reset completion") added
      a workaround that broke the retry mechanism in ATA EH.
      
      Tejun himself suggested to remove this workaround when it was identified
      to cause additional problems:
      https://lore.kernel.org/linux-ide/20110426135027.GI878@htj.dyndns.org/
      
      He even said:
      "Hmm... it seems I wasn't thinking straight when I added that work around."
      https://lore.kernel.org/linux-ide/20110426155229.GM878@htj.dyndns.org/
      
      While removing the workaround solved the issue, however, the workaround was
      kept to avoid "spurious hotplug events during reset", and instead another
      workaround was added on top of the existing workaround in commit
      8c56cacc ("libata: fix unexpectedly frozen port after ata_eh_reset()").
      
      Because these IRQs happened when the port was frozen, we know that they
      were actually a side effect of PxIS and IS.IPS(x) not being cleared before
      the COMRESET. This is now done in commit 94152042eaa9 ("ata: libahci: clear
      pending interrupt status"), so these workarounds can now be removed.
      
      Since commit 1e641060 ("libata: clear eh_info on reset completion") has
      now been reverted, the ATA EH retry mechanism is functional again, so there
      is once again no need to thaw the port more than once in ata_eh_reset().
      
      This reverts "the workaround on top of the workaround" introduced in commit
      8c56cacc
      
       ("libata: fix unexpectedly frozen port after ata_eh_reset()").
      
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@wdc.com>
      Signed-off-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      7a3bc2b3
    • Niklas Cassel's avatar
      ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset() · 80cc944e
      Niklas Cassel authored
      ata_scsi_port_error_handler() starts off by clearing ATA_PFLAG_EH_PENDING,
      before calling ap->ops->error_handler() (without holding the ap->lock).
      
      If an error IRQ is received while ap->ops->error_handler() is running,
      the irq handler will set ATA_PFLAG_EH_PENDING.
      
      Once ap->ops->error_handler() returns, ata_scsi_port_error_handler()
      checks if ATA_PFLAG_EH_PENDING is set, and if it is, another iteration
      of ATA EH is performed.
      
      The problem is that ATA_PFLAG_EH_PENDING is not only cleared by
      ata_scsi_port_error_handler(), it is also cleared by ata_eh_reset().
      
      ata_eh_reset() is called by ap->ops->error_handler(). This additional
      clearing done by ata_eh_reset() breaks the whole retry logic in
      ata_scsi_port_error_handler(). Thus, if an error IRQ is received while
      ap->ops->error_handler() is running, the port will currently remain
      frozen and will never get re-enabled.
      
      The additional clearing in ata_eh_reset() was introduced in commit
      1e641060 ("libata: clear eh_info on reset completion").
      
      Looking at the original error report:
      https://marc.info/?l=linux-ide&m=124765325828495&w=2
      
      We can see the following happening:
      [    1.074659] ata3: XXX port freeze
      [    1.074700] ata3: XXX hardresetting link, stopping engine
      [    1.074746] ata3: XXX flipping SControl
      
      [    1.411471] ata3: XXX irq_stat=400040 CONN|PHY
      [    1.411475] ata3: XXX port freeze
      
      [    1.420049] ata3: XXX starting engine
      [    1.420096] ata3: XXX rc=0, class=1
      [    1.420142] ata3: XXX clearing IRQs for thawing
      [    1.420188] ata3: XXX port thawed
      [    1.420234] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
      
      We are not supposed to be able to receive an error IRQ while the port is
      frozen (PxIE is set to 0, i.e. all IRQs for the port are disabled).
      
      AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) states:
      "Each bit location can be thought of as reporting a '1' if the virtual
      "interrupt line" for that port is indicating it wishes to generate an
      interrupt. That is, if a port has one or more interrupt status bit set,
      and the enables for those status bits are set, then this bit shall be set."
      
      Additionally, AHCI state P:ComInit clearly shows that the state machine
      will only jump to P:ComInitSetIS (which sets IS.IPS(x) to '1'), if PxIE.PCE
      is set to '1'. In our case, PxIE is set to 0, so IS.IPS(x) won't get set.
      
      So IS.IPS(x) only gets set if PxIS and PxIE is set.
      
      AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) also states:
      "The bits in this register are read/write clear. It is set by the level of
      the virtual interrupt line being a set, and cleared by a write of '1' from
      the software."
      
      So if IS.IPS(x) is set, you need to explicitly clear it by writing a 1 to
      IS.IPS(x) for that port.
      
      Since PxIE is cleared, the only way to get an interrupt while the port is
      frozen, is if IS.IPS(x) is set, and the only way IS.IPS(x) can be set when
      the port is frozen, is if it was set before the port was frozen.
      
      However, since commit 737dd811 ("ata: libahci: clear pending interrupt
      status"), we clear both PxIS and IS.IPS(x) after freezing the port, but
      before the COMRESET, so the problem that commit 1e641060 ("libata:
      clear eh_info on reset completion") fixed can no longer happen.
      
      Thus, revert commit 1e641060
      
       ("libata: clear eh_info on reset
      completion"), so that the retry logic in ata_scsi_port_error_handler()
      works once again. (The retry logic is still needed, since we can still
      get an error IRQ _after_ the port has been thawed, but before
      ata_scsi_port_error_handler() takes the ap->lock in order to check
      if ATA_PFLAG_EH_PENDING is set.)
      
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@wdc.com>
      Signed-off-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      80cc944e