Skip to content
  1. Apr 15, 2024
    • Dave Airlie's avatar
      nouveau: fix instmem race condition around ptr stores · fff1386c
      Dave Airlie authored
      
      
      Running a lot of VK CTS in parallel against nouveau, once every
      few hours you might see something like this crash.
      
      BUG: kernel NULL pointer dereference, address: 0000000000000008
      PGD 8000000114e6e067 P4D 8000000114e6e067 PUD 109046067 PMD 0
      Oops: 0000 [#1] PREEMPT SMP PTI
      CPU: 7 PID: 53891 Comm: deqp-vk Not tainted 6.8.0-rc6+ #27
      Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021
      RIP: 0010:gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
      Code: c7 48 01 c8 49 89 45 58 85 d2 0f 84 95 00 00 00 41 0f b7 46 12 49 8b 7e 08 89 da 42 8d 2c f8 48 8b 47 08 41 83 c7 01 48 89 ee <48> 8b 40 08 ff d0 0f 1f 00 49 8b 7e 08 48 89 d9 48 8d 75 04 48 c1
      RSP: 0000:ffffac20c5857838 EFLAGS: 00010202
      RAX: 0000000000000000 RBX: 00000000004d8001 RCX: 0000000000000001
      RDX: 00000000004d8001 RSI: 00000000000006d8 RDI: ffffa07afe332180
      RBP: 00000000000006d8 R08: ffffac20c5857ad0 R09: 0000000000ffff10
      R10: 0000000000000001 R11: ffffa07af27e2de0 R12: 000000000000001c
      R13: ffffac20c5857ad0 R14: ffffa07a96fe9040 R15: 000000000000001c
      FS:  00007fe395eed7c0(0000) GS:ffffa07e2c980000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000008 CR3: 000000011febe001 CR4: 00000000003706f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      
      ...
      
       ? gp100_vmm_pgt_mem+0xe3/0x180 [nouveau]
       ? gp100_vmm_pgt_mem+0x37/0x180 [nouveau]
       nvkm_vmm_iter+0x351/0xa20 [nouveau]
       ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
       ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
       ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
       ? __lock_acquire+0x3ed/0x2170
       ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
       nvkm_vmm_ptes_get_map+0xc2/0x100 [nouveau]
       ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau]
       ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau]
       nvkm_vmm_map_locked+0x224/0x3a0 [nouveau]
      
      Adding any sort of useful debug usually makes it go away, so I hand
      wrote the function in a line, and debugged the asm.
      
      Every so often pt->memory->ptrs is NULL. This ptrs ptr is set in
      the nv50_instobj_acquire called from nvkm_kmap.
      
      If Thread A and Thread B both get to nv50_instobj_acquire around
      the same time, and Thread A hits the refcount_set line, and in
      lockstep thread B succeeds at refcount_inc_not_zero, there is a
      chance the ptrs value won't have been stored since refcount_set
      is unordered. Force a memory barrier here, I picked smp_mb, since
      we want it on all CPUs and it's write followed by a read.
      
      v2: use paired smp_rmb/smp_wmb.
      
      Cc: <stable@vger.kernel.org>
      Fixes: be55287a ("drm/nouveau/imem/nv50: embed nvkm_instobj directly into nv04_instobj")
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarDanilo Krummrich <dakr@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240411011510.2546857-1-airlied@gmail.com
      fff1386c
  2. Apr 10, 2024
  3. Apr 08, 2024
  4. Apr 06, 2024
  5. Apr 05, 2024
  6. Apr 04, 2024
  7. Apr 02, 2024
  8. Mar 29, 2024
  9. Mar 28, 2024
  10. Mar 26, 2024
  11. Mar 25, 2024
    • Linus Torvalds's avatar
      Linux 6.9-rc1 · 4cece764
      Linus Torvalds authored
      v6.9-rc1
      4cece764
    • Linus Torvalds's avatar
      Merge tag 'efi-fixes-for-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · ab8de2db
      Linus Torvalds authored
      Pull EFI fixes from Ard Biesheuvel:
      
       - Fix logic that is supposed to prevent placement of the kernel image
         below LOAD_PHYSICAL_ADDR
      
       - Use the firmware stack in the EFI stub when running in mixed mode
      
       - Clear BSS only once when using mixed mode
      
       - Check efi.get_variable() function pointer for NULL before trying to
         call it
      
      * tag 'efi-fixes-for-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
        efi: fix panic in kdump kernel
        x86/efistub: Don't clear BSS twice in mixed mode
        x86/efistub: Call mixed mode boot services on the firmware's stack
        efi/libstub: fix efi_random_alloc() to allocate memory at alloc_min or higher address
      ab8de2db
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5e74df2f
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
      
       - Ensure that the encryption mask at boot is properly propagated on
         5-level page tables, otherwise the PGD entry is incorrectly set to
         non-encrypted, which causes system crashes during boot.
      
       - Undo the deferred 5-level page table setup as it cannot work with
         memory encryption enabled.
      
       - Prevent inconsistent XFD state on CPU hotplug, where the MSR is reset
         to the default value but the cached variable is not, so subsequent
         comparisons might yield the wrong result and as a consequence the
         result prevents updating the MSR.
      
       - Register the local APIC address only once in the MPPARSE enumeration
         to prevent triggering the related WARN_ONs() in the APIC and topology
         code.
      
       - Handle the case where no APIC is found gracefully by registering a
         fake APIC in the topology code. That makes all related topology
         functions work correctly and does not affect the actual APIC driver
         code at all.
      
       - Don't evaluate logical IDs during early boot as the local APIC IDs
         are not yet enumerated and the invoked function returns an error
         code. Nothing requires the logical IDs before the final CPUID
         enumeration takes place, which happens after the enumeration.
      
       - Cure the fallout of the per CPU rework on UP which misplaced the
         copying of boot_cpu_data to per CPU data so that the final update to
         boot_cpu_data got lost which caused inconsistent state and boot
         crashes.
      
       - Use copy_from_kernel_nofault() in the kprobes setup as there is no
         guarantee that the address can be safely accessed.
      
       - Reorder struct members in struct saved_context to work around another
         kmemleak false positive
      
       - Remove the buggy code which tries to update the E820 kexec table for
         setup_data as that is never passed to the kexec kernel.
      
       - Update the resource control documentation to use the proper units.
      
       - Fix a Kconfig warning observed with tinyconfig
      
      * tag 'x86-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/boot/64: Move 5-level paging global variable assignments back
        x86/boot/64: Apply encryption mask to 5-level pagetable update
        x86/cpu: Add model number for another Intel Arrow Lake mobile processor
        x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD
        Documentation/x86: Document that resctrl bandwidth control units are MiB
        x86/mpparse: Register APIC address only once
        x86/topology: Handle the !APIC case gracefully
        x86/topology: Don't evaluate logical IDs during early boot
        x86/cpu: Ensure that CPU info updates are propagated on UP
        kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe address
        x86/pm: Work around false positive kmemleak report in msr_build_context()
        x86/kexec: Do not update E820 kexec table for setup_data
        x86/config: Fix warning for 'make ARCH=x86_64 tinyconfig'
      5e74df2f
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b136f68e
      Linus Torvalds authored
      Pull scheduler doc clarification from Thomas Gleixner:
       "A single update for the documentation of the base_slice_ns tunable to
        clarify that any value which is less than the tick slice has no effect
        because the scheduler tick is not guaranteed to happen within the set
        time slice"
      
      * tag 'sched-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/doc: Update documentation for base_slice_ns and CONFIG_HZ relation
      b136f68e
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping · 864ad046
      Linus Torvalds authored
      Pull dma-mapping fixes from Christoph Hellwig:
       "This has a set of swiotlb alignment fixes for sometimes very long
        standing bugs from Will. We've been discussion them for a while and
        they should be solid now"
      
      * tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping:
        swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE
        iommu/dma: Force swiotlb_max_mapping_size on an untrusted device
        swiotlb: Fix alignment checks when both allocation and DMA masks are present
        swiotlb: Honour dma_alloc_coherent() alignment in swiotlb_alloc()
        swiotlb: Enforce page alignment in swiotlb_alloc()
        swiotlb: Fix double-allocation of slots due to broken alignment handling
      864ad046
  12. Mar 24, 2024