Skip to content
  1. Dec 22, 2022
  2. Dec 21, 2022
    • Tim Huang's avatar
      drm/amdgpu: skip mes self test after s0i3 resume for MES IP v11.0 · 8660495a
      Tim Huang authored
      
      
      MES is part of gfxoff and MES suspend and resume are skipped for S0i3.
      But the mes_self_test call path is still in the amdgpu_device_ip_late_init.
      it's should also be skipped for s0ix as no hardware re-initialization
      happened.
      
      Besides, mes_self_test will free the BO that triggers a lot of warning
      messages while in the suspend state.
      
      [   81.656085] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
      [   81.679435] Call Trace:
      [   81.679726]  <TASK>
      [   81.679981]  amdgpu_mes_remove_hw_queue+0x17a/0x230 [amdgpu]
      [   81.680857]  amdgpu_mes_self_test+0x390/0x430 [amdgpu]
      [   81.681665]  mes_v11_0_late_init+0x37/0x50 [amdgpu]
      [   81.682423]  amdgpu_device_ip_late_init+0x53/0x280 [amdgpu]
      [   81.683257]  amdgpu_device_resume+0xae/0x2a0 [amdgpu]
      [   81.684043]  amdgpu_pmops_resume+0x37/0x70 [amdgpu]
      [   81.684818]  pci_pm_resume+0x5c/0xa0
      [   81.685247]  ? pci_pm_thaw+0x90/0x90
      [   81.685658]  dpm_run_callback+0x4e/0x160
      [   81.686110]  device_resume+0xad/0x210
      [   81.686529]  async_resume+0x1e/0x40
      [   81.686931]  async_run_entry_fn+0x33/0x120
      [   81.687405]  process_one_work+0x21d/0x3f0
      [   81.687869]  worker_thread+0x4a/0x3c0
      [   81.688293]  ? process_one_work+0x3f0/0x3f0
      [   81.688777]  kthread+0xff/0x130
      [   81.689157]  ? kthread_complete_and_exit+0x20/0x20
      [   81.689707]  ret_from_fork+0x22/0x30
      [   81.690118]  </TASK>
      [   81.690380] ---[ end trace 0000000000000000 ]---
      
      v2: make the comment clean and use adev->in_s0ix instead of
      adev->suspend
      
      Signed-off-by: default avatarTim Huang <tim.huang@amd.com>
      Reviewed-by: default avatarMario Limonciello <mario.limonciello@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org # 6.0, 6.1
      8660495a
    • Evan Quan's avatar
      drm/amd/pm: correct the fan speed retrieving in PWM for some SMU13 asics · e73fc71e
      Evan Quan authored
      
      
      For SMU 13.0.0 and 13.0.7, the output from PMFW is in percent. Driver
      need to convert that into correct PMW(255) based.
      
      Signed-off-by: default avatarEvan Quan <evan.quan@amd.com>
      Acked-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org # 6.0, 6.1
      e73fc71e
    • Evan Quan's avatar
      drm/amd/pm: bump SMU13.0.0 driver_if header to version 0x34 · 272b9814
      Evan Quan authored
      
      
      To fit the latest PMFW and suppress the warning emerged on driver loading.
      
      Signed-off-by: default avatarEvan Quan <evan.quan@amd.com>
      Acked-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org # 6.0, 6.1
      272b9814
    • Alex Deucher's avatar
      drm/amdgpu: skip MES for S0ix as well since it's part of GFX · afa6646b
      Alex Deucher authored
      
      
      It's also part of gfxoff.
      
      Cc: stable@vger.kernel.org # 6.0, 6.1
      Reviewed-by: default avatarMario Limonciello <mario.limonciello@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      afa6646b
    • Arnd Bergmann's avatar
      drm/amd/pm: avoid large variable on kernel stack · d118b18f
      Arnd Bergmann authored
      The activity_monitor_external[] array is too big to fit on the
      kernel stack, resulting in this warning with clang:
      
      drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/smu_v13_0_7_ppt.c:1438:12: error: stack frame size (1040) exceeds limit (1024) in 'smu_v13_0_7_get_power_profile_mode' [-Werror,-Wframe-larger-than]
      
      Use dynamic allocation instead. It should also be possible to
      have single element here instead of the array, but this seems
      easier.
      
      v2: fix up argument to sizeof() (Alex)
      
      Fixes: 334682ae
      
       ("drm/amd/pm: enable workload type change on smu_v13_0_7")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      d118b18f
    • Philip Yang's avatar
      drm/amdkfd: Fix double release compute pasid · 1a799c4c
      Philip Yang authored
      
      
      If kfd_process_device_init_vm returns failure after vm is converted to
      compute vm and vm->pasid set to compute pasid, KFD will not take
      pdd->drm_file reference. As a result, drm close file handler maybe
      called to release the compute pasid before KFD process destroy worker to
      release the same pasid and set vm->pasid to zero, this generates below
      WARNING backtrace and NULL pointer access.
      
      Add helper amdgpu_amdkfd_gpuvm_set_vm_pasid and call it at the last step
      of kfd_process_device_init_vm, to ensure vm pasid is the original pasid
      if acquiring vm failed or is the compute pasid with pdd->drm_file
      reference taken to avoid double release same pasid.
      
       amdgpu: Failed to create process VM object
       ida_free called for id=32770 which is not allocated.
       WARNING: CPU: 57 PID: 72542 at ../lib/idr.c:522 ida_free+0x96/0x140
       RIP: 0010:ida_free+0x96/0x140
       Call Trace:
        amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu]
        amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu]
        drm_file_free.part.13+0x216/0x270 [drm]
        drm_close_helper.isra.14+0x60/0x70 [drm]
        drm_release+0x6e/0xf0 [drm]
        __fput+0xcc/0x280
        ____fput+0xe/0x20
        task_work_run+0x96/0xc0
        do_exit+0x3d0/0xc10
      
       BUG: kernel NULL pointer dereference, address: 0000000000000000
       RIP: 0010:ida_free+0x76/0x140
       Call Trace:
        amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu]
        amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu]
        drm_file_free.part.13+0x216/0x270 [drm]
        drm_close_helper.isra.14+0x60/0x70 [drm]
        drm_release+0x6e/0xf0 [drm]
        __fput+0xcc/0x280
        ____fput+0xe/0x20
        task_work_run+0x96/0xc0
        do_exit+0x3d0/0xc10
      
      Signed-off-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      1a799c4c
    • Philip Yang's avatar
      drm/amdkfd: Fix kfd_process_device_init_vm error handling · 29d48b87
      Philip Yang authored
      
      
      Should only destroy the ib_mem and let process cleanup worker to free
      the outstanding BOs. Reset the pointer in pdd->qpd structure, to avoid
      NULL pointer access in process destroy worker.
      
       BUG: kernel NULL pointer dereference, address: 0000000000000010
       Call Trace:
        amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel+0x46/0xb0 [amdgpu]
        kfd_process_device_destroy_cwsr_dgpu+0x40/0x70 [amdgpu]
        kfd_process_destroy_pdds+0x71/0x190 [amdgpu]
        kfd_process_wq_release+0x2a2/0x3b0 [amdgpu]
        process_one_work+0x2a1/0x600
        worker_thread+0x39/0x3d0
      
      Signed-off-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      29d48b87
  3. Dec 20, 2022
  4. Dec 16, 2022
  5. Dec 15, 2022
  6. Dec 14, 2022
  7. Dec 10, 2022