Skip to content
  1. Sep 06, 2023
    • Tong Tiangen's avatar
      mm: memory-failure: use rcu lock instead of tasklist_lock when collect_procs() · d256d1cd
      Tong Tiangen authored
      We found a softlock issue in our test, analyzed the logs, and found that
      the relevant CPU call trace as follows:
      
      CPU0:
        _do_fork
          -> copy_process()
            -> write_lock_irq(&tasklist_lock)  //Disable irq,waiting for
            					 //tasklist_lock
      
      CPU1:
        wp_page_copy()
          ->pte_offset_map_lock()
            -> spin_lock(&page->ptl);        //Hold page->ptl
          -> ptep_clear_flush()
            -> flush_tlb_others() ...
              -> smp_call_function_many()
                -> arch_send_call_function_ipi_mask()
                  -> csd_lock_wait()         //Waiting for other CPUs respond
      	                               //IPI
      
      CPU2:
        collect_procs_anon()
          -> read_lock(&tasklist_lock)       //Hold tasklist_lock
            ->for_each_process(tsk)
              -> page_mapped_in_vma()
                -> page_vma_mapped_walk()
      	    -> map_pte()
                    ->spin_lock(&page->ptl)  //Waiting for page->ptl
      
      We can see that CPU1 waiting for CPU0 respond IPI,CPU0 waiting for CPU2
      unlock tasklist_lock, CPU2 waiting for CPU1 unlock page->ptl. As a result,
      softlockup is triggered.
      
      For collect_procs_anon(), what we're doing is task list iteration, during
      the iteration, with the help of call_rcu(), the task_struct object is freed
      only after one or more grace periods elapse. the logic as follows:
      
      release_task()
        -> __exit_signal()
          -> __unhash_process()
            -> list_del_rcu()
      
        -> put_task_struct_rcu_user()
          -> call_rcu(&task->rcu, delayed_put_task_struct)
      
      delayed_put_task_struct()
        -> put_task_struct()
        -> if (refcount_sub_and_test())
           	__put_task_struct()
                -> free_task()
      
      Therefore, under the protection of the rcu lock, we can safely use
      get_task_struct() to ensure a safe reference to task_struct during the
      iteration.
      
      By removing the use of tasklist_lock in task list iteration, we can break
      the softlock chain above.
      
      The same logic can also be applied to:
       - collect_procs_file()
       - collect_procs_fsdax()
       - collect_procs_ksm()
      
      Link: https://lkml.kernel.org/r/20230828022527.241693-1-tongtiangen@huawei.com
      
      
      Signed-off-by: default avatarTong Tiangen <tongtiangen@huawei.com>
      Acked-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d256d1cd
    • Andrew Morton's avatar
      revert "memfd: improve userspace warnings for missing exec-related flags". · 2562d67b
      Andrew Morton authored
      This warning is telling userspace developers to pass MFD_EXEC and
      MFD_NOEXEC_SEAL to memfd_create().  Commit 434ed335 ("memfd: improve
      userspace warnings for missing exec-related flags") made the warning more
      frequent and visible in the hope that this would accelerate the fixing of
      errant userspace.
      
      But the overall effect is to generate far too much dmesg noise.
      
      Fixes: 434ed335
      
       ("memfd: improve userspace warnings for missing exec-related flags")
      Reported-by: default avatarDamian Tometzki <dtometzki@fedoraproject.org>
      Closes: https://lkml.kernel.org/r/ZPFzCSIgZ4QuHsSC@fedora.fritz.box
      
      
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Daniel Verkamp <dverkamp@chromium.org>
      Cc: Jeff Xu <jeffxu@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2562d67b
    • Zqiang's avatar
      rcu: dump vmalloc memory info safely · c83ad36a
      Zqiang authored
      Currently, for double invoke call_rcu(), will dump rcu_head objects memory
      info, if the objects is not allocated from the slab allocator, the
      vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock need to
      be held, since the call_rcu() can be invoked in interrupt context,
      therefore, there is a possibility of spinlock deadlock scenarios.
      
      And in Preempt-RT kernel, the rcutorture test also trigger the following
      lockdep warning:
      
      BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
      in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
      preempt_count: 1, expected: 0
      RCU nest depth: 1, expected: 1
      3 locks held by swapper/0/1:
       #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0
       #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370
       #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70
      irq event stamp: 565512
      hardirqs last  enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940
      hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370
      softirqs last  enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170
      softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0
      Preemption disabled at:
      [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370
      CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.5.0-rc4-rt2-yocto-preempt-rt+ #15
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x68/0xb0
       dump_stack+0x14/0x20
       __might_resched+0x1aa/0x280
       ? __pfx_rcu_torture_err_cb+0x10/0x10
       rt_spin_lock+0x53/0x130
       ? find_vmap_area+0x1f/0x70
       find_vmap_area+0x1f/0x70
       vmalloc_dump_obj+0x20/0x60
       mem_dump_obj+0x22/0x90
       __call_rcu_common+0x5bf/0x940
       ? debug_smp_processor_id+0x1b/0x30
       call_rcu_hurry+0x14/0x20
       rcu_torture_init+0x1f82/0x2370
       ? __pfx_rcu_torture_leak_cb+0x10/0x10
       ? __pfx_rcu_torture_leak_cb+0x10/0x10
       ? __pfx_rcu_torture_init+0x10/0x10
       do_one_initcall+0x6c/0x300
       ? debug_smp_processor_id+0x1b/0x30
       kernel_init_freeable+0x2b9/0x540
       ? __pfx_kernel_init+0x10/0x10
       kernel_init+0x1f/0x150
       ret_from_fork+0x40/0x50
       ? __pfx_kernel_init+0x10/0x10
       ret_from_fork_asm+0x1b/0x30
       </TASK>
      
      The previous patch fixes this by using the deadlock-safe best-effort
      version of find_vm_area.  However, in case of failure print the fact that
      the pointer was a vmalloc pointer so that we print at least something.
      
      Link: https://lkml.kernel.org/r/20230904180806.1002832-2-joel@joelfernandes.org
      Fixes: 98f18083
      
       ("mm: Make mem_dump_obj() handle vmalloc() memory")
      Signed-off-by: default avatarZqiang <qiang.zhang1211@gmail.com>
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Reported-by: default avatarZhen Lei <thunder.leizhen@huaweicloud.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c83ad36a
    • Joel Fernandes (Google)'s avatar
      mm/vmalloc: add a safer version of find_vm_area() for debug · 0818e739
      Joel Fernandes (Google) authored
      It is unsafe to dump vmalloc area information when trying to do so from
      some contexts.  Add a safer trylock version of the same function to do a
      best-effort VMA finding and use it from vmalloc_dump_obj().
      
      [applied test robot feedback on unused function fix.]
      [applied Uladzislau feedback on locking.]
      Link: https://lkml.kernel.org/r/20230904180806.1002832-1-joel@joelfernandes.org
      Fixes: 98f18083
      
       ("mm: Make mem_dump_obj() handle vmalloc() memory")
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Reviewed-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
      Reported-by: default avatarZhen Lei <thunder.leizhen@huaweicloud.com>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Zqiang <qiang.zhang1211@gmail.com>
      Cc: <stable@vger.kernel.org>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0818e739
    • Xie XiuQi's avatar
      tools/mm: fix undefined reference to pthread_once · 7f33105c
      Xie XiuQi authored
      Commit 97d5f2e9 ("tools api fs: More thread safety for global
      filesystem variables") introduces pthread_once, so the libpthread
      should be added at link time, or we'll meet the following compile
      error when 'make -C tools/mm':
      
        gcc -Wall -Wextra -I../lib/ -o page-types page-types.c ../lib/api/libapi.a
        ~/linux/tools/lib/api/fs/fs.c:146: undefined reference to `pthread_once'
        ~/linux/tools/lib/api/fs/fs.c:147: undefined reference to `pthread_once'
        ~/linux/tools/lib/api/fs/fs.c:148: undefined reference to `pthread_once'
        ~/linux/tools/lib/api/fs/fs.c:149: undefined reference to `pthread_once'
        ~/linux/tools/lib/api/fs/fs.c:150: undefined reference to `pthread_once'
        /usr/bin/ld: ../lib/api/libapi.a(libapi-in.o):~/linux/tools/lib/api/fs/fs.c:151:
        more undefined references to `pthread_once' follow
        collect2: error: ld returned 1 exit status
        make: *** [Makefile:22: page-types] Error 1
      
      Link: https://lkml.kernel.org/r/20230831034205.2376653-1-xiexiuqi@huaweicloud.com
      Fixes: 97d5f2e9
      
       ("tools api fs: More thread safety for global filesystem variables")
      Signed-off-by: default avatarXie XiuQi <xiexiuqi@huawei.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7f33105c
    • Johannes Weiner's avatar
      memcontrol: ensure memcg acquired by id is properly set up · 6f0df8e1
      Johannes Weiner authored
      In the eviction recency check, we attempt to retrieve the memcg to which
      the folio belonged when it was evicted, by the memcg id stored in the
      shadow entry.  However, there is a chance that the retrieved memcg is not
      the original memcg that has been killed, but a new one which happens to
      have the same id.
      
      This is a somewhat unfortunate, but acceptable and rare inaccuracy in the
      heuristics.  However, if we retrieve this new memcg between its allocation
      and when it is properly attached to the memcg hierarchy, we could run into
      the following NULL pointer exception during the memcg hierarchy traversal
      done in mem_cgroup_get_nr_swap_pages():
      
      [ 155757.793456] BUG: kernel NULL pointer dereference, address: 00000000000000c0
      [ 155757.807568] #PF: supervisor read access in kernel mode
      [ 155757.818024] #PF: error_code(0x0000) - not-present page
      [ 155757.828482] PGD 401f77067 P4D 401f77067 PUD 401f76067 PMD 0
      [ 155757.839985] Oops: 0000 [#1] SMP
      [ 155757.887870] RIP: 0010:mem_cgroup_get_nr_swap_pages+0x3d/0xb0
      [ 155757.899377] Code: 29 19 4a 02 48 39 f9 74 63 48 8b 97 c0 00 00 00 48 8b b7 58 02 00 00 48 2b b7 c0 01 00 00 48 39 f0 48 0f 4d c6 48 39 d1 74 42 <48> 8b b2 c0 00 00 00 48 8b ba 58 02 00 00 48 2b ba c0 01 00 00 48
      [ 155757.937125] RSP: 0018:ffffc9002ecdfbc8 EFLAGS: 00010286
      [ 155757.947755] RAX: 00000000003a3b1c RBX: 000007ffffffffff RCX: ffff888280183000
      [ 155757.962202] RDX: 0000000000000000 RSI: 0007ffffffffffff RDI: ffff888bbc2d1000
      [ 155757.976648] RBP: 0000000000000001 R08: 000000000000000b R09: ffff888ad9cedba0
      [ 155757.991094] R10: ffffea0039c07900 R11: 0000000000000010 R12: ffff888b23a7b000
      [ 155758.005540] R13: 0000000000000000 R14: ffff888bbc2d1000 R15: 000007ffffc71354
      [ 155758.019991] FS:  00007f6234c68640(0000) GS:ffff88903f9c0000(0000) knlGS:0000000000000000
      [ 155758.036356] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 155758.048023] CR2: 00000000000000c0 CR3: 0000000a83eb8004 CR4: 00000000007706e0
      [ 155758.062473] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 155758.076924] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 155758.091376] PKRU: 55555554
      [ 155758.096957] Call Trace:
      [ 155758.102016]  <TASK>
      [ 155758.106502]  ? __die+0x78/0xc0
      [ 155758.112793]  ? page_fault_oops+0x286/0x380
      [ 155758.121175]  ? exc_page_fault+0x5d/0x110
      [ 155758.129209]  ? asm_exc_page_fault+0x22/0x30
      [ 155758.137763]  ? mem_cgroup_get_nr_swap_pages+0x3d/0xb0
      [ 155758.148060]  workingset_test_recent+0xda/0x1b0
      [ 155758.157133]  workingset_refault+0xca/0x1e0
      [ 155758.165508]  filemap_add_folio+0x4d/0x70
      [ 155758.173538]  page_cache_ra_unbounded+0xed/0x190
      [ 155758.182919]  page_cache_sync_ra+0xd6/0x1e0
      [ 155758.191738]  filemap_read+0x68d/0xdf0
      [ 155758.199495]  ? mlx5e_napi_poll+0x123/0x940
      [ 155758.207981]  ? __napi_schedule+0x55/0x90
      [ 155758.216095]  __x64_sys_pread64+0x1d6/0x2c0
      [ 155758.224601]  do_syscall_64+0x3d/0x80
      [ 155758.232058]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [ 155758.242473] RIP: 0033:0x7f62c29153b5
      [ 155758.249938] Code: e8 48 89 75 f0 89 7d f8 48 89 4d e0 e8 b4 e6 f7 ff 41 89 c0 4c 8b 55 e0 48 8b 55 e8 48 8b 75 f0 8b 7d f8 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 45 f8 e8 e7 e6 f7 ff 48 8b
      [ 155758.288005] RSP: 002b:00007f6234c5ffd0 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
      [ 155758.303474] RAX: ffffffffffffffda RBX: 00007f628c4e70c0 RCX: 00007f62c29153b5
      [ 155758.318075] RDX: 000000000003c041 RSI: 00007f61d2986000 RDI: 0000000000000076
      [ 155758.332678] RBP: 00007f6234c5fff0 R08: 0000000000000000 R09: 0000000064d5230c
      [ 155758.347452] R10: 000000000027d450 R11: 0000000000000293 R12: 000000000003c041
      [ 155758.362044] R13: 00007f61d2986000 R14: 00007f629e11b060 R15: 000000000027d450
      [ 155758.376661]  </TASK>
      
      This patch fixes the issue by moving the memcg's id publication from the
      alloc stage to online stage, ensuring that any memcg acquired via id must
      be connected to the memcg tree.
      
      Link: https://lkml.kernel.org/r/20230823225430.166925-1-nphamcs@gmail.com
      Fixes: f78dfc7b
      
       ("workingset: fix confusion around eviction vs refault container")
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Co-developed-by: default avatarNhat Pham <nphamcs@gmail.com>
      Signed-off-by: default avatarNhat Pham <nphamcs@gmail.com>
      Acked-by: default avatarShakeel Butt <shakeelb@google.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6f0df8e1
  2. Sep 05, 2023
    • Linus Torvalds's avatar
      Merge tag 'arc-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · 3f86ed6e
      Linus Torvalds authored
      Pull ARC updates from Vineet Gupta:
      
       - fixes for -Wmissing-prototype warnings
      
       - missing compiler barrier in relaxed atomics
      
       - some uaccess simplification, declutter
      
       - removal of massive glocal struct cpuinfo_arc from bootlog code
      
       - __switch_to consolidation (removal of inline asm variant)
      
       - use GP to cache task pointer (vs. r25)
      
       - misc rework of entry code
      
      * tag 'arc-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (24 commits)
        ARC: boot log: fix warning
        arc: Explicitly include correct DT includes
        ARC: pt_regs: create seperate type for ecr
        ARCv2: entry: rearrange pt_regs slightly
        ARC: entry: replace 8 byte ADD.ne with 4 byte ADD2.ne
        ARC: entry: replace 8 byte OR with 4 byte BSET
        ARC: entry: Add more common chores to EXCEPTION_PROLOGUE
        ARC: entry: EV_MachineCheck dont re-read ECR
        ARC: entry: ARcompact EV_ProtV to use r10 directly
        ARC: entry: rework (non-functional)
        ARC: __switch_to: move ksp to thread_info from thread_struct
        ARC: __switch_to: asm with dwarf ops (vs. inline asm)
        ARC: kernel stack: INIT_THREAD need not setup @init_stack in @ksp
        ARC: entry: use gp to cache task pointer (vs. r25)
        ARC: boot log: eliminate struct cpuinfo_arc #4: boot log per ISA
        ARC: boot log: eliminate struct cpuinfo_arc #3: don't export
        ARC: boot log: eliminate struct cpuinfo_arc #2: cache
        ARC: boot log: eliminate struct cpuinfo_arc #1: mm
        ARCv2: memset: don't prefetch for len == 0 which happens a alot
        ARC: uaccess: elide unaliged handling if hardware supports
        ...
      3f86ed6e
    • Linus Torvalds's avatar
      Merge tag 'pm-6.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · ea4f9c37
      Linus Torvalds authored
      Pull more power management updates from Rafael Wysocki:
       "These fix cpufreq core and the pcc cpufreq driver, add per-policy
        boost support to cpufreq and add Georgian translation Makefile
        LANGUAGES in cpupower.
      
        Specifics:
      
         - Add Georgian translation to Makefile LANGUAGES in cpupower (Shuah
           Khan).
      
         - Add support for per-policy performance boost to cpufreq (Jie Zhan).
      
         - Fix assorted issues in the cpufreq core, common governor code and
           in the pcc cpufreq driver (Liao Chang)"
      
      * tag 'pm-6.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: Support per-policy performance boost
        cpufreq: pcc: Fix the potentinal scheduling delays in target_index()
        cpufreq: governor: Free dbs_data directly when gov->init() fails
        cpufreq: Fix the race condition while updating the transition_task of policy
        cpufreq: Avoid printing kernel addresses in cpufreq_resume()
        cpupower: Add Georgian translation to Makefile LANGUAGES
      ea4f9c37
    • Linus Torvalds's avatar
      Merge tag 'thermal-6.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 0ca4080a
      Linus Torvalds authored
      Pull more thermal control updates from Rafael Wysocki:
       "These are mostly updates of thermal control drivers for ARM platforms,
        new thermal control support for Loongson-2 and a couple of core
        cleanups made possible by recent changes merged previously.
      
        Specifics:
      
         - Check if the Tegra BPMP supports the trip points in order to set
           the .set_trips callback (Mikko Perttunen)
      
         - Add new Loongson-2 thermal sensor along with the DT bindings (Yinbo
           Zhu)
      
         - Use IS_ERR_OR_NULL() helper to replace a double test on the TI
           bandgap sensor (Li Zetao)
      
         - Remove redundant platform_set_drvdata() calls, as there are no
           corresponding calls to platform_get_drvdata(), from a bunch of
           drivers (Andrei Coardos)
      
         - Switch the Mediatek LVTS mode to filtered in order to enable
           interrupts (Nícolas F. R. A. Prado)
      
         - Fix Wvoid-pointer-to-enum-cast warning on the Exynos TMU (Krzysztof
           Kozlowski)
      
         - Remove redundant dev_err_probe(), because the underlying function
           already called it, from the Mediatek sensor (Chen Jiahao)
      
         - Free calibration nvmem after reading it on sun8i (Mark Brown)
      
         - Remove useless comment from the sun8i driver (Yangtao Li)
      
         - Make tsens_xxxx_nvmem static to fix a sparse warning on QCom tsens
           (Min-Hua Chen)
      
         - Remove error message at probe deferral on imx8mm (Ahmad Fatoum)
      
         - Fix parameter check in lvts_debugfs_init() with IS_ERR() on
           Mediatek LVTS (Minjie Du)
      
         - Fix interrupt routine and configuratoin for Mediatek LVTS (Nícolas
           F. R. A. Prado)
      
         - Drop unused .get_trip_type(), .get_trip_temp() and .get_trip_hyst()
           thermal zone callbacks from the core and rework the .get_trend()
           one to take a trip point pointer as an argument (Rafael Wysocki)"
      
      * tag 'thermal-6.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (29 commits)
        thermal: core: Rework .get_trend() thermal zone callback
        thermal: core: Drop unused .get_trip_*() callbacks
        thermal/drivers/tegra-bpmp: Check if BPMP supports trip points
        thermal: dt-bindings: add loongson-2 thermal
        thermal/drivers/loongson-2: Add thermal management support
        thermal/drivers/ti-soc-thermal: Use helper function IS_ERR_OR_NULL()
        thermal/drivers/generic-adc: Removed unneeded call to platform_set_drvdata()
        thermal/drivers/max77620_thermal: Removed unneeded call to platform_set_drvdata()
        thermal/drivers/mediatek/auxadc_thermal: Removed call to platform_set_drvdata()
        thermal/drivers/sun8i_thermal: Remove unneeded call to platform_set_drvdata()
        thermal/drivers/broadcom/brcstb_thermal: Removed unneeded platform_set_drvdata()
        thermal/drivers/mediatek/lvts_thermal: Make readings valid in filtered mode
        thermal/drivers/k3_bandgap: Remove unneeded call to platform_set_drvdata()
        thermal/drivers/k3_j72xx_bandgap: Removed unneeded call to platform_set_drvdata()
        thermal/drivers/broadcom/sr-thermal: Removed call to platform_set_drvdata()
        thermal/drivers/samsung: Fix Wvoid-pointer-to-enum-cast warning
        thermal/drivers/db8500: Remove redundant of_match_ptr()
        thermal/drivers/mediatek: Clean up redundant dev_err_probe()
        thermal/drivers/sun8i: Free calibration nvmem after reading it
        thermal/drivers/sun8i: Remove unneeded comments
        ...
      0ca4080a
    • Linus Torvalds's avatar
      Merge tag 'rproc-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux · 2a3a850e
      Linus Torvalds authored
      Pull remoteproc updates from Bjorn Andersson:
       "Support for booting the iMX remoteprocs using MMIO, instead of SMCCC
        is added. The iMX driver is also extended to support delivering
        interrupts from an arbitrary number of vdev.
      
        Support is added to the TI PRU driver, to allow GPMUX to be controlled
        from DeviceTree.
      
        The Qualcomm coredump collector is extended to fall back to generating
        a full coredump, in the case that the loaded firmware doesn't support
        generating minidump. The overly terse MD abbreviation of "MINIDUMP" is
        expanded, to make the code easier on the eye.
      
        The list of Qualcomm Sensor Low Power Island (SLPI) instances
        supported is cleaned up, and SDM845 is added. SDM630/636/660 support
        for the modem subsystem (mss) is added.
      
        All the Qualcomm drivers are transitioned to of_reserved_mem_lookup()
        instead of open coding the resolution of reserved-memory regions, to
        gain handling of error cases. A couple of drivers are transitioned to
        use devm_platform_ioremap_resource_byname().
      
        The stm32 remoteproc driver's PM operations are updated to modern
        macros, to avoid the "unused variable"-warning in some configurations.
      
        Drivers are transitioned away from directly including of_device.h"
      
      * tag 'rproc-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux: (23 commits)
        remoteproc: pru: add support for configuring GPMUX based on client setup
        remoteproc: stm32: fix incorrect optional pointers
        remoteproc: imx_rproc: Switch iMX8MN/MP from SMCCC to MMIO
        dt-bindings: remoteproc: imx_rproc: Support i.MX8MN/P MMIO
        dt-bindings: remoteproc: qcom,msm8996-mss-pil: Fix 8996 clocks
        remoteproc: qcom: pas: add SDM845 SLPI compatible
        remoteproc: qcom: q6v5-mss: Add support for SDM630/636/660
        dt-bindings: remoteproc: qcom,msm8996-mss-pil: Add SDM660 compatible
        remoteproc: qcom: Expand MD_* as MINIDUMP_*
        remoteproc: qcom: pas: refactor SLPI remoteproc init
        dt-bindings: remoteproc: qcom: adsp: add qcom,sdm845-slpi-pas compatible
        remoteproc: qcom: wcnss: use devm_platform_ioremap_resource_byname()
        remoteproc: qcom: q6v5: use devm_platform_ioremap_resource_byname()
        dt-bindings: remoteproc: qcom: sm6115-pas: Add QCM2290
        remoteproc: qcom: Add full coredump fallback mechanism
        remoteproc: core: Export the rproc coredump APIs
        remoteproc: qcom: Use of_reserved_mem_lookup()
        remoteproc: imx_rproc: iterate all notifiyids in rx callback
        dt-bindings: remoteproc: qcom,adsp: bring back firmware-name
        dt-bindings: remoteproc: qcom,sm8550-pas: require memory-region
        ...
      2a3a850e
    • Linus Torvalds's avatar
      Merge tag 'rpmsg-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux · 3d904704
      Linus Torvalds authored
      Pull rpmsg updates from Bjorn Andersson:
       "Add support for the GLINK flow control signals, and expose this to the
        user through the rpmsg_char interface. Add missing kstrdup() failure
        handling during allocation of GLINK channel objects"
      
      * tag 'rpmsg-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
        rpmsg: glink: Avoid dereferencing NULL channel
        rpmsg: glink: Add check for kstrdup
        rpmsg: char: Add RPMSG GET/SET FLOWCONTROL IOCTL support
        rpmsg: glink: Add support to handle signals command
        rpmsg: core: Add signal API support
      3d904704
    • Linus Torvalds's avatar
      Merge tag 'hwlock-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux · e3a6fa00
      Linus Torvalds authored
      Pull hwspinlock updates from Bjorn Andersson:
       "Convert u8500 and omap drivers to void-returning remove.
      
        Complete the support for representing the Qualcomm TCSR mutex as a
        mmio device, and check the return value of devm_regmap_field_alloc()
        in the same"
      
      * tag 'hwlock-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
        hwspinlock: qcom: add missing regmap config for SFPB MMIO implementation
        hwspinlock: u8500: Convert to platform remove callback returning void
        hwspinlock: omap: Convert to platform remove callback returning void
        hwspinlock: omap: Emit only one error message for errors in .remove()
        hwspinlock: add a check of devm_regmap_field_alloc in qcom_hwspinlock_probe
      e3a6fa00
    • Linus Torvalds's avatar
      Merge tag 'leds-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds · 2be6bc48
      Linus Torvalds authored
      Pull LED updates from Lee Jones:
       "Core Frameworks:
         - Add new framework to support Group Multi-Color (GMC) LEDs
         - Offer an 'optional' API for non-essential LEDs
         - Support obtaining 'max brightness' values from Device Tree
         - Provide new led_classdev member 'color' (settable via DT and SYFS)
         - Stop TTY Trigger from using the old LED_ON constraints
         - Statically allocate leds_class
      
        New Drivers:
         - Add support for NXP PCA995x I2C Constant Current LED Driver
      
        New Device Support:
         - Add support for Siemens Simatic IPC BX-21 to Simatic IPC
      
        Fix-ups:
         - Some dependency / Kconfig tweaking
         - Move final probe() functions back over from .probe_new()
         - Simplify obtaining resources (memory, device data) using unified
           API helpers
         - Bunch of Device Tree additions, conversions and adaptions
         - Fix trivial styling issues; comments
         - Ensure correct includes are present and remove some that are not
           required
         - Omit the use of redundant casts and if relevant replace with better
           ones
         - Use purpose-built APIs for various actions; sysfs_emit(),
           module_led_trigger()
         - Remove a bunch of superfluous locking
      
        Bug Fixes:
         - Ensure error codes are correctly propagated back up the call chain
         - Fix incorrect error values from being returned (missing '-')
         - Ensure get'ed resources are put'ed to prevent leaks
         - Use correct class when exporting module resources
         - Fixing rounding (or lack there of) issues
         - Fix 'always false' LED_COLOR_ID_MULTI BUG() check"
      
      * tag 'leds-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds: (40 commits)
        leds: aw2013: Enable pull-up supply for interrupt and I2C
        dt-bindings: leds: Document pull-up supply for interrupt and I2C
        dt-bindings: leds: aw2013: Document interrupt
        leds: uleds: Use module_misc_device macro to simplify the code
        leds: trigger: netdev: Use module_led_trigger macro to simplify the code
        dt-bindings: leds: Fix reference to definition of default-state
        leds: turris-omnia: Drop unnecessary mutex locking
        leds: turris-omnia: Use sysfs_emit() instead of sprintf()
        leds: Make leds_class a static const structure
        leds: Remove redundant of_match_ptr()
        dt-bindings: leds: Add gpio-line-names to PCA9532 GPIO
        leds: trigger: tty: Do not use LED_ON/OFF constants, use led_blink_set_oneshot instead
        dt-bindings: leds: rohm,bd71828: Drop select:false
        leds: Fix BUG_ON check for LED_COLOR_ID_MULTI that is always false
        leds: multicolor: Use rounded division when calculating color components
        leds: rgb: Add a multicolor LED driver to group monochromatic LEDs
        dt-bindings: leds: Add binding for a multicolor group of LEDs
        leds: class: Store the color index in struct led_classdev
        leds: Provide devm_of_led_get_optional()
        leds: pca995x: Fix MODULE_DEVICE_TABLE for OF
        ...
      2be6bc48
    • Linus Torvalds's avatar
      Merge tag 'mfd-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd · d8723062
      Linus Torvalds authored
      Pull NFD updates from Lee Jones:
       "New Drivers:
         - Add support for the Cirrus Logic CS42L43 Audio CODEC
      
        Fix-ups:
         - Make use of specific printk() format tags for various optimisations
         - Kconfig / module modifications / tweaking
         - Simplify obtaining resources (memory, device data) using unified
           API helpers
         - Bunch of Device Tree additions, conversions and adaptions
         - Convert a bunch of Regmap configurations to use the Maple Tree
           cache
         - Ensure correct includes are present and remove some that are not
           required
         - Remove superfluous code
         - Reduce amount of cycles spent in critical sections
         - Omit the use of redundant casts and if relevant replace with better
           ones
         - Swap out raw_spin_{un}lock_irq{save,restore}() for
           spin_{un}lock_irq{save,restore}()
      
        Bug Fixes:
         - Repair theoretical deadlock situation
         - Fix some link-time dependencies
         - Use more appropriate datatype when casting"
      
      * tag 'mfd-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (70 commits)
        mfd: mc13xxx: Simplify device data fetching in probe()
        mfd: rz-mtu3: Replace raw_spin_lock->spin_lock()
        mfd: rz-mtu3: Reduce critical sections
        mfd: mxs-lradc: Fix Wvoid-pointer-to-enum-cast warning
        mfd: wm31x: Fix Wvoid-pointer-to-enum-cast warning
        mfd: wm8994: Fix Wvoid-pointer-to-enum-cast warning
        mfd: tc3589: Fix Wvoid-pointer-to-enum-cast warning
        mfd: lp87565: Fix Wvoid-pointer-to-enum-cast warning
        mfd: hi6421-pmic: Fix Wvoid-pointer-to-enum-cast warning
        mfd: max77541: Fix Wvoid-pointer-to-enum-cast warning
        mfd: max14577: Fix Wvoid-pointer-to-enum-cast warning
        mfd: stmpe: Fix Wvoid-pointer-to-enum-cast warning
        mfd: rn5t618: Remove redundant of_match_ptr()
        mfd: lochnagar-i2c: Remove redundant of_match_ptr()
        mfd: stpmic1: Remove redundant of_match_ptr()
        mfd: act8945a: Remove redundant of_match_ptr()
        mfd: rsmu_spi: Remove redundant of_match_ptr()
        mfd: altera-a10sr: Remove redundant of_match_ptr()
        mfd: rsmu_i2c: Remove redundant of_match_ptr()
        mfd: tc3589x: Remove redundant of_match_ptr()
        ...
      d8723062
    • Linus Torvalds's avatar
      Merge tag 'i2c-for-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · e3b85b07
      Linus Torvalds authored
      Pull i2c updates from Wolfram Sang:
       "I2C has mainly cleanups this time and a few driver improvements.
      
        Because a lot of developers were on holidays (including myself) it was
        a good timing to apply lots of cleanups which would normally cause
        merge conflicts with other floating patches. Extra thanks go to Andi
        Shyti who backed me up when I was on a four week hiatus. This is also
        the reason that some patches were commited later than ideal"
      
      * tag 'i2c-for-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (67 commits)
        i2c: at91: Use dev_err_probe() instead of dev_err()
        I2C: ali15x3: Do PCI error checks on own line
        i2c: Make return value check more accurate and explicit for devm_pinctrl_get()
        i2c: designware: Add support for recovery when GPIO need pinctrl
        i2c: mlxcpld: Add support for extended transaction length
        i2c: mlxcpld: Allow driver to run on ARM64 architecture
        i2c: nforce2: Do PCI error check on own line
        i2c: sis5595: Do PCI error checks on own line
        i2c: qcom-cci: Fix error checking in cci_probe()
        i2c: muxes: pca954x: Add regulator support
        i2c: muxes: pca954x: Add MAX735x/MAX736x support
        dt-bindings: i2c: Add Maxim MAX735x/MAX736x variants
        dt-bindings: i2c: pca954x: Correct interrupt support
        i2c: pnx: Use devm_platform_get_and_ioremap_resource()
        i2c: pxa: Use devm_platform_get_and_ioremap_resource()
        i2c: s3c2410: Use devm_platform_get_and_ioremap_resource()
        i2c: sh_mobile: Use devm_platform_get_and_ioremap_resource()
        i2c: st: Use devm_platform_get_and_ioremap_resource()
        i2c: qcom-geni: Convert to devm_platform_ioremap_resource()
        i2c: stm32f4: Use devm_platform_get_and_ioremap_resource()
        ...
      e3b85b07
    • Linus Torvalds's avatar
      Merge tag 'printk-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux · 3c31041e
      Linus Torvalds authored
      Pull printk updates from Petr Mladek:
      
       - Do not try to get the console lock when it is not need or useful in
         panic()
      
       - Replace the global console_suspended state by a per-console flag
      
       - Export symbols needed for dumping the raw printk buffer in panic()
      
       - Fix documentation of printf formats for integer types
      
       - Moved Sergey Senozhatsky to the reviewer role
      
       - Misc cleanups
      
      * tag 'printk-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
        printk: export symbols for debug modules
        lib: test_scanf: Add explicit type cast to result initialization in test_number_prefix()
        printk: ringbuffer: Fix truncating buffer size min_t cast
        printk: Rename abandon_console_lock_in_panic() to other_cpu_in_panic()
        printk: Add per-console suspended state
        printk: Consolidate console deferred printing
        printk: Do not take console lock for console_flush_on_panic()
        printk: Keep non-panic-CPUs out of console lock
        printk: Reduce console_unblank() usage in unsafe scenarios
        kdb: Do not assume write() callback available
        docs: printk-formats: Treat char as always unsigned
        docs: printk-formats: Fix hex printing of signed values
        MAINTAINERS: adjust printk/vsprintf entries
      3c31041e
    • Linus Torvalds's avatar
      Merge tag 'timers-core-2023-09-04-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4accdb98
      Linus Torvalds authored
      Pull clocksource/clockevent driver updates from Thomas Gleixner:
      
       - Remove the OXNAS driver instead of adding a new one!
      
       - A set of boring fixes, cleanups and improvements
      
      * tag 'timers-core-2023-09-04-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource: Explicitly include correct DT includes
        clocksource/drivers/sun5i: Convert to platform device driver
        clocksource/drivers/sun5i: Remove pointless struct
        clocksource/drivers/sun5i: Remove duplication of code and data
        clocksource/drivers/loongson1: Set variable ls1x_timer_lock storage-class-specifier to static
        clocksource/drivers/arm_arch_timer: Disable timer before programming CVAL
        dt-bindings: timer: oxsemi,rps-timer: remove obsolete bindings
        clocksource/drivers/timer-oxnas-rps: Remove obsolete timer driver
      4accdb98
    • Linus Torvalds's avatar
      Merge tag 'm68knommu-for-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu · 7a1415ee
      Linus Torvalds authored
      Pull m68knommu updates from Greg Ungerer:
       "Two changes, one a trivial white space clean up, the other removes the
        unnecessary local pcibios_setup() code"
      
      * tag 'm68knommu-for-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
        m68k: coldfire: dma_timer: ERROR: "foo __init bar" should be "foo __init bar"
        m68k/pci: Drop useless pcibios_setup()
      7a1415ee
    • Linus Torvalds's avatar
      Merge tag 'uml-for-linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux · 68d76d4e
      Linus Torvalds authored
      Pull UML updates from Richard Weinberger:
      
       - Drop 32-bit checksum implementation and re-use it from arch/x86
      
       - String function cleanup
      
       - Fixes for -Wmissing-variable-declarations and -Wmissing-prototypes
         builds
      
      * tag 'uml-for-linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
        um: virt-pci: fix missing declaration warning
        um: Refactor deprecated strncpy to memcpy
        um: fix 3 instances of -Wmissing-prototypes
        um: port_kern: fix -Wmissing-variable-declarations
        uml: audio: fix -Wmissing-variable-declarations
        um: vector: refactor deprecated strncpy
        um: use obj-y to descend into arch/um/*/
        um: Hard-code the result of 'uname -s'
        um: Use the x86 checksum implementation on 32-bit
        asm-generic: current: Don't include thread-info.h if building asm
        um: Remove unsued extern declaration ldt_host_info()
        um: Fix hostaudio build errors
        um: Remove strlcpy usage
      68d76d4e
    • Linus Torvalds's avatar
      Merge tag 'hyperv-next-signed-20230902' of... · 0b90c563
      Linus Torvalds authored
      Merge tag 'hyperv-next-signed-20230902' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
      
      Pull hyperv updates from Wei Liu:
      
       - Support for SEV-SNP guests on Hyper-V (Tianyu Lan)
      
       - Support for TDX guests on Hyper-V (Dexuan Cui)
      
       - Use SBRM API in Hyper-V balloon driver (Mitchell Levy)
      
       - Avoid dereferencing ACPI root object handle in VMBus driver (Maciej
         Szmigiero)
      
       - A few misecllaneous fixes (Jiapeng Chong, Nathan Chancellor, Saurabh
         Sengar)
      
      * tag 'hyperv-next-signed-20230902' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (24 commits)
        x86/hyperv: Remove duplicate include
        x86/hyperv: Move the code in ivm.c around to avoid unnecessary ifdef's
        x86/hyperv: Remove hv_isolation_type_en_snp
        x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM with the paravisor
        Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the paravisor
        x86/hyperv: Introduce a global variable hyperv_paravisor_present
        Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM
        x86/hyperv: Fix serial console interrupts for fully enlightened TDX guests
        Drivers: hv: vmbus: Support fully enlightened TDX guests
        x86/hyperv: Support hypercalls for fully enlightened TDX guests
        x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests
        x86/hyperv: Fix undefined reference to isolation_type_en_snp without CONFIG_HYPERV
        x86/hyperv: Add missing 'inline' to hv_snp_boot_ap() stub
        hv: hyperv.h: Replace one-element array with flexible-array member
        Drivers: hv: vmbus: Don't dereference ACPI root object handle
        x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES
        x86/hyperv: Add smp support for SEV-SNP guest
        clocksource: hyper-v: Mark hyperv tsc page unencrypted in sev-snp enlightened guest
        x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest
        drivers: hv: Mark percpu hvcall input arg page unencrypted in SEV-SNP enlightened guest
        ...
      0b90c563
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · e4f1b820
      Linus Torvalds authored
      Pull virtio updates from Michael Tsirkin:
       "A small pull request this time around, mostly because the vduse
        network got postponed to next relase so we can be sure we got the
        security store right"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        virtio_ring: fix avail_wrap_counter in virtqueue_add_packed
        virtio_vdpa: build affinity masks conditionally
        virtio_net: merge dma operations when filling mergeable buffers
        virtio_ring: introduce dma sync api for virtqueue
        virtio_ring: introduce dma map api for virtqueue
        virtio_ring: introduce virtqueue_reset()
        virtio_ring: separate the logic of reset/enable from virtqueue_resize
        virtio_ring: correct the expression of the description of virtqueue_resize()
        virtio_ring: skip unmap for premapped
        virtio_ring: introduce virtqueue_dma_dev()
        virtio_ring: support add premapped buf
        virtio_ring: introduce virtqueue_set_dma_premapped()
        virtio_ring: put mapping error check in vring_map_one_sg
        virtio_ring: check use_dma_api before unmap desc for indirect
        vdpa_sim: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK
        vdpa: add get_backend_features vdpa operation
        vdpa: accept VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature
        vdpa: add VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK flag
        vdpa/mlx5: Remove unused function declarations
      e4f1b820
    • Linus Torvalds's avatar
      Merge tag 'tomoyo-pr-20230903' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1 · 5c5e0e81
      Linus Torvalds authored
      Pull tomoyo updates from Tetsuo Handa:
       "Three cleanup patches, no behavior changes"
      
      * tag 'tomoyo-pr-20230903' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1:
        tomoyo: remove unused function declaration
        tomoyo: refactor deprecated strncpy
        tomoyo: add format attributes to functions
      5c5e0e81
    • Rafael J. Wysocki's avatar
      Merge branch 'pm-cpufreq' · 19a56a6b
      Rafael J. Wysocki authored
      Merge additional cpufreq updates for 6.6-rc1:
      
       - Add support for per-policy performance boost (Jie Zhan).
      
       - Fix assorted issues in the cpufreq core, common governor code and in
         the pcc cpufreq driver (Liao Chang).
      
      * pm-cpufreq:
        cpufreq: Support per-policy performance boost
        cpufreq: pcc: Fix the potentinal scheduling delays in target_index()
        cpufreq: governor: Free dbs_data directly when gov->init() fails
        cpufreq: Fix the race condition while updating the transition_task of policy
        cpufreq: Avoid printing kernel addresses in cpufreq_resume()
      19a56a6b
  3. Sep 04, 2023
    • Petr Mladek's avatar
      f0f69239
    • Petr Mladek's avatar
      b3553628
    • Yuan Yao's avatar
      virtio_ring: fix avail_wrap_counter in virtqueue_add_packed · 1acfe2c1
      Yuan Yao authored
      In current packed virtqueue implementation, the avail_wrap_counter won't
      flip, in the case when the driver supplies a descriptor chain with a
      length equals to the queue size; total_sg == vq->packed.vring.num.
      
      Let’s assume the following situation:
      vq->packed.vring.num=4
      vq->packed.next_avail_idx: 1
      vq->packed.avail_wrap_counter: 0
      
      Then the driver adds a descriptor chain containing 4 descriptors.
      
      We expect the following result with avail_wrap_counter flipped:
      vq->packed.next_avail_idx: 1
      vq->packed.avail_wrap_counter: 1
      
      But, the current implementation gives the following result:
      vq->packed.next_avail_idx: 1
      vq->packed.avail_wrap_counter: 0
      
      To reproduce the bug, you can set a packed queue size as small as
      possible, so that the driver is more likely to provide a descriptor
      chain with a length equal to the packed queue size. For example, in
      qemu run following commands:
      sudo qemu-system-x86_64 \
      -enable-kvm \
      -nographic \
      -kernel "path/to/kernel_image" \
      -m 1G \
      -drive file="path/to/rootfs",if=none,id=disk \
      -device virtio-blk,drive=disk \
      -drive file="path/to/disk_image",if=none,id=rwdisk \
      -device virtio-blk,drive=rwdisk,packed=on,queue-size=4,\
      indirect_desc=off \
      -append "console=ttyS0 root=/dev/vda rw init=/bin/bash"
      
      Inside the VM, create a directory and mount the rwdisk device on it. The
      rwdisk will hang and mount operation will not complete.
      
      This commit fixes the wrap counter error by flipping the
      packed.avail_wrap_counter, when start of descriptor chain equals to the
      end of descriptor chain (head == i).
      
      Fixes: 1ce9e605
      
       ("virtio_ring: introduce packed ring support")
      Signed-off-by: default avatarYuan Yao <yuanyaogoog@chromium.org>
      Message-Id: <20230808051110.3492693-1-yuanyaogoog@chromium.org>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      1acfe2c1
    • Jason Wang's avatar
      virtio_vdpa: build affinity masks conditionally · ae15acea
      Jason Wang authored
      We try to build affinity mask via create_affinity_masks()
      unconditionally which may lead several issues:
      
      - the affinity mask is not used for parent without affinity support
        (only VDUSE support the affinity now)
      - the logic of create_affinity_masks() might not work for devices
        other than block. For example it's not rare in the networking device
        where the number of queues could exceed the number of CPUs. Such
        case breaks the current affinity logic which is based on
        group_cpus_evenly() who assumes the number of CPUs are not less than
        the number of groups. This can trigger a warning[1]:
      
      	if (ret >= 0)
      		WARN_ON(nr_present + nr_others < numgrps);
      
      Fixing this by only build the affinity masks only when
      
      - Driver passes affinity descriptor, driver like virtio-blk can make
        sure to limit the number of queues when it exceeds the number of CPUs
      - Parent support affinity setting config ops
      
      This help to avoid the warning. More optimizations could be done on
      top.
      
      [1]
      [  682.146655] WARNING: CPU: 6 PID: 1550 at lib/group_cpus.c:400 group_cpus_evenly+0x1aa/0x1c0
      [  682.146668] CPU: 6 PID: 1550 Comm: vdpa Not tainted 6.5.0-rc5jason+ #79
      [  682.146671] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
      [  682.146673] RIP: 0010:group_cpus_evenly+0x1aa/0x1c0
      [  682.146676] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc e8 1b c4 74 ff 48 89 ef e8 13 ac 98 ff 4c 89 e7 45 31 e4 e8 08 ac 98 ff eb c2 <0f> 0b eb b6 e8 fd 05 c3 00 45 31 e4 eb e5 cc cc cc cc cc cc cc cc
      [  682.146679] RSP: 0018:ffffc9000215f498 EFLAGS: 00010293
      [  682.146682] RAX: 000000000001f1e0 RBX: 0000000000000041 RCX: 0000000000000000
      [  682.146684] RDX: ffff888109922058 RSI: 0000000000000041 RDI: 0000000000000030
      [  682.146686] RBP: ffff888109922058 R08: ffffc9000215f498 R09: ffffc9000215f4a0
      [  682.146687] R10: 00000000000198d0 R11: 0000000000000030 R12: ffff888107e02800
      [  682.146689] R13: 0000000000000030 R14: 0000000000000030 R15: 0000000000000041
      [  682.146692] FS:  00007fef52315740(0000) GS:ffff888237380000(0000) knlGS:0000000000000000
      [  682.146695] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  682.146696] CR2: 00007fef52509000 CR3: 0000000110dbc004 CR4: 0000000000370ee0
      [  682.146698] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  682.146700] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  682.146701] Call Trace:
      [  682.146703]  <TASK>
      [  682.146705]  ? __warn+0x7b/0x130
      [  682.146709]  ? group_cpus_evenly+0x1aa/0x1c0
      [  682.146712]  ? report_bug+0x1c8/0x1e0
      [  682.146717]  ? handle_bug+0x3c/0x70
      [  682.146721]  ? exc_invalid_op+0x14/0x70
      [  682.146723]  ? asm_exc_invalid_op+0x16/0x20
      [  682.146727]  ? group_cpus_evenly+0x1aa/0x1c0
      [  682.146729]  ? group_cpus_evenly+0x15c/0x1c0
      [  682.146731]  create_affinity_masks+0xaf/0x1a0
      [  682.146735]  virtio_vdpa_find_vqs+0x83/0x1d0
      [  682.146738]  ? __pfx_default_calc_sets+0x10/0x10
      [  682.146742]  virtnet_find_vqs+0x1f0/0x370
      [  682.146747]  virtnet_probe+0x501/0xcd0
      [  682.146749]  ? vp_modern_get_status+0x12/0x20
      [  682.146751]  ? get_cap_addr.isra.0+0x10/0xc0
      [  682.146754]  virtio_dev_probe+0x1af/0x260
      [  682.146759]  really_probe+0x1a5/0x410
      
      Fixes: 3dad5682
      
       ("virtio-vdpa: Support interrupt affinity spreading mechanism")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Message-Id: <20230811091539.1359865-1-jasowang@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      ae15acea
    • Xuan Zhuo's avatar
      virtio_net: merge dma operations when filling mergeable buffers · 295525e2
      Xuan Zhuo authored
      
      
      Currently, the virtio core will perform a dma operation for each
      buffer. Although, the same page may be operated multiple times.
      
      This patch, the driver does the dma operation and manages the dma
      address based the feature premapped of virtio core.
      
      This way, we can perform only one dma operation for the pages of the
      alloc frag. This is beneficial for the iommu device.
      
      kernel command line: intel_iommu=on iommu.passthrough=0
      
             |  strict=0  | strict=1
      Before |  775496pps | 428614pps
      After  | 1109316pps | 742853pps
      
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Message-Id: <20230810123057.43407-13-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      295525e2
    • Xuan Zhuo's avatar
      virtio_ring: introduce dma sync api for virtqueue · 8bd2f710
      Xuan Zhuo authored
      
      
      These API has been introduced:
      
      * virtqueue_dma_need_sync
      * virtqueue_dma_sync_single_range_for_cpu
      * virtqueue_dma_sync_single_range_for_device
      
      These APIs can be used together with the premapped mechanism to sync the
      DMA address.
      
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Message-Id: <20230810123057.43407-12-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      8bd2f710
    • Xuan Zhuo's avatar
      virtio_ring: introduce dma map api for virtqueue · b6253b4e
      Xuan Zhuo authored
      
      
      Added virtqueue_dma_map_api* to map DMA addresses for virtual memory in
      advance. The purpose is to keep memory mapped across multiple add/get
      buf operations.
      
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Message-Id: <20230810123057.43407-11-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      b6253b4e
    • Xuan Zhuo's avatar
      virtio_ring: introduce virtqueue_reset() · ba3e0c47
      Xuan Zhuo authored
      
      
      Introduce virtqueue_reset() to release all buffer inside vq.
      
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Message-Id: <20230810123057.43407-10-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      ba3e0c47
    • Xuan Zhuo's avatar
      virtio_ring: separate the logic of reset/enable from virtqueue_resize · ad48d53b
      Xuan Zhuo authored
      
      
      The subsequent reset function will reuse these logic.
      
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Message-Id: <20230810123057.43407-9-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      ad48d53b
    • Xuan Zhuo's avatar
      virtio_ring: correct the expression of the description of virtqueue_resize() · 4d09f240
      Xuan Zhuo authored
      
      
      Modify the "useless" to a more accurate "unused".
      
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Message-Id: <20230810123057.43407-8-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      4d09f240
    • Xuan Zhuo's avatar
      virtio_ring: skip unmap for premapped · b319940f
      Xuan Zhuo authored
      
      
      Now we add a case where we skip dma unmap, the vq->premapped is true.
      
      We can't just rely on use_dma_api to determine whether to skip the dma
      operation. For convenience, I introduced the "do_unmap". By default, it
      is the same as use_dma_api. If the driver is configured with premapped,
      then do_unmap is false.
      
      So as long as do_unmap is false, for addr of desc, we should skip dma
      unmap operation.
      
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Message-Id: <20230810123057.43407-7-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      b319940f
    • Xuan Zhuo's avatar
      virtio_ring: introduce virtqueue_dma_dev() · 2df64759
      Xuan Zhuo authored
      
      
      Added virtqueue_dma_dev() to get DMA device for virtio. Then the
      caller can do dma operation in advance. The purpose is to keep memory
      mapped across multiple add/get buf operations.
      
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Message-Id: <20230810123057.43407-6-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      2df64759
    • Xuan Zhuo's avatar
      virtio_ring: support add premapped buf · d7344a2f
      Xuan Zhuo authored
      
      
      If the vq is the premapped mode, use the sg_dma_address() directly.
      
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Message-Id: <20230810123057.43407-5-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      d7344a2f
    • Xuan Zhuo's avatar
      virtio_ring: introduce virtqueue_set_dma_premapped() · 8daafe9e
      Xuan Zhuo authored
      
      
      This helper allows the driver change the dma mode to premapped mode.
      Under the premapped mode, the virtio core do not do dma mapping
      internally.
      
      This just work when the use_dma_api is true. If the use_dma_api is false,
      the dma options is not through the DMA APIs, that is not the standard
      way of the linux kernel.
      
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Message-Id: <20230810123057.43407-4-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      8daafe9e
    • Xuan Zhuo's avatar
      virtio_ring: put mapping error check in vring_map_one_sg · 0e27fa6d
      Xuan Zhuo authored
      
      
      This patch put the dma addr error check in vring_map_one_sg().
      
      The benefits of doing this:
      
      1. reduce one judgment of vq->use_dma_api.
      2. make vring_map_one_sg more simple, without calling
         vring_mapping_error to check the return value. simplifies subsequent
         code
      
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Message-Id: <20230810123057.43407-3-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      0e27fa6d
    • Xuan Zhuo's avatar
      virtio_ring: check use_dma_api before unmap desc for indirect · 610c708b
      Xuan Zhuo authored
      
      
      Inside detach_buf_split(), if use_dma_api is false,
      vring_unmap_one_split_indirect will be called many times, but actually
      nothing is done. So this patch check use_dma_api firstly.
      
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Message-Id: <20230810123057.43407-2-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      610c708b
    • Eugenio Pérez's avatar
      vdpa_sim: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK · 2c9c6371
      Eugenio Pérez authored
      
      
      Start offering the feature in the simulator.  Other parent drivers can
      follow this code to offer it too.
      
      Signed-off-by: default avatarEugenio Pérez <eperezma@redhat.com>
      Acked-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Message-Id: <20230609092127.170673-5-eperezma@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      2c9c6371