Commit 6d1f82e6 authored by Yu Kuai's avatar Yu Kuai Committed by Zheng Zengkai
Browse files

block: ensure the memory order between bi_private and bi_status

hulk inclusion
category: bugfix
bugzilla: 167067 https://gitee.com/openeuler/kernel/issues/I4DDEL



--------------------------------

When running stress test on null_blk under linux-4.19.y, the following
warning is reported:

  percpu_ref_switch_to_atomic_rcu: percpu ref (css_release) <= 0 (-3) after switching to atomic

The cause is that css_put() is invoked twice on the same bio as shown below:

CPU 1:                         CPU 2:

// IO completion kworker       // IO submit thread
                               __blkdev_direct_IO_simple
                                 submit_bio

bio_endio
  bio_uninit(bio)
    css_put(bi_css)
    bi_css = NULL
                               set_current_state(TASK_UNINTERRUPTIBLE)
  bio->bi_end_io
    blkdev_bio_end_io_simple
      bio->bi_private = NULL
                               // bi_private is NULL
                               READ_ONCE(bio->bi_private)
        wake_up_process
          smp_mb__after_spinlock

                               bio_unint(bio)
                                 // read bi_css as no-NULL
                                 // so call css_put() again
                                 css_put(bi_css)

Because there is no memory barriers between the reading and the writing of
bi_private and bi_css, so reading bi_private as NULL can not guarantee
bi_css will also be NULL on weak-memory model host (e.g, ARM64).

For the latest kernel source, css_put() has been removed from bio_unint(),
but the memory-order problem still exists, because the order between
bio->bi_private and {bi_status|bi_blkg} is also assumed in
__blkdev_direct_IO_simple(). It is reproducible that
__blkdev_direct_IO_simple() may read bi_status as 0 event if
bi_status is set as an errno in req_bio_endio().

In __blkdev_direct_IO(), the memory order between dio->waiter and
dio->bio.bi_status is not guaranteed neither. Until now it is unable to
reproduce it, maybe because dio->waiter and dio->bio.bi_status are
in the same cache-line. But it is better to add guarantee for memory
order.

Fixing it by using smp_wmb() and spm_rmb() to guarantee the order between
{bio->bi_private|dio->waiter} and {bi_status|bi_blkg}.

Fixes: 189ce2b9 ("block: fast-path for small and simple direct I/O requests")
Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
Reviewed-by: default avatarHou Tao <houtao1@huawei.com>
Signed-off-by: default avatarChen Jun <chenjun102@huawei.com>
Signed-off-by: default avatarZheng Zengkai <zhengzengkai@huawei.com>
parent ae96841e
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment