Skip to content
  1. Oct 09, 2022
  2. Oct 06, 2022
  3. Oct 05, 2022
  4. Sep 30, 2022
    • Hugh Dickins's avatar
      sbitmap: fix lockup while swapping · 30514bd2
      Hugh Dickins authored
      Commit 4acb8341 ("sbitmap: fix batched wait_cnt accounting")
      is a big improvement: without it, I had to revert to before commit
      040b83fc ("sbitmap: fix possible io hung due to lost wakeup")
      to avoid the high system time and freezes which that had introduced.
      
      Now okay on the NVME laptop, but 4acb8341
      
       is a disaster for heavy
      swapping (kernel builds in low memory) on another: soon locking up in
      sbitmap_queue_wake_up() (into which __sbq_wake_up() is inlined), cycling
      around with waitqueue_active() but wait_cnt 0 .  Here is a backtrace,
      showing the common pattern of outer sbitmap_queue_wake_up() interrupted
      before setting wait_cnt 0 back to wake_batch (in some cases other CPUs
      are idle, in other cases they're spinning for a lock in dd_bio_merge()):
      
      sbitmap_queue_wake_up < sbitmap_queue_clear < blk_mq_put_tag <
      __blk_mq_free_request < blk_mq_free_request < __blk_mq_end_request <
      scsi_end_request < scsi_io_completion < scsi_finish_command <
      scsi_complete < blk_complete_reqs < blk_done_softirq < __do_softirq <
      __irq_exit_rcu < irq_exit_rcu < common_interrupt < asm_common_interrupt <
      _raw_spin_unlock_irqrestore < __wake_up_common_lock < __wake_up <
      sbitmap_queue_wake_up < sbitmap_queue_clear < blk_mq_put_tag <
      __blk_mq_free_request < blk_mq_free_request < dd_bio_merge <
      blk_mq_sched_bio_merge < blk_mq_attempt_bio_merge < blk_mq_submit_bio <
      __submit_bio < submit_bio_noacct_nocheck < submit_bio_noacct <
      submit_bio < __swap_writepage < swap_writepage < pageout <
      shrink_folio_list < evict_folios < lru_gen_shrink_lruvec <
      shrink_lruvec < shrink_node < do_try_to_free_pages < try_to_free_pages <
      __alloc_pages_slowpath < __alloc_pages < folio_alloc < vma_alloc_folio <
      do_anonymous_page < __handle_mm_fault < handle_mm_fault <
      do_user_addr_fault < exc_page_fault < asm_exc_page_fault
      
      See how the process-context sbitmap_queue_wake_up() has been interrupted,
      after bringing wait_cnt down to 0 (and in this example, after doing its
      wakeups), before advancing wake_index and refilling wake_cnt: an
      interrupt-context sbitmap_queue_wake_up() of the same sbq gets stuck.
      
      I have almost no grasp of all the possible sbitmap races, and their
      consequences: but __sbq_wake_up() can do nothing useful while wait_cnt 0,
      so it is better if sbq_wake_ptr() skips on to the next ws in that case:
      which fixes the lockup and shows no adverse consequence for me.
      
      The check for wait_cnt being 0 is obviously racy, and ultimately can lead
      to lost wakeups: for example, when there is only a single waitqueue with
      waiters.  However, lost wakeups are unlikely to matter in these cases,
      and a proper fix requires redesign (and benchmarking) of the batched
      wakeup code: so let's plug the hole with this bandaid for now.
      
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Link: https://lore.kernel.org/r/9c2038a7-cdc5-5ee-854c-fbc6168bf16@google.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      30514bd2
  5. Sep 29, 2022
  6. Sep 28, 2022
  7. Sep 27, 2022