Skip to content
  1. Jan 11, 2022
    • Zhihao Cheng's avatar
      ubifs: Fix to add refcount once page is set private · 3b67db8a
      Zhihao Cheng authored
      MM defined the rule [1] very clearly that once page was set with PG_private
      flag, we should increment the refcount in that page, also main flows like
      pageout(), migrate_page() will assume there is one additional page
      reference count if page_has_private() returns true. Otherwise, we may
      get a BUG in page migration:
      
        page:0000000080d05b9d refcount:-1 mapcount:0 mapping:000000005f4d82a8
        index:0xe2 pfn:0x14c12
        aops:ubifs_file_address_operations [ubifs] ino:8f1 dentry name:"f30e"
        flags: 0x1fffff80002405(locked|uptodate|owner_priv_1|private|node=0|
        zone=1|lastcpupid=0x1fffff)
        page dumped because: VM_BUG_ON_PAGE(page_count(page) != 0)
        ------------[ cut here ]------------
        kernel BUG at include/linux/page_ref.h:184!
        invalid opcode: 0000 [#1] SMP
        CPU: 3 PID: 38 Comm: kcompactd0 Not tainted 5.15.0-rc5
        RIP: 0010:migrate_page_move_mapping+0xac3/0xe70
        Call Trace:
          ubifs_migrate_page+0x22/0xc0 [ubifs]
          move_to_new_page+0xb4/0x600
          migrate_pages+0x1523/0x1cc0
          compact_zone+0x8c5/0x14b0
          kcompactd+0x2bc/0x560
          kthread+0x18c/0x1e0
          ret_from_fork+0x1f/0x30
      
      Before the time, we should make clean a concept, what does refcount means
      in page gotten from grab_cache_page_write_begin(). There are 2 situations:
      Situation 1: refcount is 3, page is created by __page_cache_alloc.
        TYPE_A - the write process is using this page
        TYPE_B - page is assigned to one certain mapping by calling
      	   __add_to_page_cache_locked()
        TYPE_C - page is added into pagevec list corresponding current cpu by
      	   calling lru_cache_add()
      Situation 2: refcount is 2, page is gotten from the mapping's tree
        TYPE_B - page has been assigned to one certain mapping
        TYPE_A - the write process is using this page (by calling
      	   page_cache_get_speculative())
      Filesystem releases one refcount by calling put_page() in xxx_write_end(),
      the released refcount corresponds to TYPE_A (write task is using it). If
      there are any processes using a page, page migration process will skip the
      page by judging whether expected_page_refs() equals to page refcount.
      
      The BUG is caused by following process:
          PA(cpu 0)                           kcompactd(cpu 1)
      				compact_zone
      ubifs_write_begin
        page_a = grab_cache_page_write_begin
          add_to_page_cache_lru
            lru_cache_add
              pagevec_add // put page into cpu 0's pagevec
        (refcnf = 3, for page creation process)
      ubifs_write_end
        SetPagePrivate(page_a) // doesn't increase page count !
        unlock_page(page_a)
        put_page(page_a)  // refcnt = 2
      				[...]
      
          PB(cpu 0)
      filemap_read
        filemap_get_pages
          add_to_page_cache_lru
            lru_cache_add
              __pagevec_lru_add // traverse all pages in cpu 0's pagevec
      	  __pagevec_lru_add_fn
      	    SetPageLRU(page_a)
      				isolate_migratepages
                                        isolate_migratepages_block
      				    get_page_unless_zero(page_a)
      				    // refcnt = 3
                                            list_add(page_a, from_list)
      				migrate_pages(from_list)
      				  __unmap_and_move
      				    move_to_new_page
      				      ubifs_migrate_page(page_a)
      				        migrate_page_move_mapping
      					  expected_page_refs get 3
                                        (migration[1] + mapping[1] + private[1])
      	 release_pages
      	   put_page_testzero(page_a) // refcnt = 3
                                                page_ref_freeze  // refcnt = 0
      	     page_ref_dec_and_test(0 - 1 = -1)
                                                page_ref_unfreeze
                                                  VM_BUG_ON_PAGE(-1 != 0, page)
      
      UBIFS doesn't increase the page refcount after setting private flag, which
      leads to page migration task believes the page is not used by any other
      processes, so the page is migrated. This causes concurrent accessing on
      page refcount between put_page() called by other process(eg. read process
      calls lru_cache_add) and page_ref_unfreeze() called by migration task.
      
      Actually zhangjun has tried to fix this problem [2] by recalculating page
      refcnt in ubifs_migrate_page(). It's better to follow MM rules [1], because
      just like Kirill suggested in [2], we need to check all users of
      page_has_private() helper. Like f2fs does in [3], fix it by adding/deleting
      refcount when setting/clearing private for a page. BTW, according to [4],
      we set 'page->private' as 1 because ubifs just simply SetPagePrivate().
      And, [5] provided a common helper to set/clear page private, ubifs can
      use this helper following the example of iomap, afs, btrfs, etc.
      
      Jump [6] to find a reproducer.
      
      [1] https://lore.kernel.org/lkml/2b19b3c4-2bc4-15fa-15cc-27a13e5c7af1@aol.com
      [2] https://www.spinics.net/lists/linux-mtd/msg04018.html
      [3] http://lkml.iu.edu/hypermail/linux/kernel/1903.0/03313.html
      [4] https://lore.kernel.org/linux-f2fs-devel/20210422154705.GO3596236@casper.infradead.org
      [5] https://lore.kernel.org/all/20200517214718.468-1-guoqing.jiang@cloud.ionos.com
      [6] https://bugzilla.kernel.org/show_bug.cgi?id=214961
      
      Fixes: 1e51764a
      
       ("UBIFS: add new flash file system")
      Signed-off-by: default avatarZhihao Cheng <chengzhihao1@huawei.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      3b67db8a
    • Zhihao Cheng's avatar
      ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock() · 4f2262a3
      Zhihao Cheng authored
      Function ubifs_wbuf_write_nolock() may access buf out of bounds in
      following process:
      
      ubifs_wbuf_write_nolock():
        aligned_len = ALIGN(len, 8);   // Assume len = 4089, aligned_len = 4096
        if (aligned_len <= wbuf->avail) ... // Not satisfy
        if (wbuf->used) {
          ubifs_leb_write()  // Fill some data in avail wbuf
          len -= wbuf->avail;   // len is still not 8-bytes aligned
          aligned_len -= wbuf->avail;
        }
        n = aligned_len >> c->max_write_shift;
        if (n) {
          n <<= c->max_write_shift;
          err = ubifs_leb_write(c, wbuf->lnum, buf + written,
                                wbuf->offs, n);
          // n > len, read out of bounds less than 8(n-len) bytes
        }
      
      , which can be catched by KASAN:
        =========================================================
        BUG: KASAN: slab-out-of-bounds in ecc_sw_hamming_calculate+0x1dc/0x7d0
        Read of size 4 at addr ffff888105594ff8 by task kworker/u8:4/128
        Workqueue: writeback wb_workfn (flush-ubifs_0_0)
        Call Trace:
          kasan_report.cold+0x81/0x165
          nand_write_page_swecc+0xa9/0x160
          ubifs_leb_write+0xf2/0x1b0 [ubifs]
          ubifs_wbuf_write_nolock+0x421/0x12c0 [ubifs]
          write_head+0xdc/0x1c0 [ubifs]
          ubifs_jnl_write_inode+0x627/0x960 [ubifs]
          wb_workfn+0x8af/0xb80
      
      Function ubifs_wbuf_write_nolock() accepts that parameter 'len' is not 8
      bytes aligned, the 'len' represents the true length of buf (which is
      allocated in 'ubifs_jnl_xxx', eg. ubifs_jnl_write_inode), so
      ubifs_wbuf_write_nolock() must handle the length read from 'buf' carefully
      to write leb safely.
      
      Fetch a reproducer in [Link].
      
      Fixes: 1e51764a ("UBIFS: add new flash file system")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=214785
      
      
      Reported-by: default avatarChengsong Ke <kechengsong@huawei.com>
      Signed-off-by: default avatarZhihao Cheng <chengzhihao1@huawei.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      4f2262a3
    • Zhihao Cheng's avatar
      ubifs: setflags: Make dirtied_ino_d 8 bytes aligned · 1b83ec05
      Zhihao Cheng authored
      Make 'ui->data_len' aligned with 8 bytes before it is assigned to
      dirtied_ino_d. Since 8871d84c("ubifs: convert to fileattr")
      applied, 'setflags()' only affects regular files and directories, only
      xattr inode, symlink inode and special inode(pipe/char_dev/block_dev)
      have none- zero 'ui->data_len' field, so assertion
      '!(req->dirtied_ino_d & 7)' cannot fail in ubifs_budget_space().
      To avoid assertion fails in future evolution(eg. setflags can operate
      special inodes), it's better to make dirtied_ino_d 8 bytes aligned,
      after all aligned size is still zero for regular files.
      
      Fixes: 1e51764a
      
       ("UBIFS: add new flash file system")
      Signed-off-by: default avatarZhihao Cheng <chengzhihao1@huawei.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      1b83ec05
    • Zhihao Cheng's avatar
      ubifs: Rectify space amount budget for mkdir/tmpfile operations · a6dab660
      Zhihao Cheng authored
      UBIFS should make sure the flash has enough space to store dirty (Data
      that is newer than disk) data (in memory), space budget is exactly
      designed to do that. If space budget calculates less data than we need,
      'make_reservation()' will do more work(return -ENOSPC if no free space
      lelf, sometimes we can see "cannot reserve xxx bytes in jhead xxx, error
      -28" in ubifs error messages) with ubifs inodes locked, which may effect
      other syscalls.
      
      A simple way to decide how much space do we need when make a budget:
      See how much space is needed by 'make_reservation()' in ubifs_jnl_xxx()
      function according to corresponding operation.
      
      It's better to report ENOSPC in ubifs_budget_space(), as early as we can.
      
      Fixes: 474b9370 ("ubifs: Implement O_TMPFILE")
      Fixes: 1e51764a
      
       ("UBIFS: add new flash file system")
      Signed-off-by: default avatarZhihao Cheng <chengzhihao1@huawei.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      a6dab660
    • Zhihao Cheng's avatar
      ubifs: Fix 'ui->dirty' race between do_tmpfile() and writeback work · 60eb3b9c
      Zhihao Cheng authored
      'ui->dirty' is not protected by 'ui_mutex' in function do_tmpfile() which
      may race with ubifs_write_inode[wb_workfn] to access/update 'ui->dirty',
      finally dirty space is released twice.
      
      	open(O_TMPFILE)                wb_workfn
      do_tmpfile
        ubifs_budget_space(ino_req = { .dirtied_ino = 1})
        d_tmpfile // mark inode(tmpfile) dirty
        ubifs_jnl_update // without holding tmpfile's ui_mutex
          mark_inode_clean(ui)
            if (ui->dirty)
              ubifs_release_dirty_inode_budget(ui)  // release first time
                                         ubifs_write_inode
      				     mutex_lock(&ui->ui_mutex)
                                           ubifs_release_dirty_inode_budget(ui)
      				     // release second time
      				     mutex_unlock(&ui->ui_mutex)
            ui->dirty = 0
      
      Run generic/476 can reproduce following message easily
      (See reproducer in [Link]):
      
        UBIFS error (ubi0:0 pid 2578): ubifs_assert_failed [ubifs]: UBIFS assert
        failed: c->bi.dd_growth >= 0, in fs/ubifs/budget.c:554
        UBIFS warning (ubi0:0 pid 2578): ubifs_ro_mode [ubifs]: switched to
        read-only mode, error -22
        Workqueue: writeback wb_workfn (flush-ubifs_0_0)
        Call Trace:
          ubifs_ro_mode+0x54/0x60 [ubifs]
          ubifs_assert_failed+0x4b/0x80 [ubifs]
          ubifs_release_budget+0x468/0x5a0 [ubifs]
          ubifs_release_dirty_inode_budget+0x53/0x80 [ubifs]
          ubifs_write_inode+0x121/0x1f0 [ubifs]
          ...
          wb_workfn+0x283/0x7b0
      
      Fix it by holding tmpfile ubifs inode lock during ubifs_jnl_update().
      Similar problem exists in whiteout renaming, but previous fix("ubifs:
      Rename whiteout atomically") has solved the problem.
      
      Fixes: 474b9370 ("ubifs: Implement O_TMPFILE")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=214765
      
      
      Signed-off-by: default avatarZhihao Cheng <chengzhihao1@huawei.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      60eb3b9c
    • Zhihao Cheng's avatar
      ubifs: Rename whiteout atomically · 278d9a24
      Zhihao Cheng authored
      Currently, rename whiteout has 3 steps:
        1. create tmpfile(which associates old dentry to tmpfile inode) for
           whiteout, and store tmpfile to disk
        2. link whiteout, associate whiteout inode to old dentry agagin and
           store old dentry, old inode, new dentry on disk
        3. writeback dirty whiteout inode to disk
      
      Suddenly power-cut or error occurring(eg. ENOSPC returned by budget,
      memory allocation failure) during above steps may cause kinds of problems:
        Problem 1: ENOSPC returned by whiteout space budget (before step 2),
      	     old dentry will disappear after rename syscall, whiteout file
      	     cannot be found either.
      
      	     ls dir  // we get file, whiteout
      	     rename(dir/file, dir/whiteout, REANME_WHITEOUT)
      	     ENOSPC = ubifs_budget_space(&wht_req) // return
      	     ls dir  // empty (no file, no whiteout)
        Problem 2: Power-cut happens before step 3, whiteout inode with 'nlink=1'
      	     is not stored on disk, whiteout dentry(old dentry) is written
      	     on disk, whiteout file is lost on next mount (We get "dead
      	     directory entry" after executing 'ls -l' on whiteout file).
      
      Now, we use following 3 steps to finish rename whiteout:
        1. create an in-mem inode with 'nlink = 1' as whiteout
        2. ubifs_jnl_rename (Write on disk to finish associating old dentry to
           whiteout inode, associating new dentry with old inode)
        3. iput(whiteout)
      
      Rely writing in-mem inode on disk by ubifs_jnl_rename() to finish rename
      whiteout, which avoids middle disk state caused by suddenly power-cut
      and error occurring.
      
      Fixes: 9e0a1fff
      
       ("ubifs: Implement RENAME_WHITEOUT")
      Signed-off-by: default avatarZhihao Cheng <chengzhihao1@huawei.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      278d9a24
  2. Jan 10, 2022
    • Zhihao Cheng's avatar
      ubifs: Add missing iput if do_tmpfile() failed in rename whiteout · 716b4573
      Zhihao Cheng authored
      whiteout inode should be put when do_tmpfile() failed if inode has been
      initialized. Otherwise we will get following warning during umount:
        UBIFS error (ubi0:0 pid 1494): ubifs_assert_failed [ubifs]: UBIFS
        assert failed: c->bi.dd_growth == 0, in fs/ubifs/super.c:1930
        VFS: Busy inodes after unmount of ubifs. Self-destruct in 5 seconds.
      
      Fixes: 9e0a1fff
      
       ("ubifs: Implement RENAME_WHITEOUT")
      Signed-off-by: default avatarZhihao Cheng <chengzhihao1@huawei.com>
      Suggested-by: default avatarSascha Hauer <s.hauer@pengutronix.de>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      716b4573
    • Zhihao Cheng's avatar
      ubifs: Fix wrong number of inodes locked by ui_mutex in ubifs_inode comment · 7a8884fe
      Zhihao Cheng authored
      Since 9ec64962("ubifs: Implement RENAME_EXCHANGE") and
      9e0a1fff
      
      ("ubifs: Implement RENAME_WHITEOUT") are applied,
      ubifs_rename locks and changes 4 ubifs inodes, correct the comment
      for ui_mutex in ubifs_inode.
      
      Signed-off-by: default avatarZhihao Cheng <chengzhihao1@huawei.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      7a8884fe
    • Zhihao Cheng's avatar
      ubifs: Fix deadlock in concurrent rename whiteout and inode writeback · afd42704
      Zhihao Cheng authored
      Following hung tasks:
      [   77.028764] task:kworker/u8:4    state:D stack:    0 pid:  132
      [   77.028820] Call Trace:
      [   77.029027]  schedule+0x8c/0x1b0
      [   77.029067]  mutex_lock+0x50/0x60
      [   77.029074]  ubifs_write_inode+0x68/0x1f0 [ubifs]
      [   77.029117]  __writeback_single_inode+0x43c/0x570
      [   77.029128]  writeback_sb_inodes+0x259/0x740
      [   77.029148]  wb_writeback+0x107/0x4d0
      [   77.029163]  wb_workfn+0x162/0x7b0
      
      [   92.390442] task:aa              state:D stack:    0 pid: 1506
      [   92.390448] Call Trace:
      [   92.390458]  schedule+0x8c/0x1b0
      [   92.390461]  wb_wait_for_completion+0x82/0xd0
      [   92.390469]  __writeback_inodes_sb_nr+0xb2/0x110
      [   92.390472]  writeback_inodes_sb_nr+0x14/0x20
      [   92.390476]  ubifs_budget_space+0x705/0xdd0 [ubifs]
      [   92.390503]  do_rename.cold+0x7f/0x187 [ubifs]
      [   92.390549]  ubifs_rename+0x8b/0x180 [ubifs]
      [   92.390571]  vfs_rename+0xdb2/0x1170
      [   92.390580]  do_renameat2+0x554/0x770
      
      , are caused by concurrent rename whiteout and inode writeback processes:
      	rename_whiteout(Thread 1)	        wb_workfn(Thread2)
      ubifs_rename
        do_rename
          lock_4_inodes (Hold ui_mutex)
          ubifs_budget_space
            make_free_space
              shrink_liability
      	  __writeback_inodes_sb_nr
      	    bdi_split_work_to_wbs (Queue new wb work)
      					      wb_do_writeback(wb work)
      						__writeback_single_inode
      					          ubifs_write_inode
      					            LOCK(ui_mutex)
      							   ↑
      	      wb_wait_for_completion (Wait wb work) <-- deadlock!
      
      Reproducer (Detail program in [Link]):
        1. SYS_renameat2("/mp/dir/file", "/mp/dir/whiteout", RENAME_WHITEOUT)
        2. Consume out of space before kernel(mdelay) doing budget for whiteout
      
      Fix it by doing whiteout space budget before locking ubifs inodes.
      BTW, it also fixes wrong goto tag 'out_release' in whiteout budget
      error handling path(It should at least recover dir i_size and unlock
      4 ubifs inodes).
      
      Fixes: 9e0a1fff ("ubifs: Implement RENAME_WHITEOUT")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=214733
      
      
      Signed-off-by: default avatarZhihao Cheng <chengzhihao1@huawei.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      afd42704
    • Zhihao Cheng's avatar
      ubifs: rename_whiteout: Fix double free for whiteout_ui->data · 40a8f0d5
      Zhihao Cheng authored
      'whiteout_ui->data' will be freed twice if space budget fail for
      rename whiteout operation as following process:
      
      rename_whiteout
        dev = kmalloc
        whiteout_ui->data = dev
        kfree(whiteout_ui->data)  // Free first time
        iput(whiteout)
          ubifs_free_inode
            kfree(ui->data)	    // Double free!
      
      KASAN reports:
      ==================================================================
      BUG: KASAN: double-free or invalid-free in ubifs_free_inode+0x4f/0x70
      Call Trace:
        kfree+0x117/0x490
        ubifs_free_inode+0x4f/0x70 [ubifs]
        i_callback+0x30/0x60
        rcu_do_batch+0x366/0xac0
        __do_softirq+0x133/0x57f
      
      Allocated by task 1506:
        kmem_cache_alloc_trace+0x3c2/0x7a0
        do_rename+0x9b7/0x1150 [ubifs]
        ubifs_rename+0x106/0x1f0 [ubifs]
        do_syscall_64+0x35/0x80
      
      Freed by task 1506:
        kfree+0x117/0x490
        do_rename.cold+0x53/0x8a [ubifs]
        ubifs_rename+0x106/0x1f0 [ubifs]
        do_syscall_64+0x35/0x80
      
      The buggy address belongs to the object at ffff88810238bed8 which
      belongs to the cache kmalloc-8 of size 8
      ==================================================================
      
      Let ubifs_free_inode() free 'whiteout_ui->data'. BTW, delete unused
      assignment 'whiteout_ui->data_len = 0', process 'ubifs_evict_inode()
      -> ubifs_jnl_delete_inode() -> ubifs_jnl_write_inode()' doesn't need it
      (because 'inc_nlink(whiteout)' won't be excuted by 'goto out_release',
       and the nlink of whiteout inode is 0).
      
      Fixes: 9e0a1fff
      
       ("ubifs: Implement RENAME_WHITEOUT")
      Signed-off-by: default avatarZhihao Cheng <chengzhihao1@huawei.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      40a8f0d5
    • Baokun Li's avatar
      ubi: Fix race condition between ctrl_cdev_ioctl and ubi_cdev_ioctl · 3cbf0e39
      Baokun Li authored
      Hulk Robot reported a KASAN report about use-after-free:
       ==================================================================
       BUG: KASAN: use-after-free in __list_del_entry_valid+0x13d/0x160
       Read of size 8 at addr ffff888035e37d98 by task ubiattach/1385
       [...]
       Call Trace:
        klist_dec_and_del+0xa7/0x4a0
        klist_put+0xc7/0x1a0
        device_del+0x4d4/0xed0
        cdev_device_del+0x1a/0x80
        ubi_attach_mtd_dev+0x2951/0x34b0 [ubi]
        ctrl_cdev_ioctl+0x286/0x2f0 [ubi]
      
       Allocated by task 1414:
        device_add+0x60a/0x18b0
        cdev_device_add+0x103/0x170
        ubi_create_volume+0x1118/0x1a10 [ubi]
        ubi_cdev_ioctl+0xb7f/0x1ba0 [ubi]
      
       Freed by task 1385:
        cdev_device_del+0x1a/0x80
        ubi_remove_volume+0x438/0x6c0 [ubi]
        ubi_cdev_ioctl+0xbf4/0x1ba0 [ubi]
       [...]
       ==================================================================
      
      The lock held by ctrl_cdev_ioctl is ubi_devices_mutex, but the lock held
      by ubi_cdev_ioctl is ubi->device_mutex. Therefore, the two locks can be
      concurrent.
      
      ctrl_cdev_ioctl contains two operations: ubi_attach and ubi_detach.
      ubi_detach is bug-free because it uses reference counting to prevent
      concurrency. However, uif_init and uif_close in ubi_attach may race with
      ubi_cdev_ioctl.
      
      uif_init will race with ubi_cdev_ioctl as in the following stack.
                 cpu1                   cpu2                  cpu3
      _______________________|________________________|______________________
      ctrl_cdev_ioctl
       ubi_attach_mtd_dev
        uif_init
                                 ubi_cdev_ioctl
                                  ubi_create_volume
                                   cdev_device_add
         ubi_add_volume
         // sysfs exist
         kill_volumes
                                                          ubi_cdev_ioctl
                                                           ubi_remove_volume
                                                            cdev_device_del
                                                             // first free
          ubi_free_volume
           cdev_del
           // double free
         cdev_device_del
      
      And uif_close will race with ubi_cdev_ioctl as in the following stack.
                 cpu1                   cpu2                  cpu3
      _______________________|________________________|______________________
      ctrl_cdev_ioctl
       ubi_attach_mtd_dev
        uif_init
                                 ubi_cdev_ioctl
                                  ubi_create_volume
                                   cdev_device_add
        ubi_debugfs_init_dev
        //error goto out_uif;
        uif_close
         kill_volumes
                                                          ubi_cdev_ioctl
                                                           ubi_remove_volume
                                                            cdev_device_del
                                                             // first free
          ubi_free_volume
          // double free
      
      The cause of this problem is that commit 714fb87e make device
      "available" before it becomes accessible via sysfs. Therefore, we
      roll back the modification. We will fix the race condition between
      ubi device creation and udev by removing ubi_get_device in
      vol_attribute_show and dev_attribute_show.This avoids accessing
      uninitialized ubi_devices[ubi_num].
      
      ubi_get_device is used to prevent devices from being deleted during
      sysfs execution. However, now kernfs ensures that devices will not
      be deleted before all reference counting are released.
      The key process is shown in the following stack.
      
      device_del
        device_remove_attrs
          device_remove_groups
            sysfs_remove_groups
              sysfs_remove_group
                remove_files
                  kernfs_remove_by_name
                    kernfs_remove_by_name_ns
                      __kernfs_remove
                        kernfs_drain
      
      Fixes: 714fb87e
      
       ("ubi: Fix race condition between ubi device creation and udev")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      3cbf0e39
  3. Dec 24, 2021
    • Kyeong Yoo's avatar
      jffs2: GC deadlock reading a page that is used in jffs2_write_begin() · aa39cc67
      Kyeong Yoo authored
      
      
      GC task can deadlock in read_cache_page() because it may attempt
      to release a page that is actually allocated by another task in
      jffs2_write_begin().
      The reason is that in jffs2_write_begin() there is a small window
      a cache page is allocated for use but not set Uptodate yet.
      
      This ends up with a deadlock between two tasks:
      1) A task (e.g. file copy)
         - jffs2_write_begin() locks a cache page
         - jffs2_write_end() tries to lock "alloc_sem" from
      	 jffs2_reserve_space() <-- STUCK
      2) GC task (jffs2_gcd_mtd3)
         - jffs2_garbage_collect_pass() locks "alloc_sem"
         - try to lock the same cache page in read_cache_page() <-- STUCK
      
      So to avoid this deadlock, hold "alloc_sem" in jffs2_write_begin()
      while reading data in a cache page.
      
      Signed-off-by: default avatarKyeong Yoo <kyeong.yoo@alliedtelesis.co.nz>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      aa39cc67
    • Baokun Li's avatar
      ubifs: read-only if LEB may always be taken in ubifs_garbage_collect · 50cb4373
      Baokun Li authored
      
      
      If ubifs_garbage_collect_leb() returns -EAGAIN and ubifs_return_leb
      returns error, a LEB will always has a "taken" flag. In this case,
      set the ubifs to read-only to prevent a worse situation.
      
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      50cb4373
    • Baokun Li's avatar
      ubifs: fix double return leb in ubifs_garbage_collect · 0d765021
      Baokun Li authored
      
      
      If ubifs_garbage_collect_leb() returns -EAGAIN and enters the "out"
      branch, ubifs_return_leb will execute twice on the same lnum. This
      can cause data loss in concurrency situations.
      
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      0d765021
    • Baokun Li's avatar
      ubifs: fix slab-out-of-bounds in ubifs_change_lp · 88618fee
      Baokun Li authored
      
      
      Hulk Robot reported a KASAN report about slab-out-of-bounds:
       ==================================================================
       BUG: KASAN: slab-out-of-bounds in ubifs_change_lp+0x3a9/0x1390 [ubifs]
       Read of size 8 at addr ffff888101c961f8 by task fsstress/1068
       [...]
       Call Trace:
        check_memory_region+0x1c1/0x1e0
        ubifs_change_lp+0x3a9/0x1390 [ubifs]
        ubifs_change_one_lp+0x170/0x220 [ubifs]
        ubifs_garbage_collect+0x7f9/0xda0 [ubifs]
        ubifs_budget_space+0xfe4/0x1bd0 [ubifs]
        ubifs_write_begin+0x528/0x10c0 [ubifs]
      
       Allocated by task 1068:
        kmemdup+0x25/0x50
        ubifs_lpt_lookup_dirty+0x372/0xb00 [ubifs]
        ubifs_update_one_lp+0x46/0x260 [ubifs]
        ubifs_tnc_end_commit+0x98b/0x1720 [ubifs]
        do_commit+0x6cb/0x1950 [ubifs]
        ubifs_run_commit+0x15a/0x2b0 [ubifs]
        ubifs_budget_space+0x1061/0x1bd0 [ubifs]
        ubifs_write_begin+0x528/0x10c0 [ubifs]
       [...]
       ==================================================================
      
      In ubifs_garbage_collect(), if ubifs_find_dirty_leb returns an error,
      lp is an uninitialized variable. But lp.num might be used in the out
      branch, which is a random value. If the value is -1 or another value
      that can pass the check, soob may occur in the ubifs_change_lp() in
      the following procedure.
      
      To solve this problem, we initialize lp.lnum to -1, and then initialize
      it correctly in ubifs_find_dirty_leb, which is not equal to -1, and
      ubifs_return_leb is executed only when lp.lnum != -1.
      
      if find a retained or indexing LEB and continue to next loop, but break
      before find another LEB, the "taken" flag of this LEB will be cleaned
      in ubi_return_lebi(). This bug has also been fixed in this patch.
      
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      88618fee
    • Dan Carpenter's avatar
      ubifs: fix snprintf() length check · d3de970b
      Dan Carpenter authored
      
      
      The snprintf() function returns the number of bytes (not including the
      NUL terminator) which would have been printed if there were enough
      space.  So it can be greater than UBIFS_DFS_DIR_LEN.  And actually if
      it equals UBIFS_DFS_DIR_LEN then that's okay so this check is too
      strict.
      
      Fixes: 9a620291fc01 ("ubifs: Export filesystem error counters")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      d3de970b
    • Stefan Schaeckeler's avatar
      ubifs: Document sysfs nodes · 58225631
      Stefan Schaeckeler authored
      
      
      Add documentation for the new sysfs nodes
      
       /sys/fs/ubifs/ubiX_Y/errors_magic
       /sys/fs/ubifs/ubiX_Y/errors_node
       /sys/fs/ubifs/ubiX_Y/errors_crc
      
      Signed-off-by: default avatarStefan Schaeckeler <sschaeck@cisco.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      58225631
    • Stefan Schaeckeler's avatar
      ubifs: Export filesystem error counters · 2e3cbf42
      Stefan Schaeckeler authored
      
      
      Not all ubifs filesystem errors are propagated to userspace.
      
      Export bad magic, bad node and crc errors via sysfs. This allows userspace
      to notice filesystem errors:
      
       /sys/fs/ubifs/ubiX_Y/errors_magic
       /sys/fs/ubifs/ubiX_Y/errors_node
       /sys/fs/ubifs/ubiX_Y/errors_crc
      
      The counters are reset to 0 with a remount.
      
      Signed-off-by: default avatarStefan Schaeckeler <sschaeck@cisco.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      2e3cbf42
    • Petr Cvachoucek's avatar
      ubifs: Error path in ubifs_remount_rw() seems to wrongly free write buffers · 3fea4d9d
      Petr Cvachoucek authored
      it seems freeing the write buffers in the error path of the
      ubifs_remount_rw() is wrong. It leads later to a kernel oops like this:
      
      [10016.431274] UBIFS (ubi0:0): start fixing up free space
      [10090.810042] UBIFS (ubi0:0): free space fixup complete
      [10090.814623] UBIFS error (ubi0:0 pid 512): ubifs_remount_fs: cannot
      spawn "ubifs_bgt0_0", error -4
      [10101.915108] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started,
      PID 517
      [10105.275498] Unable to handle kernel NULL pointer dereference at
      virtual address 0000000000000030
      [10105.284352] Mem abort info:
      [10105.287160]   ESR = 0x96000006
      [10105.290252]   EC = 0x25: DABT (current EL), IL = 32 bits
      [10105.295592]   SET = 0, FnV = 0
      [10105.298652]   EA = 0, S1PTW = 0
      [10105.301848] Data abort info:
      [10105.304723]   ISV = 0, ISS = 0x00000006
      [10105.308573]   CM = 0, WnR = 0
      [10105.311564] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000f03d1000
      [10105.318034] [0000000000000030] pgd=00000000f6cee003,
      pud=00000000f4884003, pmd=0000000000000000
      [10105.326783] Internal error: Oops: 96000006 [#1] PREEMPT SMP
      [10105.332355] Modules linked in: ath10k_pci ath10k_core ath mac80211
      libarc4 cfg80211 nvme nvme_core cryptodev(O)
      [10105.342468] CPU: 3 PID: 518 Comm: touch Tainted: G           O
      5.4.3 #1
      [10105.349517] Hardware name: HYPEX CPU (DT)
      [10105.353525] pstate: 40000005 (nZcv daif -PAN -UAO)
      [10105.358324] pc : atomic64_try_cmpxchg_acquire.constprop.22+0x8/0x34
      [10105.364596] lr : mutex_lock+0x1c/0x34
      [10105.368253] sp : ffff000075633aa0
      [10105.371563] x29: ffff000075633aa0 x28: 0000000000000001
      [10105.376874] x27: ffff000076fa80c8 x26: 0000000000000004
      [10105.382185] x25: 0000000000000030 x24: 0000000000000000
      [10105.387495] x23: 0000000000000000 x22: 0000000000000038
      [10105.392807] x21: 000000000000000c x20: ffff000076fa80c8
      [10105.398119] x19: ffff000076fa8000 x18: 0000000000000000
      [10105.403429] x17: 0000000000000000 x16: 0000000000000000
      [10105.408741] x15: 0000000000000000 x14: fefefefefefefeff
      [10105.414052] x13: 0000000000000000 x12: 0000000000000fe0
      [10105.419364] x11: 0000000000000fe0 x10: ffff000076709020
      [10105.424675] x9 : 0000000000000000 x8 : 00000000000000a0
      [10105.429986] x7 : ffff000076fa80f4 x6 : 0000000000000030
      [10105.435297] x5 : 0000000000000000 x4 : 0000000000000000
      [10105.440609] x3 : 0000000000000000 x2 : ffff00006f276040
      [10105.445920] x1 : ffff000075633ab8 x0 : 0000000000000030
      [10105.451232] Call trace:
      [10105.453676]  atomic64_try_cmpxchg_acquire.constprop.22+0x8/0x34
      [10105.459600]  ubifs_garbage_collect+0xb4/0x334
      [10105.463956]  ubifs_budget_space+0x398/0x458
      [10105.468139]  ubifs_create+0x50/0x180
      [10105.471712]  path_openat+0x6a0/0x9b0
      [10105.475284]  do_filp_open+0x34/0x7c
      [10105.478771]  do_sys_open+0x78/0xe4
      [10105.482170]  __arm64_sys_openat+0x1c/0x24
      [10105.486180]  el0_svc_handler+0x84/0xc8
      [10105.489928]  el0_svc+0x8/0xc
      [10105.492808] Code: 52800013 17fffffb d2800003 f9800011 (c85ffc05)
      [10105.498903] ---[ end trace 46b721d93267a586 ]---
      
      To reproduce the problem:
      
      1. Filesystem initially mounted read-only, free space fixup flag set.
      
      2. mount -o remount,rw <mountpoint>
      
      3. it takes some time (free space fixup running)
          ... try to terminate running mount by CTRL-C
          ... does not respond, only after free space fixup is complete
          ... then "ubifs_remount_fs: cannot spawn "ubifs_bgt0_0", error -4"
      
      4. mount -o remount,rw <mountpoint>
          ... now finished instantly (fixup already done).
      
      5. Create file or just unmount the filesystem and we get the oops.
      
      Cc: <stable@vger.kernel.org>
      Fixes: b50b9f40
      
       ("UBIFS: do not free write-buffers when in R/O mode")
      Signed-off-by: default avatarPetr Cvachoucek <cvachoucek@gmail.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      3fea4d9d
    • Cai Huoqing's avatar
      ubifs: Make use of the helper macro kthread_run() · d98c6c35
      Cai Huoqing authored
      
      
      Repalce kthread_create/wake_up_process() with kthread_run()
      to simplify the code.
      
      Signed-off-by: default avatarCai Huoqing <caihuoqing@baidu.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      d98c6c35
    • Kai Song's avatar
      ubi: Fix a mistake in comment · bc7849e2
      Kai Song authored
      Fixes: 2a734bb8
      
       ("UBI: use debugfs for the extra checks knobs")
      There is a mistake in docstrings, it should be ubi_debugfs_exit_dev
      instead of dbg_debug_exit_dev.
      
      Signed-off-by: default avatarKai Song <songkai01@inspur.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      bc7849e2
    • Alexander Dahl's avatar
      ubifs: Fix spelling mistakes · 7296c8af
      Alexander Dahl authored
      
      
      Found with `codespell -i 3 -w fs/ubifs/**` and proof reading that parts.
      
      Signed-off-by: default avatarAlexander Dahl <ada@thorsis.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      7296c8af
  4. Dec 20, 2021
  5. Dec 19, 2021
  6. Dec 18, 2021
    • Adrian Hunter's avatar
      perf inject: Fix segfault due to perf_data__fd() without open · c271a55b
      Adrian Hunter authored
      The fixed commit attempts to get the output file descriptor even if the
      file was never opened e.g.
      
        $ perf record uname
        Linux
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.002 MB perf.data (7 samples) ]
        $ perf inject -i perf.data --vm-time-correlation=dry-run
        Segmentation fault (core dumped)
        $ gdb --quiet perf
        Reading symbols from perf...
        (gdb) r inject -i perf.data --vm-time-correlation=dry-run
        Starting program: /home/ahunter/bin/perf inject -i perf.data --vm-time-correlation=dry-run
        [Thread debugging using libthread_db enabled]
        Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
      
        Program received signal SIGSEGV, Segmentation fault.
        __GI___fileno (fp=0x0) at fileno.c:35
        35      fileno.c: No such file or directory.
        (gdb) bt
        #0  __GI___fileno (fp=0x0) at fileno.c:35
        #1  0x00005621e48dd987 in perf_data__fd (data=0x7fff4c68bd08) at util/data.h:72
        #2  perf_data__fd (data=0x7fff4c68bd08) at util/data.h:69
        #3  cmd_inject (argc=<optimized out>, argv=0x7fff4c69c1f0) at builtin-inject.c:1017
        #4  0x00005621e4936783 in run_builtin (p=0x5621e4ee6878 <commands+600>, argc=4, argv=0x7fff4c69c1f0) at perf.c:313
        #5  0x00005621e4897d5c in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:365
        #6  run_argv (argcp=<optimized out>, argv=<optimized out>) at perf.c:409
        #7  main (argc=4, argv=0x7fff4c69c1f0) at perf.c:539
        (gdb)
      
      Fixes: 0ae03893
      
       ("perf tools: Pass a fd to perf_file_header__read_pipe()")
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: stable@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20211213084829.114772-3-adrian.hunter@intel.com
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c271a55b