- Jun 30, 2024
-
-
Kent Overstreet authored
fragmentation_lru derives from dirty_sectors, and wasn't being checked. Co-developed-by:
Daniel Hill <daniel@gluo.nz> Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We need to make sure we're not missing any fragmenation entries in the LRU BTREE after repairing ALLOC BTREE Also, use the new bch2_btree_write_buffer_maybe_flush() helper; this was only working without it before since bucket invalidation (usually) wasn't happening while fsck was running. Co-developed-by:
Daniel Hill <daniel@gluo.nz> Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Add a new helper for checking references to write buffer btrees, where we need a flush before we definitively know we have an inconsistency. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Fixes warnings from bch2_print_allocator_stuck() Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
- Jun 29, 2024
-
-
Kent Overstreet authored
Accidental infinite loop; also fix btree_deadlock_to_text() Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
BCH_READ_NODECODE mode - used by the move paths - really wants to use only the original rbio, but the retry path really wants to clone - oof. Make sure to copy the crc of the pointer we read from back to the original rbio, or we'll see spurious checksum errors later. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
On a new filesystem or device we have to allocate the journal with a bump allocator, because allocation info isn't ready yet - but when hot-adding a device that doesn't have a journal, we don't want to use that path. Reported-by:
<syzbot+24a867cb90d8315cccff@syzkaller.appspotmail.com> Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Reported-by:
<syzbot+e5292b50f1957164a4b6@syzkaller.appspotmail.com> Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
the unlock is now in read_extent, this fixes an assertion pop in read_from_stale_dirty_pointer() Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
- Jun 28, 2024
-
-
Kent Overstreet authored
Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
- Jun 26, 2024
-
-
Pei Li authored
When allocating too huge a snapshot table, we should fail gracefully in __snapshot_t_mut() instead of fail in kmalloc(). Reported-by:
<syzbot+770e99b65e26fa023ab1@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=770e99b65e26fa023ab1 Tested-by:
<syzbot+770e99b65e26fa023ab1@syzkaller.appspotmail.com> Signed-off-by:
Pei Li <peili.dev@gmail.com> Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
There's no reason for discards to be single threaded across all devices; this will improve performance on multi device setups. Additionally, making them per-device simplifies the refcounting on bch_dev->io_ref; we now hold it for the duration that the discard path is running, which fixes a race between the discard path and device removal. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Pei Li authored
This series fix the shift-out-of-bounds issue in bch2_blacklist_entries_gc(). Instead of passing 0 to eytzinger0_first() when iterating the entries, we explicitly check 0 and initialize i to be 0. syzbot has tested the proposed patch and the reproducer did not trigger any issue: Reported-and-tested-by:
<syzbot+835d255ad6bc7f29ee12@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=835d255ad6bc7f29ee12 Signed-off-by:
Pei Li <peili.dev@gmail.com> Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Pei Li authored
Acquire fsck_error_counts_lock before accessing the critical section protected by this lock. syzbot has tested the proposed patch and the reproducer did not trigger any issue. Reported-by:
<syzbot+a2bc0e838efd7663f4d9@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=a2bc0e838efd7663f4d9 Signed-off-by:
Pei Li <peili.dev@gmail.com> Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
- Jun 24, 2024
-
-
Kent Overstreet authored
This fixes a rare deadlock when we're doing an emergency shutdown due to failure to do a journal write. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
- Jun 23, 2024
-
-
Kent Overstreet authored
This fixes filesystem size not changing on device removal. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
The debug code relies on btree_trans_list being ordered so that it can resume on subsequent calls or lock restarts. However, it was using trans->locknig_wait.task.pid, which is incorrect since btree_trans objects are cached and reused - typically by different tasks. Fix this by switching to pointer order, and also sort them lazily when required - speeding up the btree_trans_get() fastpath. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
debug.c was using closure_get() on a different thread's closure where the we don't know if the object being refcounted is alive. We keep btree_trans objects on a list so they can be printed by debug code, and because it is cost prohibitive to touch the btree_trans list every time we allocate and free btree_trans objects, cached objects are also on this list. However, we do not want the debug code to see cached but not in use btree_trans objects - critically because the btree_paths array will have been freed (if it was reallocated). closure_get() is also incorrect to use when that get may race with it hitting zero, i.e. we must already have a ref on the object or know the ref can't currently hit 0 for other reasons (as used in the cycle detector). to fix this, use the previously introduced closure_get_not_zero(), closure_return_sync(), and closure_init_stack_release(); the debug code now can only take a ref on a trans object if it's alive and in use. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
btree_deadlock_to_text() searches the list of btree transactions to find a deadlock - when it finds one it's done; it's not like other *_read() functions that's printing each object. Factor out btree_deadlock_to_text() to make this clearer. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We were grabbing the sequence number before unlock incremented it - fix this by moving the increment to seqmutex_lock() (so the seqmutex_relock() failure path skips the mutex_trylock()), and returning the sequence number from unlock(), to make the API simpler and safer. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This fixes incorrect/missign checking of strndup_user() returns. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
- Jun 21, 2024
-
-
Youling Tang authored
`inode->ei_flags` setting and cleaning should be done after initialization, otherwise the operation is invalid. Fixes: 9ca4853b ("bcachefs: Fix quota support for snapshots") Signed-off-by:
Youling Tang <tangyouling@kylinos.cn> Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
write_super() may reallocate the superblock buffer - but bch_sb_field_ext was referencing it; don't use it after the write_super call. Reported-by:
<syzbot+8992fc10a192067b8d8a@syzkaller.appspotmail.com> Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
printk strings get truncated to 1024 bytes; if we have a long error message (journal debug info) we need to use a helper. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
discard_new_inode() is the correct interface for tearing down an indoe that was fully created but not made visible to other threads, but it expects I_NEW to be set, which we don't use. Reported-by: https://github.com/koverstreet/bcachefs/issues/690 Fixes: bcachefs: Fix race path in bch2_inode_insert() Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Incorrect bucket state transition in the discard path; when incrementing a bucket's generation number that had already been discarded, we were forgetting to check if it should be need_gc_gens, not free. This was caught by the .invalid checks in the transaction commit path, causing us to go emergency read only. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
- Jun 20, 2024
-
-
Youling Tang authored
With CONFIG_READ_ONLY_THP_FOR_FS, the Linux kernel supports using THPs for read-only mmapped files, such as shared libraries. However, the kernel makes no attempt to actually align those mappings on 2MB boundaries, which makes it impossible to use those THPs most of the time. This issue applies to general file mapping THP as well as existing setups using CONFIG_READ_ONLY_THP_FOR_FS. This is easily fixed by using thp_get_unmapped_area for the unmapped_area function in bcachefs, which is what ext2, ext4, fuse, xfs and btrfs all use. Similar to commit b0c58223 ("btrfs: fix alignment of VMA for memory mapped files on THP"). Signed-off-by:
Youling Tang <tangyouling@kylinos.cn> Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
i.e. the start of automatic self healing: If errors=continue or fix_safe, we now automatically fix simple errors without user intervention. New error action option: fix_safe This replaces the existing errors=ro option, which gets a new slot, i.e. existing errors=ro users now get errors=fix_safe. This is currently only enabled for a limited set of errors - initially just disk accounting; errors we would never not want to fix, and we don't want to require user intervention (i.e. to make sure a bug report gets filed). Errors will still be counted in the superblock, so we (developers) will still know they've been occuring if a bug report gets filed (as bug reports typically include the errors superblock section). Eventually we'll be enabling this for a much wider set of errors, after we've done thorough error injection testing. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
reference: https://github.com/koverstreet/bcachefs/issues/692 trans->ref is the reference used by the cycle detector, which walks btree_trans objects of other threads to walk the graph of held locks and issue wakeups when an abort is required. We have to wait for the ref to go to 1 before freeing trans->paths or clearing trans->locking_wait.task. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
this is long running - help users see what's going on Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Missing enum conversion Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We only have 48 bits for the LRU time field, which is insufficient to prevent wraparound. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
LRUs only have 48 bits for the time field (i.e. LRU order); thus we need overflow checks and guards. Reported-by:
<syzbot+df3bf3f088dcaa728857@syzkaller.appspotmail.com> Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We've been moving away from going RW lazily; if we want to go RW we do that in set_may_go_rw(), and if we didn't go RW we don't need to delete dead snapshots. Reported-by:
<syzbot+4366624c0b5aac4906cf@syzkaller.appspotmail.com> Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We shouln't be running the journal shutdown sequence if we never fully initialized the journal. Reported-by:
<syzbot+ffd2270f0bca3322ee00@syzkaller.appspotmail.com> Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We can only handle btree IDs up to 62, since the btree id (plus the type for interior btree nodes) has to fit ito a 64 bit bitmask - check for invalid ones to avoid invalid shifts later. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
these should be 64 bit bitmasks, not 32 bit. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-