ext4: Commit transaction before writing back pages in data=journal mode
When journalling data we currently just walk over pages, journal those that are marked for delayed dirtying (only pinned pages dirtied behing our back these days) and checkpoint other dirty pages. Because some pages may be part of running transaction the result is that after filemap_write_and_wait() we are not guaranteed pages are stable on disk. Thus places that want to flush current pagecache content need to jump through hoops to make sure journalled data is not lost. This is manageable in cases completely controlled by ext4 (such as extent shifting operations or inode eviction) but it gets ugly for stuff like fsverity. Furthermore it is rather error prone as people often do not realize journalled data needs special handling. So change ext4_writepages() to commit transaction with inode's data before going through the writeback loop in WB_SYNC_ALL mode. As a result filemap_write_and_wait() is now really getting pages to stable storage and makes pagecache pages safe to reclaim. Consequently we can remove the special handling of journalled data from several places in follow up patches. Note that this will make fsync(2) for journalled data more expensive as we will end up not only committing the transaction we need but also checkpointing the data (which we may have previously skipped if the data was part of the running transaction). If we really cared, we would need to introduce special VFS function for writing out & invalidating page cache for a range, use ->launder_page callback to perform checkpointing, and use it from all the places that need this functionality. But at this point I'm not convinced the complexity is worth it. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230329154950.19720-5-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Please register or sign in to comment