Skip to content
  1. Sep 01, 2021
  2. Aug 31, 2021
  3. Aug 24, 2021
    • Chao Yu's avatar
      f2fs: rebuild nat_bits during umount · 94c821fb
      Chao Yu authored
      
      
      If all free_nat_bitmap are available, we can rebuild nat_bits from
      free_nat_bitmap entirely during umount, let's make another chance
      to reenable nat_bits for image.
      
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      94c821fb
    • Daeho Jeong's avatar
      f2fs: introduce periodic iostat io latency traces · a4b68176
      Daeho Jeong authored
      
      
      Whenever we notice some sluggish issues on our machines, we are always
      curious about how well all types of I/O in the f2fs filesystem are
      handled. But, it's hard to get this kind of real data. First of all,
      we need to reproduce the issue while turning on the profiling tool like
      blktrace, but the issue doesn't happen again easily. Second, with the
      intervention of any tools, the overall timing of the issue will be
      slightly changed and it sometimes makes us hard to figure it out.
      
      So, I added the feature printing out IO latency statistics tracepoint
      events, which are minimal things to understand filesystem's I/O related
      behaviors, into F2FS_IOSTAT kernel config. With "iostat_enable" sysfs
      node on, we can get this statistics info in a periodic way and it
      would cause the least overhead.
      
      [samples]
       f2fs_ckpt-254:1-507     [003] ....  2842.439683: f2fs_iostat_latency:
      dev = (254,11), iotype [peak lat.(ms)/avg lat.(ms)/count],
      rd_data [136/1/801], rd_node [136/1/1704], rd_meta [4/2/4],
      wr_sync_data [164/16/3331], wr_sync_node [152/3/648],
      wr_sync_meta [160/2/4243], wr_async_data [24/13/15],
      wr_async_node [0/0/0], wr_async_meta [0/0/0]
      
       f2fs_ckpt-254:1-507     [002] ....  2845.450514: f2fs_iostat_latency:
      dev = (254,11), iotype [peak lat.(ms)/avg lat.(ms)/count],
      rd_data [60/3/456], rd_node [60/3/1258], rd_meta [0/0/1],
      wr_sync_data [120/12/2285], wr_sync_node [88/5/428],
      wr_sync_meta [52/6/2990], wr_async_data [4/1/3],
      wr_async_node [0/0/0], wr_async_meta [0/0/0]
      
      Signed-off-by: default avatarDaeho Jeong <daehojeong@google.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a4b68176
    • Daeho Jeong's avatar
      f2fs: separate out iostat feature · 52118743
      Daeho Jeong authored
      
      
      Added F2FS_IOSTAT config option to support getting IO statistics through
      sysfs and printing out periodic IO statistics tracepoint events and
      moved I/O statistics related codes into separate files for better
      maintenance.
      
      Signed-off-by: default avatarDaeho Jeong <daehojeong@google.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      [Jaegeuk Kim: set default=y]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      52118743
  4. Aug 18, 2021
  5. Aug 14, 2021
  6. Aug 13, 2021
  7. Aug 06, 2021
    • Chao Yu's avatar
      f2fs: fix to do sanity check for sb/cp fields correctly · 65ddf656
      Chao Yu authored
      
      
      This patch fixes below problems of sb/cp sanity check:
      - in sanity_check_raw_superi(), it missed to consider log header
      blocks while cp_payload check.
      - in f2fs_sanity_check_ckpt(), it missed to check nat_bits_blocks.
      
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      65ddf656
    • Chao Yu's avatar
      f2fs: avoid unneeded memory allocation in __add_ino_entry() · 4b106518
      Chao Yu authored
      
      
      __add_ino_entry() will allocate slab cache even if we have already
      cached ino entry in radix tree, e.g. for case of multiple devices.
      
      Let's check radix tree first under protection of rcu lock to see
      whether we need to do slab allocation, it will mitigate memory
      pressure from "f2fs_ino_entry" slab cache.
      
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4b106518
    • Chao Yu's avatar
      f2fs: extent cache: support unaligned extent · 94afd6d6
      Chao Yu authored
      
      
      Compressed inode may suffer read performance issue due to it can not
      use extent cache, so I propose to add this unaligned extent support
      to improve it.
      
      Currently, it only works in readonly format f2fs image.
      
      Unaligned extent: in one compressed cluster, physical block number
      will be less than logical block number, so we add an extra physical
      block length in extent info in order to indicate such extent status.
      
      The idea is if one whole cluster blocks are contiguous physically,
      once its mapping info was readed at first time, we will cache an
      unaligned (or aligned) extent info entry in extent cache, it expects
      that the mapping info will be hitted when rereading cluster.
      
      Merge policy:
      - Aligned extents can be merged.
      - Aligned extent and unaligned extent can not be merged.
      
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      94afd6d6
    • Tiezhu Yang's avatar
      f2fs: Kconfig: clean up config options about compression · 6b3ba1e7
      Tiezhu Yang authored
      
      
      In fs/f2fs/Kconfig, F2FS_FS_LZ4HC depends on F2FS_FS_LZ4 and F2FS_FS_LZ4
      depends on F2FS_FS_COMPRESSION, so no need to make F2FS_FS_LZ4HC depends
      on F2FS_FS_COMPRESSION explicitly, remove the redudant "depends on", do
      the similar thing for F2FS_FS_LZORLE.
      
      At the same time, it is better to move F2FS_FS_LZORLE next to F2FS_FS_LZO,
      it looks like a little more clear when make menuconfig, the location of
      "LZO-RLE compression support" is under "LZO compression support" instead
      of "F2FS compression feature".
      
      Without this patch:
      
      F2FS compression feature
        LZO compression support
        LZ4 compression support
          LZ4HC compression support
        ZSTD compression support
        LZO-RLE compression support
      
      With this patch:
      
      F2FS compression feature
        LZO compression support
          LZO-RLE compression support
        LZ4 compression support
          LZ4HC compression support
        ZSTD compression support
      
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6b3ba1e7
  8. Aug 05, 2021
    • Yangtao Li's avatar
      f2fs: reduce the scope of setting fsck tag when de->name_len is zero · d4bf15a7
      Yangtao Li authored
      
      
      I recently found a case where de->name_len is 0 in f2fs_fill_dentries()
      easily reproduced, and finally set the fsck flag.
      
      Thread A			Thread B
      - f2fs_readdir
       - f2fs_read_inline_dir
        - ctx->pos = d.max
      				- f2fs_add_dentry
      				 - f2fs_add_inline_entry
      				  - do_convert_inline_dir
      				 - f2fs_add_regular_entry
      - f2fs_readdir
       - f2fs_fill_dentries
        - set_sbi_flag(sbi, SBI_NEED_FSCK)
      
      Process A opens the folder, and has been reading without closing it.
      During this period, Process B created a file under the folder (occupying
      multiple f2fs_dir_entry, exceeding the d.max of the inline dir). After
      creation, process A uses the d.max of inline dir to read it again, and
      it will read that de->name_len is 0.
      
      And Chao pointed out that w/o inline conversion, the race condition still
      can happen as below:
      
      dir_entry1: A
      dir_entry2: B
      dir_entry3: C
      free slot: _
      ctx->pos: ^
      
      Thread A is traversing directory,
      ctx-pos moves to below position after readdir() by thread A:
      AAAABBBB___
              ^
      
      Then thread B delete dir_entry2, and create dir_entry3.
      
      Thread A calls readdir() to lookup dirents starting from middle
      of new dirent slots as below:
      AAAACCCCCC_
              ^
      In these scenarios, the file system is not damaged, and it's hard to
      avoid it. But we can bypass tagging FSCK flag if:
      a) bit_pos (:= ctx->pos % d->max) is non-zero and
      b) before bit_pos moves to first valid dir_entry.
      
      Fixes: ddf06b75 ("f2fs: fix to trigger fsck if dirent.name_len is zero")
      Signed-off-by: default avatarYangtao Li <frank.li@vivo.com>
      [Chao: clean up description]
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d4bf15a7
  9. Aug 04, 2021
    • Chao Yu's avatar
      f2fs: fix to stop filesystem update once CP failed · 91803392
      Chao Yu authored
      
      
      During f2fs_write_checkpoint(), once we failed in
      f2fs_flush_nat_entries() or do_checkpoint(), metadata of filesystem
      such as prefree bitmap, nat/sit version bitmap won't be recovered,
      it may cause f2fs image to be inconsistent, let's just set CP error
      flag to avoid further updates until we figure out a scheme to rollback
      all metadatas in such condition.
      
      Reported-by: default avatarYangtao Li <frank.li@vivo.com>
      Signed-off-by: default avatarYangtao Li <frank.li@vivo.com>
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      91803392
    • Daeho Jeong's avatar
      f2fs: add sysfs node to control ra_pages for fadvise seq file · 0f6b56ec
      Daeho Jeong authored
      
      
      fadvise() allows the user to expand the readahead window to double with
      POSIX_FADV_SEQUENTIAL, now. But, in some use cases, it is not that
      sufficient and we need to meet the need in a restricted way. We can
      control the multiplier value of bdi device readahead between 2 (default)
      and 256 for POSIX_FADV_SEQUENTIAL advise option.
      
      Signed-off-by: default avatarDaeho Jeong <daehojeong@google.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0f6b56ec
    • Chao Yu's avatar
      f2fs: introduce discard_unit mount option · 4f993264
      Chao Yu authored
      As James Z reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=213877
      
      
      
      [1.] One-line summary of the problem:
      Mount multiple SMR block devices exceed certain number cause system non-response
      
      [2.] Full description of the problem/report:
      Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
      Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
      The number of SMR devices with other FS mounted on this system does not interfere with the result above.
      
      [3.] Keywords (i.e., modules, networking, kernel):
      F2FS, SMR, Memory
      
      [4.] Kernel information
      [4.1.] Kernel version (uname -a):
      Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
      
      [4.2.] Kernel .config file:
      Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
      
      [5.] Most recent kernel version which did not have the bug:
      None
      
      [6.] Output of Oops.. message (if applicable) with symbolic information
           resolved (see Documentation/admin-guide/oops-tracing.rst)
      None
      
      [7.] A small shell script or example program which triggers the
           problem (if possible)
      mount /dev/sdX /mnt/0X
      
      [8.] Memory consumption
      
      With 24 * 14T SMR Block device with F2FS
      free -g
                    total        used        free      shared  buff/cache   available
      Mem:             46          36           0           0          10          10
      Swap:             0           0           0
      
      With 3 * 14T SMR Block device with F2FS
      free -g
                     total        used        free      shared  buff/cache   available
      Mem:               7           5           0           0           1           1
      Swap:              7           0           7
      
      The root cause is, there are three bitmaps:
      - cur_valid_map
      - ckpt_valid_map
      - discard_map
      and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
      necessary, but discard_map is optional, since this bitmap will only be
      useful in mountpoint that small discard is enabled.
      
      For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
      discard for a section(zone) when all blocks of that section are invalid,
      so, for such device, we don't need small discard functionality at all.
      
      This patch introduces a new mountoption "discard_unit=block|segment|
      section" to support issuing discard with different basic unit which is
      aligned to block, segment or section, so that user can specify
      "discard_unit=segment" or "discard_unit=section" to disable small
      discard functionality.
      
      Note that this mount option can not be changed by remount() due to
      related metadata need to be initialized during mount().
      
      In order to save memory, let's use "discard_unit=section" for blkzoned
      device by default.
      
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4f993264
  10. Aug 03, 2021
  11. Jul 25, 2021