Skip to content
  1. Feb 09, 2013
    • Theodore Ts'o's avatar
      ext4: grab page before starting transaction handle in write_begin() · 47564bfb
      Theodore Ts'o authored
      
      
      The grab_cache_page_write_begin() function can potentially sleep for a
      long time, since it may need to do memory allocation which can block
      if the system is under significant memory pressure, and because it may
      be blocked on page writeback.  If it does take a long time to grab the
      page, it's better that we not hold an active jbd2 handle.
      
      So grab a handle on the page first, and _then_ start the transaction
      handle.
      
      This commit fixes the following long transaction handle hold time:
      
      postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32
         tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
         dirtied_blocks 0
      
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      47564bfb
    • Theodore Ts'o's avatar
      ext4: pass context information to jbd2__journal_start() · 9924a92a
      Theodore Ts'o authored
      
      
      So we can better understand what bits of ext4 are responsible for
      long-running jbd2 handles, use jbd2__journal_start() so we can pass
      context information for logging purposes.
      
      The recommended way for finding the longer-running handles is:
      
         T=/sys/kernel/debug/tracing
         EVENT=$T/events/jbd2/jbd2_handle_stats
         echo "interval > 5" > $EVENT/filter
         echo 1 > $EVENT/enable
      
         ./run-my-fs-benchmark
      
         cat $T/trace > /tmp/problem-handles
      
      This will list handles that were active for longer than 20ms.  Having
      longer-running handles is bad, because a commit started at the wrong
      time could stall for those 20+ milliseconds, which could delay an
      fsync() or an O_SYNC operation.  Here is an example line from the
      trace file describing a handle which lived on for 311 jiffies, or over
      1.2 seconds:
      
      postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32 
         tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
         dirtied_blocks 0
      
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      9924a92a
    • Theodore Ts'o's avatar
      ext4: move the jbd2 wrapper functions out of super.c · 722887dd
      Theodore Ts'o authored
      
      
      Move the jbd2 wrapper functions which start and stop handles out of
      super.c, where they don't really logically belong, and into
      ext4_jbd2.c.
      
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      722887dd
    • Theodore Ts'o's avatar
      jbd2: add tracepoints which provide per-handle statistics · 343d9c28
      Theodore Ts'o authored
      
      
      Handles which stay open a long time are problematic when it comes time
      to close down a transaction so it can be committed.  These tracepoints
      will help us determine which ones are the problematic ones, and to
      validate whether changes makes things better or worse.
      
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      343d9c28
  2. Feb 07, 2013
  3. Feb 05, 2013
    • Theodore Ts'o's avatar
      ext4: optimize mballoc for large allocations · 40ae3487
      Theodore Ts'o authored
      
      
      The ext4 block allocator only maintains buddy bitmaps for chunks which
      are less than or equal to one quarter of a block group.  That is, for
      a file aystem with a 1k blocksize, and where the number of blocks in a
      block group is 8192 blocks, the largest chunk size tracked by buddy
      bitmaps is 2048 blocks.
      
      For a file system with a 4k blocksize, and where the number of blocks
      in a block group is 32768 blocks, the largest chunk size tracked by
      buddy bitmaps is 8192 blocks.
      
      To work around this code, mballoc.c before this commit would truncate
      allocation requests to the number of blocks in a block group minus 10.
      Why 10?  Aside from being a completely arbitrary number, it avoids
      block allocation to be a power of two larger than 25% of the block
      group.  If you try to explicitly fallocate 50% of the block group
      size, this will demonstrate the problem; the block allocation code
      will scan the all of the blocks in the file system with cr==0 (since
      the request is for a natural power of two), but then completely fail
      for all blocks groups, since the buddy bitmaps don't track chunk sizes
      of 50% of the block group.
      
      To fix this, in these we use ext4_mb_complex_scan_group() instead of
      ext4_mb_simple_scan_group().
      
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger@dilger.ca>
      40ae3487
  4. Feb 03, 2013
  5. Feb 02, 2013
  6. Jan 30, 2013
  7. Jan 29, 2013
  8. Jan 28, 2013
  9. Jan 25, 2013
  10. Jan 17, 2013
  11. Jan 13, 2013
    • Theodore Ts'o's avatar
      ext4: trigger the lazy inode table initialization after resize · 7f511862
      Theodore Ts'o authored
      
      
      After we have finished extending the file system, we need to trigger a
      the lazy inode table thread to zero out the inode tables.
      
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      7f511862
    • Eryu Guan's avatar
      ext4: check bh in ext4_read_block_bitmap() · 15b49132
      Eryu Guan authored
      
      
      Validate the bh pointer before using it, since
      ext4_read_block_bitmap_nowait() might return NULL.
      
      I've seen this in fsfuzz testing.
      
       EXT4-fs error (device loop0): ext4_read_block_bitmap_nowait:385: comm touch: Cannot get buffer for block bitmap - block_group = 0, block_bitmap = 3925999616
       BUG: unable to handle kernel NULL pointer dereference at           (null)
       IP: [<ffffffff8121de25>] ext4_wait_block_bitmap+0x25/0xe0
       ...
       Call Trace:
        [<ffffffff8121e1e5>] ext4_read_block_bitmap+0x35/0x60
        [<ffffffff8125e9c6>] ext4_free_blocks+0x236/0xb80
        [<ffffffff811d0d36>] ? __getblk+0x36/0x70
        [<ffffffff811d0a5f>] ? __find_get_block+0x8f/0x210
        [<ffffffff81191ef3>] ? kmem_cache_free+0x33/0x140
        [<ffffffff812678e5>] ext4_xattr_release_block+0x1b5/0x1d0
        [<ffffffff812679be>] ext4_xattr_delete_inode+0xbe/0x100
        [<ffffffff81222a7c>] ext4_free_inode+0x7c/0x4d0
        [<ffffffff812277b8>] ? ext4_mark_inode_dirty+0x88/0x230
        [<ffffffff8122993c>] ext4_evict_inode+0x32c/0x490
        [<ffffffff811b8cd7>] evict+0xa7/0x1c0
        [<ffffffff811b8ed3>] iput_final+0xe3/0x170
        [<ffffffff811b8f9e>] iput+0x3e/0x50
        [<ffffffff812316fd>] ext4_add_nondir+0x4d/0x90
        [<ffffffff81231d0b>] ext4_create+0xeb/0x170
        [<ffffffff811aae9c>] vfs_create+0xac/0xd0
        [<ffffffff811ac845>] lookup_open+0x185/0x1c0
        [<ffffffff8129e3b9>] ? selinux_inode_permission+0xa9/0x170
        [<ffffffff811acb54>] do_last+0x2d4/0x7a0
        [<ffffffff811af743>] path_openat+0xb3/0x480
        [<ffffffff8116a8a1>] ? handle_mm_fault+0x251/0x3b0
        [<ffffffff811afc49>] do_filp_open+0x49/0xa0
        [<ffffffff811bbaad>] ? __alloc_fd+0xdd/0x150
        [<ffffffff8119da28>] do_sys_open+0x108/0x1f0
        [<ffffffff8119db51>] sys_open+0x21/0x30
        [<ffffffff81618959>] system_call_fastpath+0x16/0x1b
      
      Also fix comment for ext4_read_block_bitmap_nowait()
      
      Signed-off-by: default avatarEryu Guan <guaneryu@gmail.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      15b49132
    • Wang Shilong's avatar
      ext4: use unlikely to improve the efficiency of the kernel · aebf0243
      Wang Shilong authored
      
      
      Because the function 'sb_getblk' seldomly fails to return NULL
      value,it will be better to use 'unlikely' to optimize it.
      
      Signed-off-by: default avatarWang Shilong <wangsl-fnst@cn.fujitsu.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      aebf0243