Skip to content
  1. Mar 25, 2014
  2. Mar 20, 2014
    • Theodore Ts'o's avatar
      ext4: kill i_version support for Hurd-castrated file systems · c4f65706
      Theodore Ts'o authored
      
      
      The Hurd file system uses uses the inode field which is now used for
      i_version for its translator block.  This means that ext2 file systems
      that are formatted for GNU Hurd can't be used to support NFSv4.  Given
      that Hurd file systems don't support extents, and a huge number of
      modern file system features, this is no great loss.
      
      If we don't do this, the attempt to update the i_version field will
      stomp over the translator block field, which will cause file system
      corruption for Hurd file systems.  This can be replicated via:
      
      mke2fs -t ext2 -o hurd /dev/vdc
      mount -t ext4 /dev/vdc /vdc
      touch /vdc/bug0000
      umount /dev/vdc
      e2fsck -f /dev/vdc
      
      Addresses-Debian-Bug: #738758
      
      Reported-By: default avatarGabriele Giacone <1o5g4r8o@gmail.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      c4f65706
  3. Mar 19, 2014
    • T Makphaibulchoke's avatar
      ext4: each filesystem creates and uses its own mb_cache · 9c191f70
      T Makphaibulchoke authored
      
      
      This patch adds new interfaces to create and destory cache,
      ext4_xattr_create_cache() and ext4_xattr_destroy_cache(), and remove
      the cache creation and destory calls from ex4_init_xattr() and
      ext4_exitxattr() in fs/ext4/xattr.c.
      
      fs/ext4/super.c has been changed so that when a filesystem is mounted
      a cache is allocated and attched to its ext4_sb_info structure.
      
      fs/mbcache.c has been changed so that only one slab allocator is
      allocated and used by all mbcache structures.
      
      Signed-off-by: default avatarT. Makphaibulchoke <tmac@hp.com>
      9c191f70
    • T Makphaibulchoke's avatar
      fs/mbcache.c: doucple the locking of local from global data · 1f3e55fe
      T Makphaibulchoke authored
      
      
      The patch increases the parallelism of mbcache by using the built-in
      lock in the hlist_bl_node to protect the mb_cache's local block and
      index hash chains.  The global data mb_cache_lru_list and
      mb_cache_list continue to be protected by the global
      mb_cache_spinlock.
      
      New block group spinlock, mb_cache_bg_lock is also added to serialize
      accesses to mb_cache_entry's local data.
      
      A new member e_refcnt is added to the mb_cache_entry structure to help
      preventing an mb_cache_entry from being deallocated by a free while it
      is being referenced by either mb_cache_entry_get() or
      mb_cache_entry_find().
      
      Signed-off-by: default avatarT. Makphaibulchoke <tmac@hp.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      1f3e55fe
    • T Makphaibulchoke's avatar
      fs/mbcache.c: change block and index hash chain to hlist_bl_node · 3e037e52
      T Makphaibulchoke authored
      
      
      This patch changes each mb_cache's both block and index hash chains to
      use a hlist_bl_node, which contains a built-in lock.  This is the
      first step in decoupling of locks serializing accesses to mb_cache
      global data and each mb_cache_entry local data.
      
      Signed-off-by: default avatarT. Makphaibulchoke <tmac@hp.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      3e037e52
    • Lukas Czerner's avatar
      ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate · b8a86845
      Lukas Czerner authored
      
      
      Introduce new FALLOC_FL_ZERO_RANGE flag for fallocate. This has the same
      functionality as xfs ioctl XFS_IOC_ZERO_RANGE.
      
      It can be used to convert a range of file to zeros preferably without
      issuing data IO. Blocks should be preallocated for the regions that span
      holes in the file, and the entire range is preferable converted to
      unwritten extents
      
      This can be also used to preallocate blocks past EOF in the same way as
      with fallocate. Flag FALLOC_FL_KEEP_SIZE which should cause the inode
      size to remain the same.
      
      Also add appropriate tracepoints.
      
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      b8a86845
    • Lukas Czerner's avatar
      ext4: refactor ext4_fallocate code · 0e8b6879
      Lukas Czerner authored
      
      
      Move block allocation out of the ext4_fallocate into separate function
      called ext4_alloc_file_blocks(). This will allow us to use the same
      allocation code for other allocation operations such as zero range which
      is commit in the next patch.
      
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      0e8b6879
    • Lukas Czerner's avatar
      ext4: Update inode i_size after the preallocation · f282ac19
      Lukas Czerner authored
      
      
      Currently in ext4_fallocate we would update inode size, c_time and sync
      the file with every partial allocation which is entirely unnecessary. It
      is true that if the crash happens in the middle of truncate we might end
      up with unchanged i size, or c_time which I do not think is really a
      problem - it does not mean file system corruption in any way. Note that
      xfs is doing things the same way e.g. update all of the mentioned after
      the allocation is done.
      
      This commit moves all the updates after the allocation is done. In
      addition we also need to change m_time as not only inode has been change
      bot also data regions might have changed (unwritten extents). However
      m_time will be only updated when i_size changed.
      
      Also we do not need to be paranoid about changing the c_time only if the
      actual allocation have happened, we can change it even if we try to
      allocate only to find out that there are already block allocated. It's
      not really a big deal and it will save us some additional complexity.
      
      Also use ext4_debug, instead of ext4_warning in #ifdef EXT4FS_DEBUG
      section.
      
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu&gt;->
      --
      v3: Do not remove the code to set EXT4_INODE_EOFBLOCKS flag
      
       fs/ext4/extents.c | 96 ++++++++++++++++++++++++-------------------------------
       1 file changed, 42 insertions(+), 54 deletions(-)
      f282ac19
  4. Mar 14, 2014
  5. Mar 13, 2014
    • Theodore Ts'o's avatar
      fs: push sync_filesystem() down to the file system's remount_fs() · 02b9984d
      Theodore Ts'o authored
      
      
      Previously, the no-op "mount -o mount /dev/xxx" operation when the
      file system is already mounted read-write causes an implied,
      unconditional syncfs().  This seems pretty stupid, and it's certainly
      documented or guaraunteed to do this, nor is it particularly useful,
      except in the case where the file system was mounted rw and is getting
      remounted read-only.
      
      However, it's possible that there might be some file systems that are
      actually depending on this behavior.  In most file systems, it's
      probably fine to only call sync_filesystem() when transitioning from
      read-write to read-only, and there are some file systems where this is
      not needed at all (for example, for a pseudo-filesystem or something
      like romfs).
      
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Cc: Jan Kara <jack@suse.cz>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Anders Larsen <al@alarsen.net>
      Cc: Phillip Lougher <phillip@squashfs.org.uk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: Petr Vandrovec <petr@vandrovec.name>
      Cc: xfs@oss.sgi.com
      Cc: linux-btrfs@vger.kernel.org
      Cc: linux-cifs@vger.kernel.org
      Cc: samba-technical@lists.samba.org
      Cc: codalist@coda.cs.cmu.edu
      Cc: linux-ext4@vger.kernel.org
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Cc: fuse-devel@lists.sourceforge.net
      Cc: cluster-devel@redhat.com
      Cc: linux-mtd@lists.infradead.org
      Cc: jfs-discussion@lists.sourceforge.net
      Cc: linux-nfs@vger.kernel.org
      Cc: linux-nilfs@vger.kernel.org
      Cc: linux-ntfs-dev@lists.sourceforge.net
      Cc: ocfs2-devel@oss.oracle.com
      Cc: reiserfs-devel@vger.kernel.org
      02b9984d
    • Theodore Ts'o's avatar
      jbd2: improve error messages for inconsistent journal heads · 66a4cb18
      Theodore Ts'o authored
      Fix up error messages printed when the transaction pointers in a
      journal head are inconsistent.  This improves the error messages which
      are printed when running xfstests generic/068 in data=journal mode.
      See the bug report at: https://bugzilla.kernel.org/show_bug.cgi?id=60786
      
      
      
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      66a4cb18
  6. Mar 09, 2014
  7. Mar 04, 2014
    • Jan Kara's avatar
      ext4: Speedup WB_SYNC_ALL pass called from sync(2) · 10542c22
      Jan Kara authored
      
      
      When doing filesystem wide sync, there's no need to force transaction
      commit (or synchronously write inode buffer) separately for each inode
      because ext4_sync_fs() takes care of forcing commit at the end (VFS
      takes care of flushing buffer cache, respectively). Most of the time
      this slowness doesn't manifest because previous WB_SYNC_NONE writeback
      doesn't leave much to write but when there are processes aggressively
      creating new files and several filesystems to sync, the sync slowness
      can be noticeable. In the following test script sync(1) takes around 6
      minutes when there are two ext4 filesystems mounted on a standard SATA
      drive. After this patch sync takes a couple of seconds so we have about
      two orders of magnitude improvement.
      
            function run_writers
            {
              for (( i = 0; i < 10; i++ )); do
                mkdir $1/dir$i
                for (( j = 0; j < 40000; j++ )); do
                  dd if=/dev/zero of=$1/dir$i/$j bs=4k count=4 &>/dev/null
                done &
              done
            }
      
            for dir in "$@"; do
              run_writers $dir
            done
      
            sleep 40
            time sync
      
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      10542c22
  8. Feb 24, 2014
  9. Feb 22, 2014
  10. Feb 21, 2014
    • Darrick J. Wong's avatar
      ext4: merge uninitialized extents · a9b82415
      Darrick J. Wong authored
      
      
      Allow for merging uninitialized extents.
      
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      a9b82415
    • Maxim Patlasov's avatar
      ext4: avoid exposure of stale data in ext4_punch_hole() · e251f9bc
      Maxim Patlasov authored
      
      
      While handling punch-hole fallocate, it's useless to truncate page cache
      before removing the range from extent tree (or block map in indirect case)
      because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
      read) immediately after truncating page cache, but before updating extent
      tree (or block map). In that case the user will see stale data even after
      fallocate is completed.
      
      Until the problem of data corruption resulting from pages backed by
      already freed blocks is fully resolved, the simple thing we can do now
      is to add another truncation of pagecache after punch hole is done.
      
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      e251f9bc
    • Eric Whitney's avatar
      ext4: silence warnings in extent status tree debugging code · ce140cdd
      Eric Whitney authored
      
      
      Adjust the conversion specifications in a few optionally compiled debug
      messages to match the return type of ext4_es_status().  Also, make a
      couple of minor grammatical message edits while we're at it.
      
      Signed-off-by: default avatarEric Whitney <enwlinux@gmail.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      ce140cdd
    • Eric Sandeen's avatar
      ext4: remove unused ac_ex_scanned · dc9ddd98
      Eric Sandeen authored
      
      
      When looking at a bug report with:
      
      > kernel: EXT4-fs: 0 scanned, 0 found
      
      I thought wow, 0 scanned, that's odd?  But it's not odd; it's printing
      a variable that is initialized to 0 and never touched again.
      
      It's never been used since the original merge, so I don't really even
      know what the original intent was, either.
      
      If anyone knows how to hook it up, speak now via patch, otherwise just
      yank it so it's not making a confusing situation more confusing in
      kernel logs.
      
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      dc9ddd98
    • Theodore Ts'o's avatar
      ext4: avoid possible overflow in ext4_map_blocks() · e861b5e9
      Theodore Ts'o authored
      
      
      The ext4_map_blocks() function returns the number of blocks which
      satisfying the caller's request.  This number of blocks requested by
      the caller is specified by an unsigned integer, but the return value
      of ext4_map_blocks() is a signed integer (to accomodate error codes
      per the kernel's standard error signalling convention).
      
      Historically, overflows could never happen since mballoc() will refuse
      to allocate more than 2048 blocks at a time (which is something we
      should fix), and if the blocks were already allocated, the fact that
      there would be some number of intervening metadata blocks pretty much
      guaranteed that there could never be a contiguous region of data
      blocks that was greater than 2**31 blocks.
      
      However, this is now possible if there is a file system which is a bit
      bigger than 8TB, and is created using the new mke2fs hugeblock
      feature, which can create a perfectly contiguous file.  In that case,
      if a userspace program attempted to call fallocate() on this already
      fully allocated file, it's possible that ext4_map_blocks() could
      return a number large enough that it would overflow a signed integer,
      resulting in a ext4 thinking that the ext4_map_blocks() call had
      failed with some strange error code.
      
      Since ext4_map_blocks() is always free to return a smaller number of
      blocks than what was requested by the caller, fix this by capping the
      number of blocks that ext4_map_blocks() will ever try to map to 2**31
      - 1.  In practice this should never get hit, except by someone
      deliberately trying to provke the above-described bug.
      
      Thanks to the PaX team for asking whethre this could possibly happen
      in some off-line discussions about using some static code checking
      technology they are developing to find bugs in kernel code.
      
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      e861b5e9
  11. Feb 20, 2014
    • Theodore Ts'o's avatar
      ext4: make sure ex.fe_logical is initialized · ab0c00fc
      Theodore Ts'o authored
      
      
      The lowest levels of mballoc set all of the fields of struct
      ext4_free_extent except for fe_logical, since they are just trying to
      find the requested free set of blocks, and the logical block hasn't
      been set yet.  This makes some static code checkers sad.  Set it to
      various different debug values, which would be useful when
      debugging mballoc if these values were to ever show up due to the
      parts of mballoc triyng to use ac->ac_b_ex.fe_logical before it is
      properly upper layers of mballoc failing to properly set, usually by
      ext4_mb_use_best_found().
      
      Addresses-Coverity-Id: #139697
      Addresses-Coverity-Id: #139698
      Addresses-Coverity-Id: #139699
      
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      
      ab0c00fc
    • Theodore Ts'o's avatar
      ext4: don't calculate total xattr header size unless needed · 7b1b2c1b
      Theodore Ts'o authored
      
      
      The function ext4_expand_extra_isize_ea() doesn't need the size of all
      of the extended attribute headers.  So if we don't calculate it when
      it is unneeded, it we can skip some undeeded memory references, and as
      a bonus, we eliminate some kvetching by static code analysis tools.
      
      Addresses-Coverity-Id: #741291
      
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      7b1b2c1b
    • Theodore Ts'o's avatar
      ext4: add ext4_es_store_pblock_status() · 9a6633b1
      Theodore Ts'o authored
      
      
      Avoid false positives by static code analysis tools such as sparse and
      coverity caused by the fact that we set the physical block, and then
      the status in the extent_status structure.  It is also more efficient
      to set both of these values at once.
      
      Addresses-Coverity-Id: #989077
      Addresses-Coverity-Id: #989078
      Addresses-Coverity-Id: #1080722
      
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      9a6633b1
    • Eric Whitney's avatar
      ext4: fix error return from ext4_ext_handle_uninitialized_extents() · ce37c429
      Eric Whitney authored
      Commit 37794732
      
       breaks the return of error codes from
      ext4_ext_handle_uninitialized_extents() in ext4_ext_map_blocks().  A
      portion of the patch assigns that function's signed integer return
      value to an unsigned int.  Consequently, negatively valued error codes
      are lost and can be treated as a bogus allocated block count.
      
      Signed-off-by: default avatarEric Whitney <enwlinux@gmail.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      ce37c429
  12. Feb 18, 2014
  13. Feb 17, 2014