Skip to content
  1. Jan 25, 2014
    • Eric Sandeen's avatar
      xfs: allow logical-sector sized O_DIRECT · 7c71ee78
      Eric Sandeen authored
      
      
      Some time ago, mkfs.xfs started picking the storage physical
      sector size as the default filesystem "sector size" in order
      to avoid RMW costs incurred by doing IOs at logical sector
      size alignments.
      
      However, this means that for a filesystem made with i.e.
      a 4k sector size on an "advanced format" 4k/512 disk,
      512-byte direct IOs are no longer allowed.  This means
      that XFS has essentially turned this AF drive into a hard
      4K device, from the filesystem on up.
      
      XFS's mkfs-specified "sector size" is really just controlling
      the minimum size & alignment of filesystem metadata.
      
      There is no real need to tightly couple XFS's minimal
      metadata size to the minimum allowed direct IO size;
      XFS can continue doing metadata in optimal sizes, but
      still allow smaller DIOs for apps which issue them,
      for whatever reason.
      
      This patch adds a new field to the xfs_buftarg, so that
      we now track 2 sizes:
      
       1) The metadata sector size, which is the minimum unit and
          alignment of IO which will be performed by metadata operations.
       2) The device logical sector size
      
      The first is used internally by the file system for metadata
      alignment and IOs.
      The second is used for the minimum allowed direct IO alignment.
      
      This has passed xfstests on filesystems made with 4k sectors,
      including when run under the patch I sent to ignore
      XFS_IOC_DIOINFO, and issue 512 DIOs anyway.  I also directly
      tested end of block behavior on preallocated, sparse, and
      existing files when we do a 512 IO into a 4k file on a 
      4k-sector filesystem, to be sure there were no unexpected
      behaviors.
      
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      7c71ee78
    • Eric Sandeen's avatar
      xfs: rename xfs_buftarg structure members · 6da54179
      Eric Sandeen authored
      
      
      In preparation for adding new members to the structure,
      give these old ones more descriptive names:
      
      	bt_ssize -> bt_meta_sectorsize
      	bt_smask -> bt_meta_sectormask
      
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      6da54179
    • Eric Sandeen's avatar
      xfs: clean up xfs_buftarg · f0bc9985
      Eric Sandeen authored
      
      
      Clean up the xfs_buftarg structure a bit:
      - remove bt_bsize which is never used
      - replace bt_sshift with bt_ssize; we only ever shift it back
      
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      f0bc9985
  2. Jan 10, 2014
  3. Jan 07, 2014
    • Jie Liu's avatar
      xfs: fix off-by-one error in xfs_attr3_rmt_verify · 85dd0707
      Jie Liu authored
      
      
      With CRC check is enabled, if trying to set an attributes value just
      equal to the maximum size of XATTR_SIZE_MAX would cause the v3 remote
      attr write verification procedure failure, which would yield the back
      trace like below:
      
      <snip>
      XFS (sda7): Internal error xfs_attr3_rmt_write_verify at line 191 of file fs/xfs/xfs_attr_remote.c
      <snip>
      Call Trace:
      [<ffffffff816f0042>] dump_stack+0x45/0x56
      [<ffffffffa0d99c8b>] xfs_error_report+0x3b/0x40 [xfs]
      [<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
      [<ffffffffa0d99ce5>] xfs_corruption_error+0x55/0x80 [xfs]
      [<ffffffffa0dbef6b>] xfs_attr3_rmt_write_verify+0x14b/0x1a0 [xfs]
      [<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
      [<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
      [<ffffffffa0d96edd>] _xfs_buf_ioapply+0x6d/0x390 [xfs]
      [<ffffffff81184cda>] ? vm_map_ram+0x31a/0x460
      [<ffffffff81097230>] ? wake_up_state+0x20/0x20
      [<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
      [<ffffffffa0d9726b>] xfs_buf_iorequest+0x6b/0xc0 [xfs]
      [<ffffffffa0d97315>] xfs_bdstrat_cb+0x55/0xb0 [xfs]
      [<ffffffffa0d97906>] xfs_bwrite+0x46/0x80 [xfs]
      [<ffffffffa0dbfa94>] xfs_attr_rmtval_set+0x334/0x490 [xfs]
      [<ffffffffa0db84aa>] xfs_attr_leaf_addname+0x24a/0x410 [xfs]
      [<ffffffffa0db8893>] xfs_attr_set_int+0x223/0x470 [xfs]
      [<ffffffffa0db8b76>] xfs_attr_set+0x96/0xb0 [xfs]
      [<ffffffffa0db13b2>] xfs_xattr_set+0x42/0x70 [xfs]
      [<ffffffff811df9b2>] generic_setxattr+0x62/0x80
      [<ffffffff811e0213>] __vfs_setxattr_noperm+0x63/0x1b0
      [<ffffffff81307afe>] ? evm_inode_setxattr+0xe/0x10
      [<ffffffff811e0415>] vfs_setxattr+0xb5/0xc0
      [<ffffffff811e054e>] setxattr+0x12e/0x1c0
      [<ffffffff811c6e82>] ? final_putname+0x22/0x50
      [<ffffffff811c708b>] ? putname+0x2b/0x40
      [<ffffffff811cc4bf>] ? user_path_at_empty+0x5f/0x90
      [<ffffffff811bdfd9>] ? __sb_start_write+0x49/0xe0
      [<ffffffff81168589>] ? vm_mmap_pgoff+0x99/0xc0
      [<ffffffff811e07df>] SyS_setxattr+0x8f/0xe0
      [<ffffffff81700c2d>] system_call_fastpath+0x1a/0x1f
      
      Tests:
          setfattr -n user.longxattr -v `perl -e 'print "A"x65536'` testfile
      
      This patch fix it to check the remote EA size is greater than the
      XATTR_SIZE_MAX rather than more than or equal to it, because it's
      valid if the specified EA value size is equal to the limitation as
      per VFS setxattr interface.
      
      Signed-off-by: default avatarJie Liu <jeff.liu@oracle.com>
      Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      85dd0707
  4. Dec 19, 2013
  5. Dec 17, 2013
    • Dave Chinner's avatar
      xfs: abort metadata writeback on permanent errors · ac8809f9
      Dave Chinner authored
      
      
      If we are doing aysnc writeback of metadata, we can get write errors
      but have nobody to report them to. At the moment, we simply attempt
      to reissue the write from io completion in the hope that it's a
      transient error.
      
      When it's not a transient error, the buffer is stuck forever in
      this loop, and we cannot break out of it. Eventually, unmount will
      hang because the AIL cannot be emptied and everything goes downhill
      from them.
      
      To solve this problem, only retry the write IO once before aborting
      it. We don't throw the buffer away because some transient errors can
      last minutes (e.g.  FC path failover) or even hours (thin
      provisioned devices that have run out of backing space) before they
      go away. Hence we really want to keep trying until we can't try any
      more.
      
      Because the buffer was not cleaned, however, it does not get removed
      from the AIL and hence the next pass across the AIL will start IO on
      it again. As such, we still get the "retry forever" semantics that
      we currently have, but we allow other access to the buffer in the
      mean time. Meanwhile the filesystem can continue to modify the
      buffer and relog it, so the IO errors won't hang the log or the
      filesystem.
      
      Now when we are pushing the AIL, we can see all these "permanent IO
      error" buffers and we can issue a warning about failures before we
      retry the IO. We can also catch these buffers when unmounting an
      issue a corruption warning, too.
      
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      ac8809f9
    • Dave Chinner's avatar
      xfs: swalloc doesn't align allocations properly · 33177f05
      Dave Chinner authored
      
      
      When swalloc is specified as a mount option, allocations are
      supposed to be aligned to the stripe width rather than the stripe
      unit of the underlying filesystem. However, it does not do this.
      
      What the implementation does is round up the allocation size to a
      stripe width, hence ensuring that all allocations span a full stripe
      width. It does not, however, ensure that that allocation is aligned
      to a stripe width, and hence the allocations can span multiple
      underlying stripes and so still see RMW cycles for things like
      direct IO on MD RAID.
      
      So, if the swalloc mount option is set, change the allocation
      alignment in xfs_bmap_btalloc() to use the stripe width rather than
      the stripe unit.
      
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBen Myers <bpm@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      33177f05
    • Christoph Hellwig's avatar
      xfs: remove xfsbdstrat error · 83a0adc3
      Christoph Hellwig authored
      
      
      The xfsbdstrat helper is a small but useless wrapper for xfs_buf_iorequest that
      handles the case of a shut down filesystem.  Most of the users have private,
      uncached buffers that can just be freed in this case, but the complex error
      handling in xfs_bioerror_relse messes up the case when it's called without
      a locked buffer.
      
      Remove xfsbdstrat and opencode the error handling in the callers.  All but
      one can simply return an error and don't need to deal with buffer state,
      and the one caller that cares about the buffer state could do with a major
      cleanup as well, but we'll defer that to later.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBen Myers <bpm@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      83a0adc3
    • Dave Chinner's avatar
      xfs: align initial file allocations correctly · 6e708bcf
      Dave Chinner authored
      
      
      The function xfs_bmap_isaeof() is used to indicate that an
      allocation is occurring at or past the end of file, and as such
      should be aligned to the underlying storage geometry if possible.
      
      Commit 27a3f8f2 ("xfs: introduce xfs_bmap_last_extent") changed the
      behaviour of this function for empty files - it turned off
      allocation alignment for this case accidentally. Hence large initial
      allocations from direct IO are not getting correctly aligned to the
      underlying geometry, and that is cause write performance to drop in
      alignment sensitive configurations.
      
      Fix it by considering allocation into empty files as requiring
      aligned allocation again.
      
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      
      (cherry picked from commit f9b395a8)
      6e708bcf
    • Namjae Jeon's avatar
      MAINTAINERS: fix incorrect mail address of XFS maintainer · 809625ca
      Namjae Jeon authored
      
      
      When I tried to send the patches to XFS Maintainers,
      I got returned mail included delivery fail message for Dave's mail.
      Maybe, Dave Chinner mail address is incorrect.
      I try to fix it correctly.
      
      Signed-off-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Reviewed-by: default avatarBen Myers <bpm@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      
      (cherry picked from commit db10bddc)
      809625ca
    • Jie Liu's avatar
      xfs: fix infinite loop by detaching the group/project hints from user dquot · 718cc6f8
      Jie Liu authored
      
      
      xfs_quota(8) will hang up if trying to turn group/project quota off
      before the user quota is off, this could be 100% reproduced by:
        # mount -ouquota,gquota /dev/sda7 /xfs
        # mkdir /xfs/test
        # xfs_quota -xc 'off -g' /xfs <-- hangs up
        # echo w > /proc/sysrq-trigger
        # dmesg
      
        SysRq : Show Blocked State
        task                        PC stack   pid father
        xfs_quota       D 0000000000000000     0 27574   2551 0x00000000
        [snip]
        Call Trace:
        [<ffffffff81aaa21d>] schedule+0xad/0xc0
        [<ffffffff81aa327e>] schedule_timeout+0x35e/0x3c0
        [<ffffffff8114b506>] ? mark_held_locks+0x176/0x1c0
        [<ffffffff810ad6c0>] ? call_timer_fn+0x2c0/0x2c0
        [<ffffffffa0c25380>] ? xfs_qm_shrink_count+0x30/0x30 [xfs]
        [<ffffffff81aa3306>] schedule_timeout_uninterruptible+0x26/0x30
        [<ffffffffa0c26155>] xfs_qm_dquot_walk+0x235/0x260 [xfs]
        [<ffffffffa0c059d8>] ? xfs_perag_get+0x1d8/0x2d0 [xfs]
        [<ffffffffa0c05805>] ? xfs_perag_get+0x5/0x2d0 [xfs]
        [<ffffffffa0b7707e>] ? xfs_inode_ag_iterator+0xae/0xf0 [xfs]
        [<ffffffffa0c22280>] ? xfs_trans_free_dqinfo+0x50/0x50 [xfs]
        [<ffffffffa0b7709f>] ? xfs_inode_ag_iterator+0xcf/0xf0 [xfs]
        [<ffffffffa0c261e6>] xfs_qm_dqpurge_all+0x66/0xb0 [xfs]
        [<ffffffffa0c2497a>] xfs_qm_scall_quotaoff+0x20a/0x5f0 [xfs]
        [<ffffffffa0c2b8f6>] xfs_fs_set_xstate+0x136/0x180 [xfs]
        [<ffffffff8136cf7a>] do_quotactl+0x53a/0x6b0
        [<ffffffff812fba4b>] ? iput+0x5b/0x90
        [<ffffffff8136d257>] SyS_quotactl+0x167/0x1d0
        [<ffffffff814cf2ee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
        [<ffffffff81abcd19>] system_call_fastpath+0x16/0x1b
      
      It's fine if we turn user quota off at first, then turn off other
      kind of quotas if they are enabled since the group/project dquot
      refcount is decreased to zero once the user quota if off. Otherwise,
      those dquots refcount is non-zero due to the user dquot might refer
      to them as hint(s).  Hence, above operation cause an infinite loop
      at xfs_qm_dquot_walk() while trying to purge dquot cache.
      
      This problem has been around since Linux 3.4, it was introduced by:
        [ b84a3a96 xfs: remove the per-filesystem list of dquots ]
      
      Originally we will release the group dquot pointers because the user
      dquots maybe carrying around as a hint via xfs_qm_detach_gdquots().
      However, with above change, there is no such work to be done before
      purging group/project dquot cache.
      
      In order to solve this problem, this patch introduces a special routine
      xfs_qm_dqpurge_hints(), and it would release the group/project dquot
      pointers the user dquots maybe carrying around as a hint, and then it
      will proceed to purge the user dquot cache if requested.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJie Liu <jeff.liu@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      
      (cherry picked from commit df8052e7)
      718cc6f8
    • Jie Liu's avatar
      xfs: fix assertion failure at xfs_setattr_nonsize · 5c227278
      Jie Liu authored
      
      
      For CRC enabled v5 super block, change a file's ownership can simply
      trigger an ASSERT failure at xfs_setattr_nonsize() if both group and
      project quota are enabled, i.e,
      
      [  305.337609] XFS: Assertion failed: !XFS_IS_PQUOTA_ON(mp), file: fs/xfs/xfs_iops.c, line: 621
      [  305.339250] Kernel BUG at ffffffffa0a7fa32 [verbose debug info unavailable]
      [  305.383939] Call Trace:
      [  305.385536]  [<ffffffffa0a7d95a>] xfs_setattr_nonsize+0x69a/0x720 [xfs]
      [  305.387142]  [<ffffffffa0a7dea9>] xfs_vn_setattr+0x29/0x70 [xfs]
      [  305.388727]  [<ffffffff811ca388>] notify_change+0x1a8/0x350
      [  305.390298]  [<ffffffff811ac39d>] chown_common+0xfd/0x110
      [  305.391868]  [<ffffffff811ad6bf>] SyS_fchownat+0xaf/0x110
      [  305.393440]  [<ffffffff811ad760>] SyS_lchown+0x20/0x30
      [  305.394995]  [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f
      [  305.399870] RIP  [<ffffffffa0a7fa32>] assfail+0x22/0x30 [xfs]
      
      This fix adjust the assertion to check if the super block support both
      quota inodes or not.
      
      Signed-off-by: default avatarJie Liu <jeff.liu@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 5a01dd54)
      5c227278
    • Jie Liu's avatar
      xfs: fix false assertion at xfs_qm_vop_create_dqattach · 30d161c9
      Jie Liu authored
      
      
      After the previous fix, there still has another ASSERT failure if turning
      off any type of quota while fsstress is running at the same time.
      
      Backtrace in this case:
      
      [   50.867897] XFS: Assertion failed: XFS_IS_GQUOTA_ON(mp), file: fs/xfs/xfs_qm.c, line: 2118
      [   50.867924] ------------[ cut here ]------------
      ... <snip>
      [   50.867957] Kernel BUG at ffffffffa0b55a32 [verbose debug info unavailable]
      [   50.867999] invalid opcode: 0000 [#1] SMP
      [   50.869407] Call Trace:
      [   50.869446]  [<ffffffffa0bc408a>] xfs_qm_vop_create_dqattach+0x19a/0x2d0 [xfs]
      [   50.869512]  [<ffffffffa0b9cc45>] xfs_create+0x5c5/0x6a0 [xfs]
      [   50.869564]  [<ffffffffa0b5307c>] xfs_vn_mknod+0xac/0x1d0 [xfs]
      [   50.869615]  [<ffffffffa0b531d6>] xfs_vn_mkdir+0x16/0x20 [xfs]
      [   50.869655]  [<ffffffff811becd5>] vfs_mkdir+0x95/0x130
      [   50.869689]  [<ffffffff811bf63a>] SyS_mkdirat+0xaa/0xe0
      [   50.869723]  [<ffffffff811bf689>] SyS_mkdir+0x19/0x20
      [   50.869757]  [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f
      [   50.869793] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 <snip>
      [   50.870003] RIP  [<ffffffffa0b55a32>] assfail+0x22/0x30 [xfs]
      [   50.870050]  RSP <ffff88002941fd60>
      [   50.879251] ---[ end trace c93a2b342341c65b ]---
      
      We're hitting the ASSERT(XFS_IS_*QUOTA_ON(mp)) in xfs_qm_vop_create_dqattach(),
      however the assertion itself is not right IMHO.  While performing quota off, we
      firstly clear the XFS_*QUOTA_ACTIVE bit(s) from struct xfs_mount without taking
      any special locks, see xfs_qm_scall_quotaoff().  Hence there is no guarantee
      that the desired quota is still active.
      
      Signed-off-by: default avatarJie Liu <jeff.liu@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 37eb9706)
      30d161c9
    • Mark Tinguely's avatar
      xfs: fix memory leak in xfs_dir2_node_removename · 3a8c9208
      Mark Tinguely authored
      
      
      Fix the leak of kernel memory in xfs_dir2_node_removename()
      when xfs_dir2_leafn_remove() returns an error code.
      
      Signed-off-by: default avatarMark Tinguely <tinguely@sgi.com>
      Reviewed-by: default avatarBen Myers <bpm@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      
      (cherry picked from commit ef701600)
      3a8c9208
  6. Dec 14, 2013
  7. Dec 13, 2013