Skip to content
  1. Apr 16, 2015
    • Dave Chinner's avatar
      Merge branch 'xfs-dio-extend-fix' into for-next · 542c3118
      Dave Chinner authored
      Conflicts:
      	fs/xfs/xfs_file.c
      542c3118
    • Dave Chinner's avatar
      xfs: using generic_file_direct_write() is unnecessary · 0cefb29e
      Dave Chinner authored
      
      
      generic_file_direct_write() does all sorts of things to make DIO
      work "sorta ok" with mixed buffered IO workloads. We already do
      most of this work in xfs_file_aio_dio_write() because of the locking
      requirements, so there's only a couple of things it does for us.
      
      The first thing is that it does a page cache invalidation after the
      ->direct_IO callout. This can easily be added to the XFS code.
      
      The second thing it does is that if data was written, it updates the
      iov_iter structure to reflect the data written, and then does EOF
      size updates if necessary. For XFS, these EOF size updates are now
      not necessary, as we do them safely and race-free in IO completion
      context. That leaves just the iov_iter update, and that's also moved
      to the XFS code.
      
      Therefore we don't need to call generic_file_direct_write() and in
      doing so remove redundant buffered writeback and page cache
      invalidation calls from the DIO submission path. We also remove a
      racy EOF size update, and make the DIO submission code in XFS much
      easier to follow. Wins all round, really.
      
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      0cefb29e
    • Dave Chinner's avatar
      xfs: direct IO EOF zeroing needs to drain AIO · 40c63fbc
      Dave Chinner authored
      
      
      When we are doing AIO DIO writes, the IOLOCK only provides an IO
      submission barrier. When we need to do EOF zeroing, we need to ensure
      that no other IO is in progress and all pending in-core EOF updates
      have been completed. This requires us to wait for all outstanding
      AIO DIO writes to the inode to complete and, if necessary, run their
      EOF updates.
      
      Once all the EOF updates are complete, we can then restart
      xfs_file_aio_write_checks() while holding the IOLOCK_EXCL, knowing
      that EOF is up to date and we have exclusive IO access to the file
      so we can run EOF block zeroing if we need to without interference.
      This gives EOF zeroing the same exclusivity against other IO as we
      provide truncate operations.
      
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      40c63fbc
    • Dave Chinner's avatar
      xfs: DIO write completion size updates race · b9d59846
      Dave Chinner authored
      
      
      xfs_end_io_direct_write() can race with other IO completions when
      updating the in-core inode size. The IO completion processing is not
      serialised for direct IO - they are done either under the
      IOLOCK_SHARED for non-AIO DIO, and without any IOLOCK held at all
      during AIO DIO completion. Hence the non-atomic test-and-set update
      of the in-core inode size is racy and can result in the in-core
      inode size going backwards if the race if hit just right.
      
      If the inode size goes backwards, this can trigger the EOF zeroing
      code to run incorrectly on the next IO, which then will zero data
      that has successfully been written to disk by a previous DIO.
      
      To fix this bug, we need to serialise the test/set updates of the
      in-core inode size. This first patch introduces locking around the
      relevant updates and checks in the DIO path. Because we now have an
      ioend in xfs_end_io_direct_write(), we know exactly then we are
      doing an IO that requires an in-core EOF update, and we know that
      they are not running in interrupt context. As such, we do not need to
      use irqsave() spinlock variants to protect against interrupts while
      the lock is held.
      
      Hence we can use an existing spinlock in the inode to do this
      serialisation and so not need to grow the struct xfs_inode just to
      work around this problem.
      
      This patch does not address the test/set EOF update in
      generic_file_write_direct() for various reasons - that will be done
      as a followup with separate explanation.
      
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      b9d59846
    • Dave Chinner's avatar
      xfs: DIO writes within EOF don't need an ioend · a06c277a
      Dave Chinner authored
      
      
      DIO writes that lie entirely within EOF have nothing to do in IO
      completion. In this case, we don't need no steekin' ioend, and so we
      can avoid allocating an ioend until we have a mapping that spans
      EOF.
      
      This means that IO completion has two contexts - deferred completion
      to the dio workqueue that uses an ioend, and interrupt completion
      that does nothing because there is nothing that can be done in this
      context.
      
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      a06c277a
    • Dave Chinner's avatar
      xfs: handle DIO overwrite EOF update completion correctly · 6dfa1b67
      Dave Chinner authored
      
      
      Currently a DIO overwrite that extends the EOF (e.g sub-block IO or
      write into allocated blocks beyond EOF) requires a transaction for
      the EOF update. Thi is done in IO completion context, but we aren't
      explicitly handling this situation properly and so it can run in
      interrupt context. Ensure that we defer IO that spans EOF correctly
      to the DIO completion workqueue, and now that we have an ioend in IO
      completion we can use the common ioend completion path to do all the
      work.
      
      Note: we do not preallocate the append transaction as we can have
      multiple mapping and allocation calls per direct IO. hence
      preallocating can still leave us with nested transactions by
      attempting to map and allocate more blocks after we've preallocated
      an append transaction.
      
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      6dfa1b67
    • Dave Chinner's avatar
      xfs: DIO needs an ioend for writes · d5cc2e3f
      Dave Chinner authored
      
      
      Currently we can only tell DIO completion that an IO requires
      unwritten extent completion. This is done by a hacky non-null
      private pointer passed to Io completion, but the private pointer
      does not actually contain any information that is used.
      
      We also need to pass to IO completion the fact that the IO may be
      beyond EOF and so a size update transaction needs to be done. This
      is currently determined by checks in the io completion, but we need
      to determine if this is necessary at block mapping time as we need
      to defer the size update transactions to a completion workqueue,
      just like unwritten extent conversion.
      
      To do this, first we need to allocate and pass an ioend to to IO
      completion. Add this for unwritten extent conversion; we'll do the
      EOF updates in the next commit.
      
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      d5cc2e3f
    • Dave Chinner's avatar
      xfs: move DIO mapping size calculation · 1fdca9c2
      Dave Chinner authored
      
      
      The mapping size calculation is done last in __xfs_get_blocks(), but
      we are going to need the actual mapping size we will use to map the
      direct IO correctly in xfs_map_direct(). Factor out the calculation
      for code clarity, and move the call to be the first operation in
      mapping the extent to the returned buffer.
      
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      1fdca9c2
    • Dave Chinner's avatar
      xfs: factor DIO write mapping from get_blocks · a719370b
      Dave Chinner authored
      
      
      Clarify and separate the buffer mapping logic so that the direct IO mapping is
      not tangled up in propagating the extent status to teh mapping buffer. This
      makes it easier to extend the direct IO mapping to use an ioend in future.
      
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      a719370b
  2. Apr 13, 2015
  3. Mar 25, 2015
  4. Feb 24, 2015