Skip to content
  1. Aug 08, 2018
  2. Aug 07, 2018
    • Bob Peterson's avatar
      gfs2: Fix gfs2_testbit to use clone bitmaps · dffe12a8
      Bob Peterson authored
      
      
      Function gfs2_testbit is called in three places. Two of those places,
      gfs2_alloc_extent and gfs2_unaligned_extlen, should be using the clone
      bitmaps, not the "real" bitmaps. Function gfs2_unaligned_extlen is used
      by the block reservations scheme to determine the length of an extent of
      free blocks. Before this patch, it wasn't using the clone bitmap, which
      means recently-freed blocks were treated as free blocks for the purposes
      of an allocation.
      
      This patch adds a new parameter to gfs2_testbit to indicate whether or
      not the clone bitmaps should be used (if available).
      
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      dffe12a8
  3. Aug 03, 2018
  4. Jul 27, 2018
  5. Jul 26, 2018
    • Andreas Gruenbacher's avatar
      gfs2: Special-case rindex for gfs2_grow · 77612578
      Andreas Gruenbacher authored
      
      
      To speed up the common case of appending to a file,
      gfs2_write_alloc_required presumes that writing beyond the end of a file
      will always require additional blocks to be allocated.  This assumption
      is incorrect for preallocates files, but there are no negative
      consequences as long as *some* space is still left on the filesystem.
      
      One special file that always has some space preallocated beyond the end
      of the file is the rindex: when growing a filesystem, gfs2_grow adds one
      or more new resource groups and appends records describing those
      resource groups to the rindex; the preallocated space ensures that this
      is always possible.
      
      However, when a filesystem is completely full, gfs2_write_alloc_required
      will indicate that an additional allocation is required, and appending
      the next record to the rindex will fail even though space for that
      record has already been preallocated.  To fix that, skip the incorrect
      optimization in gfs2_write_alloc_required, but for the rindex only.
      Other writes to preallocated space beyond the end of the file are still
      allowed to fail on completely full filesystems.
      
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Reviewed-by: default avatarBob Peterson <rpeterso@redhat.com>
      77612578
  6. Jul 25, 2018
    • Bob Peterson's avatar
      GFS2: rgrp free blocks used incorrectly · f6753df3
      Bob Peterson authored
      
      
      Before this patch, several functions in rgrp.c checked the value of
      rgd->rd_free_clone. That does not take into account blocks that were
      reserved by a multi-block reservation. This causes a problem when
      space gets tight in the file system. For example, when function
      gfs2_inplace_reserve checks to see if a rgrp has enough blocks to
      satisfy the request, it can accept a rgrp that it should reject
      because, although there are enough blocks to satisfy the request
      _now_, those blocks may be reserved for another running process.
      
      A second problem with this occurs when we've reserved the remaining
      blocks in an rgrp: function rg_mblk_search() can reject an rgrp
      improperly because it calculates:
      
         u32 free_blocks = rgd->rd_free_clone - rgd->rd_reserved;
      
      But rd_reserved includes blocks that the current process just
      reserved in its own call to inplace_reserve. For example, it can
      reserve the last 128 blocks of an rgrp, then reject that same rgrp
      because the above calculates out to free_blocks = 0;
      
      Consequences include, but are not limited to, (1) leaving holes,
      and thus increasing file system fragmentation, and (2) reporting
      file system is full long before it actually is.
      
      This patch introduces a new function, rgd_free, which returns the
      number of clone-free blocks (blocks that are truly free as opposed
      to blocks that are still being used because an unlinked file is
      still open) minus the number of blocks reserved by processes, but
      not counting the blocks we ourselves reserved (because obviously
      we need to allocate them).
      
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      f6753df3
    • Colin Ian King's avatar
      gfs2: remove redundant variable 'moved' · d1b0cb93
      Colin Ian King authored
      Variable 'moved' s being assigned but is never used hence it is
      redundant and can be removed.  This has been the case ever since commit
      c752666c
      
      .
      
      Cleans up clang warning:
      warning: variable 'moved' set but not used [-Wunused-but-set-variable]
      
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      d1b0cb93
    • Andreas Gruenbacher's avatar
      gfs2: use iomap_readpage for blocksize == PAGE_SIZE · f95cbb44
      Andreas Gruenbacher authored
      
      
      We only use iomap_readpage for pages that don't have buffer heads
      attached yet: iomap_readpage would otherwise read pages from disk that
      are marked buffer_uptodate() but not PageUptodate().  Those pages may
      actually contain data more recent than what's on disk.
      
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Reviewed-by: default avatarBob Peterson <rpeterso@redhat.com>
      f95cbb44
    • Andreas Gruenbacher's avatar
      gfs2: Use iomap for stuffed direct I/O reads · 1d45bb7f
      Andreas Gruenbacher authored
      
      
      Remove the fallback code from direct to buffered I/O for stuffed reads.
      
      For stuffed writes, we must keep the fallback code: the deferred glock
      we are holding under direct I/O doesn't allow to write to the inode or
      change the file size.
      
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Reviewed-by: default avatarBob Peterson <rpeterso@redhat.com>
      1d45bb7f
    • Andreas Gruenbacher's avatar
      Merge branch 'iomap-4.19-merge' into linux-gfs2/for-next · 0ed91eca
      Andreas Gruenbacher authored
      
      
      Merge xfs branch 'iomap-4.19-merge' into linux-gfs2/for-next.  This
      brings in readpage and direct I/O support for inline data.
      
      The IOMAP_F_BUFFER_HEAD flag introduced in commit "iomap: add initial
      support for writes without buffer heads" needs to be set for gfs2 as
      well, so do that in the merge.
      
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      0ed91eca
    • Andreas Gruenbacher's avatar
      gfs2: fallocate_chunk: Always initialize struct iomap · c2589282
      Andreas Gruenbacher authored
      
      
      In fallocate_chunk, always initialize the iomap before calling
      gfs2_iomap_get_alloc: future changes could otherwise cause things like
      iomap.flags to leak across calls.
      
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Reviewed-by: default avatarBob Peterson <rpeterso@redhat.com>
      c2589282
    • Bob Peterson's avatar
      GFS2: Fix recovery issues for spectators · 4a772772
      Bob Peterson authored
      
      
      This patch fixes a couple problems dealing with spectators who
      remain with gfs2 mounts after the last non-spectator node fails.
      
      Before this patch, spectator mounts would try to acquire the dlm's
      mounted lock EX as part of its normal recovery sequence.
      The mounted lock is only used to determine whether the node is
      the first mounter, the first node to mount the file system, for
      the purposes of file system recovery and journal replay.
      
      It's not necessary for spectators: they should never do journal
      recovery. If they acquire the lock it will prevent another "real"
      first-mounter from acquiring the lock in EX mode, which means it
      also cannot do journal recovery because it doesn't think it's the
      first node to mount the file system.
      
      This patch checks if the mounter is a spectator, and if so, avoids
      grabbing the mounted lock. This allows a secondary mounter who is
      really the first non-spectator mounter, to do journal recovery:
      since the spectator doesn't acquire the lock, it can grab it in
      EX mode, and therefore consider itself to be the first mounter
      both as a "real" first mount, and as a first-real-after-spectator.
      
      Note that the control lock still needs to be taken in PR mode
      in order to fetch the lvb value so it has the current status of
      all journal's recovery. This is used as it is today by a first
      mounter to replay the journals. For spectators, it's merely
      used to fetch the status bits. All recovery is bypassed and the
      node waits until recovery is completed by a non-spectator node.
      
      I also improved the cryptic message given by control_mount when
      a spectator is waiting for a non-spectator to perform recovery.
      
      It also fixes a problem in gfs2_recover_set whereby spectators
      were never queueing recovery work for their own journal.
      They cannot do recovery themselves, but they still need to queue
      the work so they can check the recovery bits and clear the
      DFL_BLOCK_LOCKS bit once the recovery happens on another node.
      
      When the work queue runs on a spectator, it bypasses most of the
      work so it won't print a bunch of annoying messages. All it will
      print is a bunch of messages that look like this until recovery
      completes on the non-spectator node:
      
      GFS2: fsid=mycluster:scratch.s: recover generation 3 jid 0
      GFS2: fsid=mycluster:scratch.s: recover jid 0 result busy
      
      These continue every 1.5 seconds until the recovery is done by
      the non-spectator, at which time it says:
      
      GFS2: fsid=mycluster:scratch.s: recover generation 4 done
      
      Then it proceeds with its mount.
      
      If the file system is mounted in spectator node and the last
      remaining non-spectator is fenced, any IO to the file system is
      blocked by dlm and the spectator waits until recovery is
      performed by a non-spectator.
      
      If a spectator tries to mount the file system before any
      non-spectators, it blocks and repeatedly gives this kernel
      message:
      
      GFS2: fsid=mycluster:scratch: Recovery is required. Waiting for a non-spectator to mount.
      GFS2: fsid=mycluster:scratch: Recovery is required. Waiting for a non-spectator to mount.
      
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      4a772772
    • Andreas Gruenbacher's avatar
      Merge branch 'iomap-write' into linux-gfs2/for-next · a3479c7f
      Andreas Gruenbacher authored
      
      
      Pull in the gfs2 iomap-write changes: Tweak the existing code to
      properly support iomap write and eliminate an unnecessary special case
      in gfs2_block_map.  Implement iomap write support for buffered and
      direct I/O.  Simplify some of the existing code and eliminate code that
      is no longer used:
      
        gfs2: Remove gfs2_write_{begin,end}
        gfs2: iomap direct I/O support
        gfs2: gfs2_extent_length cleanup
        gfs2: iomap buffered write support
        gfs2: Further iomap cleanups
      
      This is based on the following changes on the xfs 'iomap-4.19-merge'
      branch:
      
        iomap: add private pointer to struct iomap
        iomap: add a page_done callback
        iomap: generic inline data handling
        iomap: complete partial direct I/O writes synchronously
        iomap: mark newly allocated buffer heads as new
        fs: factor out a __generic_write_end helper
      
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      a3479c7f
    • Souptick Joarder's avatar
      fs: gfs2: Adding new return type vm_fault_t · 109dbb1e
      Souptick Joarder authored
      Use new return type vm_fault_t for gfs2_page_mkwrite
      handler.
      
      see commit 1c8f4220
      
       ("mm: change return type to
      vm_fault_t") for reference.
      
      Signed-off-by: default avatarSouptick Joarder <jrdr.linux@gmail.com>
      Reviewed-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      109dbb1e
    • Chengguang Xu's avatar
      gfs2: using posix_acl_xattr_size instead of posix_acl_to_xattr · 910f3d58
      Chengguang Xu authored
      
      
      It seems better to get size by calling posix_acl_xattr_size() instead of
      calling posix_acl_to_xattr() with NULL buffer argument.
      
      posix_acl_xattr_size() never returns 0, so remove the unnecessary check.
      
      Signed-off-by: default avatarChengguang Xu <cgxu519@gmx.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      910f3d58
    • Bob Peterson's avatar
      gfs2: Don't reject a supposedly full bitmap if we have blocks reserved · e79e0e14
      Bob Peterson authored
      
      
      Before this patch, you could get into situations like this:
      
      1. Process 1 searches for X free blocks, finds them, makes a reservation
      2. Process 2 searches for free blocks in the same rgrp, but now the
         bitmap is full because process 1's reservation is skipped over.
         So it marks the bitmap as GBF_FULL.
      3. Process 1 tries to allocate blocks from its own reservation, but
         since the GBF_FULL bit is set, it skips over the rgrp and searches
         elsewhere, thus not using its own reservation.
      
      This patch adds an additional check to allow processes to use their
      own reservations.
      
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      e79e0e14
  7. Jul 05, 2018
  8. Jul 04, 2018
  9. Jul 02, 2018
  10. Jun 21, 2018
  11. Jun 20, 2018
  12. Jun 17, 2018
    • Linus Torvalds's avatar
      Linux 4.18-rc1 · ce397d21
      Linus Torvalds authored
      v4.18-rc1
      ce397d21
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20180616' of git://git.kernel.dk/linux-block · 265c5596
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A collection of fixes that should go into -rc1. This contains:
      
         - bsg_open vs bsg_unregister race fix (Anatoliy)
      
         - NVMe pull request from Christoph, with fixes for regressions in
           this window, FC connect/reconnect path code unification, and a
           trace point addition.
      
         - timeout fix (Christoph)
      
         - remove a few unused functions (Christoph)
      
         - blk-mq tag_set reinit fix (Roman)"
      
      * tag 'for-linus-20180616' of git://git.kernel.dk/linux-block:
        bsg: fix race of bsg_open and bsg_unregister
        block: remov blk_queue_invalidate_tags
        nvme-fabrics: fix and refine state checks in __nvmf_check_ready
        nvme-fabrics: handle the admin-only case properly in nvmf_check_ready
        nvme-fabrics: refactor queue ready check
        blk-mq: remove blk_mq_tagset_iter
        nvme: remove nvme_reinit_tagset
        nvme-fc: fix nulling of queue data on reconnect
        nvme-fc: remove reinit_request routine
        blk-mq: don't time out requests again that are in the timeout handler
        nvme-fc: change controllers first connect to use reconnect path
        nvme: don't rely on the changed namespace list log
        nvmet: free smart-log buffer after use
        nvme-rdma: fix error flow during mapping request data
        nvme: add bio remapping tracepoint
        nvme: fix NULL pointer dereference in nvme_init_subsystem
        blk-mq: reinit q->tag_set_list entry only after grace period
      265c5596