Skip to content
  1. Jun 25, 2018
    • Darrick J. Wong's avatar
      xfs: fix fdblocks accounting w/ RMAPBT per-AG reservation · d8cb5e42
      Darrick J. Wong authored
      
      
      In __xfs_ag_resv_init we incorrectly calculate the amount by which to
      decrease fdblocks when reserving blocks for the rmapbt.  Because rmapbt
      allocations do not decrease fdblocks, we must decrease fdblocks by the
      entire size of the requested reservation in order to achieve our goal of
      always having enough free blocks to satisfy an rmapbt expansion.
      
      This is in contrast to the refcountbt/finobt, which /do/ subtract from
      fdblocks whenever they allocate a block.  For this allocation type we
      preserve the existing behavior where we decrease fdblocks only by the
      requested reservation minus the size of the existing tree.
      
      This fixes the problem where the available block counts reported by
      statfs change across a remount if there had been an rmapbt size change
      since mount time.
      
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarAllison Henderson <allison.henderson@oracle.com>
      d8cb5e42
    • Darrick J. Wong's avatar
      xfs: ensure post-EOF zeroing happens after zeroing part of a file · e53c4b59
      Darrick J. Wong authored
      
      
      If a user asks us to zero_range part of a file, the end of the range is
      EOF, and not aligned to a page boundary, invoke writeback of the EOF
      page to ensure that the post-EOF part of the page is zeroed.  This
      ensures that we don't expose stale memory contents via mmap, if in a
      clumsy manner.
      
      Found by running generic/127 when it runs zero_range and mapread at EOF
      one after the other.
      
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarAllison Henderson <allison.henderson@oracle.com>
      e53c4b59
    • Darrick J. Wong's avatar
      xfs: fix off-by-one error in xfs_rtalloc_query_range · a3a374bf
      Darrick J. Wong authored
      In commit 8ad560d2
      
       ("xfs: strengthen rtalloc query range checks")
      we strengthened the input parameter checks in the rtbitmap range query
      function, but introduced an off-by-one error in the process.  The call
      to xfs_rtfind_forw deals with the high key being rextents, but we clamp
      the high key to rextents - 1.  This causes the returned results to stop
      one block short of the end of the rtdev, which is incorrect.
      
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      a3a374bf
    • Darrick J. Wong's avatar
      xfs: fix uninitialized field in rtbitmap fsmap backend · 232d0a24
      Darrick J. Wong authored
      
      
      Initialize the extent count field of the high key so that when we use
      the high key to synthesize an 'unknown owner' record (i.e. used space
      record) at the end of the queried range we have a field with which to
      compute rm_blockcount.  This is not strictly necessary because the
      synthesizer never uses the rm_blockcount field, but we can shut up the
      static code analysis anyway.
      
      Coverity-id: 1437358
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      232d0a24
    • Darrick J. Wong's avatar
      xfs: recheck reflink state after grabbing ILOCK_SHARED for a write · 5bd88d15
      Darrick J. Wong authored
      
      
      The reflink iflag could have changed since the earlier unlocked check,
      so if we got ILOCK_SHARED for a write and but we're now a reflink inode
      we have to switch to ILOCK_EXCL and relock.
      
      This helps us avoid blowing lock assertions in things like generic/166:
      
      XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/xfs_reflink.c, line: 383
      WARNING: CPU: 1 PID: 24707 at fs/xfs/xfs_message.c:104 assfail+0x25/0x30 [xfs]
      Modules linked in: deadline_iosched dm_snapshot dm_bufio ext4 mbcache jbd2 dm_flakey xfs libcrc32c dax_pmem device_dax nd_pmem sch_fq_codel af_packet [last unloaded: scsi_debug]
      CPU: 1 PID: 24707 Comm: xfs_io Not tainted 4.18.0-rc1-djw #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014
      RIP: 0010:assfail+0x25/0x30 [xfs]
      Code: ff 0f 0b c3 90 66 66 66 66 90 48 89 f1 41 89 d0 48 c7 c6 e8 ef 1b a0 48 89 fa 31 ff e8 54 f9 ff ff 80 3d fd ba 0f 00 00 75 03 <0f> 0b c3 0f 0b 66 0f 1f 44 00 00 66 66 66 66 90 48 63 f6 49 89 f9
      RSP: 0018:ffffc90006423ad8 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff880030b65e80 RCX: 0000000000000000
      RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffa01b0447
      RBP: ffffc90006423c10 R08: 0000000000000000 R09: 0000000000000000
      R10: ffff88003d43fc30 R11: f000000000000000 R12: ffff880077cda000
      R13: 0000000000000000 R14: ffffc90006423c30 R15: ffffc90006423bf9
      FS:  00007feba8986800(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000000138ab58 CR3: 000000003d40a000 CR4: 00000000000006a0
      Call Trace:
       xfs_reflink_allocate_cow+0x24c/0x3d0 [xfs]
       xfs_file_iomap_begin+0x6d2/0xeb0 [xfs]
       ? iomap_to_fiemap+0x80/0x80
       iomap_apply+0x5e/0x130
       iomap_dio_rw+0x2e0/0x400
       ? iomap_to_fiemap+0x80/0x80
       ? xfs_file_dio_aio_write+0x133/0x4a0 [xfs]
       xfs_file_dio_aio_write+0x133/0x4a0 [xfs]
       xfs_file_write_iter+0x7b/0xb0 [xfs]
       __vfs_write+0x16f/0x1f0
       vfs_write+0xc8/0x1c0
       ksys_pwrite64+0x74/0x90
       do_syscall_64+0x56/0x180
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      5bd88d15
    • Darrick J. Wong's avatar
      xfs: don't allow insert-range to shift extents past the maximum offset · f62cb48e
      Darrick J. Wong authored
      
      
      Zorro Lang reports that generic/485 blows an assert on a filesystem with
      512 byte blocks.  The test tries to fallocate a post-eof extent at the
      maximum file size and calls insert range to shift the extents right by
      two blocks.  On a 512b block filesystem this causes startoff to overflow
      the 54-bit startoff field, leading to the assert.
      
      Therefore, always check the rightmost extent to see if it would overflow
      prior to invoking the insert range machinery.
      
      Reported-by: default avatar <zlang@redhat.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200137
      
      
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      f62cb48e
    • Darrick J. Wong's avatar
      xfs: don't trip over negative free space in xfs_reserve_blocks · aafe12ce
      Darrick J. Wong authored
      
      
      If we somehow end up with a filesystem that has fewer free blocks than
      the blocks set aside to avoid ENOSPC deadlocks, it's possible that the
      free space calculation in xfs_reserve_blocks will spit out a negative
      number (because percpu_counter_sum returns s64).  We fail to notice
      this negative number and set fdblks_delta to it.  Now we increment
      fdblocks(!) and the unsigned type of m_resblks means that we end up
      setting a ridiculously huge m_resblks reservation.
      
      Avoid this comedy of errors by detecting the negative free space and
      returning -ENOSPC.
      
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      aafe12ce
    • Darrick J. Wong's avatar
      xfs: allow empty transactions while frozen · 10ee2526
      Darrick J. Wong authored
      In commit e89c0413
      
       ("xfs: implement the GETFSMAP ioctl") we
      created the ability to obtain empty transactions.  These transactions
      have no log or block reservations and therefore can't modify anything.
      Since they're also NO_WRITECOUNT they can run while the fs is frozen,
      so we don't need to WARN_ON about that usage.
      
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      10ee2526
  2. Jun 22, 2018
    • Dave Chinner's avatar
      xfs: xfs_iflush_abort() can be called twice on cluster writeback failure · e53946db
      Dave Chinner authored
      
      
      When a corrupt inode is detected during xfs_iflush_cluster, we can
      get a shutdown ASSERT failure like this:
      
      XFS (pmem1): Metadata corruption detected at xfs_symlink_shortform_verify+0x5c/0xa0, inode 0x86627 data fork
      XFS (pmem1): Unmount and run xfs_repair
      XFS (pmem1): xfs_do_force_shutdown(0x8) called from line 3372 of file fs/xfs/xfs_inode.c.  Return address = ffffffff814f4116
      XFS (pmem1): Corruption of in-memory data detected.  Shutting down filesystem
      XFS (pmem1): xfs_do_force_shutdown(0x1) called from line 222 of file fs/xfs/libxfs/xfs_defer.c.  Return address = ffffffff814a8a88
      XFS (pmem1): xfs_do_force_shutdown(0x1) called from line 222 of file fs/xfs/libxfs/xfs_defer.c.  Return address = ffffffff814a8ef9
      XFS (pmem1): Please umount the filesystem and rectify the problem(s)
      XFS: Assertion failed: xfs_isiflocked(ip), file: fs/xfs/xfs_inode.h, line: 258
      .....
      Call Trace:
       xfs_iflush_abort+0x10a/0x110
       xfs_iflush+0xf3/0x390
       xfs_inode_item_push+0x126/0x1e0
       xfsaild+0x2c5/0x890
       kthread+0x11c/0x140
       ret_from_fork+0x24/0x30
      
      Essentially, xfs_iflush_abort() has been called twice on the
      original inode that that was flushed. This happens because the
      inode has been flushed to teh buffer successfully via
      xfs_iflush_int(), and so when another inode is detected as corrupt
      in xfs_iflush_cluster, the buffer is marked stale and EIO, and
      iodone callbacks are run on it.
      
      Running the iodone callbacks walks across the original inode and
      calls xfs_iflush_abort() on it. When xfs_iflush_cluster() returns
      to xfs_iflush(), it runs the error path for that function, and that
      calls xfs_iflush_abort() on the inode a second time, leading to the
      above assert failure as the inode is not flush locked anymore.
      
      This bug has been there a long time.
      
      The simple fix would be to just avoid calling xfs_iflush_abort() in
      xfs_iflush() if we've got a failure from xfs_iflush_cluster().
      However, xfs_iflush_cluster() has magic delwri buffer handling that
      means it may or may not have run IO completion on the buffer, and
      hence sometimes we have to call xfs_iflush_abort() from
      xfs_iflush(), and sometimes we shouldn't.
      
      After reading through all the error paths and the delwri buffer
      code, it's clear that the error handling in xfs_iflush_cluster() is
      unnecessary. If the buffer is delwri, it leaves it on the delwri
      list so that when the delwri list is submitted it sees a shutdown
      fliesystem in xfs_buf_submit() and that marks the buffer stale, EIO
      and runs IO completion. i.e. exactly what xfs+iflush_cluster() does
      when it's not a delwri buffer. Further, marking a buffer stale
      clears the _XBF_DELWRI_Q flag on the buffer, which means when
      submission of the buffer occurs, it just skips over it and releases
      it.
      
      IOWs, the error handling in xfs_iflush_cluster doesn't need to care
      if the buffer is already on a the delwri queue or not - it just
      needs to mark the buffer stale, EIO and run completions. That means
      we can just use the easy fix for xfs_iflush() to avoid the double
      abort.
      
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      e53946db
    • Dave Chinner's avatar
      xfs: More robust inode extent count validation · 23fcb334
      Dave Chinner authored
      
      
      When the inode is in extent format, it can't have more extents that
      fit in the inode fork. We don't currenty check this, and so this
      corruption goes unnoticed by the inode verifiers. This can lead to
      crashes operating on invalid in-memory structures.
      
      Attempts to access such a inode will now error out in the verifier
      rather than allowing modification operations to proceed.
      
      Reported-by: default avatarWen Xu <wen.xu@gatech.edu>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      [darrick: fix a typedef, add some braces and breaks to shut up compiler warnings]
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      23fcb334
    • Christoph Hellwig's avatar
      xfs: simplify xfs_bmap_punch_delalloc_range · e2ac8363
      Christoph Hellwig authored
      
      
      Instead of using xfs_bmapi_read to find delalloc extents and then punch
      them out using xfs_bunmapi, opencode the loop to iterate over the extents
      and call xfs_bmap_del_extent_delay directly.  This both simplifies the
      code and reduces the number of extent tree lookups required.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      e2ac8363
  3. Jun 17, 2018
    • Linus Torvalds's avatar
      Linux 4.18-rc1 · ce397d21
      Linus Torvalds authored
      ce397d21
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20180616' of git://git.kernel.dk/linux-block · 265c5596
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A collection of fixes that should go into -rc1. This contains:
      
         - bsg_open vs bsg_unregister race fix (Anatoliy)
      
         - NVMe pull request from Christoph, with fixes for regressions in
           this window, FC connect/reconnect path code unification, and a
           trace point addition.
      
         - timeout fix (Christoph)
      
         - remove a few unused functions (Christoph)
      
         - blk-mq tag_set reinit fix (Roman)"
      
      * tag 'for-linus-20180616' of git://git.kernel.dk/linux-block:
        bsg: fix race of bsg_open and bsg_unregister
        block: remov blk_queue_invalidate_tags
        nvme-fabrics: fix and refine state checks in __nvmf_check_ready
        nvme-fabrics: handle the admin-only case properly in nvmf_check_ready
        nvme-fabrics: refactor queue ready check
        blk-mq: remove blk_mq_tagset_iter
        nvme: remove nvme_reinit_tagset
        nvme-fc: fix nulling of queue data on reconnect
        nvme-fc: remove reinit_request routine
        blk-mq: don't time out requests again that are in the timeout handler
        nvme-fc: change controllers first connect to use reconnect path
        nvme: don't rely on the changed namespace list log
        nvmet: free smart-log buffer after use
        nvme-rdma: fix error flow during mapping request data
        nvme: add bio remapping tracepoint
        nvme: fix NULL pointer dereference in nvme_init_subsystem
        blk-mq: reinit q->tag_set_list entry only after grace period
      265c5596
    • Linus Torvalds's avatar
      Merge tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental · 5e7b9212
      Linus Torvalds authored
      Pull documentation fixes from Mauro Carvalho Chehab:
       "This solves a series of broken links for files under Documentation,
        and improves a script meant to detect such broken links (see
        scripts/documentation-file-ref-check).
      
        The changes on this series are:
      
         - can.rst: fix a footnote reference;
      
         - crypto_engine.rst: Fix two parsing warnings;
      
         - Fix a lot of broken references to Documentation/*;
      
         - improve the scripts/documentation-file-ref-check script, in order
           to help detecting/fixing broken references, preventing
           false-positives.
      
        After this patch series, only 33 broken references to doc files are
        detected by scripts/documentation-file-ref-check"
      
      * tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental: (26 commits)
        fix a series of Documentation/ broken file name references
        Documentation: rstFlatTable.py: fix a broken reference
        ABI: sysfs-devices-system-cpu: remove a broken reference
        devicetree: fix a series of wrong file references
        devicetree: fix name of pinctrl-bindings.txt
        devicetree: fix some bindings file names
        MAINTAINERS: fix location of DT npcm files
        MAINTAINERS: fix location of some display DT bindings
        kernel-parameters.txt: fix pointers to sound parameters
        bindings: nvmem/zii: Fix location of nvmem.txt
        docs: Fix more broken references
        scripts/documentation-file-ref-check: check tools/*/Documentation
        scripts/documentation-file-ref-check: get rid of false-positives
        scripts/documentation-file-ref-check: hint: dash or underline
        scripts/documentation-file-ref-check: add a fix logic for DT
        scripts/documentation-file-ref-check: accept more wildcards at filenames
        scripts/documentation-file-ref-check: fix help message
        media: max2175: fix location of driver's companion documentation
        media: v4l: fix broken video4linux docs locations
        media: dvb: point to the location of the old README.dvb-usb file
        ...
      5e7b9212
    • Linus Torvalds's avatar
      Merge tag 'fsnotify_for_v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · dbb2816f
      Linus Torvalds authored
      Pull fsnotify updates from Jan Kara:
       "fsnotify cleanups unifying handling of different watch types.
      
        This is the shortened fsnotify series from Amir with the last five
        patches pulled out. Amir has modified those patches to not change
        struct inode but obviously it's too late for those to go into this
        merge window"
      
      * tag 'fsnotify_for_v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        fsnotify: add fsnotify_add_inode_mark() wrappers
        fanotify: generalize fanotify_should_send_event()
        fsnotify: generalize send_to_group()
        fsnotify: generalize iteration of marks by object type
        fsnotify: introduce marks iteration helpers
        fsnotify: remove redundant arguments to handle_event()
        fsnotify: use type id to identify connector object type
      dbb2816f
    • Linus Torvalds's avatar
      Merge tag 'fbdev-v4.18' of git://github.com/bzolnier/linux · 644f2639
      Linus Torvalds authored
      Pull fbdev updates from Bartlomiej Zolnierkiewicz:
       "There is nothing really major here, few small fixes, some cleanups and
        dead drivers removal:
      
         - mark omapfb drivers as orphans in MAINTAINERS file (Tomi Valkeinen)
      
         - add missing module license tags to omap/omapfb driver (Arnd
           Bergmann)
      
         - add missing GPIOLIB dependendy to omap2/omapfb driver (Arnd
           Bergmann)
      
         - convert savagefb, aty128fb & radeonfb drivers to use msleep & co.
           (Jia-Ju Bai)
      
         - allow COMPILE_TEST build for viafb driver (media part was reviewed
           by media subsystem Maintainer)
      
         - remove unused MERAM support from sh_mobile_lcdcfb and shmob-drm
           drivers (drm parts were acked by shmob-drm driver Maintainer)
      
         - remove unused auo_k190xfb drivers
      
         - misc cleanups (Souptick Joarder, Wolfram Sang, Markus Elfring, Andy
           Shevchenko, Colin Ian King)"
      
      * tag 'fbdev-v4.18' of git://github.com/bzolnier/linux: (26 commits)
        fb_omap2: add gpiolib dependency
        video/omap: add module license tags
        MAINTAINERS: make omapfb orphan
        video: fbdev: pxafb: match_string() conversion fixup
        video: fbdev: nvidia: fix spelling mistake: "scaleing" -> "scaling"
        video: fbdev: fix spelling mistake: "frambuffer" -> "framebuffer"
        video: fbdev: pxafb: Convert to use match_string() helper
        video: fbdev: via: allow COMPILE_TEST build
        video: fbdev: remove unused sh_mobile_meram driver
        drm: shmobile: remove unused MERAM support
        video: fbdev: sh_mobile_lcdcfb: remove unused MERAM support
        video: fbdev: remove unused auo_k190xfb drivers
        video: omap: Improve a size determination in omapfb_do_probe()
        video: sm501fb: Improve a size determination in sm501fb_probe()
        video: fbdev-MMP: Improve a size determination in path_init()
        video: fbdev-MMP: Delete an error message for a failed memory allocation in two functions
        video: auo_k190x: Delete an error message for a failed memory allocation in auok190x_common_probe()
        video: sh_mobile_lcdcfb: Delete an error message for a failed memory allocation in two functions
        video: sh_mobile_meram: Delete an error message for a failed memory allocation in sh_mobile_meram_probe()
        video: fbdev: sh_mobile_meram: Drop SUPERH platform dependency
        ...
      644f2639
  4. Jun 16, 2018