Skip to content
  1. Mar 15, 2017
  2. Mar 11, 2017
  3. Mar 10, 2017
    • Shaohua Li's avatar
      md/raid1/10: fix potential deadlock · 61eb2b43
      Shaohua Li authored
      
      
      Neil Brown pointed out a potential deadlock in raid 10 code with
      bio_split/chain. The raid1 code could have the same issue, but recent
      barrier rework makes it less likely to happen. The deadlock happens in
      below sequence:
      
      1. generic_make_request(bio), this will set current->bio_list
      2. raid10_make_request will split bio to bio1 and bio2
      3. __make_request(bio1), wait_barrer, add underlayer disk bio to
      current->bio_list
      4. __make_request(bio2), wait_barrer
      
      If raise_barrier happens between 3 & 4, since wait_barrier runs at 3,
      raise_barrier waits for IO completion from 3. And since raise_barrier
      sets barrier, 4 waits for raise_barrier. But IO from 3 can't be
      dispatched because raid10_make_request() doesn't finished yet.
      
      The solution is to adjust the IO ordering. Quotes from Neil:
      "
      It is much safer to:
      
          if (need to split) {
              split = bio_split(bio, ...)
              bio_chain(...)
              make_request_fn(split);
              generic_make_request(bio);
         } else
              make_request_fn(mddev, bio);
      
      This way we first process the initial section of the bio (in 'split')
      which will queue some requests to the underlying devices.  These
      requests will be queued in generic_make_request.
      Then we queue the remainder of the bio, which will be added to the end
      of the generic_make_request queue.
      Then we return.
      generic_make_request() will pop the lower-level device requests off the
      queue and handle them first.  Then it will process the remainder
      of the original bio once the first section has been fully processed.
      "
      
      Note, this only happens in read path. In write path, the bio is flushed to
      underlaying disks either by blk flush (from schedule) or offladed to raid1/10d.
      It's queued in current->bio_list.
      
      Cc: Coly Li <colyli@suse.de>
      Cc: stable@vger.kernel.org (v3.14+, only the raid10 part)
      Suggested-by: default avatarNeilBrown <neilb@suse.com>
      Reviewed-by: default avatarJack Wang <jinpu.wang@profitbricks.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      61eb2b43
    • NeilBrown's avatar
      md: don't impose the MD_SB_DISKS limit on arrays without metadata. · 1b3bae49
      NeilBrown authored
      
      
      These arrays, created with "mdadm --build" don't benefit from a limit.
      The default will be used, which is '0' and is interpreted as "don't
      impose a limit".
      
      Reported-by: default avatar <ian_bruce@mail.ru>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      1b3bae49
    • Guoqing Jiang's avatar
      md: move funcs from pers->resize to update_size · c9483634
      Guoqing Jiang authored
      
      
      raid1_resize and raid5_resize should also check the
      mddev->queue if run underneath dm-raid.
      
      And both set_capacity and revalidate_disk are used in
      pers->resize such as raid1, raid10 and raid5. So
      move them from personality file to common code.
      
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      c9483634
    • Guoqing Jiang's avatar
      md-cluster: remove useless memset from gather_all_resync_info · 75df023f
      Guoqing Jiang authored
      
      
      This memset is not needed.  The lvb is already zeroed because
      it was recently allocated by lockres_init, which uses kzalloc(),
      and read_resync_info() doesn't need it to be zero anyway.
      
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      75df023f
    • Guoqing Jiang's avatar
      md-cluster: free md_cluster_info if node leave cluster · 9c8043f3
      Guoqing Jiang authored
      
      
      To avoid memory leak, we need to free the cinfo which
      is allocated when node join cluster.
      
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      9c8043f3
    • Shaohua Li's avatar
      md: delete dead code · 99b3d74e
      Shaohua Li authored
      
      
      Nobody is using mddev_check_plugged(), so delete the dead code
      
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      99b3d74e
    • Shaohua Li's avatar
      md/raid10: submit bio directly to replacement disk · 6d399783
      Shaohua Li authored
      Commit 57c67df4
      
      (md/raid10: submit IO from originating thread instead of
      md thread) submits bio directly for normal disks but not for replacement
      disks. There is no point we shouldn't do this for replacement disks.
      
      Cc: NeilBrown <neilb@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      6d399783
  4. Mar 09, 2017
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ea6200e8
      Linus Torvalds authored
      Pull sched.h split-up fixes for MIPS from Ingo Molnar:
       "These are the fixes for MIPS build failures due to the sched.h
        split-up, from Arnd Bergmann"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        MIPS: Add missing include files
      ea6200e8
    • Tony Luck's avatar
      mm, page_alloc: Add missing check for memory holes · b4fb8f66
      Tony Luck authored
      Commit 13ad59df ("mm, page_alloc: avoid page_to_pfn() when merging
      buddies") moved the check for memory holes out of page_is_buddy() and
      had the callers do the check.
      
      But this wasn't done correctly in one place which caused ia64 to crash
      very early in boot.
      
      Update to fix that and make ia64 boot again.
      
      [ v2: Vlastimil pointed out we don't need to call page_to_pfn()
            since we already have the result of that in "buddy_pfn" ]
      
      Fixes: 13ad59df
      
       ("avoid page_to_pfn() when merging buddies")
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b4fb8f66
    • Linus Torvalds's avatar
      Merge tag 'ktest-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest · 8557b8e4
      Linus Torvalds authored
      Pull ktest fixes from Steven Rostedt:
       "Greg Kroah-Hartman reported to me that the ktest of v4.11-rc1 locked
        up in an infinite loop while doing the make mrproper.
      
        Looking into the cause I noticed that a recent update to the function
        run_command (used for running all shell commands, including "make
        mrproper") changed the internal loop to use the function
        wait_for_input.
      
        The wait_for_input function uses select to look at two file
        descriptors. One is the file descriptor of the command it is running,
        the other is STDIN. The STDIN check was not checking the return status
        of the sysread call, and was also just writing a lot of data into
        syswrite without regard to the size of the data read.
      
        Changing the code to check the return status of sysread, and also to
        still process the passed in descriptor data without looping back to
        the select fixed Greg's problem.
      
        While looking at this code I also realized that the loop did not honor
        the timeout if STDIN always had input (or for some reason return
        error). this could prevent wait_for_input to timeout on the file
        descriptor it is suppose to be waiting for. That is fixed too"
      
      * tag 'ktest-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
        ktest: Make sure wait_for_input does honor the timeout
        ktest: Fix while loop in wait_for_input
      8557b8e4
    • Linus Torvalds's avatar
      overlayfs: remove now unnecessary header file include · 04bb94b1
      Linus Torvalds authored
      This removes the extra include header file that was added in commit
      e58bc927
      
       "Pull overlayfs updates from Miklos Szeredi" now that it
      is no longer needed.
      
      There are probably other such includes that got added during the
      scheduler header splitup series, but this is the one that annoyed me
      personally and I know about.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      04bb94b1
    • Linus Torvalds's avatar
      sched/headers: fix up header file dependency on <linux/sched/signal.h> · bd0f9b35
      Linus Torvalds authored
      The scheduler header file split and cleanups ended up exposing a few
      nasty header file dependencies, and in particular it showed how we in
      <linux/wait.h> ended up depending on "signal_pending()", which now comes
      from <linux/sched/signal.h>.
      
      That's a very subtle and annoying dependency, which already caused a
      semantic merge conflict (see commit e58bc927
      
       "Pull overlayfs updates
      from Miklos Szeredi", which added that fixup in the merge commit).
      
      It turns out that we can avoid this dependency _and_ improve code
      generation by moving the guts of the fairly nasty helper #define
      __wait_event_interruptible_locked() to out-of-line code.  The code that
      includes the signal_pending() check is all in the slow-path where we
      actually go to sleep waiting for the event anyway, so using a helper
      function is the right thing to do.
      
      Using a helper function is also what we already did for the non-locked
      versions, see the "__wait_event*()" macros and the "prepare_to_wait*()"
      set of helper functions.
      
      We might want to try to unify all these macro games, we have a _lot_ of
      subtly different wait-event loops.  But this is the minimal patch to fix
      the annoying header dependency.
      
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bd0f9b35
  5. Mar 08, 2017