Skip to content
  1. Feb 17, 2010
    • Kiyoshi Ueda's avatar
      dm mpath: fix stall when requeueing io · 9eef87da
      Kiyoshi Ueda authored
      
      
      This patch fixes the problem that system may stall if target's ->map_rq
      returns DM_MAPIO_REQUEUE in map_request().
      E.g. stall happens on 1 CPU box when a dm-mpath device with queue_if_no_path
           bounces between all-paths-down and paths-up on I/O load.
      
      When target's ->map_rq returns DM_MAPIO_REQUEUE, map_request() requeues
      the request and returns to dm_request_fn().  Then, dm_request_fn()
      doesn't exit the I/O dispatching loop and continues processing
      the requeued request again.
      This map and requeue loop can be done with interrupt disabled,
      so 1 CPU system can be stalled if this situation happens.
      
      For example, commands below can stall my 1 CPU box within 1 minute or so:
        # dmsetup table mp
        mp: 0 2097152 multipath 1 queue_if_no_path 0 1 1 service-time 0 1 2 8:144 1 1
        # while true; do dd if=/dev/mapper/mp of=/dev/null bs=1M count=100; done &
        # while true; do \
        > dmsetup message mp 0 "fail_path 8:144" \
        > dmsetup suspend --noflush mp \
        > dmsetup resume mp \
        > dmsetup message mp 0 "reinstate_path 8:144" \
        > done
      
      To fix the problem above, this patch changes dm_request_fn() to exit
      the I/O dispatching loop once if a request is requeued in map_request().
      
      Signed-off-by: default avatarKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      9eef87da
    • Takahiro Yasui's avatar
      dm raid1: fix null pointer dereference in suspend · 558569aa
      Takahiro Yasui authored
      When suspending a failed mirror, bios are completed by mirror_end_io() and
      __rh_lookup() in dm_rh_dec() returns NULL where a non-NULL return value is
      required by design.  Fix this by not changing the state of the recovery failed
      region from DM_RH_RECOVERING to DM_RH_NOSYNC in dm_rh_recovery_end().
      
      Issue
      
      On 2.6.33-rc1 kernel, I hit the bug when I suspended the failed
      mirror by dmsetup command.
      
      BUG: unable to handle kernel NULL pointer dereference at 00000020
      IP: [<f94f38e2>] dm_rh_dec+0x35/0xa1 [dm_region_hash]
      ...
      EIP: 0060:[<f94f38e2>] EFLAGS: 00010046 CPU: 0
      EIP is at dm_rh_dec+0x35/0xa1 [dm_region_hash]
      EAX: 00000286 EBX: 00000000 ECX: 00000286 EDX: 00000000
      ESI: eff79eac EDI: eff79e80 EBP: f6915cd4 ESP: f6915cc4
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      Process dmsetup (pid: 2849, ti=f6914000 task=eff03e80 task.ti=f6914000)
       ...
      Call Trace:
       [<f9530af6>] ? mirror_end_io+0x53/0x1b1 [dm_mirror]
       [<f9413104>] ? clone_endio+0x4d/0xa2 [dm_mod]
       [<f9530aa3>] ? mirror_end_io+0x0/0x1b1 [dm_mirror]
       [<f94130b7>] ? clone_endio+0x0/0xa2 [dm_mod]
       [<c02d6bcb>] ? bio_endio+0x28/0x2b
       [<f952f303>] ? hold_bio+0x2d/0x62 [dm_mirror]
       [<f952f942>] ? mirror_presuspend+0xeb/0xf7 [dm_mirror]
       [<c02aa3e2>] ? vmap_page_range+0xb/0xd
       [<f9414c8d>] ? suspend_targets+0x2d/0x3b [dm_mod]
       [<f9414ca9>] ? dm_table_presuspend_targets+0xe/0x10 [dm_mod]
       [<f941456f>] ? dm_suspend+0x4d/0x150 [dm_mod]
       [<f941767d>] ? dev_suspend+0x55/0x18a [dm_mod]
       [<c0343762>] ? _copy_from_user+0x42/0x56
       [<f9417fb0>] ? dm_ctl_ioctl+0x22c/0x281 [dm_mod]
       [<f9417628>] ? dev_suspend+0x0/0x18a [dm_mod]
       [<f9417d84>] ? dm_ctl_ioctl+0x0/0x281 [dm_mod]
       [<c02c3c4b>] ? vfs_ioctl+0x22/0x85
       [<c02c422c>] ? do_vfs_ioctl+0x4cb/0x516
       [<c02c42b7>] ? sys_ioctl+0x40/0x5a
       [<c0202858>] ? sysenter_do_call+0x12/0x28
      
      Analysis
      
      When recovery process of a region failed, dm_rh_recovery_end() function
      changes the state of the region from RM_RH_RECOVERING to DM_RH_NOSYNC.
      When recovery_complete() is executed between dm_rh_update_states() and
      dm_writes() in do_mirror(), bios are processed with the region state,
      DM_RH_NOSYNC. However, the region data is freed without checking its
      pending count when dm_rh_update_states() is called next time.
      
      When bios are finished by mirror_end_io(), __rh_lookup() in dm_rh_dec()
      returns NULL even though a valid return value are expected.
      
      Solution
      
      Remove the state change of the recovery failed region from DM_RH_RECOVERING
      to DM_RH_NOSYNC in dm_rh_recovery_end(). We can remove the state change
      because:
      
        - If the region data has been released by dm_rh_update_states(),
          a new region data is created with the state of DM_RH_NOSYNC, and
          bios are processed according to the DM_RH_NOSYNC state.
      
        - If the region data has not been released by dm_rh_update_states(),
          a state of the region is DM_RH_RECOVERING and bios are put in the
          delayed_bio list.
      
      The flag change from DM_RH_RECOVERING to DM_RH_NOSYNC in dm_rh_recovery_end()
      was added in the following commit:
        dm raid1: handle resync failures
        author  Jonathan Brassow <jbrassow@redhat.com>
          Thu, 12 Jul 2007 16:29:04 +0000 (17:29 +0100)
        http://git.kernel.org/linus/f44db678edcc6f4c2779ac43f63f0b9dfa28b724
      
      
      
      Signed-off-by: default avatarTakahiro Yasui <tyasui@redhat.com>
      Reviewed-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      558569aa
    • Mikulas Patocka's avatar
      dm raid1: fail writes if errors are not handled and log fails · 5528d17d
      Mikulas Patocka authored
      If the mirror log fails when the handle_errors option was not selected
      and there is no remaining valid mirror leg, writes return success even
      though they weren't actually written to any device.  This patch
      completes them with EIO instead.
      
      This code path is taken:
      do_writes:
      	bio_list_merge(&ms->failures, &sync);
      do_failures:
      	if (!get_valid_mirror(ms)) (false)
      	else if (errors_handled(ms)) (false)
      	else bio_endio(bio, 0);
      
      The logic in do_failures is based on presuming that the write was already
      tried: if it succeeded at least on one leg (without handle_errors) it
      is reported as success.
      
      Reference: https://bugzilla.redhat.com/show_bug.cgi?id=555197
      
      
      
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      5528d17d
    • Jonathan Brassow's avatar
      dm log: userspace fix overhead_size calcuations · ebfd32bb
      Jonathan Brassow authored
      
      
      This patch fixes two bugs that revolve around the miscalculation and
      misuse of the variable 'overhead_size'.  'overhead_size' is the size of
      the various header structures used during communication.
      
      The first bug is the use of 'sizeof' with the pointer of a structure
      instead of the structure itself - resulting in the wrong size being
      computed.  This is then used in a check to see if the payload
      (data_size) would be to large for the preallocated structure.  Since the
      bug produces a smaller value for the overhead, it was possible for the
      structure to be breached.  (Although the current users of the code do
      not currently send enough data to trigger this bug.)
      
      The second bug is that the 'overhead_size' value is used to compute how
      much of the preallocated space should be cleared before populating it
      with fresh data.  This should have simply been 'sizeof(struct cn_msg)'
      not overhead_size.  The fact that 'overhead_size' was computed
      incorrectly made this problem "less bad" - leaving only a pointer's
      worth of space at the end uncleared.  Thus, this bug was never producing
      a bad result, but still needs to be fixed - especially now that the
      value is computed correctly.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      ebfd32bb
    • Mike Snitzer's avatar
      dm snapshot: persistent annotate work_queue as on stack · 55f67f2d
      Mike Snitzer authored
      
      
      chunk_io() declares its 'struct mdata_req' on the stack and then
      initializes its 'struct work_struct' member.  Annotate the
      initialization of this workqueue with INIT_WORK_ON_STACK to suppress a
      debugobjects warning seen when CONFIG_DEBUG_OBJECTS_WORK is enabled.
      
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      55f67f2d
    • Nikanth Karthikesan's avatar
      dm stripe: avoid divide by zero with invalid stripe count · 781248c1
      Nikanth Karthikesan authored
      
      
      If a table containing zero as stripe count is passed into stripe_ctr
      the code attempts to divide by zero.
      
      This patch changes DM_TABLE_LOAD to return -EINVAL if the stripe count
      is zero.
      
      We now get the following error messages:
        device-mapper: table: 253:0: striped: Invalid stripe count
        device-mapper: ioctl: error adding target to table
      
      Signed-off-by: default avatarNikanth Karthikesan <knikanth@suse.de>
      Cc: stable@kernel.org
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      781248c1
  2. Feb 16, 2010
  3. Feb 15, 2010
  4. Feb 14, 2010
    • Clemens Ladisch's avatar
      firewire: ohci: retransmit isochronous transmit packets on cycle loss · 7f51a100
      Clemens Ladisch authored
      
      
      In isochronous transmit DMA descriptors, link the skip address pointer
      back to the descriptor itself.  When a cycle is lost, the controller
      will send the packet in the next cycle, instead of terminating the
      entire DMA program.
      
      There are two reasons for this:
      
      * This behaviour is compatible with the old IEEE1394 stack.  Old
        applications would not expect the DMA program to stop in this case.
      
      * Since the OHCI driver does not report any uncompleted packets, the
        context would stop silently; clients would not have any chance to
        detect and handle this error without a watchdog timer.
      
      Signed-off-by: default avatarClemens Ladisch <clemens@ladisch.de>
      
      Pieter Palmers notes:
      
      "The reason I added this retry behavior to the old stack is because some
      cards now and then fail to send a packet (e.g. the o2micro card in my
      dell laptop).  I couldn't figure out why exactly this happens, my best
      guess is that the card cannot fetch the payload data on time.  This
      happens much more frequently when sending large packets, which leads me
      to suspect that there are some contention issues with the DMA that fills
      the transmit FIFO.
      
      In the old stack it was a pretty critical issue as it resulted in a
      freeze of the userspace application.
      
      The omission of a packet doesn't necessarily have to be an issue.  E.g.
      in IEC61883 streams the DBC field can be used to detect discontinuities
      in the stream.  So as long as the other side doesn't bail when no
      [packet] is present in a cycle, there is not really a problem.
      
      I'm not convinced though that retrying is the proper solution, but it is
      simple and effective for what it had to do.  And I think there are no
      reasons not to do it this way.  Userspace can still detect this by
      checking the cycle the descriptor was sent in."
      
      Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (changelog, comment)
      7f51a100
    • Kirill Smelkov's avatar
      perf top: Fix help text alignment · 1a72cfa6
      Kirill Smelkov authored
      
      
      Print this:
      
      Mapped keys:
              [d]     display refresh delay.                  (2)
              [e]     display entries (lines).                (46)
              [f]     profile display filter (count).         (5)
              [F]     annotate display filter (percent).      (5%)
              [s]     annotate symbol.                        (NULL)
              [S]     stop annotation.
              [K]     hide kernel_symbols symbols.            (no)
              [U]     hide user symbols.                      (no)
              [z]     toggle sample zeroing.                  (0)
              [qQ]    quit.
      
      instead of:
      
      Mapped keys:
              [d]     display refresh delay.                  (2)
              [e]     display entries (lines).                (46)
              [f]     profile display filter (count).         (5)
              [F]     annotate display filter (percent).      (5%)
              [s]     annotate symbol.                        (NULL)
              [S]     stop annotation.
              [K]     hide kernel_symbols symbols.                    (no)
              [U]     hide user symbols.                      (no)
              [z]     toggle sample zeroing.                  (0)
              [qQ]    quit.
      
      Signed-off-by: default avatarKirill Smelkov <kirr@landau.phys.spbu.ru>
      Acked-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20100212162059.GA30041@landau.phys.spbu.ru>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1a72cfa6
    • Heiko Carstens's avatar
      tracing/kprobes: Fix probe parsing · a9bb18f3
      Heiko Carstens authored
      
      
      Trying to add a probe like:
      
        echo p:myprobe 0x10000 > /sys/kernel/debug/tracing/kprobe_events
      
      will fail since the wrong pointer is passed to strict_strtoul
      when trying to convert the address to an unsigned long.
      
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <20100210162346.GA6933@osiris.boeblingen.de.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a9bb18f3
  5. Feb 13, 2010
  6. Feb 12, 2010