Skip to content
  1. Dec 11, 2013
    • Joe Thornber's avatar
      dm thin: switch to read only mode if a mapping insert fails · fafc7a81
      Joe Thornber authored
      
      
      Switch the thin pool to read-only mode when dm_thin_insert_block() fails
      since there is little reason to expect the cause of the failure to be
      resolved without further action by user space.
      
      This issue was noticed with the device-mapper-test-suite using:
      dmtest run --suite thin-provisioning -n /exhausting_metadata_space_causes_fail_mode/
      
      The quantity of errors logged in this case must be reduced.
      
      before patch:
      
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map metadata: unable to allocate new metadata block
      <snip ... these repeat for a long while ... >
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map common: dm_tm_shadow_block() failed
      device-mapper: thin: 253:4: no free metadata space available.
      device-mapper: thin: 253:4: switching pool to read-only mode
      
      after patch:
      
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: 253:4: dm_thin_insert_block() failed: error = -28
      device-mapper: thin: 253:4: switching pool to read-only mode
      
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      fafc7a81
    • Mike Snitzer's avatar
      dm space map metadata: return on failure in sm_metadata_new_block · f62b6b8f
      Mike Snitzer authored
      Commit 2fc48021
      
       ("dm persistent
      metadata: add space map threshold callback") introduced a regression
      to the metadata block allocation path that resulted in errors being
      ignored.  This regression was uncovered by running the following
      device-mapper-test-suite test:
      dmtest run --suite thin-provisioning -n /exhausting_metadata_space_causes_fail_mode/
      
      The ignored error codes in sm_metadata_new_block() could crash the
      kernel through use of either the dm-thin or dm-cache targets, e.g.:
      
      device-mapper: thin: 253:4: reached low water mark for metadata device: sending event.
      device-mapper: space map metadata: unable to allocate new metadata block
      general protection fault: 0000 [#1] SMP
      ...
      Workqueue: dm-thin do_worker [dm_thin_pool]
      task: ffff880035ce2ab0 ti: ffff88021a054000 task.ti: ffff88021a054000
      RIP: 0010:[<ffffffffa0331385>]  [<ffffffffa0331385>] metadata_ll_load_ie+0x15/0x30 [dm_persistent_data]
      RSP: 0018:ffff88021a055a68  EFLAGS: 00010202
      RAX: 003fc8243d212ba0 RBX: ffff88021a780070 RCX: ffff88021a055a78
      RDX: ffff88021a055a78 RSI: 0040402222a92a80 RDI: ffff88021a780070
      RBP: ffff88021a055a68 R08: ffff88021a055ba4 R09: 0000000000000010
      R10: 0000000000000000 R11: 00000002a02e1000 R12: ffff88021a055ad4
      R13: 0000000000000598 R14: ffffffffa0338470 R15: ffff88021a055ba4
      FS:  0000000000000000(0000) GS:ffff88033fca0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00007f467c0291b8 CR3: 0000000001a0b000 CR4: 00000000000007e0
      Stack:
       ffff88021a055ab8 ffffffffa0332020 ffff88021a055b30 0000000000000001
       ffff88021a055b30 0000000000000000 ffff88021a055b18 0000000000000000
       ffff88021a055ba4 ffff88021a055b98 ffff88021a055ae8 ffffffffa033304c
      Call Trace:
       [<ffffffffa0332020>] sm_ll_lookup_bitmap+0x40/0xa0 [dm_persistent_data]
       [<ffffffffa033304c>] sm_metadata_count_is_more_than_one+0x8c/0xc0 [dm_persistent_data]
       [<ffffffffa0333825>] dm_tm_shadow_block+0x65/0x110 [dm_persistent_data]
       [<ffffffffa0331b00>] sm_ll_mutate+0x80/0x300 [dm_persistent_data]
       [<ffffffffa0330e60>] ? set_ref_count+0x10/0x10 [dm_persistent_data]
       [<ffffffffa0331dba>] sm_ll_inc+0x1a/0x20 [dm_persistent_data]
       [<ffffffffa0332270>] sm_disk_new_block+0x60/0x80 [dm_persistent_data]
       [<ffffffff81520036>] ? down_write+0x16/0x40
       [<ffffffffa001e5c4>] dm_pool_alloc_data_block+0x54/0x80 [dm_thin_pool]
       [<ffffffffa001b23c>] alloc_data_block+0x9c/0x130 [dm_thin_pool]
       [<ffffffffa001c27e>] provision_block+0x4e/0x180 [dm_thin_pool]
       [<ffffffffa001fe9a>] ? dm_thin_find_block+0x6a/0x110 [dm_thin_pool]
       [<ffffffffa001c57a>] process_bio+0x1ca/0x1f0 [dm_thin_pool]
       [<ffffffff8111e2ed>] ? mempool_free+0x8d/0xa0
       [<ffffffffa001d755>] process_deferred_bios+0xc5/0x230 [dm_thin_pool]
       [<ffffffffa001d911>] do_worker+0x51/0x60 [dm_thin_pool]
       [<ffffffff81067872>] process_one_work+0x182/0x3b0
       [<ffffffff81068c90>] worker_thread+0x120/0x3a0
       [<ffffffff81068b70>] ? manage_workers+0x160/0x160
       [<ffffffff8106eb2e>] kthread+0xce/0xe0
       [<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70
       [<ffffffff8152af6c>] ret_from_fork+0x7c/0xb0
       [<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70
       [<ffffffff8152af6c>] ret_from_fork+0x7c/0xb0
       [<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70
      
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Cc: stable@vger.kernel.org # v3.10+
      f62b6b8f
    • Mikulas Patocka's avatar
      dm table: fail dm_table_create on dm_round_up overflow · 5b2d0657
      Mikulas Patocka authored
      
      
      The dm_round_up function may overflow to zero.  In this case,
      dm_table_create() must fail rather than go on to allocate an empty array
      with alloc_targets().
      
      This fixes a possible memory corruption that could be caused by passing
      too large a number in "param->target_count".
      
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      5b2d0657
    • Mikulas Patocka's avatar
      dm snapshot: avoid snapshot space leak on crash · 230c83af
      Mikulas Patocka authored
      
      
      There is a possible leak of snapshot space in case of crash.
      
      The reason for space leaking is that chunks in the snapshot device are
      allocated sequentially, but they are finished (and stored in the metadata)
      out of order, depending on the order in which copying finished.
      
      For example, supposed that the metadata contains the following records
      SUPERBLOCK
      METADATA (blocks 0 ... 250)
      DATA 0
      DATA 1
      DATA 2
      ...
      DATA 250
      
      Now suppose that you allocate 10 new data blocks 251-260. Suppose that
      copying of these blocks finish out of order (block 260 finished first
      and the block 251 finished last). Now, the snapshot device looks like
      this:
      SUPERBLOCK
      METADATA (blocks 0 ... 250, 260, 259, 258, 257, 256)
      DATA 0
      DATA 1
      DATA 2
      ...
      DATA 250
      DATA 251
      DATA 252
      DATA 253
      DATA 254
      DATA 255
      METADATA (blocks 255, 254, 253, 252, 251)
      DATA 256
      DATA 257
      DATA 258
      DATA 259
      DATA 260
      
      Now, if the machine crashes after writing the first metadata block but
      before writing the second metadata block, the space for areas DATA 250-255
      is leaked, it contains no valid data and it will never be used in the
      future.
      
      This patch makes dm-snapshot complete exceptions in the same order they
      were allocated, thus fixing this bug.
      
      Note: when backporting this patch to the stable kernel, change the version
      field in the following way:
      * if version in the stable kernel is {1, 11, 1}, change it to {1, 12, 0}
      * if version in the stable kernel is {1, 10, 0} or {1, 10, 1}, change it
        to {1, 10, 2}
      Userspace reads the version to determine if the bug was fixed, so the
      version change is needed.
      
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      230c83af
  2. Nov 19, 2013
  3. Nov 13, 2013
  4. Nov 12, 2013
    • Joe Thornber's avatar
      dm cache: add cache block invalidation support · 65790ff9
      Joe Thornber authored
      
      
      Cache block invalidation is removing an entry from the cache without
      writing it back.  Cache blocks can be invalidated via the
      'invalidate_cblocks' message, which takes an arbitrary number of cblock
      ranges:
         invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]*
      
      E.g.
         dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789
      
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      65790ff9
    • Joe Thornber's avatar
      dm cache: add remove_cblock method to policy interface · 532906aa
      Joe Thornber authored
      
      
      Implement policy_remove_cblock() and add remove_cblock method to the mq
      policy.  These methods will be used by the following cache block
      invalidation patch which adds the 'invalidate_cblocks' message to the
      cache core.
      
      Also, update some comments in dm-cache-policy.h
      
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      532906aa
    • Joe Thornber's avatar
      dm cache policy mq: reduce memory requirements · 633618e3
      Joe Thornber authored
      
      
      Rather than storing the cblock in each cache entry, we allocate all
      entries in an array and infer the cblock from the entry position.
      
      Saves 4 bytes of memory per cache block.  In addition, this gives us an
      easy way of looking up cache entries by cblock.
      
      We no longer need to keep an explicit bitset to track which cblocks
      have been allocated.  And no searching is needed to find free cblocks.
      
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      633618e3
    • Joe Thornber's avatar
      dm cache metadata: check the metadata version when reading the superblock · 53d49819
      Joe Thornber authored
      
      
      Need to check the version to verify on-disk metadata is supported.
      
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      53d49819
    • Joe Thornber's avatar
      dm cache: add passthrough mode · 2ee57d58
      Joe Thornber authored
      "Passthrough" is a dm-cache operating mode (like writethrough or
      writeback) which is intended to be used when the cache contents are not
      known to be coherent with the origin device.  It behaves as follows:
      
      * All reads are served from the origin device (all reads miss the cache)
      * All writes are forwarded to the origin device; additionally, write
        hits cause cache block invalidates
      
      This mode decouples cache coherency checks from cache device creation,
      largely to avoid having to perform coherency checks while booting.  Boot
      scripts can create cache devices in passthrough mode and put them into
      service (mount cached filesystems, for example) without having to worry
      about coherency.  Coherency that exists is maintained, although the
      cache will gradually cool as writes take place.
      
      Later, applications can perform coherency checks, the nature of which
      will depend on the type of the underlying storage.  If coherency can be
      verified, the cache device can be transitioned to writet...
      2ee57d58
    • Joe Thornber's avatar
      dm cache: cache shrinking support · f494a9c6
      Joe Thornber authored
      
      
      Allow a cache to shrink if the blocks being removed from the cache are
      not dirty.
      
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      f494a9c6
  5. Nov 10, 2013
  6. Nov 06, 2013
    • Joe Thornber's avatar
      dm array: fix bug in growing array · 9c1d4de5
      Joe Thornber authored
      
      
      Entries would be lost if the old tail block was partially filled.
      
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.9+
      9c1d4de5
    • Hannes Reinecke's avatar
      dm mpath: requeue I/O during pg_init · b63349a7
      Hannes Reinecke authored
      
      
      When pg_init is running no I/O can be submitted to the underlying
      devices, as the path priority etc might change.  When using queue_io for
      this, requests will be piling up within multipath as the block I/O
      scheduler just sees a _very fast_ device.  All of this queued I/O has to
      be resubmitted from within multipathing once pg_init is done.
      
      This approach has the problem that it's virtually impossible to
      abort I/O when pg_init is running, and we're adding heavy load
      to the devices after pg_init since all of the queued I/O needs to be
      resubmitted _before_ any requests can be pulled off of the request queue
      and normal operation continues.
      
      This patch will requeue the I/O that triggers the pg_init call, and
      return 'busy' when pg_init is in progress.  With these changes the block
      I/O scheduler will stop submitting I/O during pg_init, resulting in a
      quicker path switch and less I/O pressure (and memory consumption) after
      pg_init.
      
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      [patch header edited for clarity and typos by Mike Snitzer]
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      b63349a7
  7. Nov 01, 2013
    • Shiva Krishna Merla's avatar
      dm mpath: fix race condition between multipath_dtr and pg_init_done · 954a73d5
      Shiva Krishna Merla authored
      
      
      Whenever multipath_dtr() is happening we must prevent queueing any
      further path activation work.  Implement this by adding a new
      'pg_init_disabled' flag to the multipath structure that denotes future
      path activation work should be skipped if it is set.  By disabling
      pg_init and then re-enabling in flush_multipath_work() we also avoid the
      potential for pg_init to be initiated while suspending an mpath device.
      
      Without this patch a race condition exists that may result in a kernel
      panic:
      
      1) If after pg_init_done() decrements pg_init_in_progress to 0, a call
         to wait_for_pg_init_completion() assumes there are no more pending path
         management commands.
      2) If pg_init_required is set by pg_init_done(), due to retryable
         mode_select errors, then process_queued_ios() will again queue the
         path activation work.
      3) If free_multipath() completes before activate_path() work is called a
         NULL pointer dereference like the following can be seen when
         accessing members of the recently destructed multipath:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
      RIP: 0010:[<ffffffffa003db1b>]  [<ffffffffa003db1b>] activate_path+0x1b/0x30 [dm_multipath]
      [<ffffffff81090ac0>] worker_thread+0x170/0x2a0
      [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40
      
      [switch to disabling pg_init in flush_multipath_work & header edits by Mike Snitzer]
      Signed-off-by: default avatarShiva Krishna Merla <shivakrishna.merla@netapp.com>
      Reviewed-by: default avatarKrishnasamy Somasundaram <somasundaram.krishnasamy@netapp.com>
      Tested-by: default avatarSpeagle Andy <Andy.Speagle@netapp.com>
      Acked-by: default avatarJunichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      954a73d5
    • Mikulas Patocka's avatar
      dm: allocate buffer for messages with small number of arguments using GFP_NOIO · f36afb39
      Mikulas Patocka authored
      
      
      dm-mpath and dm-thin must process messages even if some device is
      suspended, so we allocate argv buffer with GFP_NOIO. These messages have
      a small fixed number of arguments.
      
      On the other hand, dm-switch needs to process bulk data using messages
      so excessive use of GFP_NOIO could cause trouble.
      
      The patch also lowers the default number of arguments from 64 to 8, so
      that there is smaller load on GFP_NOIO allocations.
      
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Acked-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      f36afb39
  8. Oct 14, 2013