Skip to content
  1. Feb 11, 2021
    • Nikos Tsironis's avatar
      dm era: only resize metadata in preresume · cca2c6ae
      Nikos Tsironis authored
      Metadata resize shouldn't happen in the ctr. The ctr loads a temporary
      (inactive) table that will only become active upon resume. That is why
      resize should always be done in terms of resume. Otherwise a load (ctr)
      whose inactive table never becomes active will incorrectly resize the
      metadata.
      
      Also, perform the resize directly in preresume, instead of using the
      worker to do it.
      
      The worker might run other metadata operations, e.g., it could start
      digestion, before resizing the metadata. These operations will end up
      using the old size.
      
      This could lead to errors, like:
      
        device-mapper: era: metadata_digest_transcribe_writeset: dm_array_set_value failed
        device-mapper: era: process_old_eras: digest step failed, stopping digestion
      
      The reason of the above error is that the worker started the digestion
      of the archived writeset using the old, larger size.
      
      As a result, metadata_digest_transcribe_writeset tried to write beyond
      the end of the era array.
      
      Fixes: eec40579
      
       ("dm: add era target")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      cca2c6ae
    • Nikos Tsironis's avatar
      dm era: Use correct value size in equality function of writeset tree · 64f2d15a
      Nikos Tsironis authored
      Fix the writeset tree equality test function to use the right value size
      when comparing two btree values.
      
      Fixes: eec40579
      
       ("dm: add era target")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
      Reviewed-by: default avatarMing-Hung Tsai <mtsai@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      64f2d15a
    • Nikos Tsironis's avatar
      dm era: Fix bitset memory leaks · 904e6b26
      Nikos Tsironis authored
      Deallocate the memory allocated for the in-core bitsets when destroying
      the target and in error paths.
      
      Fixes: eec40579
      
       ("dm: add era target")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
      Reviewed-by: default avatarMing-Hung Tsai <mtsai@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      904e6b26
    • Nikos Tsironis's avatar
      dm era: Verify the data block size hasn't changed · c8e846ff
      Nikos Tsironis authored
      dm-era doesn't support changing the data block size of existing devices,
      so check explicitly that the requested block size for a new target
      matches the one stored in the metadata.
      
      Fixes: eec40579
      
       ("dm: add era target")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
      Reviewed-by: default avatarMing-Hung Tsai <mtsai@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      c8e846ff
    • Nikos Tsironis's avatar
      dm era: Reinitialize bitset cache before digesting a new writeset · 25249333
      Nikos Tsironis authored
      In case of devices with at most 64 blocks, the digestion of consecutive
      eras uses the writeset of the first era as the writeset of all eras to
      digest, leading to lost writes. That is, we lose the information about
      what blocks were written during the affected eras.
      
      The digestion code uses a dm_disk_bitset object to access the archived
      writesets. This structure includes a one word (64-bit) cache to reduce
      the number of array lookups.
      
      This structure is initialized only once, in metadata_digest_start(),
      when we kick off digestion.
      
      But, when we insert a new writeset into the writeset tree, before the
      digestion of the previous writeset is done, or equivalently when there
      are multiple writesets in the writeset tree to digest, then all these
      writesets are digested using the same cache and the cache is not
      re-initialized when moving from one writeset to the next.
      
      For devices with more than 64 blocks, i.e., the size of the cache, the
      cache is indirectly invalidated when we move to a next set of blocks, so
      we avoid the bug.
      
      But for devices with at most 64 blocks we end up using the same cached
      data for digesting all archived writesets, i.e., the cache is loaded
      when digesting the first writeset and it never gets reloaded, until the
      digestion is done.
      
      As a result, the writeset of the first era to digest is used as the
      writeset of all the following archived eras, leading to lost writes.
      
      Fix this by reinitializing the dm_disk_bitset structure, and thus
      invalidating the cache, every time the digestion code starts digesting a
      new writeset.
      
      Fixes: eec40579
      
       ("dm: add era target")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      25249333
    • Nikos Tsironis's avatar
      dm era: Update in-core bitset after committing the metadata · 2099b145
      Nikos Tsironis authored
      In case of a system crash, dm-era might fail to mark blocks as written
      in its metadata, although the corresponding writes to these blocks were
      passed down to the origin device and completed successfully.
      
      Consider the following sequence of events:
      
      1. We write to a block that has not been yet written in the current era
      2. era_map() checks the in-core bitmap for the current era and sees
         that the block is not marked as written.
      3. The write is deferred for submission after the metadata have been
         updated and committed.
      4. The worker thread processes the deferred write
         (process_deferred_bios()) and marks the block as written in the
         in-core bitmap, **before** committing the metadata.
      5. The worker thread starts committing the metadata.
      6. We do more writes that map to the same block as the write of step (1)
      7. era_map() checks the in-core bitmap and sees that the block is marked
         as written, **although the metadata have not been committed yet**.
      8. These writes are passed down to the origin device immediately and the
         device reports them as completed.
      9. The system crashes, e.g., power failure, before the commit from step
         (5) finishes.
      
      When the system recovers and we query the dm-era target for the list of
      written blocks it doesn't report the aforementioned block as written,
      although the writes of step (6) completed successfully.
      
      The issue is that era_map() decides whether to defer or not a write
      based on non committed information. The root cause of the bug is that we
      update the in-core bitmap, **before** committing the metadata.
      
      Fix this by updating the in-core bitmap **after** successfully
      committing the metadata.
      
      Fixes: eec40579
      
       ("dm: add era target")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      2099b145
    • Nikos Tsironis's avatar
      dm era: Recover committed writeset after crash · de89afc1
      Nikos Tsironis authored
      Following a system crash, dm-era fails to recover the committed writeset
      for the current era, leading to lost writes. That is, we lose the
      information about what blocks were written during the affected era.
      
      dm-era assumes that the writeset of the current era is archived when the
      device is suspended. So, when resuming the device, it just moves on to
      the next era, ignoring the committed writeset.
      
      This assumption holds when the device is properly shut down. But, when
      the system crashes, the code that suspends the target never runs, so the
      writeset for the current era is not archived.
      
      There are three issues that cause the committed writeset to get lost:
      
      1. dm-era doesn't load the committed writeset when opening the metadata
      2. The code that resizes the metadata wipes the information about the
         committed writeset (assuming it was loaded at step 1)
      3. era_preresume() starts a new era, without taking into account that
         the current era might not have been archived, due to a system crash.
      
      To fix this:
      
      1. Load the committed writeset when opening the metadata
      2. Fix the code that resizes the metadata to make sure it doesn't wipe
         the loaded writeset
      3. Fix era_preresume() to check for a loaded writeset and archive it,
         before starting a new era.
      
      Fixes: eec40579
      
       ("dm: add era target")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      de89afc1
  2. Feb 10, 2021
  3. Feb 09, 2021
  4. Feb 03, 2021
  5. Feb 02, 2021
  6. Jan 29, 2021
  7. Jan 28, 2021