Skip to content
  1. Jun 13, 2013
  2. May 18, 2013
    • Alex Elder's avatar
      rbd: fix cleanup in rbd_add() · 3abef3b3
      Alex Elder authored
      Bjorn Helgaas pointed out that a recent commit introduced a
      use-after-free condition in an error path for rbd_add().
      He correctly stated:
      
          I think b536f69a
      
       "rbd: set up devices only for mapped images"
          introduced a use-after-free error in rbd_add():
      	...
          If rbd_dev_device_setup() returns an error, we call
          rbd_dev_image_release(), which ultimately kfrees rbd_dev.
          Then we call rbd_dev_destroy(), which references fields in
          the already-freed rbd_dev struct before kfreeing it again.
      
      The simple fix is to return the error code after the call to
      rbd_dev_image_release().
      
      Closer examination revealed that there's no need to clean up
      rbd_opts in that function, so fix that too.
      
      Update some other comments that have also become out of date.
      
      Reported-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      3abef3b3
    • Alex Elder's avatar
      rbd: don't destroy ceph_opts in rbd_add() · 7262cfca
      Alex Elder authored
      
      
      Whether rbd_client_create() successfully creates a new client or
      not, it takes responsibility for getting the ceph_opts structure
      it's passed destroyed.  If successful, the structure becomes
      associated with the created client; if not, rbd_client_create()
      will destroy it.
      
      Previously, rbd_get_client() would call ceph_destroy_options()
      if rbd_get_client() failed, and that meant it got called twice.
      That led freeing various pointers more than once, which is never a
      good idea.
      
      This resolves:
          http://tracker.ceph.com/issues/4559
      
      Cc: stable@vger.kernel.org # 3.8+
      Reported-by: default avatarDan van der Ster <dan@vanderster.com>
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      7262cfca
    • Jim Schutt's avatar
      ceph: ceph_pagelist_append might sleep while atomic · 39be95e9
      Jim Schutt authored
      
      
      Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc()
      while holding a lock, but it's spoiled because ceph_pagelist_addpage()
      always calls kmap(), which might sleep.  Here's the result:
      
      [13439.295457] ceph: mds0 reconnect start
      [13439.300572] BUG: sleeping function called from invalid context at include/linux/highmem.h:58
      [13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: kworker/1:1
          . . .
      [13439.376225] Call Trace:
      [13439.378757]  [<ffffffff81076f4c>] __might_sleep+0xfc/0x110
      [13439.384353]  [<ffffffffa03f4ce0>] ceph_pagelist_append+0x120/0x1b0 [libceph]
      [13439.391491]  [<ffffffffa0448fe9>] ceph_encode_locks+0x89/0x190 [ceph]
      [13439.398035]  [<ffffffff814ee849>] ? _raw_spin_lock+0x49/0x50
      [13439.403775]  [<ffffffff811cadf5>] ? lock_flocks+0x15/0x20
      [13439.409277]  [<ffffffffa045e2af>] encode_caps_cb+0x41f/0x4a0 [ceph]
      [13439.415622]  [<ffffffff81196748>] ? igrab+0x28/0x70
      [13439.420610]  [<ffffffffa045e9f8>] ? iterate_session_caps+0xe8/0x250 [ceph]
      [13439.427584]  [<ffffffffa045ea25>] iterate_session_caps+0x115/0x250 [ceph]
      [13439.434499]  [<ffffffffa045de90>] ? set_request_path_attr+0x2d0/0x2d0 [ceph]
      [13439.441646]  [<ffffffffa0462888>] send_mds_reconnect+0x238/0x450 [ceph]
      [13439.448363]  [<ffffffffa0464542>] ? ceph_mdsmap_decode+0x5e2/0x770 [ceph]
      [13439.455250]  [<ffffffffa0462e42>] check_new_map+0x352/0x500 [ceph]
      [13439.461534]  [<ffffffffa04631ad>] ceph_mdsc_handle_map+0x1bd/0x260 [ceph]
      [13439.468432]  [<ffffffff814ebc7e>] ? mutex_unlock+0xe/0x10
      [13439.473934]  [<ffffffffa043c612>] extra_mon_dispatch+0x22/0x30 [ceph]
      [13439.480464]  [<ffffffffa03f6c2c>] dispatch+0xbc/0x110 [libceph]
      [13439.486492]  [<ffffffffa03eec3d>] process_message+0x1ad/0x1d0 [libceph]
      [13439.493190]  [<ffffffffa03f1498>] ? read_partial_message+0x3e8/0x520 [libceph]
          . . .
      [13439.587132] ceph: mds0 reconnect success
      [13490.720032] ceph: mds0 caps stale
      [13501.235257] ceph: mds0 recovery completed
      [13501.300419] ceph: mds0 caps renewed
      
      Fix it up by encoding locks into a buffer first, and when the number
      of encoded locks is stable, copy that into a ceph_pagelist.
      
      [elder@inktank.com: abbreviated the stack info a bit.]
      
      Cc: stable@vger.kernel.org # 3.4+
      Signed-off-by: default avatarJim Schutt <jaschut@sandia.gov>
      Reviewed-by: default avatarAlex Elder <elder@inktank.com>
      39be95e9
    • Jim Schutt's avatar
      ceph: add cpu_to_le32() calls when encoding a reconnect capability · c420276a
      Jim Schutt authored
      
      
      In his review, Alex Elder mentioned that he hadn't checked that
      num_fcntl_locks and num_flock_locks were properly decoded on the
      server side, from a le32 over-the-wire type to a cpu type.
      I checked, and AFAICS it is done; those interested can consult
          Locker::_do_cap_update()
      in src/mds/Locker.cc and src/include/encoding.h in the Ceph server
      code (git://github.com/ceph/ceph).
      
      I also checked the server side for flock_len decoding, and I believe
      that also happens correctly, by virtue of having been declared
      __le32 in struct ceph_mds_cap_reconnect, in src/include/ceph_fs.h.
      
      Cc: stable@vger.kernel.org # 3.4+
      Signed-off-by: default avatarJim Schutt <jaschut@sandia.gov>
      Reviewed-by: default avatarAlex Elder <elder@inktank.com>
      c420276a
    • Alex Elder's avatar
      libceph: must hold mutex for reset_changed_osds() · 14d2f38d
      Alex Elder authored
      
      
      An osd client has a red-black tree describing its osds, and
      occasionally we would get crashes due to one of these trees tree
      becoming corrupt somehow.
      
      The problem turned out to be that reset_changed_osds() was being
      called without protection of the osd client request mutex.  That
      function would call __reset_osd() for any osd that had changed, and
      __reset_osd() would call __remove_osd() for any osd with no
      outstanding requests, and finally __remove_osd() would remove the
      corresponding entry from the red-black tree.  Thus, the tree was
      getting modified without having any lock protection, and was
      vulnerable to problems due to concurrent updates.
      
      This appears to be the only osd tree updating path that has this
      problem.  It can be fairly easily fixed by moving the call up
      a few lines, to just before the request mutex gets dropped
      in kick_requests().
      
      This resolves:
          http://tracker.ceph.com/issues/5043
      
      Cc: stable@vger.kernel.org # 3.4+
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      14d2f38d
  3. May 14, 2013
    • Alex Elder's avatar
      rbd: re-submit flattened write request (part 2) · 638f5abe
      Alex Elder authored
      
      
      Add code to rbd_img_obj_exists_callback() to detect when a clone's
      parent image has disappeared, and re-submit the original write
      request in that case.
      
      Kill off some redundant assertions.
      
      This completes the resolution for:
          http://tracker.ceph.com/issues/3763
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      638f5abe
    • Alex Elder's avatar
      rbd: re-submit write request for flattened clone · bbea1c1a
      Alex Elder authored
      
      
      Add code to rbd_img_parent_read_full_callback() to detect when a
      clone's parent image has disappeared, and re-submit the original
      write request in that case.  (See the previous commit for more
      reasoning about why this is appropriate.)
      
      Rename some variables in rbd_img_obj_parent_read_full_callback()
      to match the convention used in the previous patch.
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      bbea1c1a
    • Alex Elder's avatar
      rbd: re-submit read request for flattened clone · 02c74fba
      Alex Elder authored
      
      
      If a clone image gets flattened while a parent read request is
      underway, the original rbd object request needs to be resubmitted.
      
      The reason is that by the time we get the response to the parent
      read request, the data read from the parent may be out of date.
      In other words, we could see this sequence of events:
      
          rbd client                      parent image/osd
          ----------                      ----------------
          original object ENOENT;
              issue parent read
                                          respond to parent read
                                          child image flattened
          original image header refresh
                   <--- original object written independently here
          parent read response received
      
      Add code to rbd_img_parent_read_callback() to detect when a clone's
      parent image has disappeared (as evidenced by its parent overlap
      becoming 0), and re-submit the original read request in that case.
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      02c74fba
    • Alex Elder's avatar
      rbd: detect when clone image is flattened · 392a9dad
      Alex Elder authored
      
      
      A format 2 clone image can be the subject of a "flatten" operation,
      during which all of its data gets "copied up" from its parent image,
      leaving the image fully populated.  Once this is complete, the
      clone's association with the parent is abolished.
      
      Since this can occur when a clone is mapped, we need to detect when
      it has occurred and handle it accordingly.  We know an image has
      been flattened when we know it at one time had a parent, but we have
      learned (via a "get_parent" object class method call) it no longer
      has one.
      
      There might be in-flight requests at the point we learn an image has
      been flattened, so we can't simply clean up parent data structures
      right away.  Instead, we'll drop the initial parent reference when
      the parent has disappeared (rather than when the image gets
      destroyed), which will allow the last in-flight reference to clean
      things up when it's complete.
      
      We leverage the fact that a zero parent overlap renders an image
      effectively unlayered.  We set the overlap to 0 at the point we
      detect the clone image has flattened, which allows the unlayered
      behavior to take effect immediately, while keeping other parent
      structures in place until in-flight requests to complete.
      
      This and the next few patches resolve:
          http://tracker.ceph.com/issues/3763
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      392a9dad
    • Alex Elder's avatar
      rbd: reference count parent requests · a2acd00e
      Alex Elder authored
      
      
      Keep a reference count for uses of the parent information for an rbd
      device.
      
      An initial reference is set in rbd_img_request_create() if the
      target image has a parent (with non-zero overlap).  Each image
      request for an image with a non-zero parent overlap gets another
      reference when it's created, and that reference is dropped when the
      request is destroyed.
      
      The initial reference is dropped when the image gets torn down.
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      a2acd00e
    • Alex Elder's avatar
      rbd: define parent image request routines · e93f3152
      Alex Elder authored
      
      
      Define rbd_parent_request_create() and rbd_parent_request_destroy()
      to handle the creation of parent image requests submitted for
      layered image objects.  For simplicity, let rbd_img_request_put()
      handle dropping the reference to any image request (parent or not),
      and call whichever destructor is appropriate on the last put.
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      e93f3152
    • Alex Elder's avatar
      rbd: define rbd_dev_unparent() · fb65d228
      Alex Elder authored
      
      
      Define rbd_dev_unparent() to encapsulate cleaning up parent data
      structures from a layered rbd image.
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      fb65d228
    • Alex Elder's avatar
      rbd: don't release write request until necessary · 8785b1d4
      Alex Elder authored
      
      
      Previously when a layered write was going to involve a copyup
      request, the original osd request was released before submitting the
      parent full-object read.  The osd request for the copyup would then
      be allocated in rbd_img_obj_parent_read_full_callback().
      
      Shortly we will be handling the event of mapped layered images
      getting flattened, and when that occurs we need to resubmit the
      original request.  We therefore don't want to release the osd
      request until we really konw we're going to replace it--in the
      callback function.
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      8785b1d4
    • Alex Elder's avatar
      rbd: get parent info on refresh · 642a2537
      Alex Elder authored
      
      
      Get parent info for format 2 images on every refresh (rather than
      just during the initial probe).  This will be needed to detect the
      disappearance of the parent image in the event a mapped image
      becomes unlayered (i.e., flattened).  Avoid leaking the previous
      parent spec on the second and subsequent times this information is
      requested by dropping the previous one (if any) before updating it.
      (Also, extract the pool id into a local variable before assigning
      it into the parent spec.)
      
      Switch to using a non-zero parent overlap value rather than the
      existence of a parent (a non-null parent_spec pointer) to determine
      whether to mark a request layered.  It will soon be possible for
      a layered image to become unlayered while a request is in flight.
      
      This means that the layered flag for an image request indicates that
      there was a non-zero parent overlap at the time the image request
      was created.  The parent overlap can change thereafter, which may
      lead to special handling at request submission or completion time.
      
      This and the next several patches are related to:
          http://tracker.ceph.com/issues/3763
      
      NOTE:
      If an error occurs while refreshing the parent info (i.e.,
      requesting it after initial probe), the old parent info will
      persist.  This is not really correct, and is a scenario that needs
      to be addressed.  For now we'll assert that the failure mode is
      unlikely, but the issue has been documented in tracker issue 5040.
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      642a2537
    • Alex Elder's avatar
      rbd: ignore zero-overlap parent · 70cf49cf
      Alex Elder authored
      
      
      An rbd clone image that has an overlap with its parent of 0 is
      effectively not a layered image at all.  Detect this case and treat
      such an image as non-layered.  Issue a warning to be sure the user
      knows what's going on.
      
      This resolves:
          http://tracker.ceph.com/issues/5028
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      70cf49cf
    • Alex Elder's avatar
      rbd: support reading parent page data for writes · b91f09f1
      Alex Elder authored
      Currently, rbd_img_obj_parent_read_full() assumes the incoming
      object request contains bio data.  But if a layered image is part of
      a multi-layer stack of images it will result in read requests of
      page data to parent images.
      
      This is handling the same kind of issue as was resolved by this
      commit:
          5b2ab72d
      
        rbd: support reading parent page data
      
      This resolves:
          http://tracker.ceph.com/issues/5027
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      b91f09f1
    • Alex Elder's avatar
      rbd: fix parent request size assumption · ebda6408
      Alex Elder authored
      
      
      The code that reads object data from the parent for a copyup on
      write request currently assumes that the size of that request is the
      size of a "full" object from the original target image.
      
      That is not necessarily the case.  The parent overlap could reduce
      the request size below that.  To fix that assumption we need to
      record the number of pages in the copyup_pages array, for both an
      image request and an object request.  Rename a local variable in
      rbd_img_obj_parent_read_full_callback() to reflect we're recording
      the length of the parent read request, not the size of the target
      object.
      
      This resolves:
          http://tracker.ceph.com/issues/5038
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      ebda6408
    • Alex Elder's avatar
      libceph: init sent and completed when starting · c10ebbf5
      Alex Elder authored
      
      
      The rbd code has a need to be able to restart an osd request that
      has already been started and completed once before.  This currently
      wouldn't work right because the osd client code assumes an osd
      request will be started exactly once  Certain fields in a request
      are never cleared and this leads to trouble if you try to reuse it.
      
      Specifically, the r_sent, r_got_reply, and r_completed fields are
      never cleared.  The r_sent field records the osd incarnation at the
      time the request was sent to that osd.  If that's non-zero, the
      message won't get re-mapped to a target osd properly, and won't be
      put on the unsafe requests list the first time it's sent as it
      should.  The r_got_reply field is used in handle_reply() to ensure
      the reply to a request is processed only once.  And the r_completed
      field is used for lingering requests to avoid calling the callback
      function every time the osd client re-sends the request on behalf of
      its initiator.
      
      Each osd request passes through ceph_osdc_start_request() when
      responsibility for the request is handed over to the osd client for
      completion.  We can safely zero these three fields there each time a
      request gets started.
      
      One last related change--clear the r_linger flag when a request
      is no longer registered as a linger request.
      
      This resolves:
          http://tracker.ceph.com/issues/5026
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      c10ebbf5
  4. May 09, 2013
    • Alex Elder's avatar
      rbd: kill rbd_img_request_get() · c48f3f86
      Alex Elder authored
      
      
      Get rid of rbd_img_request_get(), because it isn't used, and maybe
      won't ever be needed.
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      c48f3f86
    • Alex Elder's avatar
      rbd: only set up watch for mapped images · 1f3ef788
      Alex Elder authored
      
      
      Any changes to parent images are immaterial to any mapped clone.
      So there is no need to have a watch event registered on header
      objects except for the header object of an image that is mapped.
      In fact, a watch request is a write operation, and we may only
      have read access to a parent image.
      
      We can't set up the watch request until we know the name of the
      header object though.  So pass a flag to rbd_dev_image_probe() to
      indicate whether this probe is for a mapping or for a parent image.
      
      Change the second parameter to rbd_dev_header_watch_sync() be
      Boolean while we're at it.
      
      This resolves:
          http://tracker.ceph.com/issues/4941
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      1f3ef788
    • Alex Elder's avatar
      rbd: set mapping read-only flag in rbd_add() · 7ce4eef7
      Alex Elder authored
      
      
      The rbd_dev->mapping field for a parent image is not meaningful.
      Since rbd_image_probe() is used both for images being mapped and
      their parents, it doesn't make sense to set that flag in that
      function.
      
      So move the setting of the mapping.read_only flag out of
      rbd_dev_image_probe() and into rbd_add() instead.
      
      This resolves:
          http://tracker.ceph.com/issues/4940
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      7ce4eef7
    • Alex Elder's avatar
      rbd: support reading parent page data · 5b2ab72d
      Alex Elder authored
      
      
      Currently, rbd_img_parent_read() assumes the incoming object request
      contains bio data.  But if a layered image is part of a multi-layer
      stack of images it will result in read requests of page data to parent
      images.
      
      Fortunately, it's not hard to add support for page data.
      
      This resolves:
          http://tracker.ceph.com/issues/4939
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      5b2ab72d
    • Alex Elder's avatar
      rbd: fix an incorrect assertion condition · 91c6febb
      Alex Elder authored
      
      
      In rbd_img_obj_parent_read_full_callback() there is an assertion
      intended to verify the size of the image request for a full parent
      read was the size of the original request's target object.  But
      assertion was looking at the parent image order rather than the
      original one, and these values can differ.
      
      Fix that.
      
      This resolves:
          http://tracker.ceph.com/issues/4938
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      91c6febb
    • Alex Elder's avatar
      rbd: define rbd_dev_v2_header_info() · 2df3fac7
      Alex Elder authored
      
      
      This rearranges rbd_dev_v2_refresh() so it works more like
      rbd_dev_v1_header_info().  While format 1 images need to read the
      whole header object to get any information, format 2 can collect
      almost all information selectively.  So the one-time initialization
      will remain in a separate function--based on rbd_dev_v2_probe().
      
      Rename rbd_dev_v2_refresh() to be rbd_dev_v2_header_info(), and have
      it call rbd_dev_v2_header_onetime() if it's being called for the
      first time for the given rbd device.
      
      Rename rbd_dev_v2_probe() to be rbd_dev_v2_header_onetime() and
      remove the image size and snapshot context calls it held in
      common with the refresh function.
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      2df3fac7
    • Alex Elder's avatar
      rbd: get rid of trivial v1 header wrappers · 99a41ebc
      Alex Elder authored
      
      
      Get rid of the trivial wrapper functions rbd_dev_v1_refresh() and
      rbd_dev_v1_probe(), substituting rbd_dev_v1_header_read() calls
      in their place.
      
      Rename rbd_dev_v1_header_read() to be rbd_dev_v1_header_info(), to
      be more generic (it will better reflect what happens with format 2
      images).
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      99a41ebc
    • Alex Elder's avatar
      rbd: simplify rbd_dev_v1_probe() · 30d60ba2
      Alex Elder authored
      
      
      An rbd_dev structure's fields are all zero-filled for an initial
      probe, so there's no need to explicitly zero the parent_spec
      and parent_overlap fields in rbd_dev_v1_probe().  Removing these
      assignments makes rbd_dev_v1_probe() *almost* trivial.
      
      Move the dout() message that announces discovery of an image into
      rbd_dev_image_probe(), generalize to support images in either format
      and only show it if an image is fully discovered.
      
      This highlights that are some unnecessary cleanups in the error
      path for rbd_dev_v1_probe(), so they can be removed.
      
      Now rbd_dev_v1_probe() *is* a trivial wrapper function.
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      30d60ba2
    • Alex Elder's avatar
      rbd: update in-core header directly · 662518b1
      Alex Elder authored
      
      
      Now that rbd_header_from_disk() only fills in one-time fields once,
      we can extend it slightly so it releases the other fields before
      replacing their values.  This way there's no need to pass a
      temporary buffer and then copy all the results in.  Just use the rbd
      device header structure in rbd_header_from_disk() so its values get
      updated directly.
      
      Note that this means we need to take the header semaphore at the
      point we update things.  So pass the rbd_dev rather than the address
      of its header as its first argument to rbd_header_from_disk(), and
      have it return an error code.
      
      As a result, rbd_dev_v1_header_read() does all the work,
      rbd_read_header() becomes unnecessary, and rbd_dev_v1_refresh()
      becomes a very simple wrapper.
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      662518b1
    • Alex Elder's avatar
      rbd: refactor rbd_header_from_disk() · bb23e37a
      Alex Elder authored
      
      
      This rearranges rbd_header_from_disk so that it:
          - allocates the snapshot context right away
          - keeps results in local variables, not changing the passed-in
            header until it's known we'll succeed
          - does initialization of set-once fields in a header only if
            they have not already been set
      
      The last point is moot at the moment, because rbd_read_header()
      (the only caller) always supplies a zero-filled header buffer.
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      bb23e37a
    • Alex Elder's avatar
      rbd: zero format 1 header structure earlier · 46578dcd
      Alex Elder authored
      
      
      The passed-in header structure is zeroed in rbd_header_from_disk().
      Instead, have the caller do it.  Note that there are two callers,
      rbd_dev_v1_refresh() and rbd_dev_v1_probe().  The latter already has
      a zeroed header structure so zeroing it isn't necessary there.
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      46578dcd
    • Alex Elder's avatar
      rbd: set the mapping size and features later · f35a4dee
      Alex Elder authored
      
      
      Defer setting the size and features fields of a mapped image until
      after the Linux disk structure is set up.  Set the capacity of the
      disk after that.
      
      Rearrange the definition of rbd_image_header, separating the fields
      that are set only once from those that can be updated.
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      f35a4dee
  5. May 08, 2013
    • Alex Elder's avatar
      rbd: always set read-only flag in rbd_add() · 51344a38
      Alex Elder authored
      
      
      Hold off setting the read-only flag in rbd_add() for an image being
      mapped until we have successfully probed the image.  At that point
      we know whether it's a snapshot mapping or not, so we can set the
      read-only flag in that one place rather than doing so (for
      snapshots) in rbd_dev_mapping_set().  To do this, pass a flag to the
      image probe routine indicating whether we want a read-only mapping.
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      51344a38
    • Alex Elder's avatar
      rbd: kill rbd_dev_clear_mapping() · 6d80b130
      Alex Elder authored
      
      
      This function is a duplicate of rbd_dev_mapping_clear(), and was
      added by mistake.
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      6d80b130
    • Alex Elder's avatar
      rbd: don't look up snapshot id in rbd_dev_mapping_set() · 8f4b7d98
      Alex Elder authored
      
      
      Currently rbd_dev_mapping_set() looks up the snapshot id for the
      snapshot whose name is found in the rbd device's spec structure.
      
      That function gets called by rbd_dev_device_setup(), which is
      called by rbd_add() *after* rbd_dev_image_probe().  If the
      image probe succeeds, the rbd device's spec will already have
      been updated to include names and ids for all fields.
      
      Therefore there's no need to look up the snapshot id in
      rbd_dev_mapping_set().
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      8f4b7d98
    • Alex Elder's avatar
      rbd: don't print warning if not mapping a parent · c734b796
      Alex Elder authored
      
      
      The presence of the LAYERING bit in an rbd image's feature mask does
      not guarantee the image actually has a parent image.  Currently that
      bit is set only when a clone (i.e., image with a parent) is created,
      but it is (currently) not cleared if that clone gets flattened back
      into a "normal" image.  A "parent_id" query will leave the
      parent_spec for the image being mapped a null pointer, but will not
      return an error.
      
      Currently, whenever an image with the LAYERED feature gets mapped, a
      warning about the use of layered images gets printed.  But we don't
      want to do this for a flattened image, so print the warning only
      if we find there is a parent spec after the probe.
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      c734b796
    • Alex Elder's avatar
      rbd: kill rbd_update_mapping_size() · 29334ba4
      Alex Elder authored
      
      
      Since rbd_update_mapping_size() is now a trivial wrapper, just open
      code it in its two callers.
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      29334ba4
    • Alex Elder's avatar
      rbd: update capacity in rbd_dev_refresh() · 00a653e2
      Alex Elder authored
      
      
      When a mapped image changes size, we change the capacity recorded
      for the Linux disk associated with it, in rbd_update_mapping_size().
      That function is called in two places--the format 1 and format 2
      refresh routines.
      
      There is no need to set the capacity while holding the header
      semaphore.  Instead, do it in the common rbd_dev_refresh(), using
      the logic that's already there to initiate disk revalidation.
      
      Add handling in the request function, just in case a request
      that exceeds the capacity of the device comes in (perhaps one
      that was started before a refresh shrunk the device).
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      00a653e2
    • Alex Elder's avatar
      rbd: revalidate only for mapping size changes · e627db08
      Alex Elder authored
      This commit:
          d98df63e
      
       rbd: revalidate_disk upon rbd resize
      instituted a call to revalidate_disk() to notify interested parties
      that a mapped image has changed size.  This works well, as long as
      the the rbd device doesn't map a snapshot.
      
      A snapshot will never change size.  However, the base image the
      snapshot is associated with can, and it can do so while the snapshot
      is mapped.
      
      The problem is that the test for the size is looking at the size of
      the base image, not the size of the mapped snapshot.  This patch
      corrects that.
      
      Update the warning message shown in the event of error, and move
      it into the callers.
      
      This resolves:
          http://tracker.ceph.com/issues/4911
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      e627db08
    • Alex Elder's avatar
      rbd: fix leak of format 2 snapshot context · 49ece554
      Alex Elder authored
      
      
      When rbd_dev_v2_refresh() is called, the rbd device already has a
      snapshot context associated with it.  But that never gets freed,
      the pointer just gets overwritten.
      
      Fix this by dropping the rbd device's reference to the snapshot
      context before overwriting the pointer.
      
      Because ceph_put_snap_context() already handles for a null pointer
      we don't need to check for that (for the probe case, where no
      context has yet been assigned).
      
      This resolves:
          http://tracker.ceph.com/issues/4912
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      49ece554
  6. May 03, 2013
    • Alex Elder's avatar
      rbd: fix image request leak on parent read · b5b09be3
      Alex Elder authored
      
      
      When a read for a layered image object finds the target object
      doesn't exist, a read image request for the parent image is created
      and submitted.  When that completes, the callback routine was
      not releasing that parent image request.  Fix that.
      
      The slab allocation stuff just added has greatly simplified the
      search for the source of this memory leak.
      
      This resolves:
          http://tracker.ceph.com/issues/4803
      
      Signed-off-by: default avatarAlex Elder <elder@inktank.com>
      Reviewed-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      b5b09be3