Skip to content
  1. Sep 27, 2018
    • Bart Van Assche's avatar
      block, scsi: Change the preempt-only flag into a counter · cd84a62e
      Bart Van Assche authored
      
      
      The RQF_PREEMPT flag is used for three purposes:
      - In the SCSI core, for making sure that power management requests
        are executed even if a device is in the "quiesced" state.
      - For domain validation by SCSI drivers that use the parallel port.
      - In the IDE driver, for IDE preempt requests.
      Rename "preempt-only" into "pm-only" because the primary purpose of
      this mode is power management. Since the power management core may
      but does not have to resume a runtime suspended device before
      performing system-wide suspend and since a later patch will set
      "pm-only" mode as long as a block device is runtime suspended, make
      it possible to set "pm-only" mode from more than one context. Since
      with this change scsi_device_quiesce() is no longer idempotent, make
      that function return early if it is called for a quiesced queue.
      
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Acked-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cd84a62e
    • Bart Van Assche's avatar
      block: Move power management code into a new source file · bca6b067
      Bart Van Assche authored
      
      
      Move the code for runtime power management from blk-core.c into the
      new source file blk-pm.c. Move the corresponding declarations from
      <linux/blkdev.h> into <linux/blk-pm.h>. For CONFIG_PM=n, leave out
      the declarations of the functions that are not used in that mode.
      This patch not only reduces the number of #ifdefs in the block layer
      core code but also reduces the size of header file <linux/blkdev.h>
      and hence should help to reduce the build time of the Linux kernel
      if CONFIG_PM is not defined.
      
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bca6b067
  2. Sep 26, 2018
  3. Sep 25, 2018
  4. Sep 22, 2018
  5. Sep 21, 2018
  6. Sep 20, 2018
  7. Sep 15, 2018
    • Paolo Valente's avatar
      blok, bfq: do not plug I/O if all queues are weight-raised · c8765de0
      Paolo Valente authored
      
      
      To reduce latency for interactive and soft real-time applications, bfq
      privileges the bfq_queues containing the I/O of these
      applications. These privileged queues, referred-to as weight-raised
      queues, get a much higher share of the device throughput
      w.r.t. non-privileged queues. To preserve this higher share, the I/O
      of any non-weight-raised queue must be plugged whenever a sync
      weight-raised queue, while being served, remains temporarily empty. To
      attain this goal, bfq simply plugs any I/O (from any queue), if a sync
      weight-raised queue remains empty while in service.
      
      Unfortunately, this plugging typically lowers throughput with random
      I/O, on devices with internal queueing (because it reduces the filling
      level of the internal queues of the device).
      
      This commit addresses this issue by restricting the cases where
      plugging is performed: if a sync weight-raised queue remains empty
      while in service, then I/O plugging is performed only if some of the
      active bfq_queues are *not* weight-raised (which is actually the only
      circumstance where plugging is needed to preserve the higher share of
      the throughput of weight-raised queues). This restriction proved able
      to boost throughput in really many use cases needing only maximum
      throughput.
      
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c8765de0
    • Paolo Valente's avatar
      block, bfq: inject other-queue I/O into seeky idle queues on NCQ flash · d0edc247
      Paolo Valente authored
      
      
      The Achilles' heel of BFQ is its failing to reach a high throughput
      with sync random I/O on flash storage with internal queueing, in case
      the processes doing I/O have differentiated weights.
      
      The cause of this failure is as follows. If at least two processes do
      sync I/O, and have a different weight from each other, then BFQ plugs
      I/O dispatching every time one of these processes, while it is being
      served, remains temporarily without pending I/O requests. This
      plugging is necessary to guarantee that every process enjoys a
      bandwidth proportional to its weight; but it empties the internal
      queue(s) of the drive. And this kills throughput with random I/O. So,
      if some processes have differentiated weights and do both sync and
      random I/O, the end result is a throughput collapse.
      
      This commit tries to counter this problem by injecting the service of
      other processes, in a controlled way, while the process in service
      happens to have no I/O. This injection is performed only if the medium
      is non rotational and performs internal queueing, and the process in
      service does random I/O (service injection might be beneficial for
      sequential I/O too, we'll work on that).
      
      As an example of the benefits of this commit, on a PLEXTOR PX-256M5S
      SSD, and with five processes having differentiated weights and doing
      sync random 4KB I/O, this commit makes the throughput with bfq grow by
      400%, from 25 to 100MB/s. This higher throughput is 10MB/s lower than
      that reached with none. As some less random I/O is added to the mix,
      the throughput becomes equal to or higher than that with none.
      
      This commit is a very first attempt to recover throughput without
      losing control, and certainly has many limitations. One is, e.g., that
      the processes whose service is injected are not chosen so as to
      distribute the extra bandwidth they receive in accordance to their
      weights. Thus there might be loss of weighted fairness in some
      cases. Anyway, this loss concerns extra service, which would not have
      been received at all without this commit. Other limitations and issues
      will probably show up with usage.
      
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d0edc247
    • Paolo Valente's avatar
      block, bfq: correctly charge and reset entity service in all cases · cbeb869a
      Paolo Valente authored
      
      
      BFQ schedules entities (which represent either per-process queues or
      groups of queues) as a function of their timestamps. In particular, as
      a function of their (virtual) finish times. The finish time of an
      entity is computed as a function of the budget assigned to the entity,
      assuming, tentatively, that the entity, once in service, will receive
      an amount of service equal to its budget. Then, when the entity is
      expired because it finishes to be served, this finish time is updated
      as a function of the actual service received by the entity. This
      allows the entity to be correctly charged with only the service
      received, and then to be correctly re-scheduled.
      
      Yet an entity may receive service also while not being the entity in
      service (in the scheduling environment of its parent entity), for
      several reasons. If the entity remains with no backlog while receiving
      this 'unofficial' service, then it is expired. Also on such an
      expiration, the finish time of the entity should be updated to account
      for only the service actually received by the entity. Unfortunately,
      such an update is not performed for an entity expiring without being
      the entity in service.
      
      In a similar vein, the service counter of the entity in service is
      reset when the entity is expired, to be ready to be used for next
      service cycle. This reset too should be performed also in case an
      entity is expired because it remains empty after receiving service
      while not being the entity in service. But in this case the reset is
      not performed.
      
      This commit performs the above update of the finish time and reset of
      the service received, also for an entity expiring while not being the
      entity in service.
      
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cbeb869a
  8. Sep 14, 2018
  9. Sep 12, 2018
  10. Sep 08, 2018
  11. Sep 07, 2018