diff options
author | Ming Lei <ming.lei@redhat.com> | 2020-05-29 15:53:15 +0200 |
---|---|---|
committer | Jens Axboe <axboe@kernel.dk> | 2020-05-29 10:23:25 -0600 |
commit | bf0beec0607db3c6f6fb7bd2c6d503792b05cf3f (patch) | |
tree | 10b8e3b1cc69eb2e8b4f23cc176726c371d2c239 /block/blk-mq-tag.c | |
parent | 602380d28e28b454683efac41dc4b2862d055d91 (diff) | |
download | linux-stable-bf0beec0607db3c6f6fb7bd2c6d503792b05cf3f.tar.gz linux-stable-bf0beec0607db3c6f6fb7bd2c6d503792b05cf3f.tar.bz2 linux-stable-bf0beec0607db3c6f6fb7bd2c6d503792b05cf3f.zip |
blk-mq: drain I/O when all CPUs in a hctx are offline
Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup
up queue mapping. Thomas mentioned the following point[1]:
"That was the constraint of managed interrupts from the very beginning:
The driver/subsystem has to quiesce the interrupt line and the associated
queue _before_ it gets shutdown in CPU unplug and not fiddle with it
until it's restarted by the core when the CPU is plugged in again."
However, current blk-mq implementation doesn't quiesce hw queue before
the last CPU in the hctx is shutdown. Even worse, CPUHP_BLK_MQ_DEAD is a
cpuhp state handled after the CPU is down, so there isn't any chance to
quiesce the hctx before shutting down the CPU.
Add new CPUHP_AP_BLK_MQ_ONLINE state to stop allocating from blk-mq hctxs
where the last CPU goes away, and wait for completion of in-flight
requests. This guarantees that there is no inflight I/O before shutting
down the managed IRQ.
Add a BLK_MQ_F_STACKING and set it for dm-rq and loop, so we don't need
to wait for completion of in-flight requests from these drivers to avoid
a potential dead-lock. It is safe to do this for stacking drivers as those
do not use interrupts at all and their I/O completions are triggered by
underlying devices I/O completion.
[1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
[hch: different retry mechanism, merged two patches, minor cleanups]
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Diffstat (limited to 'block/blk-mq-tag.c')
-rw-r--r-- | block/blk-mq-tag.c | 8 |
1 files changed, 8 insertions, 0 deletions
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 762198b62088..96a39d0724a2 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -180,6 +180,14 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) sbitmap_finish_wait(bt, ws, &wait); found_tag: + /* + * Give up this allocation if the hctx is inactive. The caller will + * retry on an active hctx. + */ + if (unlikely(test_bit(BLK_MQ_S_INACTIVE, &data->hctx->state))) { + blk_mq_put_tag(tags, data->ctx, tag + tag_offset); + return BLK_MQ_NO_TAG; + } return tag + tag_offset; } |