diff options
author | Song Liu <songliubraving@fb.com> | 2016-11-17 15:24:38 -0800 |
---|---|---|
committer | Shaohua Li <shli@fb.com> | 2016-11-18 13:26:07 -0800 |
commit | 2ded370373a400c20cf0c6e941e724e61582a867 (patch) | |
tree | 704038326bbbe6f2e7fddd1b98b1f0ae1d3d9f44 /drivers/md/raid5.c | |
parent | 937621c36e0ea1af2aceeaea412ba3bd80247199 (diff) | |
download | linux-stable-2ded370373a400c20cf0c6e941e724e61582a867.tar.gz linux-stable-2ded370373a400c20cf0c6e941e724e61582a867.tar.bz2 linux-stable-2ded370373a400c20cf0c6e941e724e61582a867.zip |
md/r5cache: State machine for raid5-cache write back mode
This patch adds state machine for raid5-cache. With log device, the
raid456 array could operate in two different modes (r5c_journal_mode):
- write-back (R5C_MODE_WRITE_BACK)
- write-through (R5C_MODE_WRITE_THROUGH)
Existing code of raid5-cache only has write-through mode. For write-back
cache, it is necessary to extend the state machine.
With write-back cache, every stripe could operate in two different
phases:
- caching
- writing-out
In caching phase, the stripe handles writes as:
- write to journal
- return IO
In writing-out phase, the stripe behaviors as a stripe in write through
mode R5C_MODE_WRITE_THROUGH.
STRIPE_R5C_CACHING is added to sh->state to differentiate caching and
writing-out phase.
Please note: this is a "no-op" patch for raid5-cache write-through
mode.
The following detailed explanation is copied from the raid5-cache.c:
/*
* raid5 cache state machine
*
* With rhe RAID cache, each stripe works in two phases:
* - caching phase
* - writing-out phase
*
* These two phases are controlled by bit STRIPE_R5C_CACHING:
* if STRIPE_R5C_CACHING == 0, the stripe is in writing-out phase
* if STRIPE_R5C_CACHING == 1, the stripe is in caching phase
*
* When there is no journal, or the journal is in write-through mode,
* the stripe is always in writing-out phase.
*
* For write-back journal, the stripe is sent to caching phase on write
* (r5c_handle_stripe_dirtying). r5c_make_stripe_write_out() kicks off
* the write-out phase by clearing STRIPE_R5C_CACHING.
*
* Stripes in caching phase do not write the raid disks. Instead, all
* writes are committed from the log device. Therefore, a stripe in
* caching phase handles writes as:
* - write to log device
* - return IO
*
* Stripes in writing-out phase handle writes as:
* - calculate parity
* - write pending data and parity to journal
* - write data and parity to raid disks
* - return IO for pending writes
*/
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Diffstat (limited to 'drivers/md/raid5.c')
-rw-r--r-- | drivers/md/raid5.c | 45 |
1 files changed, 41 insertions, 4 deletions
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 34895f3218d9..7c98eb06d1b2 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -4107,6 +4107,9 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s) if (rdev && !test_bit(Faulty, &rdev->flags)) do_recovery = 1; } + + if (test_bit(R5_InJournal, &dev->flags)) + s->injournal++; } if (test_bit(STRIPE_SYNCING, &sh->state)) { /* If there is a failed device being replaced, @@ -4386,14 +4389,47 @@ static void handle_stripe(struct stripe_head *sh) || s.expanding) handle_stripe_fill(sh, &s, disks); - /* Now to consider new write requests and what else, if anything - * should be read. We do not handle new writes when: + /* + * When the stripe finishes full journal write cycle (write to journal + * and raid disk), this is the clean up procedure so it is ready for + * next operation. + */ + r5c_finish_stripe_write_out(conf, sh, &s); + + /* + * Now to consider new write requests, cache write back and what else, + * if anything should be read. We do not handle new writes when: * 1/ A 'write' operation (copy+xor) is already in flight. * 2/ A 'check' operation is in flight, as it may clobber the parity * block. + * 3/ A r5c cache log write is in flight. */ - if (s.to_write && !sh->reconstruct_state && !sh->check_state) - handle_stripe_dirtying(conf, sh, &s, disks); + + if (!sh->reconstruct_state && !sh->check_state && !sh->log_io) { + if (!r5c_is_writeback(conf->log)) { + if (s.to_write) + handle_stripe_dirtying(conf, sh, &s, disks); + } else { /* write back cache */ + int ret = 0; + + /* First, try handle writes in caching phase */ + if (s.to_write) + ret = r5c_try_caching_write(conf, sh, &s, + disks); + /* + * If caching phase failed: ret == -EAGAIN + * OR + * stripe under reclaim: !caching && injournal + * + * fall back to handle_stripe_dirtying() + */ + if (ret == -EAGAIN || + /* stripe under reclaim: !caching && injournal */ + (!test_bit(STRIPE_R5C_CACHING, &sh->state) && + s.injournal > 0)) + handle_stripe_dirtying(conf, sh, &s, disks); + } + } /* maybe we need to check and possibly fix the parity for this stripe * Any reads will already have been scheduled, so we just see if enough @@ -5110,6 +5146,7 @@ static void raid5_make_request(struct mddev *mddev, struct bio * bi) * data on failed drives. */ if (rw == READ && mddev->degraded == 0 && + !r5c_is_writeback(conf->log) && mddev->reshape_position == MaxSector) { bi = chunk_aligned_read(mddev, bi); if (!bi) |