diff options
author | Song Liu <songliubraving@fb.com> | 2016-11-17 15:24:39 -0800 |
---|---|---|
committer | Shaohua Li <shli@fb.com> | 2016-11-18 13:26:30 -0800 |
commit | 1e6d690b9334b7e1b31d25fd8d93e980e449a5f9 (patch) | |
tree | 878a16fa392d23a942a1bc0efe3ef5e3ae2e3ab0 /drivers/md/raid5.h | |
parent | 2ded370373a400c20cf0c6e941e724e61582a867 (diff) | |
download | linux-1e6d690b9334b7e1b31d25fd8d93e980e449a5f9.tar.gz linux-1e6d690b9334b7e1b31d25fd8d93e980e449a5f9.tar.bz2 linux-1e6d690b9334b7e1b31d25fd8d93e980e449a5f9.zip |
md/r5cache: caching phase of r5cache
As described in previous patch, write back cache operates in two
phases: caching and writing-out. The caching phase works as:
1. write data to journal
(r5c_handle_stripe_dirtying, r5c_cache_data)
2. call bio_endio
(r5c_handle_data_cached, r5c_return_dev_pending_writes).
Then the writing-out phase is as:
1. Mark the stripe as write-out (r5c_make_stripe_write_out)
2. Calcualte parity (reconstruct or RMW)
3. Write parity (and maybe some other data) to journal device
4. Write data and parity to RAID disks
This patch implements caching phase. The cache is integrated with
stripe cache of raid456. It leverages code of r5l_log to write
data to journal device.
Writing-out phase of the cache is implemented in the next patch.
With r5cache, write operation does not wait for parity calculation
and write out, so the write latency is lower (1 write to journal
device vs. read and then write to raid disks). Also, r5cache will
reduce RAID overhead (multipile IO due to read-modify-write of
parity) and provide more opportunities of full stripe writes.
This patch adds 2 flags to stripe_head.state:
- STRIPE_R5C_PARTIAL_STRIPE,
- STRIPE_R5C_FULL_STRIPE,
Instead of inactive_list, stripes with cached data are tracked in
r5conf->r5c_full_stripe_list and r5conf->r5c_partial_stripe_list.
STRIPE_R5C_FULL_STRIPE and STRIPE_R5C_PARTIAL_STRIPE are flags for
stripes in these lists. Note: stripes in r5c_full/partial_stripe_list
are not considered as "active".
For RMW, the code allocates an extra page for each data block
being updated. This is stored in r5dev->orig_page and the old data
is read into it. Then the prexor calculation subtracts ->orig_page
from the parity block, and the reconstruct calculation adds the
->page data back into the parity block.
r5cache naturally excludes SkipCopy. When the array has write back
cache, async_copy_data() will not skip copy.
There are some known limitations of the cache implementation:
1. Write cache only covers full page writes (R5_OVERWRITE). Writes
of smaller granularity are write through.
2. Only one log io (sh->log_io) for each stripe at anytime. Later
writes for the same stripe have to wait. This can be improved by
moving log_io to r5dev.
3. With writeback cache, read path must enter state machine, which
is a significant bottleneck for some workloads.
4. There is no per stripe checkpoint (with r5l_payload_flush) in
the log, so recovery code has to replay more than necessary data
(sometimes all the log from last_checkpoint). This reduces
availability of the array.
This patch includes a fix proposed by ZhengYuan Liu
<liuzhengyuan@kylinos.cn>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Diffstat (limited to 'drivers/md/raid5.h')
-rw-r--r-- | drivers/md/raid5.h | 19 |
1 files changed, 18 insertions, 1 deletions
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h index c9590a8e1425..73c183398e38 100644 --- a/drivers/md/raid5.h +++ b/drivers/md/raid5.h @@ -264,7 +264,7 @@ struct stripe_head_state { int syncing, expanding, expanded, replacing; int locked, uptodate, to_read, to_write, failed, written; int to_fill, compute, req_compute, non_overwrite; - int injournal; + int injournal, just_cached; int failed_num[2]; int p_failed, q_failed; int dec_preread_active; @@ -368,6 +368,12 @@ enum { STRIPE_R5C_CACHING, /* the stripe is in caching phase * see more detail in the raid5-cache.c */ + STRIPE_R5C_PARTIAL_STRIPE, /* in r5c cache (to-be/being handled or + * in conf->r5c_partial_stripe_list) + */ + STRIPE_R5C_FULL_STRIPE, /* in r5c cache (to-be/being handled or + * in conf->r5c_full_stripe_list) + */ }; #define STRIPE_EXPAND_SYNC_FLAGS \ @@ -618,6 +624,12 @@ struct r5conf { */ atomic_t active_stripes; struct list_head inactive_list[NR_STRIPE_HASH_LOCKS]; + + atomic_t r5c_cached_full_stripes; + struct list_head r5c_full_stripe_list; + atomic_t r5c_cached_partial_stripes; + struct list_head r5c_partial_stripe_list; + atomic_t empty_inactive_list_nr; struct llist_head released_stripes; wait_queue_head_t wait_for_quiescent; @@ -739,4 +751,9 @@ r5c_try_caching_write(struct r5conf *conf, struct stripe_head *sh, extern void r5c_finish_stripe_write_out(struct r5conf *conf, struct stripe_head *sh, struct stripe_head_state *s); +extern void r5c_release_extra_page(struct stripe_head *sh); +extern void r5c_handle_cached_data_endio(struct r5conf *conf, + struct stripe_head *sh, int disks, struct bio_list *return_bi); +extern int r5c_cache_data(struct r5l_log *log, struct stripe_head *sh, + struct stripe_head_state *s); #endif |