diff options
author | Tejun Heo <tj@kernel.org> | 2015-05-28 14:50:51 -0400 |
---|---|---|
committer | Jens Axboe <axboe@fb.com> | 2015-06-02 08:40:20 -0600 |
commit | 2a81490811d0296d390c571bb64eaa93e5ed7def (patch) | |
tree | a2716ba515488d22b4f2e428134c44f1f1674f5b /include/linux/writeback.h | |
parent | b16b1deb553adcd7b3b7ce3e6d6fd1b923f314da (diff) | |
download | linux-stable-2a81490811d0296d390c571bb64eaa93e5ed7def.tar.gz linux-stable-2a81490811d0296d390c571bb64eaa93e5ed7def.tar.bz2 linux-stable-2a81490811d0296d390c571bb64eaa93e5ed7def.zip |
writeback: implement foreign cgroup inode detection
As concurrent write sharing of an inode is expected to be very rare
and memcg only tracks page ownership on first-use basis severely
confining the usefulness of such sharing, cgroup writeback tracks
ownership per-inode. While the support for concurrent write sharing
of an inode is deemed unnecessary, an inode being written to by
different cgroups at different points in time is a lot more common,
and, more importantly, charging only by first-use can too readily lead
to grossly incorrect behaviors (single foreign page can lead to
gigabytes of writeback to be incorrectly attributed).
To resolve this issue, cgroup writeback detects the majority dirtier
of an inode and will transfer the ownership to it. To avoid
unnnecessary oscillation, the detection mechanism keeps track of
history and gives out the switch verdict only if the foreign usage
pattern is stable over a certain amount of time and/or writeback
attempts.
The detection mechanism has fairly low space and computation overhead.
It adds 8 bytes to struct inode (one int and two u16's) and minimal
amount of calculation per IO. The detection mechanism converges to
the correct answer usually in several seconds of IO time when there's
a clear majority dirtier. Even when there isn't, it can reach an
acceptable answer fairly quickly under most circumstances.
Please see wb_detach_inode() for more details.
This patch only implements detection. Following patches will
implement actual switching.
v2: wbc_account_io() now checks whether the wbc is associated with a
wb before dereferencing it. This can happen when pageout() is
writing pages directly without going through the usual writeback
path. As pageout() path is single-threaded, we don't want it to
be blocked behind a slow cgroup and ultimately want it to delegate
actual writing to the usual writeback path.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Diffstat (limited to 'include/linux/writeback.h')
-rw-r--r-- | include/linux/writeback.h | 16 |
1 files changed, 16 insertions, 0 deletions
diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 8f964e558af5..b333c945e571 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -88,6 +88,15 @@ struct writeback_control { unsigned for_sync:1; /* sync(2) WB_SYNC_ALL writeback */ #ifdef CONFIG_CGROUP_WRITEBACK struct bdi_writeback *wb; /* wb this writeback is issued under */ + struct inode *inode; /* inode being written out */ + + /* foreign inode detection, see wbc_detach_inode() */ + int wb_id; /* current wb id */ + int wb_lcand_id; /* last foreign candidate wb id */ + int wb_tcand_id; /* this foreign candidate wb id */ + size_t wb_bytes; /* bytes written by current wb */ + size_t wb_lcand_bytes; /* bytes written by last candidate */ + size_t wb_tcand_bytes; /* bytes written by this candidate */ #endif }; @@ -187,6 +196,8 @@ void wbc_attach_and_unlock_inode(struct writeback_control *wbc, struct inode *inode) __releases(&inode->i_lock); void wbc_detach_inode(struct writeback_control *wbc); +void wbc_account_io(struct writeback_control *wbc, struct page *page, + size_t bytes); /** * inode_attach_wb - associate an inode with its wb @@ -285,6 +296,11 @@ static inline void wbc_init_bio(struct writeback_control *wbc, struct bio *bio) { } +static inline void wbc_account_io(struct writeback_control *wbc, + struct page *page, size_t bytes) +{ +} + #endif /* CONFIG_CGROUP_WRITEBACK */ /* |