diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2013-09-08 13:46:52 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2013-09-08 13:46:52 -0700 |
commit | 0d98439ea3c6ffb2af931f6de4480e744634e2c5 (patch) | |
tree | d272064d6bf26a38e246320ecd0445bb75345bf1 /fs/namei.c | |
parent | 8aab6a27332bbf2abfcb35224738394e784d940b (diff) | |
download | linux-0d98439ea3c6ffb2af931f6de4480e744634e2c5.tar.gz linux-0d98439ea3c6ffb2af931f6de4480e744634e2c5.tar.bz2 linux-0d98439ea3c6ffb2af931f6de4480e744634e2c5.zip |
vfs: use lockred "dead" flag to mark unrecoverably dead dentries
This simplifies the RCU to refcounting code in particular.
I was originally intending to leave this for later, but walking through
all the dput() logic (see previous commit), I realized that the dput()
"might_sleep()" check was misleadingly weak. And I removed it as
misleading, both for performance profiling and for debugging.
However, the might_sleep() debugging case is actually true: the final
dput() can indeed sleep, if the inode of the dentry that you are
releasing ends up sleeping at iput time (see dentry_iput()). So the
problem with the might_sleep() in dput() wasn't that it wasn't true, it
was that it wasn't actually testing and triggering on the interesting
case.
In particular, just about *any* dput() can indeed sleep, if you happen
to race with another thread deleting the file in question, and you then
lose the race to the be the last dput() for that file. But because it's
a very rare race, the debugging code would never trigger it in practice.
Why is this problematic? The new d_rcu_to_refcount() (see commit
15570086b590: "vfs: reimplement d_rcu_to_refcount() using
lockref_get_or_lock()") does a dput() for the failure case, and it does
it under the RCU lock. So potentially sleeping really is a bug.
But there's no way I'm going to fix this with the previous complicated
"lockref_get_or_lock()" interface. And rather than revert to the old
and crufty nested dentry locking code (which did get this right by
delaying the reference count updates until they were verified to be
safe), let's make forward progress.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'fs/namei.c')
-rw-r--r-- | fs/namei.c | 23 |
1 files changed, 5 insertions, 18 deletions
diff --git a/fs/namei.c b/fs/namei.c index f415c6683a83..cc4bcfaa8624 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -517,25 +517,12 @@ static inline void unlock_rcu_walk(void) */ static inline int d_rcu_to_refcount(struct dentry *dentry, seqcount_t *validate, unsigned seq) { - int gotref; - - gotref = lockref_get_or_lock(&dentry->d_lockref); - - /* Does the sequence number still match? */ - if (read_seqcount_retry(validate, seq)) { - if (gotref) - dput(dentry); - else - spin_unlock(&dentry->d_lock); - return -ECHILD; - } - - /* Get the ref now, if we couldn't get it originally */ - if (!gotref) { - dentry->d_lockref.count++; - spin_unlock(&dentry->d_lock); + if (likely(lockref_get_not_dead(&dentry->d_lockref))) { + if (!read_seqcount_retry(validate, seq)) + return 0; + dput(dentry); } - return 0; + return -ECHILD; } /** |