| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch devide __ceph_flush_snaps() into two stags. In the first
stage, __ceph_flush_snaps() assign snapcaps flush TIDs and add them
to cap flush lists. __ceph_flush_snaps() keeps holding the
i_ceph_lock in this stagge. So inode's auth cap can not change. In
the second stage, __ceph_flush_snaps() send flushsnap cap messages.
i_ceph_lock is unlocked before sending each cap message. If auth cap
changes in the middle, __ceph_flush_snaps() just stops. This is OK
because kick_flushing_inode_caps() will re-send flushsnap cap messages
to inode's new auth MDS.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| | |
If ceph_check_caps() wants to send cap message to a recovering MDS,
make sure it kicks cap flushes first.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| | |
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| | |
make ceph_kick_flushing_caps() ignore inodes whose cap flushes
have already been re-sent by ceph_early_kick_flushing_caps()
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch includes following changes
- Assign flush tid to snapcap flush
- Remove session's s_cap_snaps_flushing list. Add inode to session's
s_cap_flushing list instead. Inode is removed from the list when
there is no pending snapcap flush or cap flush.
- make __kick_flushing_caps() re-send both snapcap flushes and cap
flushes.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
We don't have requirement of searching cap flush by TID. In most cases,
we just need to know TID of the oldest cap flush. List is ideal for this
usage.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| | |
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| | |
This helps the recovering MDS to reconstruct the internal states that
tracking pending snapflush.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| | |
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
To mount non-default filesytem, user currently needs to provide mds
namespace ID. This is inconvenience.
This patch makes user be able to mount filesystem by name. If user
wants to mount non-default filesystem. Client first subscribes to
fsmap.user. Subscribe to mdsmap.<ID> after getting ID of filesystem.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We can now handle the snapshot cases under RCU, as well as the
non-snapshot case when we don't need to queue up a lease renewal
allow LOOKUP_RCU walks to proceed under those conditions.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Under rcuwalk, we need to take extra care when dereferencing d_parent.
We want to do that once and pass a pointer to dentry_lease_is_valid.
Also, we must ensure that that function can handle the case where we're
racing with d_release. Check whether "di" is NULL under the d_lock, and
just return 0 if so.
Finally, we still need to kick off a renewal job if the lease is getting
close to expiration. If that's the case, then just drop out of rcuwalk
mode since that could block.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
To check for a valid dentry lease, we need to get at the
ceph_dentry_info. Under rcuwalk though, we may end up with a dentry that
is on its way to destruction. Since we need to take the d_lock in
dentry_lease_is_valid already, we can just ensure that we clear the
d_fsinfo pointer out under the same lock before destroying it.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| | |
Nothing calls it.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| | |
Pretty simple: just use ceph_dentry_info.time instead (which was already
there, unused).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| | |
trivial fix to spelling mistake in pr_err message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
|
| |
| |
| |
| |
| |
| | |
old_snapc->seq is used in dout(...)
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| | |
Otherwise ceph_sync_write_unsafe() may access/modify freed inode.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| | |
ceph_aio_complete() can free the ceph_aio_request struct before
the code exits the while loop.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| | |
Track usage count for individual fmode bit. This can reduce the
array size by half.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| | |
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This patch adds codes that decode pool namespace information in
cap message and request reply. Pool namespace is saved in i_layout,
it will be passed to libceph when doing read/write.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add pool namesapce pointer to struct ceph_file_layout and struct
ceph_object_locator. Pool namespace is used by when mapping object
to PG, it's also used when composing OSD request.
The namespace pointer in struct ceph_file_layout is RCU protected.
So libceph can read namespace without taking lock.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
[idryomov@gmail.com: ceph_oloc_destroy(), misc minor changes]
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Define new ceph_file_layout structure and rename old ceph_file_layout
to ceph_file_layout_legacy. This is preparation for adding namespace
to ceph_file_layout structure.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
An on-stack oid in ceph_ioctl_get_dataloc() is not initialized,
resulting in a WARN and a NULL pointer dereference later on. We will
have more of these on-stack in the future, so fix it with a convenience
macro.
Fixes: d30291b985d1 ("libceph: variable-sized ceph_object_id")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|\ \
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This changes the vfs dentry hashing to mix in the parent pointer at the
_beginning_ of the hash, rather than at the end.
That actually improves both the hash and the code generation, because we
can move more of the computation to the "static" part of the dcache
setup, and do less at lookup runtime.
It turns out that a lot of other hash users also really wanted to mix in
a base pointer as a 'salt' for the hash, and so the slightly extended
interface ends up working well for other cases too.
Users that want a string hash that is purely about the string pass in a
'salt' pointer of NULL.
* merge branch 'salted-string-hash':
fs/dcache.c: Save one 32-bit multiply in dcache lookup
vfs: make the string hashes salt the hash
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We always mixed in the parent pointer into the dentry name hash, but we
did it late at lookup time. It turns out that we can simplify that
lookup-time action by salting the hash with the parent pointer early
instead of late.
A few other users of our string hashes also wanted to mix in their own
pointers into the hash, and those are updated to use the same mechanism.
Hash users that don't have any particular initial salt can just use the
NULL pointer as a no-salt.
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
->atomic_open() can be given an in-lookup dentry *or* a negative one
found in dcache. Use d_in_lookup() to tell one from another, rather
than d_unhashed().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|\ \
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"Tmpfs readdir throughput regression fix (this cycle) + some -stable
fodder all over the place.
One missing bit is Miklos' tonight locks.c fix - NFS folks had already
grabbed that one by the time I woke up ;-)"
[ The locks.c fix came through the nfsd tree just moments ago ]
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
namespace: update event counter when umounting a deleted dentry
9p: use file_dentry()
ceph: fix d_obtain_alias() misuses
lockless next_positive()
libfs.c: new helper - next_positive()
dcache_{readdir,dir_lseek}(): don't bother with nested ->d_lock
|
| |
| |
| |
| |
| |
| | |
on failure d_obtain_alias() will have done iput()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| |
| |
| |
| | |
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There are several issues in fscache revalidation code.
- In ceph_revalidate_work(), fscache_invalidate() is called when
fscache_check_consistency() return 0. This is complete wrong
because 0 means cache is valid.
- Handle_cap_grant() calls ceph_queue_revalidate() if client
already has CAP_FILE_CACHE. This code is confusing. Client
should revalidate the cache each time it got CAP_FILE_CACHE
anew.
- In Handle_cap_grant(), fscache_invalidate() is called if MDS
revokes CAP_FILE_CACHE. This is inconsistency with the case
that inode get evicted. In the later case, the cache is not
discarded. Client may use the cache when inode is reloaded.
This patch moves the fscache revalidation into ceph_get_caps().
Client revalidates the cache after it gets CAP_FILE_CACHE.
i_rdcache_gen should keep constance while CAP_FILE_CACHE is
used. If i_fscache_gen is not equal to i_rdcache_gen, client
needs to check cache's consistency.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
All other filesystems do not add dirty pages to fscache. They all
disable fscache when inode is opened for write. Only ceph adds
dirty pages to fscache, but the code is buggy.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| | |
ceph_fill_file_size() has already called ceph_fscache_invalidate()
if it return true.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| | |
If readpages fails, fscache needs to cleanup its internal state.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
|/
|
|
|
|
|
| |
For the benefit of every single caller, take osdc instead of map.
Also, now that osdc->osdmap can't ever be NULL, drop the check.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"Followups to the parallel lookup work:
- update docs
- restore killability of the places that used to take ->i_mutex
killably now that we have down_write_killable() merged
- Additionally, it turns out that I missed a prerequisite for
security_d_instantiate() stuff - ->getxattr() wasn't the only thing
that could be called before dentry is attached to inode; with smack
we needed the same treatment applied to ->setxattr() as well"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
switch ->setxattr() to passing dentry and inode separately
switch xattr_handler->set() to passing dentry and inode separately
restore killability of old mutex_lock_killable(&inode->i_mutex) users
add down_write_killable_nested()
update D/f/directory-locking
|
| |
| |
| |
| |
| |
| |
| | |
preparation for similar switch in ->setxattr() (see the next commit for
rationale).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|\ \
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
"This changeset has a few main parts:
- Ilya has finished a huge refactoring effort to sync up the
client-side logic in libceph with the user-space client code, which
has evolved significantly over the last couple years, with lots of
additional behaviors (e.g., how requests are handled when cluster
is full and transitions from full to non-full).
This structure of the code is more closely aligned with userspace
now such that it will be much easier to maintain going forward when
behavior changes take place. There are some locking improvements
bundled in as well.
- Zheng adds multi-filesystem support (multiple namespaces within the
same Ceph cluster)
- Zheng has changed the readdir offsets and directory enumeration so
that dentry offsets are hash-based and therefore stable across
directory fragmentation events on the MDS.
- Zheng has a smorgasbord of bug fixes across fs/ceph"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits)
ceph: fix wake_up_session_cb()
ceph: don't use truncate_pagecache() to invalidate read cache
ceph: SetPageError() for writeback pages if writepages fails
ceph: handle interrupted ceph_writepage()
ceph: make ceph_update_writeable_page() uninterruptible
libceph: make ceph_osdc_wait_request() uninterruptible
ceph: handle -EAGAIN returned by ceph_update_writeable_page()
ceph: make fault/page_mkwrite return VM_FAULT_OOM for -ENOMEM
ceph: block non-fatal signals for fault/page_mkwrite
ceph: make logical calculation functions return bool
ceph: tolerate bad i_size for symlink inode
ceph: improve fragtree change detection
ceph: keep leaf frag when updating fragtree
ceph: fix dir_auth check in ceph_fill_dirfrag()
ceph: don't assume frag tree splits in mds reply are sorted
ceph: fix inode reference leak
ceph: using hash value to compose dentry offset
ceph: don't forbid marking directory complete after forward seek
ceph: record 'offset' for each entry of readdir result
ceph: define 'end/complete' in readdir reply as bit flags
...
|
| |
| |
| |
| |
| |
| |
| | |
We should reset i_requested_max_size before waking the waiters.
(zero i_requested_max_size make waiter re-request the max size)
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
truncate_pagecache() drops dirty pages, it's dangerous to use it
to invalidate read cache. Besides, we shouldn't start invalidating
read cache while there are buffer writers. Because buffer writers
may add dirty pages later.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| | |
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
writepage() can be interrupted when it's called by direct memory
reclaimer (the direct memory relaimer is killed). To avoid lossing
data, we redirty the page.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| | |
ceph_update_writeable_page() is used by ceph_write_begin(). It beaks
atomicity of write operation if it's interruptible.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| | |
when ceph_update_writeable_page() return -EAGAIN, caller should
lock the page and call ceph_update_writeable_page() again.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| | |
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Fault and page_mkwrite are supposed to be uninterruptable. But they
call ceph functions that are interruptible. So they should block
signals before calling functions that are interruptible
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch makes serverl logical caculation functions return bool to
improve readability due to these particular functions only using 0/1
as their return value.
No functional change.
Signed-off-by: Zhang Zhuoyu <zhangzhuoyu@cmss.chinamobile.com>
|
| |
| |
| |
| |
| |
| | |
A mds bug can cause symlink's size to be truncated to zero.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| |
| |
| |
| |
| |
| |
| | |
check if number of splits in i_fragtree is equal to number of splits
in mds reply
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|