linux.git - Linux kernel mainline tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	ceph: fix ceph_get_caps() interruption	Yan, Zheng	2017-01-18	1	-1/+6
\| \| \| \| \| \| \| \|	Commit 5c341ee32881 ("ceph: fix scheduler warning due to nested blocking") causes infinite loop when process is interrupted. Fix it. Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
*	ceph: properly set issue_seq for cap release	Yan, Zheng	2016-12-12	1	-0/+1
\| \| \| \|	Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: add flags parameter to send_cap_msg	Jeff Layton	2016-12-12	1	-10/+15
\| \| \| \| \| \| \| \| \| \| \| \|	Add a flags parameter to send_cap_msg, so we can request expedited service from the MDS when we know we'll be waiting on the result. Set that flag in the case of try_flush_caps. The callers of that function generally wait synchronously on the result, so it's beneficial to ask the server to expedite it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
*	ceph: update cap message struct version to 10	Jeff Layton	2016-12-12	1	-6/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The userland ceph has MClientCaps at struct version 10. This brings the kernel up the same version. For now, all of the the new stuff is set to default values including the flags field, which will be conditionally set in a later patch. Note that we don't need to set the change_attr and btime to anything since we aren't currently setting the feature flag. The MDS should ignore those values. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
*	ceph: define new argument structure for send_cap_msg	Jeff Layton	2016-12-12	1	-99/+126
\| \| \| \| \| \| \| \| \| \| \| \|	When we get to this many arguments, it's hard to work with positional parameters. send_cap_msg is already at 25 arguments, with more needed. Define a new args structure and pass a pointer to it to send_cap_msg. Eventually it might make sense to embed one of these inside ceph_cap_snap instead of tracking individual fields. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
*	ceph: move xattr initialzation before the encoding past the ceph_mds_caps	Jeff Layton	2016-12-12	1	-7/+7
\| \| \| \| \| \| \| \|	Just for clarity. This part is inside the header, so it makes sense to group it with the rest of the stuff in the header. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
*	ceph: fix minor typo in unsafe_request_wait	Jeff Layton	2016-12-12	1	-1/+1
\| \| \| \| \|	Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
*	ceph: try getting buffer capability for readahead/fadvise	Yan, Zheng	2016-12-12	1	-0/+21
\| \| \| \| \| \| \| \| \|	For readahead/fadvise cases, caller of ceph_readpages does not hold buffer capability. Pages can be added to page cache while there is no buffer capability. This can cause data integrity issue. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: fix scheduler warning due to nested blocking	Nikolay Borisov	2016-12-12	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	try_get_cap_refs can be used as a condition in a wait_event* calls. This is all fine until it has to call __ceph_do_pending_vmtruncate, which in turn acquires the i_truncate_mutex. This leads to a situation in which a task's state is !TASK_RUNNING and at the same time it's trying to acquire a sleeping primitive. In essence a nested sleeping primitives are being used. This causes the following warning: WARNING: CPU: 22 PID: 11064 at kernel/sched/core.c:7631 __might_sleep+0x9f/0xb0() do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8109447d>] prepare_to_wait_event+0x5d/0x110 ipmi_msghandler tcp_scalable ib_qib dca ib_mad ib_core ib_addr ipv6 CPU: 22 PID: 11064 Comm: fs_checker.pl Tainted: G O 4.4.20-clouder2 #6 Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1a 10/16/2015 0000000000000000 ffff8838b416fa88 ffffffff812f4409 ffff8838b416fad0 ffffffff81a034f2 ffff8838b416fac0 ffffffff81052b46 ffffffff81a0432c 0000000000000061 0000000000000000 0000000000000000 ffff88167bda54a0 Call Trace: [<ffffffff812f4409>] dump_stack+0x67/0x9e [<ffffffff81052b46>] warn_slowpath_common+0x86/0xc0 [<ffffffff81052bcc>] warn_slowpath_fmt+0x4c/0x50 [<ffffffff8109447d>] ? prepare_to_wait_event+0x5d/0x110 [<ffffffff8109447d>] ? prepare_to_wait_event+0x5d/0x110 [<ffffffff8107767f>] __might_sleep+0x9f/0xb0 [<ffffffff81612d30>] mutex_lock+0x20/0x40 [<ffffffffa04eea14>] __ceph_do_pending_vmtruncate+0x44/0x1a0 [ceph] [<ffffffffa04fa692>] try_get_cap_refs+0xa2/0x320 [ceph] [<ffffffffa04fd6f5>] ceph_get_caps+0x255/0x2b0 [ceph] [<ffffffff81094370>] ? wait_woken+0xb0/0xb0 [<ffffffffa04f2c11>] ceph_write_iter+0x2b1/0xde0 [ceph] [<ffffffff81613f22>] ? schedule_timeout+0x202/0x260 [<ffffffff8117f01a>] ? kmem_cache_free+0x1ea/0x200 [<ffffffff811b46ce>] ? iput+0x9e/0x230 [<ffffffff81077632>] ? __might_sleep+0x52/0xb0 [<ffffffff81156147>] ? __might_fault+0x37/0x40 [<ffffffff8119e123>] ? cp_new_stat+0x153/0x170 [<ffffffff81198cfa>] __vfs_write+0xaa/0xe0 [<ffffffff81199369>] vfs_write+0xa9/0x190 [<ffffffff811b6d01>] ? set_close_on_exec+0x31/0x70 [<ffffffff8119a056>] SyS_write+0x46/0xa0 This happens since wait_event_interruptible can interfere with the mutex locking code, since they both fiddle with the task state. Fix the issue by using the newly-added nested blocking infrastructure in 61ada528dea0 ("sched/wait: Provide infrastructure to deal with nested blocking") Link: https://lwn.net/Articles/628628/ Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: fix null pointer dereference in ceph_flush_snaps()	Yan, Zheng	2016-08-08	1	-1/+4
\| \| \| \|	Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: optimize cap flush waiting	Yan, Zheng	2016-07-28	1	-27/+64
\| \| \| \| \| \| \| \| \| \| \| \|	Add a 'wake' flag to ceph_cap_flush struct, which indicates if there is someone waiting for it to finish. When getting flush ack message, we check the 'wake' flag in corresponding ceph_cap_flush struct to decide if we should wake up waiters. One corner case is that the acked cap flush has 'wake' flags is set, but it is not the first one on the flushing list. We do not wake up waiters in this case, set 'wake' flags of preceding ceph_cap_flush struct instead Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: cleanup ceph_flush_snaps()	Yan, Zheng	2016-07-28	1	-83/+102
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch devide __ceph_flush_snaps() into two stags. In the first stage, __ceph_flush_snaps() assign snapcaps flush TIDs and add them to cap flush lists. __ceph_flush_snaps() keeps holding the i_ceph_lock in this stagge. So inode's auth cap can not change. In the second stage, __ceph_flush_snaps() send flushsnap cap messages. i_ceph_lock is unlocked before sending each cap message. If auth cap changes in the middle, __ceph_flush_snaps() just stops. This is OK because kick_flushing_inode_caps() will re-send flushsnap cap messages to inode's new auth MDS. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: kick cap flushes before sending other cap message	Yan, Zheng	2016-07-28	1	-9/+34
\| \| \| \| \| \| \|	If ceph_check_caps() wants to send cap message to a recovering MDS, make sure it kicks cap flushes first. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: introduce an inode flag to indicates if snapflush is needed	Yan, Zheng	2016-07-28	1	-15/+33
\| \| \| \|	Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: avoid sending duplicated cap flush message	Yan, Zheng	2016-07-28	1	-1/+17
\| \| \| \| \| \| \|	make ceph_kick_flushing_caps() ignore inodes whose cap flushes have already been re-sent by ceph_early_kick_flushing_caps() Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: unify cap flush and snapcap flush	Yan, Zheng	2016-07-28	1	-135/+156
\| \| \| \| \| \| \| \| \| \| \| \|	This patch includes following changes - Assign flush tid to snapcap flush - Remove session's s_cap_snaps_flushing list. Add inode to session's s_cap_flushing list instead. Inode is removed from the list when there is no pending snapcap flush or cap flush. - make __kick_flushing_caps() re-send both snapcap flushes and cap flushes. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: use list instead of rbtree to track cap flushes	Yan, Zheng	2016-07-28	1	-89/+31
\| \| \| \| \| \| \| \|	We don't have requirement of searching cap flush by TID. In most cases, we just need to know TID of the oldest cap flush. List is ideal for this usage. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: update types of some local varibles	Yan, Zheng	2016-07-28	1	-11/+12
\| \| \| \|	Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: wait unsafe sync writes for evicting inode	Yan, Zheng	2016-07-28	1	-48/+2
\| \| \| \| \| \|	Otherwise ceph_sync_write_unsafe() may access/modify freed inode. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: reduce i_nr_by_mode array size	Yan, Zheng	2016-07-28	1	-14/+31
\| \| \| \| \| \| \|	Track usage count for individual fmode bit. This can reduce the array size by half. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: rados pool namespace support	Yan, Zheng	2016-07-28	1	-20/+26
\| \| \| \| \| \| \| \|	This patch adds codes that decode pool namespace information in cap message and request reply. Pool namespace is saved in i_layout, it will be passed to libceph when doing read/write. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	libceph: define new ceph_file_layout structure	Yan, Zheng	2016-07-28	1	-1/+4
\| \| \| \| \| \| \| \|	Define new ceph_file_layout structure and rename old ceph_file_layout to ceph_file_layout_legacy. This is preparation for adding namespace to ceph_file_layout structure. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: improve fscache revalidation	Yan, Zheng	2016-06-01	1	-10/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are several issues in fscache revalidation code. - In ceph_revalidate_work(), fscache_invalidate() is called when fscache_check_consistency() return 0. This is complete wrong because 0 means cache is valid. - Handle_cap_grant() calls ceph_queue_revalidate() if client already has CAP_FILE_CACHE. This code is confusing. Client should revalidate the cache each time it got CAP_FILE_CACHE anew. - In Handle_cap_grant(), fscache_invalidate() is called if MDS revokes CAP_FILE_CACHE. This is inconsistency with the case that inode get evicted. In the later case, the cache is not discarded. Client may use the cache when inode is reloaded. This patch moves the fscache revalidation into ceph_get_caps(). Client revalidates the cache after it gets CAP_FILE_CACHE. i_rdcache_gen should keep constance while CAP_FILE_CACHE is used. If i_fscache_gen is not equal to i_rdcache_gen, client needs to check cache's consistency. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: avoid unnecessary fscache invalidation/revlidation	Yan, Zheng	2016-06-01	1	-6/+3
\| \| \| \| \| \| \|	ceph_fill_file_size() has already called ceph_fscache_invalidate() if it return true. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: don't use truncate_pagecache() to invalidate read cache	Yan, Zheng	2016-05-26	1	-4/+4
\| \| \| \| \| \| \| \| \|	truncate_pagecache() drops dirty pages, it's dangerous to use it to invalidate read cache. Besides, we shouldn't start invalidating read cache while there are buffer writers. Because buffer writers may add dirty pages later. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: renew caps for read/write if mds session got killed.	Yan, Zheng	2016-05-26	1	-10/+33
\| \| \| \| \| \| \| \| \|	When mds session gets killed, read/write operation may hang. Client waits for Frw caps, but mds does not know what caps client wants. To recover this, client sends an open request to mds. The request will tell mds what caps client wants. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros	Kirill A. Shutemov	2016-04-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced long time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	ceph: encode ctime in cap message	Yan, Zheng	2016-03-25	1	-4/+7
\| \| \| \|	Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: initial CEPH_FEATURE_FS_FILE_LAYOUT_V2 support	Yan, Zheng	2016-03-04	1	-3/+24
\| \| \| \| \| \| \| \|	Add support for the format change of MClientReply/MclientCaps. Also add code that denies access to inodes with pool_ns layouts. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
*	wrappers for ->i_mutex access	Al Viro	2016-01-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	ceph: make fsync() wait unsafe requests that created/modified inode	Yan, Zheng	2015-11-02	1	-37/+34
\| \| \| \| \| \| \| \|	If we get a unsafe reply for request that created/modified inode, add the unsafe request to a list in the newly created/modified inode. So we can make fsync() wait these unsafe requests. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: don't invalidate page cache when inode is no longer used	Yan, Zheng	2015-11-02	1	-3/+2
\| \| \| \| \| \| \| \|	ceph_check_caps() invalidate page cache when inode is not used by any open file. This behaviour is not friendly for workload that repeatly read files. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: EIO all operations after forced umount	Yan, Zheng	2015-09-08	1	-0/+8
\| \| \| \| \| \| \| \| \|	This patch makes try_get_cap_refs() and __do_request() check if the file system was forced umount, and return -EIO if it was. This patch also adds a helper function to drops dirty caps and wakes up blocking operation. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: always re-send cap flushes when MDS recovers	Yan, Zheng	2015-07-31	1	-17/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit e548e9b93d3e565e42b938a99804114565be1f81 makes the kclient only re-send cap flush once during MDS failover. If the kclient sends a cap flush after MDS enters reconnect stage but before MDS recovers. The kclient will skip re-sending the same cap flush when MDS recovers. This causes problem for newly created inode. The MDS handles cap flushes before replaying unsafe requests, so it's possible that MDS find corresponding inode is missing when handling cap flush. The fix is reverting to old behaviour: always re-send when MDS recovers Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
*	ceph: rework dcache readdir	Yan, Zheng	2015-06-25	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously our dcache readdir code relies on that child dentries in directory dentry's d_subdir list are sorted by dentry's offset in descending order. When adding dentries to the dcache, if a dentry already exists, our readdir code moves it to head of directory dentry's d_subdir list. This design relies on dcache internals. Al Viro suggests using ncpfs's approach: keeping array of pointers to dentries in page cache of directory inode. the validity of those pointers are presented by directory inode's complete and ordered flags. When a dentry gets pruned, we clear directory inode's complete flag in the d_prune() callback. Before moving a dentry to other directory, we clear the ordered flag for both old and new directory. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: pre-allocate data structure that tracks caps flushing	Yan, Zheng	2015-06-25	1	-4/+22
\| \| \| \|	Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: re-send flushing caps (which are revoked) in reconnect stage	Yan, Zheng	2015-06-25	1	-4/+53
\| \| \| \| \| \| \| \|	if flushing caps were revoked, we should re-send the cap flush in client reconnect stage. This guarantees that MDS processes the cap flush message before issuing the flushing caps to other client. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: send TID of the oldest pending caps flush to MDS	Yan, Zheng	2015-06-25	1	-18/+49
\| \| \| \| \| \| \|	According to this information, MDS can trim its completed caps flush list (which is used to detect duplicated cap flush). Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: track pending caps flushing globally	Yan, Zheng	2015-06-25	1	-9/+41
\| \| \| \| \| \| \| \| \| \|	So we know TID of the oldest pending caps flushing. Later patch will send this information to MDS, so that MDS can trim its completed caps flush list. Tracking pending caps flushing globally also simplifies syncfs code. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: track pending caps flushing accurately	Yan, Zheng	2015-06-25	1	-85/+160
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we do not trace accurate TID for flushing caps. when MDS failovers, we have no choice but to re-send all flushing caps with a new TID. This can cause problem because MDS can has already flushed some caps and has issued the same caps to other client. The re-sent cap flush has a new TID, which makes MDS unable to detect if it has already processed the cap flush. This patch adds code to track pending caps flushing accurately. When re-sending cap flush is needed, we use its original flush TID. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: fix directory fsync	Yan, Zheng	2015-06-25	1	-9/+64
\| \| \| \| \| \| \| \|	fsync() on directory should flush dirty caps and wait for any uncommitted directory opertions to commit. But ceph_dir_fsync() only waits for uncommitted directory opertions. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: fix flushing caps	Yan, Zheng	2015-06-25	1	-24/+25
\| \| \| \| \| \| \| \| \| \|	Current ceph_fsync() only flushes dirty caps and wait for them to be flushed. It doesn't wait for caps that has already been flushing. This patch makes ceph_fsync() wait for pending flushing caps too. Besides, this patch also makes caps_are_flushed() peroperly handle tid wrapping. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: don't include used caps in cap_wanted	Yan, Zheng	2015-06-25	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	when copying files to cephfs, file data may stay in page cache after corresponding file is closed. Cached data use Fc capability. If we include Fc capability in cap_wanted, MDS will treat files with cached data as open files, and journal them in an EOpen event when trimming log segment. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: don't pre-allocate space for cap release messages	Yan, Zheng	2015-06-25	1	-56/+29
\| \| \| \| \| \| \| \| \| \| \|	Previously we pre-allocate cap release messages for each caps. This wastes lots of memory when there are large amount of caps. This patch make the code not pre-allocate the cap release messages. Instead, we add the corresponding ceph_cap struct to a list when releasing a cap. Later when flush cap releases is needed, we allocate the cap release messages dynamically. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: make sure syncfs flushes all cap snaps	Yan, Zheng	2015-06-25	1	-7/+11
\| \| \| \|	Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: take snap_rwsem when accessing snap realm's cached_context	Yan, Zheng	2015-06-25	1	-1/+3
\| \| \| \| \| \| \| \|	When ceph inode's i_head_snapc is NULL, __ceph_mark_dirty_caps() accesses snap realm's cached_context. So we need take read lock of snap_rwsem. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: avoid sending unnessesary FLUSHSNAP message	Yan, Zheng	2015-06-25	1	-31/+38
\| \| \| \| \| \| \| \| \|	when a snap notification contains no new snapshot, we can avoid sending FLUSHSNAP message to MDS. But we still need to create cap_snap in some case because it's required by write path and page writeback path Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: set i_head_snapc when getting CEPH_CAP_FILE_WR reference	Yan, Zheng	2015-06-25	1	-46/+109
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In most cases that snap context is needed, we are holding reference of CEPH_CAP_FILE_WR. So we can set ceph inode's i_head_snapc when getting the CEPH_CAP_FILE_WR reference, and make codes get snap context from i_head_snapc. This makes the code simpler. Another benefit of this change is that we can handle snap notification more elegantly. Especially when snap context is updated while someone else is doing write. The old queue cap_snap code may set cap_snap's context to ether the old context or the new snap context, depending on if i_head_snapc is set. The new queue capp_snap code always set cap_snap's context to the old snap context. Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	ceph: check OSD caps before read/write	Yan, Zheng	2015-06-25	1	-0/+4
\| \| \| \|	Signed-off-by: Yan, Zheng <zyan@redhat.com>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2015-04-26	1	-1/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fourth vfs update from Al Viro: "d_inode() annotations from David Howells (sat in for-next since before the beginning of merge window) + four assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RCU pathwalk breakage when running into a symlink overmounting something fix I_DIO_WAKEUP definition direct-io: only inc/dec inode->i_dio_count for file systems fs/9p: fix readdir() VFS: assorted d_backing_inode() annotations VFS: fs/inode.c helpers: d_inode() annotations VFS: fs/cachefiles: d_backing_inode() annotations VFS: fs library helpers: d_inode() annotations VFS: assorted weird filesystems: d_inode() annotations VFS: normal filesystems (and lustre): d_inode() annotations VFS: security/: d_inode() annotations VFS: security/: d_backing_inode() annotations VFS: net/: d_inode() annotations VFS: net/unix: d_backing_inode() annotations VFS: kernel/: d_inode() annotations VFS: audit: d_backing_inode() annotations VFS: Fix up some ->d_inode accesses in the chelsio driver VFS: Cachefiles should perform fs modifications on the top layer only VFS: AF_UNIX sockets should call mknod on the top layer only