summaryrefslogtreecommitdiffstats
path: root/fs/ceph
Commit message (Collapse)AuthorAgeFilesLines
* Ceph: remove left-over reject fileLinus Torvalds2014-12-171-10/+0
| | | | | | | | | | | | Neither Sage nor I noticed that Zheng Yan had mistakenly committed fs/ceph/super.h.rej as part of commit 31c542a199d7 ("ceph: add inline data to pagecache"). Remove it. Requested-by: Yan, Zheng <ukernel@gmail.com> Cc: Sage Weil <sweil@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2014-12-1713-116/+712
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph updates from Sage Weil: "The big item here is support for inline data for CephFS and for message signatures from Zheng. There are also several bug fixes, including interrupted flock request handling, 0-length xattrs, mksnap, cached readdir results, and a message version compat field. Finally there are several cleanups from Ilya, Dan, and Markus. Note that there is another series coming soon that fixes some bugs in the RBD 'lingering' requests, but it isn't quite ready yet" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits) ceph: fix setting empty extended attribute ceph: fix mksnap crash ceph: do_sync is never initialized libceph: fixup includes in pagelist.h ceph: support inline data feature ceph: flush inline version ceph: convert inline data to normal data before data write ceph: sync read inline data ceph: fetch inline data when getting Fcr cap refs ceph: use getattr request to fetch inline data ceph: add inline data to pagecache ceph: parse inline data in MClientReply and MClientCaps libceph: specify position of extent operation libceph: add CREATE osd operation support libceph: add SETXATTR/CMPXATTR osd operations support rbd: don't treat CEPH_OSD_OP_DELETE as extent op ceph: remove unused stringification macros libceph: require cephx message signature by default ceph: introduce global empty snap context ceph: message versioning fixes ...
| * ceph: fix setting empty extended attributeYan, Zheng2014-12-171-2/+5
| | | | | | | | | | | | | | | | make sure 'value' is not null. otherwise __ceph_setxattr will remove the extended attribute. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
| * ceph: fix mksnap crashYan, Zheng2014-12-171-1/+3
| | | | | | | | | | | | | | | | | | mksnap reply only contain 'target', does not contain 'dentry'. So it's wrong to use req->r_reply_info.head->is_dentry to detect traceless reply. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
| * ceph: do_sync is never initializedDan Carpenter2014-12-171-1/+1
| | | | | | | | | | | | | | | | | | | | Probably this code was syncing a lot more often then intended because the do_sync variable wasn't set to zero. Cc: stable@vger.kernel.org # v3.11+ Fixes: c62988ec0910 ('ceph: avoid meaningless calling ceph_caps_revoking if sync_mode == WB_SYNC_ALL.') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
| * ceph: support inline data featureYan, Zheng2014-12-171-1/+2
| | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: flush inline versionYan, Zheng2014-12-173-4/+23
| | | | | | | | | | | | | | | | | | | | After converting inline data to normal data, client need to flush the new i_inline_version (CEPH_INLINE_NONE) to MDS. This commit makes cap messages (sent to MDS) contain inline_version and inline_data. Client always converts inline data to normal data before data write, so the inline data length part is always zero. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: convert inline data to normal data before data writeYan, Zheng2014-12-173-3/+161
| | | | | | | | | | | | | | | | | | | | Before any data write, convert inline data to normal data and set i_inline_version to CEPH_INLINE_NONE. The OSD request that saves inline data to object contains 3 operations (CMPXATTR, WRITE and SETXATTR). It compares a xattr named 'inline_version' to prevent old data overwrites newer data. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: sync read inline dataYan, Zheng2014-12-172-13/+116
| | | | | | | | | | | | | | | | we can't use getattr to fetch inline data while holding Fr cap, because it can cause deadlock. If we need to sync read inline data, drop cap refs first, then use getattr to fetch inline data. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: fetch inline data when getting Fcr cap refsYan, Zheng2014-12-173-18/+63
| | | | | | | | | | | | | | | | | | | | we can't use getattr to fetch inline data after getting Fcr caps, because it can cause deadlock. The solution is try bringing inline data to page cache when not holding any cap, and hope the inline data page is still there after getting the Fcr caps. If the page is still there, pin it in page cache for later IO. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: use getattr request to fetch inline dataYan, Zheng2014-12-173-10/+32
| | | | | | | | | | | | | | Add a new parameter 'locked_page' to ceph_do_getattr(). If inline data in getattr reply will be copied to the page. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: add inline data to pagecacheYan, Zheng2014-12-175-1/+84
| | | | | | | | | | | | | | Request reply and cap message can contain inline data. add inline data to the page cache if there is Fc cap. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: parse inline data in MClientReply and MClientCapsYan, Zheng2014-12-173-11/+36
| | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * libceph: specify position of extent operationYan, Zheng2014-12-172-6/+11
| | | | | | | | | | | | | | | | | | allow specifying position of extent operation in multi-operations osd request. This is required for cephfs to convert inline data to normal data (compare xattr, then write object). Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
| * ceph: remove unused stringification macrosIlya Dryomov2014-12-171-3/+0
| | | | | | | | | | | | These were used to report git versions a long time ago. Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
| * ceph: introduce global empty snap contextYan, Zheng2014-12-173-3/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current snaphost code does not properly handle moving inode from one empty snap realm to another empty snap realm. After changing inode's snap realm, some dirty pages' snap context can be not equal to inode's i_head_snap. This can trigger BUG() in ceph_put_wrbuffer_cap_refs() The fix is introduce a global empty snap context for all empty snap realm. This avoids triggering the BUG() for filesystem with no snapshot. Fixes: http://tracker.ceph.com/issues/9928 Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
| * ceph: message versioning fixesJohn Spray2014-12-171-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | There were two places we were assigning version in host byte order instead of network byte order. Also in MSG_CLIENT_SESSION we weren't setting compat_version in the header to reflect continued compatability with older MDSs. Fixes: http://tracker.ceph.com/issues/9945 Signed-off-by: John Spray <john.spray@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
| * libceph: message signature supportYan, Zheng2014-12-171-0/+16
| | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph, rbd: delete unnecessary checks before two function callsSF Markus Elfring2014-12-173-12/+6
| | | | | | | | | | | | | | | | | | | | | | | | The functions ceph_put_snap_context() and iput() test whether their argument is NULL and then return immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> [idryomov@redhat.com: squashed rbd.c hunk, changelog] Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
| * ceph: introduce a new inode flag indicating if cached dentries are orderedYan, Zheng2014-12-173-19/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | After creating/deleting/renaming file, offsets of sibling dentries may change. So we can not use cached dentries to satisfy readdir. But we can still use the cached dentries to conclude -ENOENT for lookup. This patch introduces a new inode flag indicating if child dentries are ordered. The flag is set at the same time marking a directory complete. After creating/deleting/renaming file, we clear the flag on directory inode. This prevents ceph_readdir() from using cached dentries to satisfy readdir syscall. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: fix file lock interruptionYan, Zheng2014-12-173-10/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | When a lock operation is interrupted, current code sends a unlock request to MDS to undo the lock operation. This method does not work as expected because the unlock request can drop locks that have already been acquired. The fix is use the newly introduced CEPH_LOCK_FCNTL_INTR/CEPH_LOCK_FLOCK_INTR requests to interrupt blocked file lock request. These requests do not drop locks that have alread been acquired, they only interrupt blocked file lock request. Signed-off-by: Yan, Zheng <zyan@redhat.com>
* | Merge branch 'iov_iter' into for-nextAl Viro2014-12-081-1/+1
|\|
| * ceph: fix flush tid comparisionYan, Zheng2014-11-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | TID of cap flush ack is 64 bits, but ceph_inode_info::flushing_cap_tid is only 16 bits. 16 bits should be plenty to let the cap flush updates pipeline appropriately, but we need to cast in the proper direction when comparing these differently-sized versions. So downcast the 64-bits one to 16 bits. Reflects ceph.git commit a5184cf46a6e867287e24aeb731634828467cd98. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
* | kill f_dentry usesAl Viro2014-11-192-4/+4
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | assorted conversions to %p[dD]Al Viro2014-11-194-41/+33
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | switch d_materialise_unique() users to d_splice_alias()Al Viro2014-11-191-1/+1
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | move d_rcu from overlapping d_child to overlapping d_aliasAl Viro2014-11-032-5/+5
|/ | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2014-10-1512-177/+386
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "There is the long-awaited discard support for RBD (Guangliang Zhao, Josh Durgin), a pile of RBD bug fixes that didn't belong in late -rc's (Ilya Dryomov, Li RongQing), a pile of fs/ceph bug fixes and performance and debugging improvements (Yan, Zheng, John Spray), and a smattering of cleanups (Chao Yu, Fabian Frederick, Joe Perches)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits) ceph: fix divide-by-zero in __validate_layout() rbd: rbd workqueues need a resque worker libceph: ceph-msgr workqueue needs a resque worker ceph: fix bool assignments libceph: separate multiple ops with commas in debugfs output libceph: sync osd op definitions in rados.h libceph: remove redundant declaration ceph: additional debugfs output ceph: export ceph_session_state_name function ceph: include the initial ACL in create/mkdir/mknod MDS requests ceph: use pagelist to present MDS request data libceph: reference counting pagelist ceph: fix llistxattr on symlink ceph: send client metadata to MDS ceph: remove redundant code for max file size verification ceph: remove redundant io_iter_advance() ceph: move ceph_find_inode() outside the s_mutex ceph: request xattrs if xattr_version is zero rbd: set the remaining discard properties to enable support rbd: use helpers to handle discard for layered images correctly ...
| * ceph: fix divide-by-zero in __validate_layout()Yan, Zheng2014-10-141-1/+1
| | | | | | | | | | | | The 'stripe_unit' field is 64 bits, casting it to 32 bits can result zero. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: fix bool assignmentsFabian Frederick2014-10-141-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix some coccinelle warnings: fs/ceph/caps.c:2400:6-10: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2401:6-15: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2402:6-17: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2403:6-22: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2404:6-22: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2405:6-19: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2440:4-20: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2469:3-16: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2490:2-18: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2519:3-7: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2549:3-12: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2575:2-6: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2589:3-7: WARNING: Assignment of bool to 0/1 Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
| * ceph: additional debugfs outputJohn Spray2014-10-142-0/+47
| | | | | | | | | | | | | | MDS session state and client global ID is useful instrumentation when testing. Signed-off-by: John Spray <john.spray@redhat.com>
| * ceph: export ceph_session_state_name functionJohn Spray2014-10-142-7/+9
| | | | | | | | | | | | | | ...so that it can be used from the ceph debugfs code when dumping session info. Signed-off-by: John Spray <john.spray@redhat.com>
| * ceph: include the initial ACL in create/mkdir/mknod MDS requestsYan, Zheng2014-10-144-47/+170
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Current code set new file/directory's initial ACL in a non-atomic manner. Client first sends request to MDS to create new file/directory, then set the initial ACL after the new file/directory is successfully created. The fix is include the initial ACL in create/mkdir/mknod MDS requests. So MDS can handle creating file/directory and setting the initial ACL in one request. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
| * ceph: use pagelist to present MDS request dataYan, Zheng2014-10-143-39/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | Current code uses page array to present MDS request data. Pages in the array are allocated/freed by caller of ceph_mdsc_do_request(). If request is interrupted, the pages can be freed while they are still being used by the request message. The fix is use pagelist to present MDS request data. Pagelist is reference counted. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
| * libceph: reference counting pagelistYan, Zheng2014-10-141-1/+0
| | | | | | | | | | | | | | this allow pagelist to present data that may be sent multiple times. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
| * ceph: fix llistxattr on symlinkYan, Zheng2014-10-141-2/+1
| | | | | | | | | | | | only regular file and directory have vxattrs. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: send client metadata to MDSJohn Spray2014-10-141-1/+70
| | | | | | | | | | | | | | | | | | | | Implement version 2 of CEPH_MSG_CLIENT_SESSION syntax, which includes additional client metadata to allow the MDS to report on clients by user-sensible names like hostname. Signed-off-by: John Spray <john.spray@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
| * ceph: remove redundant code for max file size verificationChao Yu2014-10-142-15/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Both ceph_update_writeable_page and ceph_setattr will verify file size with max size ceph supported. There are two caller for ceph_update_writeable_page, ceph_write_begin and ceph_page_mkwrite. For ceph_write_begin, we have already verified the size in generic_write_checks of ceph_write_iter; for ceph_page_mkwrite, we have no chance to change file size when mmap. Likewise we have already verified the size in inode_change_ok when we call ceph_setattr. So let's remove the redundant code for max file size verification. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
| * ceph: remove redundant io_iter_advance()Yan, Zheng2014-10-141-1/+0
| | | | | | | | | | | | | | ceph_sync_read and generic_file_read_iter() have already advanced the IO iterator. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: move ceph_find_inode() outside the s_mutexYan, Zheng2014-10-142-8/+10
| | | | | | | | | | | | | | | | | | ceph_find_inode() may wait on freeing inode, using it inside the s_mutex may cause deadlock. (the freeing inode is waiting for OSD read reply, but dispatch thread is blocked by the s_mutex) Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
| * ceph: request xattrs if xattr_version is zeroYan, Zheng2014-10-145-30/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Following sequence of events can happen. - Client releases an inode, queues cap release message. - A 'lookup' reply brings the same inode back, but the reply doesn't contain xattrs because MDS didn't receive the cap release message and thought client already has up-to-data xattrs. The fix is force sending a getattr request to MDS if xattrs_version is 0. The getattr mask is set to CEPH_STAT_CAP_XATTR, so MDS knows client does not have xattr. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: make sure request isn't in any waiting list when kicking request.Yan, Zheng2014-10-141-0/+1
| | | | | | | | | | | | | | we may corrupt waiting list if a request in the waiting list is kicked. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
| * ceph: protect kick_requests() with mdsc->mutexYan, Zheng2014-10-141-2/+3
| | | | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
| * ceph: trim unused inodes before reconnecting to recovering MDSYan, Zheng2014-10-141-10/+13
| | | | | | | | | | | | | | So the recovering MDS does not need to fetch these ununsed inodes during cache rejoin. This may reduce MDS recovery time. Signed-off-by: Yan, Zheng <zyan@redhat.com>
* | vfs: Remove d_drop calls from d_revalidate implementationsEric W. Biederman2014-10-091-1/+0
|/ | | | | | | | | | | | | | Now that d_invalidate always succeeds it is not longer necessary or desirable to hard code d_drop calls into filesystem specific d_revalidate implementations. Remove the unnecessary d_drop calls and rely on d_invalidate to drop the dentries. Using d_invalidate ensures that paths to mount points will not be dropped. Reviewed-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2014-08-135-17/+43
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "There is a lot of refactoring and hardening of the libceph and rbd code here from Ilya that fix various smaller bugs, and a few more important fixes with clone overlap. The main fix is a critical change to the request_fn handling to not sleep that was exposed by the recent mutex changes (which will also go to the 3.16 stable series). Yan Zheng has several fixes in here for CephFS fixing ACL handling, time stamps, and request resends when the MDS restarts. Finally, there are a few cleanups from Himangi Saraogi based on Coccinelle" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (39 commits) libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly rbd: remove extra newlines from rbd_warn() messages rbd: allocate img_request with GFP_NOIO instead GFP_ATOMIC rbd: rework rbd_request_fn() ceph: fix kick_requests() ceph: fix append mode write ceph: fix sizeof(struct tYpO *) typo ceph: remove redundant memset(0) rbd: take snap_id into account when reading in parent info rbd: do not read in parent info before snap context rbd: update mapping size only on refresh rbd: harden rbd_dev_refresh() and callers a bit rbd: split rbd_dev_spec_update() into two functions rbd: remove unnecessary asserts in rbd_dev_image_probe() rbd: introduce rbd_dev_header_info() rbd: show the entire chain of parent images ceph: replace comma with a semicolon rbd: use rbd_segment_name_free() instead of kfree() ceph: check zero length in ceph_sync_read() ceph: reset r_resend_mds after receiving -ESTALE ...
| * ceph: fix kick_requests()Yan, Zheng2014-08-071-2/+3
| | | | | | | | | | | | | | __do_request() may unregister the request. So we should update iterator 'p' before calling __do_request() Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
| * ceph: fix append mode writeYan, Zheng2014-07-281-6/+5
| | | | | | | | | | | | | | generic_write_checks() may update 'pos', so we need to pass 'pos' to ceph_sync_write() and ceph_sync_direct_write(); Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
| * ceph: fix sizeof(struct tYpO *) typoIlya Dryomov2014-07-281-1/+1
| | | | | | | | | | | | | | | | struct ceph_xattr -> struct ceph_inode_xattr Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
| * ceph: remove redundant memset(0)Ilya Dryomov2014-07-281-1/+1
| | | | | | | | | | | | | | | | xattrs array of pointers is allocated with kcalloc() - no need to memset() it to 0 right after that. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>