summaryrefslogtreecommitdiffstats
path: root/fs/ceph/caps.c
Commit message (Collapse)AuthorAgeFilesLines
* ceph: defer flushing the capsnap if the Fb is usedXiubo Li2021-02-161-13/+20
| | | | | | | | | | If the Fb cap is used it means the current inode is flushing the dirty data to OSD, just defer flushing the capsnap. URL: https://tracker.ceph.com/issues/48640 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: allow queueing cap/snap handling after putting cap referencesJeff Layton2021-02-161-4/+25
| | | | | | | | | | | | | | | | Testing with the fscache overhaul has triggered some lockdep warnings about circular lock dependencies involving page_mkwrite and the mmap_lock. It'd be better to do the "real work" without the mmap lock being held. Change the skip_checking_caps parameter in __ceph_put_cap_refs to an enum, and use that to determine whether to queue check_caps, do it synchronously or not at all. Change ceph_page_mkwrite to do a ceph_put_cap_refs_async(). Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: fix flush_snap logic after putting capsJeff Layton2021-02-161-4/+6
| | | | | | | | | | | | | | | A primary reason for skipping ceph_check_caps after putting the references was to avoid the locking in ceph_check_caps during a reconnect. __ceph_put_cap_refs can still call ceph_flush_snaps in that case though, and that takes many of the same inconvenient locks. Fix the logic in __ceph_put_cap_refs to skip flushing snaps when the skip_checking_caps flag is set. Fixes: e64f44a88465 ("ceph: skip checking caps when session reconnecting and releasing reqs") Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: fix race in concurrent __ceph_remove_cap invocationsLuis Henriques2020-12-141-2/+9
| | | | | | | | | | | | | | | | | A NULL pointer dereference may occur in __ceph_remove_cap with some of the callbacks used in ceph_iterate_session_caps, namely trim_caps_cb and remove_session_caps_cb. Those callers hold the session->s_mutex, so they are prevented from concurrent execution, but ceph_evict_inode does not. Since the callers of this function hold the i_ceph_lock, the fix is simply a matter of returning immediately if caps->ci is NULL. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/43272 Suggested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Luis Henriques <lhenriques@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: fix up some warnings on W=1 buildsJeff Layton2020-12-141-7/+4
| | | | | | | | Convert some decodes into unused variables into skips, and fix up some non-kerneldoc comment headers to not start with "/**". Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: add new RECOVER mount_state when recovering sessionJeff Layton2020-12-141-1/+1
| | | | | | | | | | | | | | | | | When recovering a session (a'la recover_session=clean), we want to do all of the operations that we do on a forced umount, but changing the mount state to SHUTDOWN is can cause queued MDS requests to fail when the session comes back. Most of those can idle until the session is recovered in this situation. Reserve SHUTDOWN state for forced umount, and make a new RECOVER state for the forced reconnect situation. Change several tests for equality with SHUTDOWN to test for that or RECOVER. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: don't WARN when removing caps due to blocklistingJeff Layton2020-12-141-1/+2
| | | | | | | | | | | | We expect to remove dirty caps when the client is blocklisted. Don't throw a warning in that case. [ idryomov: break unnecessarily long line ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: check session state after bumping session->s_seqJeff Layton2020-11-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some messages sent by the MDS entail a session sequence number increment, and the MDS will drop certain types of requests on the floor when the sequence numbers don't match. In particular, a REQUEST_CLOSE message can cross with one of the sequence morphing messages from the MDS which can cause the client to stall, waiting for a response that will never come. Originally, this meant an up to 5s delay before the recurring workqueue job kicked in and resent the request, but a recent change made it so that the client would never resend, causing a 60s stall unmounting and sometimes a blockisting event. Add a new helper for incrementing the session sequence and then testing to see whether a REQUEST_CLOSE needs to be resent, and move the handling of CEPH_MDS_SESSION_CLOSING into that function. Change all of the bare sequence counter increments to use the new helper. Reorganize check_session_state with a switch statement. It should no longer be called when the session is CLOSING, so throw a warning if it ever is (but still handle that case sanely). [ idryomov: whitespace, pr_err() call fixup ] URL: https://tracker.ceph.com/issues/47563 Fixes: fa9967734227 ("ceph: fix potential mdsc use-after-free crash") Reported-by: Patrick Donnelly <pdonnell@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: comment cleanups and clarificationsJeff Layton2020-10-121-0/+16
| | | | | Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: break up send_cap_msgJeff Layton2020-10-121-32/+28
| | | | | | | | Push the allocation of the msg and the send into the caller. Rename the function to encode_cap_msg and make it void return. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: drop separate mdsc argument from __send_capJeff Layton2020-10-121-6/+5
| | | | | | | We can get it from the session if we need it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: metrics for opened files, pinned caps and opened inodesXiubo Li2020-10-121-2/+36
| | | | | | | | | | | | | | | | | | | | | | | | | In client for each inode, it may have many opened files and may have been pinned in more than one MDS servers. And some inodes are idle, which have no any opened files. This patch will show these metrics in the debugfs, likes: item total ----------------------------------------- opened files / total inodes 14 / 5 pinned i_caps / total inodes 7 / 5 opened inodes / total inodes 3 / 5 Will send these metrics to ceph, which will be used by the `fs top`, later. [ jlayton: drop unrelated hunk, count hashed inodes instead of allocated ones ] URL: https://tracker.ceph.com/issues/47005 Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: add ceph_sb_to_mdsc helper support to parse the mdscXiubo Li2020-10-121-2/+1
| | | | | | | | | | This will help simplify the code. [ jlayton: fix minor merge conflict in quota.c ] Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: fix inode number handling on arches with 32-bit ino_tJeff Layton2020-08-241-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tuan and Ulrich mentioned that they were hitting a problem on s390x, which has a 32-bit ino_t value, even though it's a 64-bit arch (for historical reasons). I think the current handling of inode numbers in the ceph driver is wrong. It tries to use 32-bit inode numbers on 32-bit arches, but that's actually not a problem. 32-bit arches can deal with 64-bit inode numbers just fine when userland code is compiled with LFS support (the common case these days). What we really want to do is just use 64-bit numbers everywhere, unless someone has mounted with the ino32 mount option. In that case, we want to ensure that we hash the inode number down to something that will fit in 32 bits before presenting the value to userland. Add new helper functions that do this, and only do the conversion before presenting these values to userland in getattr and readdir. The inode table hashvalue is changed to just cast the inode number to unsigned long, as low-order bits are the most likely to vary anyway. While it's not strictly required, we do want to put something in inode->i_ino. Instead of basing it on BITS_PER_LONG, however, base it on the size of the ino_t type. NOTE: This is a user-visible change on 32-bit arches: 1/ inode numbers will be seen to have changed between kernel versions. 32-bit arches will see large inode numbers now instead of the hashed ones they saw before. 2/ any really old software not built with LFS support may start failing stat() calls with -EOVERFLOW on inode numbers >2^32. Nothing much we can do about these, but hopefully the intersection of people running such code on ceph will be very small. The workaround for both problems is to mount with "-o ino32". [ idryomov: changelog tweak ] URL: https://tracker.ceph.com/issues/46828 Reported-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com> Reported-and-Tested-by: Tuan Hoang1 <Tuan.Hoang1@ibm.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: clean up and optimize ceph_check_delayed_caps()Jeff Layton2020-08-031-6/+4
| | | | | | | | | Make this loop look a bit more sane. Also optimize away the spinlock release/reacquire if we can't get an inode reference. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: add global total_caps to count the mdsc's total caps numberXiubo Li2020-08-031-0/+2
| | | | | | | | | This will help to reduce using the global mdsc->mutex lock in many places. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: skip checking caps when session reconnecting and releasing reqsXiubo Li2020-06-011-2/+13
| | | | | | | | | | | It make no sense to check the caps when reconnecting to mds. And for the async dirop caps, they will be put by its _cb() function, so when releasing the requests, it will make no sense too. URL: https://tracker.ceph.com/issues/45635 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: ceph_kick_flushing_caps needs the s_mutexJeff Layton2020-06-011-0/+2
| | | | | | | | | | | | | | | | The mdsc->cap_dirty_lock is not held while walking the list in ceph_kick_flushing_caps, which is not safe. ceph_early_kick_flushing_caps does something similar, but the s_mutex is held while it's called and I think that guards against changes to the list. Ensure we hold the s_mutex when calling ceph_kick_flushing_caps, and add some clarifying comments. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: request expedited service on session's last cap flushJeff Layton2020-06-011-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | When flushing a lot of caps to the MDS's at once (e.g. for syncfs), we can end up waiting a substantial amount of time for MDS replies, due to the fact that it may delay some of them so that it can batch them up together in a single journal transaction. This can lead to stalls when calling sync or syncfs. What we'd really like to do is request expedited service on the _last_ cap we're flushing back to the server. If the CHECK_CAPS_FLUSH flag is set on the request and the current inode was the last one on the session->s_cap_dirty list, then mark the request with CEPH_CLIENT_CAPS_SYNC. Note that this heuristic is not perfect. New inodes can race onto the list after we've started flushing, but it does seem to fix some common use cases. URL: https://tracker.ceph.com/issues/44744 Reported-by: Jan Fajerski <jfajerski@suse.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: convert mdsc->cap_dirty to a per-session listJeff Layton2020-06-011-13/+65
| | | | | | | | | | | | | | | | | | | | This is a per-sb list now, but that makes it difficult to tell when the cap is the last dirty one associated with the session. Switch this to be a per-session list, but continue using the mdsc->cap_dirty_lock to protect the lists. This list is only ever walked in ceph_flush_dirty_caps, so change that to walk the sessions array and then flush the caps for inodes on each session's list. If the auth cap ever changes while the inode has dirty caps, then move the inode to the appropriate session for the new auth_cap. Also, ensure that we never remove an auth cap while the inode is still on the s_cap_dirty list. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: reset i_requested_max_size if file write is not wantedYan, Zheng2020-06-011-10/+19
| | | | | | | | | | | | | | | | | | | | write can stuck at waiting for larger max_size in following sequence of events: - client opens a file and writes to position 'A' (larger than unit of max size increment) - client closes the file handle and updates wanted caps (not wanting file write caps) - client opens and truncates the file, writes to position 'A' again. At the 1st event, client set inode's requested_max_size to 'A'. At the 2nd event, mds removes client's writable range, but client does not reset requested_max_size. At the 3rd event, client does not request max size because requested_max_size is already larger than 'A'. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: fix potential race in ceph_check_capsJeff Layton2020-06-011-1/+13
| | | | | | | | | | | Nothing ensures that session will still be valid by the time we dereference the pointer. Take and put a reference. In principle, we should always be able to get a reference here, but throw a warning if that's ever not the case. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: don't take i_ceph_lock in handle_cap_importJeff Layton2020-06-011-3/+2
| | | | | | | | | Just take it before calling it. This means we have to do a couple of minor in-memory operations under the spinlock now, but those shouldn't be an issue. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: don't release i_ceph_lock in handle_cap_truncJeff Layton2020-06-011-8/+10
| | | | | | | | There's no reason to do this here. Just have the caller handle it. Also, add a lockdep assertion. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: add comments for handle_cap_flush_ack logicJeff Layton2020-06-011-1/+13
| | | | | Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: split up __finish_cap_flushJeff Layton2020-06-011-31/+29
| | | | | | | | | | | This function takes a mdsc argument or ci argument, but if both are passed in, it ignores the ci arg. Fortunately, nothing does that, but there's no good reason to have the same function handle both cases. Also, get rid of some branches and just use |= to set the wake_* vals. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: reorganize __send_cap for less spinlock abuseJeff Layton2020-06-011-79/+86
| | | | | | | | | | | | | Get rid of the __releases annotation by breaking it up into two functions: __prep_cap which is done under the spinlock and __send_cap that is done outside it. Add new fields to cap_msg_args for the wake boolean and old_xattr_buf pointer. Nothing checks the return value from __send_cap, so make it void return. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: add caps perf metric for each superblockXiubo Li2020-06-011-0/+19
| | | | | | | | | | | Count hits and misses in the caps cache. If the client has all of the necessary caps when a task needs references, then it's counted as a hit. Any other situation is a miss. URL: https://tracker.ceph.com/issues/43215 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: flush release queue when handling caps for unknown inodeJeff Layton2020-05-271-1/+1
| | | | | | | | | | | | | | | | | | It's possible for the VFS to completely forget about an inode, but for it to still be sitting on the cap release queue. If the MDS sends the client a cap message for such an inode, it just ignores it today, which can lead to a stall of up to 5s until the cap release queue is flushed. If we get a cap message for an inode that can't be located, then go ahead and flush the cap release queue. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/45532 Fixes: 1e9c2eb6811e ("ceph: delete stale dentry when last reference is dropped") Reported-and-Tested-by: Andrej Filipčič <andrej.filipcic@ijs.si> Suggested-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: fix double unlock in handle_cap_export()Wu Bo2020-05-041-0/+1
| | | | | | | | | | | If the ceph_mdsc_open_export_target_session() return fails, it will do a "goto retry", but the session mutex has already been unlocked. Re-lock the mutex in that case to ensure that we don't unlock it twice. Signed-off-by: Wu Bo <wubo40@huawei.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: fix special error code in ceph_try_get_caps()Wu Bo2020-05-041-1/+1
| | | | | | | | | | There are 3 speical error codes: -EAGAIN/-EFBIG/-ESTALE. After calling try_get_cap_refs, ceph_try_get_caps test for the -EAGAIN twice. Ensure that it tests for -ESTALE instead. Signed-off-by: Wu Bo <wubo40@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: wait for async creating inode before requesting new max sizeYan, Zheng2020-03-301-0/+5
| | | | | | | | | | | ceph_check_caps() can't request new max size for async creating inode. This may make ceph_get_caps() loop busily until getting reply of the async create. Also, wait for async creating reply before calling ceph_renew_caps(). Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: don't skip updating wanted caps when cap is staleYan, Zheng2020-03-301-2/+6
| | | | | | | | | | | | | | | | 1. try_get_cap_refs() fails to get caps and finds that mds_wanted does not include what it wants. It returns -ESTALE. 2. ceph_get_caps() calls ceph_renew_caps(). ceph_renew_caps() finds that inode has cap, so it calls ceph_check_caps(). 3. ceph_check_caps() finds that issued caps (without checking if it's stale) already includes caps wanted by open file, so it skips updating wanted caps. Above events can cause an infinite loop inside ceph_get_caps(). Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: request new max size only when there is auth capYan, Zheng2020-03-301-1/+1
| | | | | | | | | When there is no auth cap, check_max_size() can't do anything and may cause an infinite loop inside ceph_get_caps(). Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: cleanup return error of try_get_cap_refs()Yan, Zheng2020-03-301-11/+15
| | | | | | | | | | | | | | | | Returns 0 if caps were not able to be acquired (yet), 1 if cap acquisition succeeded, or a negative error code. There are 3 special error codes: -EAGAIN: need to sleep but non-blocking is specified -EFBIG: ask caller to call check_max_size() and try again. -ESTALE: ask caller to call ceph_renew_caps() and try again. [ jlayton: add WARN_ON_ONCE check for -EAGAIN ] Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: check all mds' caps after page writebackYan, Zheng2020-03-301-1/+1
| | | | | | | | | | | | | If an inode has caps from multiple mds's, the following can happen: - non-auth mds revokes Fsc. Fcb is used, so page writeback is queued. - when writeback finishes, ceph_check_caps() is called with auth only flag. ceph_check_caps() invalidates pagecache, but skips checking any non-auth caps. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: update i_requested_max_size only when sending cap msg to auth mdsYan, Zheng2020-03-301-1/+2
| | | | | | | | Non-auth mds can't do anything to 'update max' cap message. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: simplify calling of ceph_get_fmode()Yan, Zheng2020-03-301-23/+3
| | | | | | | | | | | | | | | | | | Originally, calling ceph_get_fmode() for open files is by thread that handles request reply. There is a small window between updating caps and and waking the request initiator. We need to prevent ceph_check_caps() from releasing wanted caps in the window. Previous patches made fill_inode() call __ceph_touch_fmode() for open file requests. This prevented ceph_check_caps() from releasing wanted caps for 'caps_wanted_delay_min' seconds, enough for request initiator to get woken up and call ceph_get_fmode(). This allows us to now call ceph_get_fmode() in ceph_open() instead. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: remove delay check logic from ceph_check_caps()Yan, Zheng2020-03-301-111/+37
| | | | | | | | | | __ceph_caps_file_wanted() already checks 'caps_wanted_delay_min' and 'caps_wanted_delay_max'. There is no need to duplicate the logic in ceph_check_caps() and __send_cap() Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: consider inode's last read/write when calculating wanted capsYan, Zheng2020-03-301-54/+129
| | | | | | | | | | | | | | | | | | | | | | | Add i_last_rd and i_last_wr to ceph_inode_info. These fields are used to track the last time the client acquired read/write caps for the inode. If there is no read/write on an inode for 'caps_wanted_delay_max' seconds, __ceph_caps_file_wanted() does not request caps for read/write even there are open files. Call __ceph_touch_fmode() for dir operations. __ceph_caps_file_wanted() calculates dir's wanted caps according to last dir read/modification. If there is recent dir read, dir inode wants CEPH_CAP_ANY_SHARED caps. If there is recent dir modification, also wants CEPH_CAP_FILE_EXCL. Readdir is a special case. Dir inode wants CEPH_CAP_FILE_EXCL after readdir, as with that, modifications do not need to release CEPH_CAP_FILE_SHARED or invalidate all dentry leases issued by readdir. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: always renew caps if mds_wanted is insufficientYan, Zheng2020-03-301-21/+15
| | | | | | | | | | | Original code only renews caps for inodes with CEPH_I_CAP_DROPPED flag, which indicates that mds has closed the session and caps were dropped. Remove this flag in preparation for not requesting caps for idle open files. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: cache layout in parent dir on first sync createJeff Layton2020-03-301-3/+10
| | | | | | | | | | | | | | | | If a create is done, then typically we'll end up writing to the file soon afterward. We don't want to wait for the reply before doing that when doing an async create, so that means we need the layout for the new file before we've gotten the response from the MDS. All files created in a directory will initially inherit the same layout, so copy off the requisite info from the first synchronous create in the directory, and save it in a new i_cached_layout field. Zero out the layout when we lose Dc caps in the dir. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: don't take refs to want mask unless we have all bitsYan, Zheng2020-03-301-1/+4
| | | | | | | | | If we don't have all of the cap bits for the want mask in try_get_cap_refs, then just take refs on the need bits. Signed-off-by: "Yan, Zheng" <ukernel@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: cap tracking for async directory operationsJeff Layton2020-03-301-8/+19
| | | | | | | | | | | | | | | Track and correctly handle directory caps for asynchronous operations. Add aliases for Frc caps that we now designate at Dcu caps (when dealing with directories). Unlike file caps, we don't reclaim these when the session goes away, and instead preemptively release them. In-flight async dirops are instead handled during reconnect phase. The client needs to re-do a synchronous operation in order to re-get directory caps. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: make __take_cap_refs non-staticJeff Layton2020-03-301-6/+6
| | | | | | | | | Rename it to ceph_take_cap_refs and make it available to other files. Also replace a comment with a lockdep assertion. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: add infrastructure for waiting for async create to completeJeff Layton2020-03-301-1/+12
| | | | | | | | | | | | | When we issue an async create, we must ensure that any later on-the-wire requests involving it wait for the create reply. Expand i_ceph_flags to be an unsigned long, and add a new bit that MDS requests can wait on. If the bit is set in the inode when sending caps, then don't send it and just return that it has been delayed. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: more caps.c lockdep assertionsJeff Layton2020-03-301-0/+3
| | | | | | Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: clean up kick_flushing_inode_caps()Jeff Layton2020-03-301-12/+9
| | | | | | | | | | | | | | | The last thing that this function does is release i_ceph_lock, so have the caller do that instead. Add a lockdep assertion to ensure that the function is always called with i_ceph_lock held. Change the prototype to take a ceph_inode_info pointer and drop the separate mdsc argument as we can get that from the session. While at it, make it non-static. We'll need this to kick any flushing caps once the create reply comes in. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: check inode type for CEPH_CAP_FILE_{CACHE,RD,REXTEND,LAZYIO}Yan, Zheng2020-03-301-12/+32
| | | | | | | | These bits will have new meaning for directory inodes. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: add refcounting for Fx capsJeff Layton2020-03-301-0/+7
| | | | | | | | | In future patches we'll be taking and relying on Fx caps. Add proper refcounting for them. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>