summaryrefslogtreecommitdiffstats
path: root/fs/ceph
Commit message (Collapse)AuthorAgeFilesLines
...
| * | ceph: fix error handling of start_read()Yan, Zheng2016-10-031-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If start_page() fails to add a page to page cache or fails to send OSD request. It should cal put_page() (instead of free_page()) for relevant pages. Besides, start_page() need to cancel fscache readpage if it fails to send OSD request. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reported-by: Zhi Zhang <zhang.david2011@gmail.com>
* | | Merge remote-tracking branch 'jk/vfs' into work.miscAl Viro2016-10-082-12/+18
|\ \ \ | |_|/ |/| |
| * | fs: Give dentry to inode_change_ok() instead of inodeJan Kara2016-09-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | inode_change_ok() will be resposible for clearing capabilities and IMA extended attributes and as such will need dentry. Give it as an argument to inode_change_ok() instead of an inode. Also rename inode_change_ok() to setattr_prepare() to better relect that it does also some modifications in addition to checks. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
| * | ceph: Propagate dentry down to inode_change_ok()Jan Kara2016-09-222-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To avoid clearing of capabilities or security related extended attributes too early, inode_change_ok() will need to take dentry instead of inode. ceph_setattr() has the dentry easily available but __ceph_setattr() is also called from ceph_set_acl() where dentry is not easily available. Luckily that call path does not need inode_change_ok() to be called anyway. So reorganize functions a bit so that inode_change_ok() is called only from paths where dentry is available. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * | posix_acl: Clear SGID bit when setting file permissionsJan Kara2016-09-221-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When file permissions are modified via chmod(2) and the user is not in the owning group or capable of CAP_FSETID, the setgid bit is cleared in inode_change_ok(). Setting a POSIX ACL via setxattr(2) sets the file permissions as well as the new ACL, but doesn't clear the setgid bit in a similar way; this allows to bypass the check in chmod(2). Fix that. References: CVE-2016-7097 Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* | | ceph: do not modify fi->frag in need_reset_readdir()Nicolas Iooss2016-09-051-1/+1
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset") modified "if (fpos_frag(new_pos) != fi->frag)" to "if (fi->frag |= fpos_frag(new_pos))" in need_reset_readdir(), thus replacing a comparison operator with an assignment one. This looks like a typo which is reported by clang when building the kernel with some warning flags: fs/ceph/dir.c:600:22: error: using the result of an assignment as a condition without parentheses [-Werror,-Wparentheses] } else if (fi->frag |= fpos_frag(new_pos)) { ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~ fs/ceph/dir.c:600:22: note: place parentheses around the assignment to silence this warning } else if (fi->frag |= fpos_frag(new_pos)) { ^ ( ) fs/ceph/dir.c:600:22: note: use '!=' to turn this compound assignment into an inequality comparison } else if (fi->frag |= fpos_frag(new_pos)) { ^~ != Fixes: f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset") Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | ceph: initialize pathbase in the !dentry case in encode_caps_cb()Ilya Dryomov2016-08-091-0/+1
| | | | | | | | | | | | | | | | pathbase is the base inode; set it to 0 if we've got no path. Coverity-id: 146348 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
* | ceph: fix null pointer dereference in ceph_flush_snaps()Yan, Zheng2016-08-081-1/+4
|/ | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
* Merge tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds2016-08-0213-752/+999
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull Ceph updates from Ilya Dryomov: "The highlights are: - RADOS namespace support in libceph and CephFS (Zheng Yan and myself). The stopgaps added in 4.5 to deny access to inodes in namespaces are removed and CEPH_FEATURE_FS_FILE_LAYOUT_V2 feature bit is now fully supported - A large rework of the MDS cap flushing code (Zheng Yan) - Handle some of ->d_revalidate() in RCU mode (Jeff Layton). We were overly pessimistic before, bailing at the first sight of LOOKUP_RCU On top of that we've got a few CephFS bug fixes, a couple of cleanups and Arnd's workaround for a weird genksyms issue" * tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client: (34 commits) ceph: fix symbol versioning for ceph_monc_do_statfs ceph: Correctly return NXIO errors from ceph_llseek ceph: Mark the file cache as unreclaimable ceph: optimize cap flush waiting ceph: cleanup ceph_flush_snaps() ceph: kick cap flushes before sending other cap message ceph: introduce an inode flag to indicates if snapflush is needed ceph: avoid sending duplicated cap flush message ceph: unify cap flush and snapcap flush ceph: use list instead of rbtree to track cap flushes ceph: update types of some local varibles ceph: include 'follows' of pending snapflush in cap reconnect message ceph: update cap reconnect message to version 3 ceph: mount non-default filesystem by name libceph: fsmap.user subscription support ceph: handle LOOKUP_RCU in ceph_d_revalidate ceph: allow dentry_lease_is_valid to work under RCU walk ceph: clear d_fsinfo pointer under d_lock ceph: remove ceph_mdsc_lease_release ceph: don't use ->d_time ...
| * ceph: Correctly return NXIO errors from ceph_llseekPhil Turnbull2016-07-281-7/+5
| | | | | | | | | | | | | | | | | | ceph_llseek does not correctly return NXIO errors because the 'out' path always returns 'offset'. Fixes: 06222e491e66 ("fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek") Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: Mark the file cache as unreclaimableNikolay Borisov2016-07-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ceph creates multiple caches with the SLAB_RECLAIMABLE flag set, so that it can satisfy its internal needs. Inspecting the code shows that most of the caches are indeed reclaimable since they are directly related to the generic inode/dentry shrinkers. However, one of the cache used to satisfy struct file is not reclaimable since its entries are freed only when the last reference to the file is dropped. If a heavily loaded node opens a lot of files it can introduce non-trivial discrepancies between memory shown as reclaimable and what is actually reclaimed when drop_caches is used. Fix this by removing the reclaimable flag for the file's cache. Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: optimize cap flush waitingYan, Zheng2016-07-283-27/+73
| | | | | | | | | | | | | | | | | | | | | | | | Add a 'wake' flag to ceph_cap_flush struct, which indicates if there is someone waiting for it to finish. When getting flush ack message, we check the 'wake' flag in corresponding ceph_cap_flush struct to decide if we should wake up waiters. One corner case is that the acked cap flush has 'wake' flags is set, but it is not the first one on the flushing list. We do not wake up waiters in this case, set 'wake' flags of preceding ceph_cap_flush struct instead Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: cleanup ceph_flush_snaps()Yan, Zheng2016-07-283-88/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch devide __ceph_flush_snaps() into two stags. In the first stage, __ceph_flush_snaps() assign snapcaps flush TIDs and add them to cap flush lists. __ceph_flush_snaps() keeps holding the i_ceph_lock in this stagge. So inode's auth cap can not change. In the second stage, __ceph_flush_snaps() send flushsnap cap messages. i_ceph_lock is unlocked before sending each cap message. If auth cap changes in the middle, __ceph_flush_snaps() just stops. This is OK because kick_flushing_inode_caps() will re-send flushsnap cap messages to inode's new auth MDS. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: kick cap flushes before sending other cap messageYan, Zheng2016-07-281-9/+34
| | | | | | | | | | | | | | If ceph_check_caps() wants to send cap message to a recovering MDS, make sure it kicks cap flushes first. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: introduce an inode flag to indicates if snapflush is neededYan, Zheng2016-07-283-15/+36
| | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: avoid sending duplicated cap flush messageYan, Zheng2016-07-282-1/+18
| | | | | | | | | | | | | | make ceph_kick_flushing_caps() ignore inodes whose cap flushes have already been re-sent by ceph_early_kick_flushing_caps() Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: unify cap flush and snapcap flushYan, Zheng2016-07-285-222/+175
| | | | | | | | | | | | | | | | | | | | | | | | This patch includes following changes - Assign flush tid to snapcap flush - Remove session's s_cap_snaps_flushing list. Add inode to session's s_cap_flushing list instead. Inode is removed from the list when there is no pending snapcap flush or cap flush. - make __kick_flushing_caps() re-send both snapcap flushes and cap flushes. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: use list instead of rbtree to track cap flushesYan, Zheng2016-07-285-118/+56
| | | | | | | | | | | | | | | | We don't have requirement of searching cap flush by TID. In most cases, we just need to know TID of the oldest cap flush. List is ideal for this usage. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: update types of some local variblesYan, Zheng2016-07-281-11/+12
| | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: include 'follows' of pending snapflush in cap reconnect messageYan, Zheng2016-07-281-1/+16
| | | | | | | | | | | | | | This helps the recovering MDS to reconstruct the internal states that tracking pending snapflush. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: update cap reconnect message to version 3Yan, Zheng2016-07-281-21/+47
| | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: mount non-default filesystem by nameYan, Zheng2016-07-284-17/+117
| | | | | | | | | | | | | | | | | | | | | | To mount non-default filesytem, user currently needs to provide mds namespace ID. This is inconvenience. This patch makes user be able to mount filesystem by name. If user wants to mount non-default filesystem. Client first subscribes to fsmap.user. Subscribe to mdsmap.<ID> after getting ID of filesystem. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: handle LOOKUP_RCU in ceph_d_revalidateJeff Layton2016-07-281-6/+14
| | | | | | | | | | | | | | | | | | We can now handle the snapshot cases under RCU, as well as the non-snapshot case when we don't need to queue up a lease renewal allow LOOKUP_RCU walks to proceed under those conditions. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
| * ceph: allow dentry_lease_is_valid to work under RCU walkJeff Layton2016-07-281-15/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Under rcuwalk, we need to take extra care when dereferencing d_parent. We want to do that once and pass a pointer to dentry_lease_is_valid. Also, we must ensure that that function can handle the case where we're racing with d_release. Check whether "di" is NULL under the d_lock, and just return 0 if so. Finally, we still need to kick off a renewal job if the lease is getting close to expiration. If that's the case, then just drop out of rcuwalk mode since that could block. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
| * ceph: clear d_fsinfo pointer under d_lockJeff Layton2016-07-281-1/+5
| | | | | | | | | | | | | | | | | | | | | | To check for a valid dentry lease, we need to get at the ceph_dentry_info. Under rcuwalk though, we may end up with a dentry that is on its way to destruction. Since we need to take the d_lock in dentry_lease_is_valid already, we can just ensure that we clear the d_fsinfo pointer out under the same lock before destroying it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
| * ceph: remove ceph_mdsc_lease_releaseJeff Layton2016-07-282-45/+0
| | | | | | | | | | | | | | Nothing calls it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
| * ceph: don't use ->d_timeMiklos Szeredi2016-07-284-8/+8
| | | | | | | | | | | | | | Pretty simple: just use ceph_dentry_info.time instead (which was already there, unused). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * ceph: fix spelling mistake: "resgister" -> "register"Colin Ian King2016-07-281-1/+1
| | | | | | | | | | | | trivial fix to spelling mistake in pr_err message Signed-off-by: Colin Ian King <colin.king@canonical.com>
| * ceph: fix NULL dereference in ceph_queue_cap_snap()Yan, Zheng2016-07-281-1/+1
| | | | | | | | | | | | old_snapc->seq is used in dout(...) Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: wait unsafe sync writes for evicting inodeYan, Zheng2016-07-285-48/+61
| | | | | | | | | | | | Otherwise ceph_sync_write_unsafe() may access/modify freed inode. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: fix use-after-free bug in ceph_direct_read_write()Yan, Zheng2016-07-281-2/+5
| | | | | | | | | | | | | | ceph_aio_complete() can free the ceph_aio_request struct before the code exits the while loop. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: reduce i_nr_by_mode array sizeYan, Zheng2016-07-284-22/+35
| | | | | | | | | | | | | | Track usage count for individual fmode bit. This can reduce the array size by half. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: set user pages dirty after direct IO readYan, Zheng2016-07-281-2/+2
| | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: rados pool namespace supportYan, Zheng2016-07-288-77/+159
| | | | | | | | | | | | | | | | This patch adds codes that decode pool namespace information in cap message and request reply. Pool namespace is saved in i_layout, it will be passed to libceph when doing read/write. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * libceph: rados pool namespace supportYan, Zheng2016-07-281-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Add pool namesapce pointer to struct ceph_file_layout and struct ceph_object_locator. Pool namespace is used by when mapping object to PG, it's also used when composing OSD request. The namespace pointer in struct ceph_file_layout is RCU protected. So libceph can read namespace without taking lock. Signed-off-by: Yan, Zheng <zyan@redhat.com> [idryomov@gmail.com: ceph_oloc_destroy(), misc minor changes] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: define new ceph_file_layout structureYan, Zheng2016-07-287-45/+43
| | | | | | | | | | | | | | | | Define new ceph_file_layout structure and rename old ceph_file_layout to ceph_file_layout_legacy. This is preparation for adding namespace to ceph_file_layout structure. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * libceph: add an ONSTACK initializer for oidsIlya Dryomov2016-07-281-1/+1
| | | | | | | | | | | | | | | | | | | | An on-stack oid in ceph_ioctl_get_dataloc() is not initialized, resulting in a WARN and a NULL pointer dereference later on. We will have more of these on-stack in the future, so fix it with a convenience macro. Fixes: d30291b985d1 ("libceph: variable-sized ceph_object_id") Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | Merge branch 'salted-string-hash'Linus Torvalds2016-07-282-3/+3
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This changes the vfs dentry hashing to mix in the parent pointer at the _beginning_ of the hash, rather than at the end. That actually improves both the hash and the code generation, because we can move more of the computation to the "static" part of the dcache setup, and do less at lookup runtime. It turns out that a lot of other hash users also really wanted to mix in a base pointer as a 'salt' for the hash, and so the slightly extended interface ends up working well for other cases too. Users that want a string hash that is purely about the string pass in a 'salt' pointer of NULL. * merge branch 'salted-string-hash': fs/dcache.c: Save one 32-bit multiply in dcache lookup vfs: make the string hashes salt the hash
| * vfs: make the string hashes salt the hashLinus Torvalds2016-06-102-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We always mixed in the parent pointer into the dentry name hash, but we did it late at lookup time. It turns out that we can simplify that lookup-time action by salting the hash with the parent pointer early instead of late. A few other users of our string hashes also wanted to mix in their own pointers into the hash, and those are updated to use the same mechanism. Hash users that don't have any particular initial salt can just use the NULL pointer as a no-salt. Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Use the right predicate in ->atomic_open() instancesAl Viro2016-07-051-1/+1
| | | | | | | | | | | | | | | | ->atomic_open() can be given an in-lookup dentry *or* a negative one found in dcache. Use d_in_lookup() to tell one from another, rather than d_unhashed(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'for-linus' of ↵Linus Torvalds2016-07-011-7/+3
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Tmpfs readdir throughput regression fix (this cycle) + some -stable fodder all over the place. One missing bit is Miklos' tonight locks.c fix - NFS folks had already grabbed that one by the time I woke up ;-)" [ The locks.c fix came through the nfsd tree just moments ago ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: namespace: update event counter when umounting a deleted dentry 9p: use file_dentry() ceph: fix d_obtain_alias() misuses lockless next_positive() libfs.c: new helper - next_positive() dcache_{readdir,dir_lseek}(): don't bother with nested ->d_lock
| * ceph: fix d_obtain_alias() misusesAl Viro2016-06-241-7/+3
| | | | | | | | | | | | on failure d_obtain_alias() will have done iput() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | ceph: use i_version to check validity of fscacheYan, Zheng2016-06-011-0/+3
| | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
* | ceph: improve fscache revalidationYan, Zheng2016-06-014-83/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are several issues in fscache revalidation code. - In ceph_revalidate_work(), fscache_invalidate() is called when fscache_check_consistency() return 0. This is complete wrong because 0 means cache is valid. - Handle_cap_grant() calls ceph_queue_revalidate() if client already has CAP_FILE_CACHE. This code is confusing. Client should revalidate the cache each time it got CAP_FILE_CACHE anew. - In Handle_cap_grant(), fscache_invalidate() is called if MDS revokes CAP_FILE_CACHE. This is inconsistency with the case that inode get evicted. In the later case, the cache is not discarded. Client may use the cache when inode is reloaded. This patch moves the fscache revalidation into ceph_get_caps(). Client revalidates the cache after it gets CAP_FILE_CACHE. i_rdcache_gen should keep constance while CAP_FILE_CACHE is used. If i_fscache_gen is not equal to i_rdcache_gen, client needs to check cache's consistency. Signed-off-by: Yan, Zheng <zyan@redhat.com>
* | ceph: disable fscache when inode is opened for writeYan, Zheng2016-06-014-53/+52
| | | | | | | | | | | | | | | | All other filesystems do not add dirty pages to fscache. They all disable fscache when inode is opened for write. Only ceph adds dirty pages to fscache, but the code is buggy. Signed-off-by: Yan, Zheng <zyan@redhat.com>
* | ceph: avoid unnecessary fscache invalidation/revlidationYan, Zheng2016-06-011-6/+3
| | | | | | | | | | | | | | ceph_fill_file_size() has already called ceph_fscache_invalidate() if it return true. Signed-off-by: Yan, Zheng <zyan@redhat.com>
* | ceph: call __fscache_uncache_page() if readpages failsYan, Zheng2016-06-011-1/+3
| | | | | | | | | | | | If readpages fails, fscache needs to cleanup its internal state. Signed-off-by: Yan, Zheng <zyan@redhat.com>
* | libceph: change ceph_osdmap_flag() to take osdcIlya Dryomov2016-05-301-4/+4
|/ | | | | | | For the benefit of every single caller, take osdc instead of map. Also, now that osdc->osdmap can't ever be NULL, drop the check. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* Merge branch 'for-linus' of ↵Linus Torvalds2016-05-271-3/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Followups to the parallel lookup work: - update docs - restore killability of the places that used to take ->i_mutex killably now that we have down_write_killable() merged - Additionally, it turns out that I missed a prerequisite for security_d_instantiate() stuff - ->getxattr() wasn't the only thing that could be called before dentry is attached to inode; with smack we needed the same treatment applied to ->setxattr() as well" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch ->setxattr() to passing dentry and inode separately switch xattr_handler->set() to passing dentry and inode separately restore killability of old mutex_lock_killable(&inode->i_mutex) users add down_write_killable_nested() update D/f/directory-locking
| * switch xattr_handler->set() to passing dentry and inode separatelyAl Viro2016-05-271-3/+4
| | | | | | | | | | | | | | preparation for similar switch in ->setxattr() (see the next commit for rationale). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>