linux-stable.git - Linux kernel stable tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'work.misc' of ↵	Linus Torvalds	2017-07-08	10	-46/+61
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc filesystem updates from Al Viro: "Assorted normal VFS / filesystems stuff..." * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: dentry name snapshots Make statfs properly return read-only state after emergency remount fs/dcache: init in_lookup_hashtable minix: Deinline get_block, save 2691 bytes fs: Reorder inode_owner_or_capable() to avoid needless fs: warn in case userspace lied about modprobe return
\| *	dentry name snapshots	Al Viro	2017-07-07	6	-42/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	take_dentry_name_snapshot() takes a safe snapshot of dentry name; if the name is a short one, it gets copied into caller-supplied structure, otherwise an extra reference to external name is grabbed (those are never modified). In either case the pointer to stable string is stored into the same structure. dentry must be held by the caller of take_dentry_name_snapshot(), but may be freely dropped afterwards - the snapshot will stay until destroyed by release_dentry_name_snapshot(). Intended use: struct name_snapshot s; take_dentry_name_snapshot(&s, dentry); ... access s.name ... release_dentry_name_snapshot(&s); Replaces fsnotify_oldname_...(), gets used in fsnotify to obtain the name to pass down with event. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| *	Make statfs properly return read-only state after emergency remount	Carlos Maiolino	2017-06-29	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Emergency remount (sysrq-u) sets MS_RDONLY to the superblock but doesn't set MNT_READONLY to the mount point. Once calculate_f_flags() only check for the mount point read only state, when setting kstatfs flags, after an emergency remount, statfs does not report the filesystem as read-only, even though it is. Enable flags_by_sb() to also check for superblock read only state, so the kstatfs and consequently statfs can properly show the read-only state of the filesystem. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| *	fs/dcache: init in_lookup_hashtable	Sebastian Andrzej Siewior	2017-06-29	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in_lookup_hashtable was introduced in commit 94bdd655caba ("parallel lookups machinery, part 3") and never initialized but since it is in the data it is all zeros. But we need this for -RT. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| *	minix: Deinline get_block, save 2691 bytes	Denys Vlasenko	2017-06-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This function compiles to 1402 bytes of machine code. It has 2 callsites, and also a not-inlined copy gets created by compiler anyway since its address gets passed as a parameter to block_truncate_page(). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| *	fs: Reorder inode_owner_or_capable() to avoid needless	Kees Cook	2017-06-29	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Checking for capabilities should be the last operation when performing access control tests so that PF_SUPERPRIV is set only when it was required for success (implying that the capability was needed for the operation). Reported-by: Solar Designer <solar@openwall.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| *	fs: warn in case userspace lied about modprobe return	Luis R. Rodriguez	2017-06-29	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	kmod <= v19 was broken -- it could return 0 to modprobe calls, incorrectly assuming that a kernel module was built-in, whereas in reality the module was just forming in the kernel. The reason for this is an incorrect userspace heuristics. A userspace kmod fix is available for it [0], however should userspace break again we could go on with an failed get_fs_type() which is hard to debug as the request_module() is detected as returning 0. The first suspect would be that there is something worth with the kernel's module loader and obviously in this case that is not the issue. Since these issues are painful to debug complain when we know userspace has outright lied to us. [0] http://git.kernel.org/cgit/utils/kernel/kmod/kmod.git/commit/libkmod/libkmod-module.c?id=fd44a98ae2eb5eb32161088954ab21e58e19dfc4 Suggested-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Jessica Yu <jeyu@redhat.com> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	Merge branch 'for-spi' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	Linus Torvalds	2017-07-08	1	-31/+11
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull spi uaccess delousing from Al Viro: "Getting rid of pointless __get_user() and friends in drivers/spi. [ the only reason it's on a separate branch is that I hoped it would be picked by spi folks; looks like mail asking them to grab it got lost and I hadn't followed up on that ]" * 'for-spi' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: spidev: quit messing with access_ok()
\| * \|	spidev: quit messing with access_ok()	Al Viro	2017-06-29	1	-31/+11
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \| \|	Merge branch 'work.__copy_in_user' of ↵	Linus Torvalds	2017-07-08	2	-16/+9
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull __copy_in_user removal from Al Viro: "There used to be 6 places in the entire tree calling __copy_in_user(), all of them bogus. Four got killed off in work.drm branch, this takes care of the remaining ones and kills the definition of that sucker" * 'work.__copy_in_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: kill __copy_in_user() sanitize do_i2c_smbus_ioctl()
\| * \| \|	kill __copy_in_user()	Al Viro	2017-07-04	1	-6/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	no users left Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \| \|	Merge branch 'work.drm' into work.__copy_in_user	Al Viro	2017-07-04	12	-1045/+476
\| \|\ \ \
\| * \| \| \|	sanitize do_i2c_smbus_ioctl()	Al Viro	2017-05-25	1	-10/+9
\| \| \|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	no need to mess with __copy_in_user() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \| \| \|	Merge branch 'work.read_write' of ↵	Linus Torvalds	2017-07-07	1	-2/+4
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull read/write fix from Al Viro: "file_start_write()/file_end_write() got mixed into vfs_iter_write() by accident; that's a deadlock for all existing callers - they already do that, some - quite a bit outside. Easily fixed, fortunately" * 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: move file_{start,end}_write() out of do_iter_write()
\| * \| \| \|	move file_{start,end}_write() out of do_iter_write()	Al Viro	2017-07-06	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	... and do not grab it in vfs_write_iter(). Fixes: "fs: implement vfs_iter_read using do_iter_read" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \| \| \| \|	Merge branch 'uaccess-work.iov_iter' of ↵	Linus Torvalds	2017-07-07	5	-75/+178
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull iov_iter hardening from Al Viro: "This is the iov_iter/uaccess/hardening pile. For one thing, it trims the inline part of copy_to_user/copy_from_user to the minimum that does need to be inlined - object size checks, basically. For another, it sanitizes the checks for iov_iter primitives. There are 4 groups of checks: access_ok(), might_fault(), object size and KASAN. - access_ok() had been verified by whoever had set the iov_iter up. However, that has happened in a function far away, so proving that there's no path to actual copying bypassing those checks is hard and proving that iov_iter has not been buggered in the meanwhile is also not pleasant. So we want those redone in actual copyin/copyout. - might_fault() is better off consolidated - we know whether it needs to be checked as soon as we enter iov_iter primitive and observe the iov_iter flavour. No need to wait until the copyin/copyout. The call chains are short enough to make sure we won't miss anything - in fact, it's more robust that way, since there are cases where we do e.g. forced fault-in before getting to copyin/copyout. It's not quite what we need to check (in particular, combination of iovec-backed and set_fs(KERNEL_DS) is almost certainly a bug, not a cause to skip checks), but that's for later series. For now let's keep might_fault(). - KASAN checks belong in copyin/copyout - at the same level where other iov_iter flavours would've hit them in memcpy(). - object size checks should apply to all iov_iter flavours, not just iovec-backed ones. There are two groups of primitives - one gets the kernel object described as pointer + size (copy_to_iter(), etc.) while another gets it as page + offset + size (copy_page_to_iter(), etc.) For the first group the checks are best done where we actually have a chance to find the object size. In other words, those belong in inline wrappers in uio.h, before calling into iov_iter.c. Same kind as we have for inlined part of copy_to_user(). For the second group there is no object to look at - offset in page is just a number, it bears no type information. So we do them in the common helper called by iov_iter.c primitives of that kind. All it currently does is checking that we are not trying to access outside of the compound page; eventually we might want to add some sanity checks on the page involved. So the things we need in copyin/copyout part of iov_iter.c do not quite match anything in uaccess.h (we want no zeroing, we do want access_ok() and KASAN and we want no might_fault() or object size checks done on that level). OTOH, these needs are simple enough to provide a couple of helpers (static in iov_iter.c) doing just what we need..." * 'uaccess-work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: iov_iter: saner checks on copyin/copyout iov_iter: sanity checks for copy to/from page primitives iov_iter/hardening: move object size checks to inlined part copy_{to,from}_user(): consolidate object size checks copy_{from,to}_user(): move kasan checks and might_fault() out-of-line
\| * \| \| \| \|	iov_iter: saner checks on copyin/copyout	Al Viro	2017-07-07	1	-16/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* might_fault() is better checked in caller (and e.g. fault-in + kmap_atomic codepath also needs might_fault() coverage) * we have already done object size checks * we have NOT done access_ok() recently enough; we rely upon the iovec array having passed sanity checks back when it had been created and not nothing having buggered it since. However, that's very much non-local, so we'd better recheck that. So the thing we want does not match anything in uaccess - we need access_ok + kasan checks + raw copy without any zeroing. Just define such helpers and use them here. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \| \| \| \|	iov_iter: sanity checks for copy to/from page primitives	Al Viro	2017-06-29	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	for now - just that we don't attempt to cross out of compound page Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \| \| \| \|	iov_iter/hardening: move object size checks to inlined part	Al Viro	2017-06-29	2	-16/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There we actually have useful information about object sizes. Note: this patch has them done for all iov_iter flavours. Right now we do them twice in iovec case, but that'll change very shortly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \| \| \| \|	copy_{to,from}_user(): consolidate object size checks	Al Viro	2017-06-29	2	-26/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	... and move them into thread_info.h, next to check_object_size() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \| \| \| \|	copy_{from,to}_user(): move kasan checks and might_fault() out-of-line	Al Viro	2017-06-29	2	-10/+16
\| \| \|/ / / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \| \| \| \|	exec: Limit arg stack to at most 75% of _STK_LIM	Kees Cook	2017-07-07	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To avoid pathological stack usage or the need to special-case setuid execs, just limit all arg stack usage to at most 75% of _STK_LIM (6MB). Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \| \| \|	Merge tag 'for-linus-v4.13-2' of ↵	Linus Torvalds	2017-07-07	24	-63/+572
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull Writeback error handling updates from Jeff Layton: "This pile represents the bulk of the writeback error handling fixes that I have for this cycle. Some of the earlier patches in this pile may look trivial but they are prerequisites for later patches in the series. The aim of this set is to improve how we track and report writeback errors to userland. Most applications that care about data integrity will periodically call fsync/fdatasync/msync to ensure that their writes have made it to the backing store. For a very long time, we have tracked writeback errors using two flags in the address_space: AS_EIO and AS_ENOSPC. Those flags are set when a writeback error occurs (via mapping_set_error) and are cleared as a side-effect of filemap_check_errors (as you noted yesterday). This model really sucks for userland. Only the first task to call fsync (or msync or fdatasync) will see the error. Any subsequent task calling fsync on a file will get back 0 (unless another writeback error occurs in the interim). If I have several tasks writing to a file and calling fsync to ensure that their writes got stored, then I need to have them coordinate with one another. That's difficult enough, but in a world of containerized setups that coordination may even not be possible. But wait...it gets worse! The calls to filemap_check_errors can be buried pretty far down in the call stack, and there are internal callers of filemap_write_and_wait and the like that also end up clearing those errors. Many of those callers ignore the error return from that function or return it to userland at nonsensical times (e.g. truncate() or stat()). If I get back -EIO on a truncate, there is no reason to think that it was because some previous writeback failed, and a subsequent fsync() will (incorrectly) return 0. This pile aims to do three things: 1) ensure that when a writeback error occurs that that error will be reported to userland on a subsequent fsync/fdatasync/msync call, regardless of what internal callers are doing 2) report writeback errors on all file descriptions that were open at the time that the error occurred. This is a user-visible change, but I think most applications are written to assume this behavior anyway. Those that aren't are unlikely to be hurt by it. 3) document what filesystems should do when there is a writeback error. Today, there is very little consistency between them, and a lot of cargo-cult copying. We need to make it very clear what filesystems should do in this situation. To achieve this, the set adds a new data type (errseq_t) and then builds new writeback error tracking infrastructure around that. Once all of that is in place, we change the filesystems to use the new infrastructure for reporting wb errors to userland. Note that this is just the initial foray into cleaning up this mess. There is a lot of work remaining here: 1) convert the rest of the filesystems in a similar fashion. Once the initial set is in, then I think most other fs' will be fairly simple to convert. Hopefully most of those can in via individual filesystem trees. 2) convert internal waiters on writeback to use errseq_t for detecting errors instead of relying on the AS_* flags. I have some draft patches for this for ext4, but they are not quite ready for prime time yet. This was a discussion topic this year at LSF/MM too. If you're interested in the gory details, LWN has some good articles about this: https://lwn.net/Articles/718734/ https://lwn.net/Articles/724307/" * tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: btrfs: minimal conversion to errseq_t writeback error reporting on fsync xfs: minimal conversion to errseq_t writeback error reporting ext4: use errseq_t based error handling for reporting data writeback errors fs: convert __generic_file_fsync to use errseq_t based reporting block: convert to errseq_t based writeback error tracking dax: set errors in mapping when writeback fails Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error fs: new infrastructure for writeback error handling and reporting lib: add errseq_t type and infrastructure for handling it mm: don't TestClearPageError in __filemap_fdatawait_range mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails jbd2: don't clear and reset errors after waiting on writeback buffer: set errors in mapping at the time that the error occurs fs: check for writeback errors after syncing out buffers in generic_file_fsync buffer: use mapping_set_error instead of setting the flag mm: fix mapping_set_error call in me_pagecache_dirty
\| * \| \| \| \|	btrfs: minimal conversion to errseq_t writeback error reporting on fsync	Jeff Layton	2017-07-06	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Just check and advance the errseq_t in the file before returning, and use an errseq_t based check for writeback errors. Other internal callers of filemap_* functions are left as-is. Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| * \| \| \| \|	xfs: minimal conversion to errseq_t writeback error reporting	Jeff Layton	2017-07-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Just check and advance the data errseq_t in struct file before before returning from fsync on normal files. Internal filemap_* callers are left as-is. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| * \| \| \| \|	ext4: use errseq_t based error handling for reporting data writeback errors	Jeff Layton	2017-07-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a call to filemap_report_wb_err at the end of ext4_sync_file. This will ensure that we check and advance the errseq_t in the file, which allows us to track and report errors on all open fds when they occur. Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| * \| \| \| \|	fs: convert __generic_file_fsync to use errseq_t based reporting	Jeff Layton	2017-07-06	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Many simple, block-based filesystems use generic_file_fsync as their fsync operation. Some others (ext* and fat) also call this function to handle syncing out data. Switch this code over to use errseq_t based error reporting so that all of these filesystems get reliable error reporting via fsync, fdatasync and msync. Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| * \| \| \| \|	block: convert to errseq_t based writeback error tracking	Jeff Layton	2017-07-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a very minimal conversion to errseq_t based error tracking for raw block device access. Just have it use the standard file_write_and_wait_range call. Note that there are internal callers that call sync_blockdev and the like that are not affected by this. They'll continue to use the AS_EIO/AS_ENOSPC flags for error reporting like they always have for now. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| * \| \| \| \|	dax: set errors in mapping when writeback fails	Jeff Layton	2017-07-06	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Jan Kara's description for this patch is much better than mine, so I'm quoting it verbatim here: DAX currently doesn't set errors in the mapping when cache flushing fails in dax_writeback_mapping_range(). Since this function can get called only from fsync(2) or sync(2), this is actually as good as it can currently get since we correctly propagate the error up from dax_writeback_mapping_range() to filemap_fdatawrite() However, in the future better writeback error handling will enable us to properly report these errors on fsync(2) even if there are multiple file descriptors open against the file or if sync(2) gets called before fsync(2). So convert DAX to using standard error reporting through the mapping. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-and-tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
\| * \| \| \| \|	Documentation: flesh out the section in vfs.txt on storing and reporting ↵	Jeff Layton	2017-07-06	1	-3/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	writeback errors Let's try to make this extra clear for fs authors. Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| * \| \| \| \|	mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error	Jeff Layton	2017-07-06	1	-6/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a writeback error occurs, we want later callers to be able to pick up that fact when they go to wait on that writeback to complete. Traditionally, we've used AS_EIO/AS_ENOSPC flags to track that, but that's problematic since only one "checker" will be informed when an error occurs. In later patches, we're going to want to convert many of these callers to check for errors since a well-defined point in time. For now, ensure that we can handle both sorts of checks by both setting errors in both places when there is a writeback failure. Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| * \| \| \| \|	fs: new infrastructure for writeback error handling and reporting	Jeff Layton	2017-07-06	7	-1/+206
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Most filesystems currently use mapping_set_error and filemap_check_errors for setting and reporting/clearing writeback errors at the mapping level. filemap_check_errors is indirectly called from most of the filemap_fdatawait_* functions and from filemap_write_and_wait*. These functions are called from all sorts of contexts to wait on writeback to finish -- e.g. mostly in fsync, but also in truncate calls, getattr, etc. The non-fsync callers are problematic. We should be reporting writeback errors during fsync, but many places spread over the tree clear out errors before they can be properly reported, or report errors at nonsensical times. If I get -EIO on a stat() call, there is no reason for me to assume that it is because some previous writeback failed. The fact that it also clears out the error such that a subsequent fsync returns 0 is a bug, and a nasty one since that's potentially silent data corruption. This patch adds a small bit of new infrastructure for setting and reporting errors during address_space writeback. While the above was my original impetus for adding this, I think it's also the case that current fsync semantics are just problematic for userland. Most applications that call fsync do so to ensure that the data they wrote has hit the backing store. In the case where there are multiple writers to the file at the same time, this is really hard to determine. The first one to call fsync will see any stored error, and the rest get back 0. The processes with open fds may not be associated with one another in any way. They could even be in different containers, so ensuring coordination between all fsync callers is not really an option. One way to remedy this would be to track what file descriptor was used to dirty the file, but that's rather cumbersome and would likely be slow. However, there is a simpler way to improve the semantics here without incurring too much overhead. This set adds an errseq_t to struct address_space, and a corresponding one is added to struct file. Writeback errors are recorded in the mapping's errseq_t, and the one in struct file is used as the "since" value. This changes the semantics of the Linux fsync implementation such that applications can now use it to determine whether there were any writeback errors since fsync(fd) was last called (or since the file was opened in the case of fsync having never been called). Note that those writeback errors may have occurred when writing data that was dirtied via an entirely different fd, but that's the case now with the current mapping_set_error/filemap_check_error infrastructure. This will at least prevent you from getting a false report of success. The new behavior is still consistent with the POSIX spec, and is more reliable for application developers. This patch just adds some basic infrastructure for doing this, and ensures that the f_wb_err "cursor" is properly set when a file is opened. Later patches will change the existing code to use this new infrastructure for reporting errors at fsync time. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>
\| * \| \| \| \|	lib: add errseq_t type and infrastructure for handling it	Jeff Layton	2017-07-06	4	-1/+234
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An errseq_t is a way of recording errors in one place, and allowing any number of "subscribers" to tell whether an error has been set again since a previous time. It's implemented as an unsigned 32-bit value that is managed with atomic operations. The low order bits are designated to hold an error code (max size of MAX_ERRNO). The upper bits are used as a counter. The API works with consumers sampling an errseq_t value at a particular point in time. Later, that value can be used to tell whether new errors have been set since that time. Note that there is a 1 in 512k risk of collisions here if new errors are being recorded frequently, since we have so few bits to use as a counter. To mitigate this, one bit is used as a flag to tell whether the value has been sampled since a new value was recorded. That allows us to avoid bumping the counter if no one has sampled it since it was last bumped. Later patches will build on this infrastructure to change how writeback errors are tracked in the kernel. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: NeilBrown <neilb@suse.com> Reviewed-by: Jan Kara <jack@suse.cz>
\| * \| \| \| \|	mm: don't TestClearPageError in __filemap_fdatawait_range	Jeff Layton	2017-07-06	1	-15/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The -EIO returned here can end up overriding whatever error is marked in the address space, and be returned at fsync time, even when there is a more appropriate error stored in the mapping. Read errors are also sometimes tracked on a per-page level using PG_error. Suppose we have a read error on a page, and then that page is subsequently dirtied by overwriting the whole page. Writeback doesn't clear PG_error, so we can then end up successfully writing back that page and still return -EIO on fsync. Worse yet, PG_error is cleared during a sync() syscall, but the -EIO return from that is silently discarded. Any subsystem that is relying on PG_error to report errors during fsync can easily lose writeback errors due to this. All you need is a stray sync() call to wait for writeback to complete and you've lost the error. Since the handling of the PG_error flag is somewhat inconsistent across subsystems, let's just rely on marking the address space when there are writeback errors. Change the TestClearPageError call to ClearPageError, and make __filemap_fdatawait_range a void return function. Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| * \| \| \| \|	mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails	Jeff Layton	2017-07-06	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	filemap_write_and_wait{_range} will return an error if writeback initiation fails, but won't clear errors in the address_space. This is particularly problematic on DAX, as filemap_fdatawrite* is effectively synchronous there. Ensure that we clear the AS_EIO/AS_ENOSPC flags when filemap_fdatawrite* returns an error. Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| * \| \| \| \|	jbd2: don't clear and reset errors after waiting on writeback	Jeff Layton	2017-07-06	3	-15/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Resetting this flag is almost certainly racy, and will be problematic with some coming changes. Make filemap_fdatawait_keep_errors return int, but not clear the flag(s). Have jbd2 call it instead of filemap_fdatawait and don't attempt to re-set the error flag if it fails. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| * \| \| \| \|	buffer: set errors in mapping at the time that the error occurs	Jeff Layton	2017-07-06	3	-8/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I noticed on xfs that I could still sometimes get back an error on fsync on a fd that was opened after the error condition had been cleared. The problem is that the buffer code sets the write_io_error flag and then later checks that flag to set the error in the mapping. That flag perisists for quite a while however. If the file is later opened with O_TRUNC, the buffers will then be invalidated and the mapping's error set such that a subsequent fsync will return error. I think this is incorrect, as there was no writeback between the open and fsync. Add a new mark_buffer_write_io_error operation that sets the flag and the error in the mapping at the same time. Replace all calls to set_buffer_write_io_error with mark_buffer_write_io_error, and remove the places that check this flag in order to set the error in the mapping. This sets the error in the mapping earlier, at the time that it's first detected. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
\| * \| \| \| \|	fs: check for writeback errors after syncing out buffers in generic_file_fsync	Jeff Layton	2017-07-06	2	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ext2 currently does a test+clear of the AS_EIO flag, which is is problematic for some coming changes. What we really need to do instead is call filemap_check_errors in __generic_file_fsync after syncing out the buffers. That will be sufficient for this case, and help other callers detect these errors properly as well. With that, we don't need to twiddle it in ext2. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
\| * \| \| \| \|	buffer: use mapping_set_error instead of setting the flag	Jeff Layton	2017-07-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
\| * \| \| \| \|	mm: fix mapping_set_error call in me_pagecache_dirty	Jeff Layton	2017-07-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The error code should be negative. Since this ends up in the default case anyway, this is harmless, but it's less confusing to negate it. Also, later patches will require a negative error code here. Link: http://lkml.kernel.org/r/20170525103355.6760-1-jlayton@redhat.com Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* \| \| \| \| \|	Merge tag 'for-linus-v4.13-1' of ↵	Linus Torvalds	2017-07-07	12	-26/+23
\|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull Writeback error handling fixes from Jeff Layton: "The main rationale for all of these changes is to tighten up writeback error reporting to userland. There are many ways now that writeback errors can be lost, such that fsync/fdatasync/msync return 0 when writeback actually failed. This pile contains a small set of cleanups and writeback error handling fixes that I was able to break off from the main pile (#2). Two of the patches in this pile are trivial. The exceptions are the patch to fix up error handling in write_one_page, and the patch to make JFS pay attention to write_one_page errors" * tag 'for-linus-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fs: remove call_fsync helper function mm: clean up error handling in write_one_page JFS: do not ignore return code from write_one_page() mm: drop "wait" parameter from write_one_page()
\| * \| \| \| \| \|	fs: remove call_fsync helper function	Jeff Layton	2017-07-05	3	-8/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| * \| \| \| \| \|	mm: clean up error handling in write_one_page	Jeff Layton	2017-07-05	1	-7/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't try to check PageError since that's potentially racy and not necessarily going to be set after writepage errors out. Instead, check the mapping for an error after writepage returns. That should also help us detect errors that occurred if the VM tried to clean the page earlier due to memory pressure. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>
\| * \| \| \| \| \|	JFS: do not ignore return code from write_one_page()	Dave Kleikamp	2017-07-05	2	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are a couple places where jfs calls write_one_page() where clean recovery is not possible. In these cases, the file system should be marked dirty. To do this, it is now necessary to store the superblock in the metapage structure. Link: http://lkml.kernel.org/r/db45ab67-55c7-08ff-6776-f76b3bf5cbf5@oracle.com Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Jeff Layton <jlayton@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| * \| \| \| \| \|	mm: drop "wait" parameter from write_one_page()	Jeff Layton	2017-07-05	8	-15/+15
\| \|/ / / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The callers all set it to 1. Also, make it clear that this function will not set any sort of AS_* error, and that the caller must do so if necessary. No existing caller uses this on normal files, so none of them need it. Also, add __must_check here since, in general, the callers need to handle an error here in some fashion. Link: http://lkml.kernel.org/r/20170525103303.6524-1-jlayton@redhat.com Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* \| \| \| \| \|	Merge tag 'cifs-bug-fixes-for-4.13' of git://git.samba.org/sfrench/cifs-2.6	Linus Torvalds	2017-07-07	8	-24/+218
\|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull cifs fixes from Steve French: "First set of CIFS/SMB3 fixes for the merge window. Also improves POSIX character mapping for SMB3" * tag 'cifs-bug-fixes-for-4.13' of git://git.samba.org/sfrench/cifs-2.6: CIFS: fix circular locking dependency cifs: set oparms.create_options rather than or'ing in CREATE_OPEN_BACKUP_INTENT cifs: Do not modify mid entry after submitting I/O in cifs_call_async CIFS: add SFM mapping for 0x01-0x1F cifs: hide unused functions cifs: Use smb 2 - 3 and cifsacl mount options getacl functions cifs: prototype declaration and definition for smb 2 - 3 and cifsacl mount options CIFS: add CONFIG_CIFS_DEBUG_KEYS to dump encryption keys cifs: set mapping error when page writeback fails in writepage or launder_pages SMB3: Enable encryption for SMB3.1.1
\| * \| \| \| \| \|	CIFS: fix circular locking dependency	Rabin Vincent	2017-07-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a CIFS filesystem is mounted with the forcemand option and the following command is run on it, lockdep warns about a circular locking dependency between CifsInodeInfo::lock_sem and the inode lock. while echo foo > hello; do :; done & while touch -c hello; do :; done cifs_writev() takes the locks in the wrong order, but note that we can't only flip the order around because it releases the inode lock before the call to generic_write_sync() while it holds the lock_sem across that call. But, AFAICS, there is no need to hold the CifsInodeInfo::lock_sem across the generic_write_sync() call either, so we can release both the locks before generic_write_sync(), and change the order. ====================================================== WARNING: possible circular locking dependency detected 4.12.0-rc7+ #9 Not tainted ------------------------------------------------------ touch/487 is trying to acquire lock: (&cifsi->lock_sem){++++..}, at: cifsFileInfo_put+0x88f/0x16a0 but task is already holding lock: (&sb->s_type->i_mutex_key#11){+.+.+.}, at: utimes_common+0x3ad/0x870 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&sb->s_type->i_mutex_key#11){+.+.+.}: __lock_acquire+0x1f74/0x38f0 lock_acquire+0x1cc/0x600 down_write+0x74/0x110 cifs_strict_writev+0x3cb/0x8c0 __vfs_write+0x4c1/0x930 vfs_write+0x14c/0x2d0 SyS_write+0xf7/0x240 entry_SYSCALL_64_fastpath+0x1f/0xbe -> #0 (&cifsi->lock_sem){++++..}: check_prevs_add+0xfa0/0x1d10 __lock_acquire+0x1f74/0x38f0 lock_acquire+0x1cc/0x600 down_write+0x74/0x110 cifsFileInfo_put+0x88f/0x16a0 cifs_setattr+0x992/0x1680 notify_change+0x61a/0xa80 utimes_common+0x3d4/0x870 do_utimes+0x1c1/0x220 SyS_utimensat+0x84/0x1a0 entry_SYSCALL_64_fastpath+0x1f/0xbe other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sb->s_type->i_mutex_key#11); lock(&cifsi->lock_sem); lock(&sb->s_type->i_mutex_key#11); lock(&cifsi->lock_sem); * DEADLOCK * 2 locks held by touch/487: #0: (sb_writers#10){.+.+.+}, at: mnt_want_write+0x41/0xb0 #1: (&sb->s_type->i_mutex_key#11){+.+.+.}, at: utimes_common+0x3ad/0x870 stack backtrace: CPU: 0 PID: 487 Comm: touch Not tainted 4.12.0-rc7+ #9 Call Trace: dump_stack+0xdb/0x185 print_circular_bug+0x45b/0x790 __lock_acquire+0x1f74/0x38f0 lock_acquire+0x1cc/0x600 down_write+0x74/0x110 cifsFileInfo_put+0x88f/0x16a0 cifs_setattr+0x992/0x1680 notify_change+0x61a/0xa80 utimes_common+0x3d4/0x870 do_utimes+0x1c1/0x220 SyS_utimensat+0x84/0x1a0 entry_SYSCALL_64_fastpath+0x1f/0xbe Fixes: 19dfc1f5f2ef03a52 ("cifs: fix the race in cifs_writev()") Signed-off-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Steve French <smfrench@gmail.com> Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
\| * \| \| \| \| \|	cifs: set oparms.create_options rather than or'ing in CREATE_OPEN_BACKUP_INTENT	Colin Ian King	2017-07-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently oparms.create_options is uninitialized and the code is logically or'ing in CREATE_OPEN_BACKUP_INTENT onto a garbage value of oparms.create_options from the stack. Fix this by just setting the value rather than or'ing in the setting. Detected by CoverityScan, CID#1447220 ("Unitialized scale value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
\| * \| \| \| \| \|	cifs: Do not modify mid entry after submitting I/O in cifs_call_async	Long Li	2017-07-05	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In cifs_call_async, server may respond as soon as I/O is submitted. Because mid entry is freed on the return path, it should not be modified after I/O is submitted. cifs_save_when_sent modifies the sent timestamp in mid entry, and should not be called after I/O. Call it before I/O. Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com>
\| * \| \| \| \| \|	CIFS: add SFM mapping for 0x01-0x1F	Björn JACKE	2017-07-05	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hi, attached patch adds more missing mappings for the 0x01-0x1f range. Please review, if you're fine with it, considere it also for stable. Björn >From a97720c26db2ee77d4e798e3d383fcb6a348bd29 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B6rn=20Jacke?= <bjacke@samba.org> Date: Wed, 31 May 2017 22:48:41 +0200 Subject: [PATCH] cifs: add SFM mapping for 0x01-0x1F 0x1-0x1F has to be mapped to 0xF001-0xF01F Signed-off-by: Bjoern Jacke <bjacke@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>