summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* ext4: flush background discard kwork when retry allocationWang Jianchao2021-08-303-3/+13
| | | | | | | | | | | | The background discard kwork tries to mark blocks used and issue discard. This can make filesystem suffer from NOSPC error, xfstest generic/371 can fail due to it. Fix it by flushing discard kwork in ext4_should_retry_alloc. At the same time, give up discard at the moment. Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com> Link: https://lore.kernel.org/r/20210830075246.12516-6-jianchao.wan9@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: get discard out of jbd2 commit kthread contexWang Jianchao2021-08-302-25/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now, discard is issued and waited to be completed in jbd2 commit kthread context after the logs are committed. When large amount of files are deleted and discard is flooding, jbd2 commit kthread can be blocked for long time. Then all of the metadata operations can be blocked to wait the log space. One case is the page fault path with read mm->mmap_sem held, which wants to update the file time but has to wait for the log space. When other threads in the task wants to do mmap, then write mmap_sem is blocked. Finally all of the following read mmap_sem requirements are blocked, even the ps command which need to read the /proc/pid/ -cmdline. Our monitor service which needs to read /proc/pid/cmdline used to be blocked for 5 mins. This patch frees the blocks back to buddy after commit and then do discard in a async kworker context in fstrim fashion, namely, - mark blocks to be discarded as used if they have not been allocated - do discard - mark them free After this, jbd2 commit kthread won't be blocked any more by discard and we won't get NOSPC even if the discard is slow or throttled. Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2 Suggested-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com> Link: https://lore.kernel.org/r/20210830075246.12516-5-jianchao.wan9@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: remove the repeated comment of ext4_trim_all_freeWang Jianchao2021-08-121-6/+1
| | | | | | | | Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210724074124.25731-4-jianchao.wan9@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: add new helper interface ext4_try_to_trim_range()Wang Jianchao2021-08-121-45/+57
| | | | | | | | | | | | | There is no functional change in this patch but just split the codes, which serachs free block and does trim, into a new function ext4_try_to_trim_range. This is preparing for the following async backgroup discard. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210724074124.25731-3-jianchao.wan9@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: remove the 'group' parameter of ext4_trim_extentWang Jianchao2021-08-121-5/+4
| | | | | | | | | | | Get rid of the 'group' parameter of ext4_trim_extent as we can get it from the 'e4b'. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210724074124.25731-2-jianchao.wan9@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: clean up two gcc -Wall warnings in recovery.cTheodore Ts'o2021-08-101-3/+3
| | | | | | | | | Fix a signed vs unsigned and a void * pointer arithmetic warning. This cleanup is also in e2fsprogs commit aec460db9a93 ("e2fsck: clean up two gcc -Wall warnings in recovery.c"). Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: fix clang warning in recovery.cTheodore Ts'o2021-08-101-1/+0
| | | | | | | | | | Remove unused variable store which was never used. This fix is also in e2fsprogs commit 99a2294f85f0 ("e2fsck: value stored to err is never read"). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: fix portability problems caused by unaligned accessesTheodore Ts'o2021-08-101-11/+11
| | | | | | | | | | | | | This commit applies the e2fsck/recovery.c portions of commit 1e0c8ca7c08a ("e2fsck: fix portability problems caused by unaligned accesses) from the e2fsprogs git tree. The on-disk format for the ext4 journal can have unaigned 32-bit integers. This can happen when replaying a journal using a obsolete checksum format (which was never popularly used, since the v3 format replaced v2 while the metadata checksum feature was being stablized). Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix potential htree corruption when growing large_dir directoriesTheodore Ts'o2021-08-061-1/+1
| | | | | | | | | | | | | Commit b5776e7524af ("ext4: fix potential htree index checksum corruption) removed a required restart when multiple levels of index nodes need to be split. Fix this to avoid directory htree corruptions when using the large_dir feature. Cc: stable@kernel.org # v5.11 Cc: Благодаренко Артём <artem.blagodarenko@gmail.com> Fixes: b5776e7524af ("ext4: fix potential htree index checksum corruption) Reported-by: Denis <denis@voxelsoft.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: remove conflicting comment from __ext4_forgetGuoqing Jiang2021-07-231-3/+0
| | | | | | | | | | We do a bforget and return for no journal case, so let's remove this conflict comment. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Link: https://lore.kernel.org/r/20210714055940.1553705-1-guoqing.jiang@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix potential uninitialized access to retval in kmmpdYe Bin2021-07-231-1/+1
| | | | | | | | | | if (!ext4_has_feature_mmp(sb)) then retval can be unitialized before we jump to the wait_to_exit label. Fixes: 61bb4a1c417e ("ext4: fix possible UAF when remounting r/o a mmp-protected file system") Signed-off-by: Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20210713022728.2533770-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* Merge tag 'xfs-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2021-07-189-37/+174
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull xfs fixes from Darrick Wong: "A few fixes for issues in the new online shrink code, additional corrections for my recent bug-hunt w.r.t. extent size hints on realtime, and improved input checking of the GROWFSRT ioctl. IOW, the usual 'I somehow got bored during the merge window and resumed auditing the farther reaches of xfs': - Fix shrink eligibility checking when sparse inode clusters enabled - Reset '..' directory entries when unlinking directories to prevent verifier errors if fs is shrinked later - Don't report unusable extent size hints to FSGETXATTR - Don't warn when extent size hints are unusable because the sysadmin configured them that way - Fix insufficient parameter validation in GROWFSRT ioctl - Fix integer overflow when adding rt volumes to filesystem" * tag 'xfs-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: detect misaligned rtinherit directory extent size hints xfs: fix an integer overflow error in xfs_growfs_rt xfs: improve FSGROWFSRT precondition checking xfs: don't expose misaligned extszinherit hints to userspace xfs: correct the narrative around misaligned rtinherit/extszinherit dirs xfs: reset child dir '..' entry when unlinking child xfs: check for sparse inode clusters that cross new EOAG when shrinking
| * xfs: detect misaligned rtinherit directory extent size hintsDarrick J. Wong2021-07-151-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | If we encounter a directory that has been configured to pass on an extent size hint to a new realtime file and the hint isn't an integer multiple of the rt extent size, we should flag the hint for administrative review because that is a misconfiguration (that other parts of the kernel will fix automatically). Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * xfs: fix an integer overflow error in xfs_growfs_rtDarrick J. Wong2021-07-151-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | During a realtime grow operation, we run a single transaction for each rt bitmap block added to the filesystem. This means that each step has to be careful to increase sb_rblocks appropriately. Fix the integer overflow error in this calculation that can happen when the extent size is very large. Found by running growfs to add a rt volume to a filesystem formatted with a 1g rt extent size. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * xfs: improve FSGROWFSRT precondition checkingDarrick J. Wong2021-07-151-7/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve the checking at the start of a realtime grow operation so that we avoid accidentally set a new extent size that is too large and avoid adding an rt volume to a filesystem with rmap or reflink because we don't support rt rmap or reflink yet. While we're at it, separate the checks so that we're only testing one aspect at a time. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * xfs: don't expose misaligned extszinherit hints to userspaceDarrick J. Wong2021-07-151-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 603f000b15f2 changed xfs_ioctl_setattr_check_extsize to reject an attempt to set an EXTSZINHERIT extent size hint on a directory with RTINHERIT set if the hint isn't a multiple of the realtime extent size. However, I have recently discovered that it is possible to change the realtime extent size when adding a rt device to a filesystem, which means that the existence of directories with misaligned inherited hints is not an accident. As a result, it's possible that someone could have set a valid hint and added an rt volume with a different rt extent size, which invalidates the ondisk hints. After such a sequence, FSGETXATTR will report a misaligned hint, which FSSETXATTR will trip over, causing confusion if the user was doing the usual GET/SET sequence to change some other attribute. Change xfs_fill_fsxattr to omit the hint if it isn't aligned properly. Fixes: 603f000b15f2 ("xfs: validate extsz hints against rt extent size when rtinherit is set") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * xfs: correct the narrative around misaligned rtinherit/extszinherit dirsDarrick J. Wong2021-07-153-22/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While auditing the realtime growfs code, I realized that the GROWFSRT ioctl (and by extension xfs_growfs) has always allowed sysadmins to change the realtime extent size when adding a realtime section to the filesystem. Since we also have always allowed sysadmins to set RTINHERIT and EXTSZINHERIT on directories even if there is no realtime device, this invalidates the premise laid out in the comments added in commit 603f000b15f2. In other words, this is not a case of inadequate metadata validation. This is a case of nearly forgotten (and apparently untested) but supported functionality. Update the comments to reflect what we've learned, and remove the log message about correcting the misalignment. Fixes: 603f000b15f2 ("xfs: validate extsz hints against rt extent size when rtinherit is set") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * xfs: reset child dir '..' entry when unlinking childDarrick J. Wong2021-07-151-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While running xfs/168, I noticed a second source of post-shrink corruption errors causing shutdowns. Let's say that directory B has a low inode number and is a child of directory A, which has a high number. If B is empty but open, and unlinked from A, B's dotdot link continues to point to A. If A is then unlinked and the filesystem shrunk so that A is no longer a valid inode, a subsequent AIL push of B will trip the inode verifiers because the dotdot entry points outside of the filesystem. To avoid this problem, reset B's dotdot entry to the root directory when unlinking directories, since the root directory cannot be removed. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
| * xfs: check for sparse inode clusters that cross new EOAG when shrinkingDarrick J. Wong2021-07-153-0/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While running xfs/168, I noticed occasional write verifier shutdowns involving inodes at the very end of the filesystem. Existing inode btree validation code checks that all inode clusters are fully contained within the filesystem. However, due to inadequate checking in the fs shrink code, it's possible that there could be a sparse inode cluster at the end of the filesystem where the upper inodes of the cluster are marked as holes and the corresponding blocks are free. In this case, the last blocks in the AG are listed in the bnobt. This enables the shrink to proceed but results in a filesystem that trips the inode verifiers. Fix this by disallowing the shrink. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | Merge tag 'iomap-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2021-07-182-20/+13
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull iomap fixes from Darrick Wong: "A handful of bugfixes for the iomap code. There's nothing especially exciting here, just fixes for UBSAN (not KASAN as I erroneously wrote in the tag message) warnings about undefined behavior in the SEEK_DATA/SEEK_HOLE code, and some reshuffling of per-page block state info to fix some problems with gfs2. - Fix KASAN warnings due to integer overflow in SEEK_DATA/SEEK_HOLE - Fix assertion errors when using inlinedata files on gfs2" * tag 'iomap-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: Don't create iomap_page objects in iomap_page_mkwrite_actor iomap: Don't create iomap_page objects for inline files iomap: Permit pages without an iop to enter writeback iomap: remove the length variable in iomap_seek_hole iomap: remove the length variable in iomap_seek_data
| * | iomap: Don't create iomap_page objects in iomap_page_mkwrite_actorAndreas Gruenbacher2021-07-151-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we create those objects in iomap_writepage_map when needed, there's no need to pre-create them in iomap_page_mkwrite_actor anymore. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
| * | iomap: Don't create iomap_page objects for inline filesAndreas Gruenbacher2021-07-151-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In iomap_readpage_actor, don't create iop objects for inline inodes. Otherwise, iomap_read_inline_data will set PageUptodate without setting iop->uptodate, and iomap_page_release will eventually complain. To prevent this kind of bug from occurring in the future, make sure the page doesn't have private data attached in iomap_read_inline_data. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
| * | iomap: Permit pages without an iop to enter writebackAndreas Gruenbacher2021-07-151-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create an iop in the writeback path if one doesn't exist. This allows us to avoid creating the iop in some cases. We'll initially do that for pages with inline data, but it can be extended to pages which are entirely within an extent. It also allows for an iop to be removed from pages in the future (eg page split). Co-developed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
| * | iomap: remove the length variable in iomap_seek_holeChristoph Hellwig2021-07-151-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The length variable is rather pointless given that it can be trivially deduced from offset and size. Also the initial calculation can lead to KASAN warnings. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Leizhen (ThunderTown) <thunder.leizhen@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
| * | iomap: remove the length variable in iomap_seek_dataChristoph Hellwig2021-07-151-10/+6
| |/ | | | | | | | | | | | | | | | | | | | | | | The length variable is rather pointless given that it can be trivially deduced from offset and size. Also the initial calculation can lead to KASAN warnings. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Leizhen (ThunderTown) <thunder.leizhen@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
* | Merge tag '5.14-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2021-07-178-20/+124
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull cifs fixes from Steve French: "Eight cifs/smb3 fixes, including three for stable. Three are DFS related fixes, and two to fix problems pointed out by static checkers" * tag '5.14-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: do not share tcp sessions of dfs connections SMB3.1.1: fix mount failure to some servers when compression enabled cifs: added WARN_ON for all the count decrements cifs: fix missing null session check in mount cifs: handle reconnect of tcon when there is no cached dfs referral cifs: fix the out of range assignment to bit fields in parse_server_interfaces cifs: Do not use the original cruid when following DFS links for multiuser mounts cifs: use the expiry output of dns_query to schedule next resolution
| * | cifs: do not share tcp sessions of dfs connectionsPaulo Alcantara2021-07-162-3/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure that we do not share tcp sessions of dfs mounts when mounting regular shares that connect to same server. DFS connections rely on a single instance of tcp in order to do failover properly in cifs_reconnect(). Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | SMB3.1.1: fix mount failure to some servers when compression enabledSteve French2021-07-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | When sending the compression context to some servers, they rejected the SMB3.1.1 negotiate protocol because they expect the compression context to have a data length of a multiple of 8. Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | cifs: added WARN_ON for all the count decrementsShyam Prasad N2021-07-152-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a few ref counters srv_count, ses_count and tc_count which we use for ref counting. Added a WARN_ON during the decrement of each of these counters to make sure that they don't go below their minimum values. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | cifs: fix missing null session check in mountSteve French2021-07-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although it is unlikely to be have ended up with a null session pointer calling cifs_try_adding_channels in cifs_mount. Coverity correctly notes that we are already checking for it earlier (when we return from do_dfs_failover), so at a minimum to clarify the code we should make sure we also check for it when we exit the loop so we don't end up calling cifs_try_adding_channels or mount_setup_tlink with a null ses pointer. Addresses-Coverity: 1505608 ("Derefernce after null check") Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | cifs: handle reconnect of tcon when there is no cached dfs referralPaulo Alcantara2021-07-151-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | When there is no cached DFS referral of tcon->dfs_path, then reconnect to same share. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | cifs: fix the out of range assignment to bit fields in parse_server_interfacesHyunchul Lee2021-07-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Because the out of range assignment to bit fields are compiler-dependant, the fields could have wrong value. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | cifs: Do not use the original cruid when following DFS links for multiuser ↵Ronnie Sahlberg2021-07-141-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mounts Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213565 cruid should only be used for the initial mount and after this we should use the current users credentials. Ignore the original cruid mount argument when creating a new context for a multiuser mount following a DFS link. Fixes: 24e0a1eff9e2 ("cifs: switch to new mount api") Cc: stable@vger.kernel.org # 5.11+ Reported-by: Xiaoli Feng <xifeng@redhat.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | cifs: use the expiry output of dns_query to schedule next resolutionShyam Prasad N2021-07-146-10/+65
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | We recently fixed DNS resolution of the server hostname during reconnect. However, server IP address may change, even when the old one continues to server (although sub-optimally). We should schedule the next DNS resolution based on the TTL of the DNS record used for the last resolution. This way, we resolve the server hostname again when a DNS record expires. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: <stable@vger.kernel.org> # v5.11+ Signed-off-by: Steve French <stfrench@microsoft.com>
* | Merge tag 'io_uring-5.14-2021-07-16' of git://git.kernel.dk/linux-blockLinus Torvalds2021-07-161-3/+5
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull io_uring fixes from Jens Axboe: "Two small fixes: one fixing the process target of a check, and the other a minor issue with the drain error handling" * tag 'io_uring-5.14-2021-07-16' of git://git.kernel.dk/linux-block: io_uring: fix io_drain_req() io_uring: use right task for exiting checks
| * | io_uring: fix io_drain_req()Pavel Begunkov2021-07-111-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | io_drain_req() return whether the request has been consumed or not, not an error code. Fix a stupid mistake slipped from optimisation patches. Reported-by: syzbot+ba6fcd859210f4e9e109@syzkaller.appspotmail.com Fixes: 76cc33d79175a ("io_uring: refactor io_req_defer()") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4d3c53c4274ffff307c8ae062fc7fda63b978df2.1626039606.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | io_uring: use right task for exiting checksPavel Begunkov2021-07-111-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | When we use delayed_work for fallback execution of requests, current will be not of the submitter task, and so checks in io_req_task_submit() may not behave as expected. Currently, it leaves inline completions not flushed, so making io_ring_exit_work() to hang. Use the submitter task for all those checks. Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cb413c715bed0bc9c98b169059ea9c8a2c770715.1625881431.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'zonefs-5.14-rc2' of ↵Linus Torvalds2021-07-161-3/+0
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs fix from Damien Le Moal: "A single patch to remove an unnecessary NULL bio check (from Xianting)" * tag 'zonefs-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: remove redundant null bio check
| * | zonefs: remove redundant null bio checkXianting Tian2021-07-161-3/+0
| |/ | | | | | | | | | | | | | | bio_alloc() with __GFP_DIRECT_RECLAIM, which is included in GFP_NOFS, never fails, see comments in bio_alloc_bioset(). Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
* | Merge tag 'configfs-5.13-1' of git://git.infradead.org/users/hch/configfsLinus Torvalds2021-07-151-7/+22
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | Pull configfs fix from Christoph Hellwig: - fix the read and write iterators (Bart Van Assche) * tag 'configfs-5.13-1' of git://git.infradead.org/users/hch/configfs: configfs: fix the read and write iterators
| * | configfs: fix the read and write iteratorsBart Van Assche2021-07-131-7/+22
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7fe1e79b59ba ("configfs: implement the .read_iter and .write_iter methods") changed the simple_read_from_buffer() calls into copy_to_iter() calls and the simple_write_to_buffer() calls into copy_from_iter() calls. The simple*buffer() methods update the file offset (*ppos) but the read and write iterators not yet. Make the read and write iterators update the file offset (iocb->ki_pos). This patch has been tested as follows: # modprobe target_core_user # dd if=/sys/kernel/config/target/dbroot bs=1 /var/target 12+0 records in 12+0 records out 12 bytes copied, 9.5539e-05 s, 126 kB/s # cd /sys/kernel/config/acpi/table # mkdir test # cd test # dmesg -c >/dev/null; printf 'SSDT\x8\0\0\0abcdefghijklmnopqrstuvwxyz' | dd of=aml bs=1; dmesg -c 34+0 records in 34+0 records out 34 bytes copied, 0.010627 s, 3.2 kB/s [ 261.056551] ACPI configfs: invalid table length Reported-by: Yanko Kaneti <yaneti@declera.com> Cc: Yanko Kaneti <yaneti@declera.com> Fixes: 7fe1e79b59ba ("configfs: implement the .read_iter and .write_iter methods") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
* | Merge tag 'Wimplicit-fallthrough-clang-5.14-rc2' of ↵Linus Torvalds2021-07-152-9/+9
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull fallthrough fixes from Gustavo Silva: "This fixes many fall-through warnings when building with Clang and -Wimplicit-fallthrough, and also enables -Wimplicit-fallthrough for Clang, globally. It's also important to notice that since we have adopted the use of the pseudo-keyword macro fallthrough, we also want to avoid having more /* fall through */ comments being introduced. Contrary to GCC, Clang doesn't recognize any comments as implicit fall-through markings when the -Wimplicit-fallthrough option is enabled. So, in order to avoid having more comments being introduced, we use the option -Wimplicit-fallthrough=5 for GCC, which similar to Clang, will cause a warning in case a code comment is intended to be used as a fall-through marking. The patch for Makefile also enforces this. We had almost 4,000 of these issues for Clang in the beginning, and there might be a couple more out there when building some architectures with certain configurations. However, with the recent fixes I think we are in good shape and it is now possible to enable the warning for Clang" * tag 'Wimplicit-fallthrough-clang-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits) Makefile: Enable -Wimplicit-fallthrough for Clang powerpc/smp: Fix fall-through warning for Clang dmaengine: mpc512x: Fix fall-through warning for Clang usb: gadget: fsl_qe_udc: Fix fall-through warning for Clang powerpc/powernv: Fix fall-through warning for Clang MIPS: Fix unreachable code issue MIPS: Fix fall-through warnings for Clang ASoC: Mediatek: MT8183: Fix fall-through warning for Clang power: supply: Fix fall-through warnings for Clang dmaengine: ti: k3-udma: Fix fall-through warning for Clang s390: Fix fall-through warnings for Clang dmaengine: ipu: Fix fall-through warning for Clang iommu/arm-smmu-v3: Fix fall-through warning for Clang mmc: jz4740: Fix fall-through warning for Clang PCI: Fix fall-through warning for Clang scsi: libsas: Fix fall-through warning for Clang video: fbdev: Fix fall-through warning for Clang math-emu: Fix fall-through warning cpufreq: Fix fall-through warning for Clang drm/msm: Fix fall-through warning in msm_gem_new_impl() ...
| * | fcntl: Fix unreachable code in do_fcntl()Gustavo A. R. Silva2021-07-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the following warning: fs/fcntl.c:373:3: warning: fallthrough annotation in unreachable code [-Wimplicit-fallthrough] fallthrough; ^ include/linux/compiler_attributes.h:210:41: note: expanded from macro 'fallthrough' # define fallthrough __attribute__((__fallthrough__)) by placing the fallthrough; statement inside ifdeffery. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
| * | xfs: Fix multiple fall-through warnings for ClangGustavo A. R. Silva2021-07-121-8/+8
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation to enable -Wimplicit-fallthrough for Clang, fix the following warnings by replacing /* fallthrough */ comments, and its variants, with the new pseudo-keyword macro fallthrough: fs/xfs/libxfs/xfs_attr.c:487:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_attr.c:500:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_attr.c:532:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_attr.c:594:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_attr.c:607:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_attr.c:1410:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_attr.c:1445:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_attr.c:1473:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] Notice that Clang doesn't recognize /* fallthrough */ comments as implicit fall-through markings, so in order to globally enable -Wimplicit-fallthrough for Clang, these comments need to be replaced with fallthrough; in the whole codebase. Link: https://github.com/KSPP/linux/issues/115 Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
* | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2021-07-154-11/+45
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc fixes from Andrew Morton: "13 patches. Subsystems affected by this patch series: mm (kasan, pagealloc, rmap, hmm, and hugetlb), and hfs" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/hugetlb: fix refs calculation from unaligned @vaddr hfs: add lock nesting notation to hfs_find_init hfs: fix high memory mapping in hfs_bnode_read hfs: add missing clean-up in hfs_fill_super lib/test_hmm: remove set but unused page variable mm: fix the try_to_unmap prototype for !CONFIG_MMU mm/page_alloc: further fix __alloc_pages_bulk() return value mm/page_alloc: correct return value when failing at preparing mm/page_alloc: avoid page allocator recursion with pagesets.lock held Revert "mm/page_alloc: make should_fail_alloc_page() static" kasan: fix build by including kernel.h kasan: add memzero init for unaligned size at DEBUG mm: move helper to check slub_debug_enabled
| * | hfs: add lock nesting notation to hfs_find_initDesmond Cheong Zhi Xi2021-07-152-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Syzbot reports a possible recursive lock in [1]. This happens due to missing lock nesting information. From the logs, we see that a call to hfs_fill_super is made to mount the hfs filesystem. While searching for the root inode, the lock on the catalog btree is grabbed. Then, when the parent of the root isn't found, a call to __hfs_bnode_create is made to create the parent of the root. This eventually leads to a call to hfs_ext_read_extent which grabs a lock on the extents btree. Since the order of locking is catalog btree -> extents btree, this lock hierarchy does not lead to a deadlock. To tell lockdep that this locking is safe, we add nesting notation to distinguish between catalog btrees, extents btrees, and attributes btrees (for HFS+). This has already been done in hfsplus. Link: https://syzkaller.appspot.com/bug?id=f007ef1d7a31a469e3be7aeb0fde0769b18585db [1] Link: https://lkml.kernel.org/r/20210701030756.58760-4-desmondcheongzx@gmail.com Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> Reported-by: syzbot+b718ec84a87b7e73ade4@syzkaller.appspotmail.com Tested-by: syzbot+b718ec84a87b7e73ade4@syzkaller.appspotmail.com Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | hfs: fix high memory mapping in hfs_bnode_readDesmond Cheong Zhi Xi2021-07-151-5/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pages that we read in hfs_bnode_read need to be kmapped into kernel address space. However, currently only the 0th page is kmapped. If the given offset + length exceeds this 0th page, then we have an invalid memory access. To fix this, we kmap relevant pages one by one and copy their relevant portions of data. An example of invalid memory access occurring without this fix can be seen in the following crash report: ================================================================== BUG: KASAN: use-after-free in memcpy include/linux/fortify-string.h:191 [inline] BUG: KASAN: use-after-free in hfs_bnode_read+0xc4/0xe0 fs/hfs/bnode.c:26 Read of size 2 at addr ffff888125fdcffe by task syz-executor5/4634 CPU: 0 PID: 4634 Comm: syz-executor5 Not tainted 5.13.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x195/0x1f8 lib/dump_stack.c:120 print_address_description.constprop.0+0x1d/0x110 mm/kasan/report.c:233 __kasan_report mm/kasan/report.c:419 [inline] kasan_report.cold+0x7b/0xd4 mm/kasan/report.c:436 check_region_inline mm/kasan/generic.c:180 [inline] kasan_check_range+0x154/0x1b0 mm/kasan/generic.c:186 memcpy+0x24/0x60 mm/kasan/shadow.c:65 memcpy include/linux/fortify-string.h:191 [inline] hfs_bnode_read+0xc4/0xe0 fs/hfs/bnode.c:26 hfs_bnode_read_u16 fs/hfs/bnode.c:34 [inline] hfs_bnode_find+0x880/0xcc0 fs/hfs/bnode.c:365 hfs_brec_find+0x2d8/0x540 fs/hfs/bfind.c:126 hfs_brec_read+0x27/0x120 fs/hfs/bfind.c:165 hfs_cat_find_brec+0x19a/0x3b0 fs/hfs/catalog.c:194 hfs_fill_super+0xc13/0x1460 fs/hfs/super.c:419 mount_bdev+0x331/0x3f0 fs/super.c:1368 hfs_mount+0x35/0x40 fs/hfs/super.c:457 legacy_get_tree+0x10c/0x220 fs/fs_context.c:592 vfs_get_tree+0x93/0x300 fs/super.c:1498 do_new_mount fs/namespace.c:2905 [inline] path_mount+0x13f5/0x20e0 fs/namespace.c:3235 do_mount fs/namespace.c:3248 [inline] __do_sys_mount fs/namespace.c:3456 [inline] __se_sys_mount fs/namespace.c:3433 [inline] __x64_sys_mount+0x2b8/0x340 fs/namespace.c:3433 do_syscall_64+0x37/0xc0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x45e63a Code: 48 c7 c2 bc ff ff ff f7 d8 64 89 02 b8 ff ff ff ff eb d2 e8 88 04 00 00 0f 1f 84 00 00 00 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f9404d410d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 0000000020000248 RCX: 000000000045e63a RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007f9404d41120 RBP: 00007f9404d41120 R08: 00000000200002c0 R09: 0000000020000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 R13: 0000000000000003 R14: 00000000004ad5d8 R15: 0000000000000000 The buggy address belongs to the page: page:00000000dadbcf3e refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x125fdc flags: 0x2fffc0000000000(node=0|zone=2|lastcpupid=0x3fff) raw: 02fffc0000000000 ffffea000497f748 ffffea000497f6c8 0000000000000000 raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888125fdce80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888125fdcf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff888125fdcf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff888125fdd000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888125fdd080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== Link: https://lkml.kernel.org/r/20210701030756.58760-3-desmondcheongzx@gmail.com Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | hfs: add missing clean-up in hfs_fill_superDesmond Cheong Zhi Xi2021-07-151-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "hfs: fix various errors", v2. This series ultimately aims to address a lockdep warning in hfs_find_init reported by Syzbot [1]. The work done for this led to the discovery of another bug, and the Syzkaller repro test also reveals an invalid memory access error after clearing the lockdep warning. Hence, this series is broken up into three patches: 1. Add a missing call to hfs_find_exit for an error path in hfs_fill_super 2. Fix memory mapping in hfs_bnode_read by fixing calls to kmap 3. Add lock nesting notation to tell lockdep that the observed locking hierarchy is safe This patch (of 3): Before exiting hfs_fill_super, the struct hfs_find_data used in hfs_find_init should be passed to hfs_find_exit to be cleaned up, and to release the lock held on the btree. The call to hfs_find_exit is missing from an error path. We add it back in by consolidating calls to hfs_find_exit for error paths. Link: https://syzkaller.appspot.com/bug?id=f007ef1d7a31a469e3be7aeb0fde0769b18585db [1] Link: https://lkml.kernel.org/r/20210701030756.58760-1-desmondcheongzx@gmail.com Link: https://lkml.kernel.org/r/20210701030756.58760-2-desmondcheongzx@gmail.com Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | fs: add vfs_parse_fs_param_source() helperChristian Brauner2021-07-141-18/+36
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a simple helper that filesystems can use in their parameter parser to parse the "source" parameter. A few places open-coded this function and that already caused a bug in the cgroup v1 parser that we fixed. Let's make it harder to get this wrong by introducing a helper which performs all necessary checks. Link: https://syzkaller.appspot.com/bug?id=6312526aba5beae046fdae8f00399f87aab48b12 Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge tag 'vboxsf-v5.14-1' of ↵Linus Torvalds2021-07-133-38/+116
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux Pull vboxsf fixes from Hans de Goede: "This adds support for the atomic_open directory-inode op to vboxsf. Note this is not just an enhancement this also fixes an actual issue which users are hitting, see the commit message of the "boxsf: Add support for the atomic_open directory-inode" patch" * tag 'vboxsf-v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux: vboxsf: Add support for the atomic_open directory-inode op vboxsf: Add vboxsf_[create|release]_sf_handle() helpers vboxsf: Make vboxsf_dir_create() return the handle for the created file vboxsf: Honor excl flag to the dir-inode create op