summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | | | ksmbd: set FILE_NAMED_STREAMS attribute in FS_ATTRIBUTE_INFORMATIONNamjae Jeon2023-03-221-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If vfs objects = streams_xattr in ksmbd.conf FILE_NAMED_STREAMS should be set to Attributes in FS_ATTRIBUTE_INFORMATION. MacOS client show "Format: SMB (Unknown)" on faked NTFS and no streams support. Cc: stable@vger.kernel.org Reported-by: Miao Lihua <441884205@qq.com> Tested-by: Miao Lihua <441884205@qq.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | | | | ksmbd: fix wrong signingkey creation when encryption is AES256Namjae Jeon2023-03-221-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MacOS and Win11 support AES256 encrytion and it is included in the cipher array of encryption context. Especially on macOS, The most preferred cipher is AES256. Connecting to ksmbd fails on newer MacOS clients that support AES256 encryption. MacOS send disconnect request after receiving final session setup response from ksmbd. Because final session setup is signed with signing key was generated incorrectly. For signging key, 'L' value should be initialized to 128 if key size is 16bytes. Cc: stable@vger.kernel.org Reported-by: Miao Lihua <441884205@qq.com> Tested-by: Miao Lihua <441884205@qq.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
* | | | | | | | | Merge tag 'for-6.3-rc3-tag' of ↵Linus Torvalds2023-03-248-71/+71
|\ \ \ \ \ \ \ \ \ | |_|/ / / / / / / |/| | | | | / / / | | |_|_|_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more fixes, the zoned accounting fix is spread across a few patches, preparatory and the actual fixes: - zoned mode: - fix accounting of unusable zone space - fix zone activation condition for DUP profile - preparatory patches - improved error handling of missing chunks - fix compiler warning" * tag 'for-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: drop space_info->active_total_bytes btrfs: zoned: count fresh BG region as zone unusable btrfs: use temporary variable for space_info in btrfs_update_block_group btrfs: rename BTRFS_FS_NO_OVERCOMMIT to BTRFS_FS_ACTIVE_ZONE_TRACKING btrfs: zoned: fix btrfs_can_activate_zone() to support DUP profile btrfs: fix compiler warning on SPARC/PA-RISC handling fscrypt_setup_filename btrfs: handle missing chunk mapping more gracefully
| * | | | | | | btrfs: zoned: drop space_info->active_total_bytesNaohiro Aota2023-03-154-43/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The space_info->active_total_bytes is no longer necessary as we now count the region of newly allocated block group as zone_unusable. Drop its usage. Fixes: 6a921de58992 ("btrfs: zoned: introduce space_info->active_total_bytes") CC: stable@vger.kernel.org # 6.1+ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | | | btrfs: zoned: count fresh BG region as zone unusableNaohiro Aota2023-03-152-6/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The naming of space_info->active_total_bytes is misleading. It counts not only active block groups but also full ones which are previously active but now inactive. That confusion results in a bug not counting the full BGs into active_total_bytes on mount time. For a background, there are three kinds of block groups in terms of activation. 1. Block groups never activated 2. Block groups currently active 3. Block groups previously active and currently inactive (due to fully written or zone finish) What we really wanted to exclude from "total_bytes" is the total size of BGs #1. They seem empty and allocatable but since they are not activated, we cannot rely on them to do the space reservation. And, since BGs #1 never get activated, they should have no "used", "reserved" and "pinned" bytes. OTOH, BGs #3 can be counted in the "total", since they are already full we cannot allocate from them anyway. For them, "total_bytes == used + reserved + pinned + zone_unusable" should hold. Tracking #2 and #3 as "active_total_bytes" (current implementation) is confusing. And, tracking #1 and subtract that properly from "total_bytes" every time you need space reservation is cumbersome. Instead, we can count the whole region of a newly allocated block group as zone_unusable. Then, once that block group is activated, release [0 .. zone_capacity] from the zone_unusable counters. With this, we can eliminate the confusing ->active_total_bytes and the code will be common among regular and the zoned mode. Also, no additional counter is needed with this approach. Fixes: 6a921de58992 ("btrfs: zoned: introduce space_info->active_total_bytes") CC: stable@vger.kernel.org # 6.1+ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | | | btrfs: use temporary variable for space_info in btrfs_update_block_groupJosef Bacik2023-03-151-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We do cache->space_info->counter += num_bytes; everywhere in here. This is makes the lines longer than they need to be, and will be especially noticeable when we add the active tracking in, so add a temp variable for the space_info so this is cleaner. Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | | | btrfs: rename BTRFS_FS_NO_OVERCOMMIT to BTRFS_FS_ACTIVE_ZONE_TRACKINGJosef Bacik2023-03-153-8/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This flag only gets set when we're doing active zone tracking, and we're going to need to use this flag for things related to this behavior. Rename the flag to represent what it actually means for the file system so it can be used in other ways and still make sense. Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | | | btrfs: zoned: fix btrfs_can_activate_zone() to support DUP profileNaohiro Aota2023-03-151-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | btrfs_can_activate_zone() returns true if at least one device has one zone available for activation. This is OK for the single profile, but not OK for DUP profile. We need two zones to create a DUP block group. Fix it by properly handling the case with the profile flags. Fixes: 265f7237dd25 ("btrfs: zoned: allow DUP on meta-data block groups") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | | | btrfs: fix compiler warning on SPARC/PA-RISC handling fscrypt_setup_filenameSweet Tea Dorminy2023-03-151-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1ec49744ba83 ("btrfs: turn on -Wmaybe-uninitialized") exposed that on SPARC and PA-RISC, gcc is unaware that fscrypt_setup_filename() only returns negative error values or 0. This ultimately results in a maybe-uninitialized warning in btrfs_lookup_dentry(). Change to only return negative error values or 0 from fscrypt_setup_filename() at the relevant call site, and assert that no positive error codes are returned (which would have wider implications involving other users). Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/all/481b19b5-83a0-4793-b4fd-194ad7b978c3@roeck-us.net/ Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | | | btrfs: handle missing chunk mapping more gracefullyQu Wenruo2023-03-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [BUG] During my scrub rework, I did a stupid thing like this: bio->bi_iter.bi_sector = stripe->logical; btrfs_submit_bio(fs_info, bio, stripe->mirror_num); Above bi_sector assignment is using logical address directly, which lacks ">> SECTOR_SHIFT". This results a read on a range which has no chunk mapping. This results the following crash: BTRFS critical (device dm-1): unable to find logical 11274289152 length 65536 assertion failed: !IS_ERR(em), in fs/btrfs/volumes.c:6387 Sure this is all my fault, but this shows a possible problem in real world, that some bit flip in file extents/tree block can point to unmapped ranges, and trigger above ASSERT(), or if CONFIG_BTRFS_ASSERT is not configured, cause invalid pointer access. [PROBLEMS] In the above call chain, we just don't handle the possible error from btrfs_get_chunk_map() inside __btrfs_map_block(). [FIX] The fix is straightforward, replace the ASSERT() with proper error handling (callers handle errors already). Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* | | | | | | | Merge tag 'gfs2-v6.3-rc3-fix' of ↵Linus Torvalds2023-03-231-0/+18
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fix from Andreas Gruenbacher: - Reinstate commit 970343cd4904 ("GFS2: free disk inode which is deleted by remote node -V2") as reverting that commit could cause gfs2_put_super() to hang. * tag 'gfs2-v6.3-rc3-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: Reinstate "GFS2: free disk inode which is deleted by remote node -V2"
| * | | | | | | | Reinstate "GFS2: free disk inode which is deleted by remote node -V2"Bob Peterson2023-03-231-0/+18
| | |/ / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It turns out that reverting commit 970343cd4904 ("GFS2: free disk inode which is deleted by remote node -V2") causes a regression related to evicting inodes that were unlinked on a different cluster node. We could also have simply added a call to d_mark_dontcache() to function gfs2_try_evict(), but the original pre-revert code is better tested and proven. This reverts commit 445cb1277e10d7e19b631ef8a64aa3f055df377d. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* | | | | | | | Merge tag 'zonefs-6.3-rc4' of ↵Linus Torvalds2023-03-231-2/+2
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs fixes from Damien Le Moal: - Silence a false positive smatch warning about an uninitialized variable - Fix an error message to provide more useful information about invalid zone append write results * tag 'zonefs-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: Fix error message in zonefs_file_dio_append() zonefs: Prevent uninitialized symbol 'size' warning
| * | | | | | | | zonefs: Fix error message in zonefs_file_dio_append()Damien Le Moal2023-03-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the expected write location in a sequential file is always at the end of the file (append write), when an invalid write append location is detected in zonefs_file_dio_append(), print the invalid written location instead of the expected write location. Fixes: a608da3bd730 ("zonefs: Detect append writes at invalid locations") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
| * | | | | | | | zonefs: Prevent uninitialized symbol 'size' warningDamien Le Moal2023-03-211-1/+1
| |/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In zonefs_file_dio_append(), initialize the variable size to 0 to prevent compilation and static code analizers warning such as: New smatch warnings: fs/zonefs/file.c:441 zonefs_file_dio_append() error: uninitialized symbol 'size'. The warning is a false positive as size is never actually used uninitialized. No functional change. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/202303191227.GL8Dprbi-lkp@intel.com/ Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
* | | | | | | | Merge tag 'nfsd-6.3-3' of ↵Linus Torvalds2023-03-213-3/+10
|\ \ \ \ \ \ \ \ | | |_|_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix a crash during NFS READs from certain client implementations - Address a minor kbuild regression in v6.3 * tag 'nfsd-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: don't replace page in rq_pages if it's a continuation of last page NFS & NFSD: Update GSS dependencies
| * | | | | | | nfsd: don't replace page in rq_pages if it's a continuation of last pageJeff Layton2023-03-171-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The splice read calls nfsd_splice_actor to put the pages containing file data into the svc_rqst->rq_pages array. It's possible however to get a splice result that only has a partial page at the end, if (e.g.) the filesystem hands back a short read that doesn't cover the whole page. nfsd_splice_actor will plop the partial page into its rq_pages array and return. Then later, when nfsd_splice_actor is called again, the remainder of the page may end up being filled out. At this point, nfsd_splice_actor will put the page into the array _again_ corrupting the reply. If this is done enough times, rq_next_page will overrun the array and corrupt the trailing fields -- the rq_respages and rq_next_page pointers themselves. If we've already added the page to the array in the last pass, don't add it to the array a second time when dealing with a splice continuation. This was originally handled properly in nfsd_splice_actor, but commit 91e23b1c3982 ("NFSD: Clean up nfsd_splice_actor()") removed the check for it. Fixes: 91e23b1c3982 ("NFSD: Clean up nfsd_splice_actor()") Cc: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Dario Lesca <d.lesca@solinos.it> Tested-by: David Critch <dcritch@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2150630 Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | | | | NFS & NFSD: Update GSS dependenciesChuck Lever2023-03-102-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Geert reports that: > On v6.2, "make ARCH=m68k defconfig" gives you > CONFIG_RPCSEC_GSS_KRB5=m > On v6.3, it became builtin, due to dropping the dependencies on > the individual crypto modules. > > $ grep -E "CRYPTO_(MD5|DES|CBC|CTS|ECB|HMAC|SHA1|AES)" .config > CONFIG_CRYPTO_AES=y > CONFIG_CRYPTO_AES_TI=m > CONFIG_CRYPTO_DES=m > CONFIG_CRYPTO_CBC=m > CONFIG_CRYPTO_CTS=m > CONFIG_CRYPTO_ECB=m > CONFIG_CRYPTO_HMAC=m > CONFIG_CRYPTO_MD5=m > CONFIG_CRYPTO_SHA1=m This behavior is triggered by the "default y" in the definition of RPCSEC_GSS. The "default y" was added in 2010 by commit df486a25900f ("NFS: Fix the selection of security flavours in Kconfig"). However, svc_gss_principal was removed in 2012 by commit 03a4e1f6ddf2 ("nfsd4: move principal name into svc_cred"), so the 2010 fix is no longer necessary. We can safely change the NFS_V4 and NFSD_V4 dependencies back to RPCSEC_GSS_KRB5 to get the nicer v6.2 behavior back. Selecting KRB5 symbolically represents the true requirement here: that all spec-compliant NFSv4 implementations must have Kerberos available to use. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: dfe9a123451a ("SUNRPC: Enable rpcsec_gss_krb5.ko to be built without CRYPTO_DES") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | | | | | | Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linuxLinus Torvalds2023-03-202-18/+19
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull fsverity fixes from Eric Biggers: "Fix two significant performance issues with fsverity" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux: fsverity: don't drop pagecache at end of FS_IOC_ENABLE_VERITY fsverity: Remove WQ_UNBOUND from fsverity read workqueue
| * | | | | | | | fsverity: don't drop pagecache at end of FS_IOC_ENABLE_VERITYEric Biggers2023-03-151-12/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The full pagecache drop at the end of FS_IOC_ENABLE_VERITY is causing performance problems and is hindering adoption of fsverity. It was intended to solve a race condition where unverified pages might be left in the pagecache. But actually it doesn't solve it fully. Since the incomplete solution for this race condition has too much performance impact for it to be worth it, let's remove it for now. Fixes: 3fda4c617e84 ("fs-verity: implement FS_IOC_ENABLE_VERITY ioctl") Cc: stable@vger.kernel.org Reviewed-by: Victor Hsieh <victorhsieh@google.com> Link: https://lore.kernel.org/r/20230314235332.50270-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
| * | | | | | | | fsverity: Remove WQ_UNBOUND from fsverity read workqueueNathan Huckleberry2023-03-141-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | WQ_UNBOUND causes significant scheduler latency on ARM64/Android. This is problematic for latency sensitive workloads, like I/O post-processing. Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related scheduler latency and improves app cold startup times by ~30ms. WQ_UNBOUND was also removed from the dm-verity workqueue for the same reason [1]. This code was tested by running Android app startup benchmarks and measuring how long the fsverity workqueue spent in the runnable state. Before Total workqueue scheduler latency: 553800us After Total workqueue scheduler latency: 18962us [1]: https://lore.kernel.org/all/20230202012348.885402-1-nhuck@google.com/ Signed-off-by: Nathan Huckleberry <nhuck@google.com> Fixes: 8a1d0f9cacc9 ("fs-verity: add data verification hooks for ->readpages()") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230310193325.620493-1-nhuck@google.com Signed-off-by: Eric Biggers <ebiggers@google.com>
* | | | | | | | | Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linuxLinus Torvalds2023-03-202-13/+25
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull fscrypt fix from Eric Biggers: "Fix a bug where when a filesystem was being unmounted, the fscrypt keyring was destroyed before inodes have been released by the Landlock LSM. This bug was found by syzbot" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: check for NULL keyring in fscrypt_put_master_key_activeref() fscrypt: improve fscrypt_destroy_keyring() documentation fscrypt: destroy keyring after security_sb_delete()
| * | | | | | | | | fscrypt: check for NULL keyring in fscrypt_put_master_key_activeref()Eric Biggers2023-03-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is a bug for fscrypt_put_master_key_activeref() to see a NULL keyring. But it used to be possible due to the bug, now fixed, where fscrypt_destroy_keyring() was called before security_sb_delete(). To be consistent with how fscrypt_destroy_keyring() uses WARN_ON for the same issue, WARN and leak the fscrypt_master_key if the keyring is NULL instead of dereferencing the NULL pointer. This is a robustness improvement, not a fix. Link: https://lore.kernel.org/r/20230313221231.272498-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
| * | | | | | | | | fscrypt: improve fscrypt_destroy_keyring() documentationEric Biggers2023-03-181-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Document that fscrypt_destroy_keyring() must be called after all potentially-encrypted inodes have been evicted. Link: https://lore.kernel.org/r/20230313221231.272498-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
| * | | | | | | | | fscrypt: destroy keyring after security_sb_delete()Eric Biggers2023-03-141-3/+12
| | |_|_|_|_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fscrypt_destroy_keyring() must be called after all potentially-encrypted inodes were evicted; otherwise it cannot safely destroy the keyring. Since inodes that are in-use by the Landlock LSM don't get evicted until security_sb_delete(), this means that fscrypt_destroy_keyring() must be called *after* security_sb_delete(). This fixes a WARN_ON followed by a NULL dereference, only possible if Landlock was being used on encrypted files. Fixes: d7e7b9af104c ("fscrypt: stop using keyrings subsystem for fscrypt_master_key") Cc: stable@vger.kernel.org Reported-by: syzbot+93e495f6a4f748827c88@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/00000000000044651705f6ca1e30@google.com Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230313221231.272498-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
* | | | | | | | | Merge tag 'nfs-for-6.3-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2023-03-204-10/+17
|\ \ \ \ \ \ \ \ \ | |_|_|_|/ / / / / |/| | | | | | | / | | |_|_|_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client fixes from Anna Schumaker: - Fix /proc/PID/io read_bytes accounting - Fix setting NLM file_lock start and end during decoding testargs - Fix timing for setting access cache timestamps * tag 'nfs-for-6.3-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Correct timing for assigning access cache timestamp lockd: set file_lock start and end when decoding nlm4 testargs NFS: Fix /proc/PID/io read_bytes for buffered reads
| * | | | | | | NFS: Correct timing for assigning access cache timestampChengen Du2023-03-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the user's login time is newer than the cache's timestamp, the original entry in the RB-tree will be replaced by a new entry. Currently, the timestamp is only set if the entry is not found in the RB-tree, which can cause the timestamp to be undefined when the entry exists. This may result in a significant increase in ACCESS operations if the timestamp is set to zero. Signed-off-by: Chengen Du <chengen.du@canonical.com> Fixes: 0eb43812c027 ("NFS: Clear the file access cache upon login”) Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | | | | | lockd: set file_lock start and end when decoding nlm4 testargsJeff Layton2023-03-142-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 6930bcbfb6ce dropped the setting of the file_lock range when decoding a nlm_lock off the wire. This causes the client side grant callback to miss matching blocks and reject the lock, only to rerequest it 30s later. Add a helper function to set the file_lock range from the start and end values that the protocol uses, and have the nlm_lock decoder call that to set up the file_lock args properly. Fixes: 6930bcbfb6ce ("lockd: detect and reject lock arguments that overflow") Reported-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Tested-by: Amir Goldstein <amir73il@gmail.com> Cc: stable@vger.kernel.org #6.0 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | | | | | NFS: Fix /proc/PID/io read_bytes for buffered readsDave Wysochanski2023-03-141-0/+3
| | |/ / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to commit 8786fde8421c ("Convert NFS from readpages to readahead"), nfs_readpages() used the old mm interface read_cache_pages() which called task_io_account_read() for each NFS page read. After this commit, nfs_readpages() is converted to nfs_readahead(), which now uses the new mm interface readahead_page(). The new interface requires callers to call task_io_account_read() themselves. In addition, to nfs_readahead() task_io_account_read() should also be called from nfs_read_folio(). Fixes: 8786fde8421c ("Convert NFS from readpages to readahead") Link: https://lore.kernel.org/linux-nfs/CAPt2mGNEYUk5u8V4abe=5MM5msZqmvzCVrtCP4Qw1n=gCHCnww@mail.gmail.com/ Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | | | | | | Merge tag 'ext4_for_linus_urgent' of ↵Linus Torvalds2023-03-191-3/+1
|\ \ \ \ \ \ \ | |_|_|_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fix from Ted Ts'o: "Fix a double unlock bug on an error path in ext4, found by smatch and syzkaller" * tag 'ext4_for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix possible double unlock when moving a directory
| * | | | | | ext4: fix possible double unlock when moving a directoryTheodore Ts'o2023-03-171-3/+1
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes: 0813299c586b ("ext4: Fix possible corruption when moving a directory") Link: https://lore.kernel.org/r/5efbe1b9-ad8b-4a4f-b422-24824d2b775c@kili.mountain Reported-by: Dan Carpenter <error27@gmail.com> Reported-by: syzbot+0c73d1d8b952c5f3d714@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | | | | | Merge tag '6.3-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2023-03-1614-195/+118
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull cifs client fixes from Steve French: "Seven cifs/smb3 client fixes, all also for stable: - four DFS fixes - multichannel reconnect fix - fix smb1 stats for cancel command - fix for set file size error path" * tag '6.3-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: use DFS root session instead of tcon ses cifs: return DFS root session id in DebugData cifs: fix use-after-free bug in refresh_cache_worker() cifs: set DFS root session in cifs_get_smb_ses() cifs: generate signkey for the channel that's reconnecting cifs: Fix smb2_set_path_size() cifs: Move the in_send statistic to __smb_send_rqst()
| * | | | | | cifs: use DFS root session instead of tcon sesPaulo Alcantara2023-03-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use DFS root session whenever possible to get new DFS referrals otherwise we might end up with an IPC tcon (tcon->ses->tcon_ipc) that doesn't respond to them. It should be safe accessing @ses->dfs_root_ses directly in cifs_inval_name_dfs_link_error() as it has same lifetime as of @tcon. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Cc: stable@vger.kernel.org # 6.2 Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | | cifs: return DFS root session id in DebugDataPaulo Alcantara2023-03-141-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return the DFS root session id in /proc/fs/cifs/DebugData to make it easier to track which IPC tcon was used to get new DFS referrals for a specific connection, and aids in debugging. A simple output of it would be Sessions: 1) Address: 192.168.1.13 Uses: 1 Capability: 0x300067 Session Status: 1 Security type: RawNTLMSSP SessionId: 0xd80000000009 User: 0 Cred User: 0 DFS root session id: 0x128006c000035 Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Cc: stable@vger.kernel.org # 6.2 Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | | cifs: fix use-after-free bug in refresh_cache_worker()Paulo Alcantara2023-03-148-164/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The UAF bug occurred because we were putting DFS root sessions in cifs_umount() while DFS cache refresher was being executed. Make DFS root sessions have same lifetime as DFS tcons so we can avoid the use-after-free bug is DFS cache refresher and other places that require IPCs to get new DFS referrals on. Also, get rid of mount group handling in DFS cache as we no longer need it. This fixes below use-after-free bug catched by KASAN [ 379.946955] BUG: KASAN: use-after-free in __refresh_tcon.isra.0+0x10b/0xc10 [cifs] [ 379.947642] Read of size 8 at addr ffff888018f57030 by task kworker/u4:3/56 [ 379.948096] [ 379.948208] CPU: 0 PID: 56 Comm: kworker/u4:3 Not tainted 6.2.0-rc7-lku #23 [ 379.948661] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014 [ 379.949368] Workqueue: cifs-dfscache refresh_cache_worker [cifs] [ 379.949942] Call Trace: [ 379.950113] <TASK> [ 379.950260] dump_stack_lvl+0x50/0x67 [ 379.950510] print_report+0x16a/0x48e [ 379.950759] ? __virt_addr_valid+0xd8/0x160 [ 379.951040] ? __phys_addr+0x41/0x80 [ 379.951285] kasan_report+0xdb/0x110 [ 379.951533] ? __refresh_tcon.isra.0+0x10b/0xc10 [cifs] [ 379.952056] ? __refresh_tcon.isra.0+0x10b/0xc10 [cifs] [ 379.952585] __refresh_tcon.isra.0+0x10b/0xc10 [cifs] [ 379.953096] ? __pfx___refresh_tcon.isra.0+0x10/0x10 [cifs] [ 379.953637] ? __pfx___mutex_lock+0x10/0x10 [ 379.953915] ? lock_release+0xb6/0x720 [ 379.954167] ? __pfx_lock_acquire+0x10/0x10 [ 379.954443] ? refresh_cache_worker+0x34e/0x6d0 [cifs] [ 379.954960] ? __pfx_wb_workfn+0x10/0x10 [ 379.955239] refresh_cache_worker+0x4ad/0x6d0 [cifs] [ 379.955755] ? __pfx_refresh_cache_worker+0x10/0x10 [cifs] [ 379.956323] ? __pfx_lock_acquired+0x10/0x10 [ 379.956615] ? read_word_at_a_time+0xe/0x20 [ 379.956898] ? lockdep_hardirqs_on_prepare+0x12/0x220 [ 379.957235] process_one_work+0x535/0x990 [ 379.957509] ? __pfx_process_one_work+0x10/0x10 [ 379.957812] ? lock_acquired+0xb7/0x5f0 [ 379.958069] ? __list_add_valid+0x37/0xd0 [ 379.958341] ? __list_add_valid+0x37/0xd0 [ 379.958611] worker_thread+0x8e/0x630 [ 379.958861] ? __pfx_worker_thread+0x10/0x10 [ 379.959148] kthread+0x17d/0x1b0 [ 379.959369] ? __pfx_kthread+0x10/0x10 [ 379.959630] ret_from_fork+0x2c/0x50 [ 379.959879] </TASK> Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Cc: stable@vger.kernel.org # 6.2 Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | | cifs: set DFS root session in cifs_get_smb_ses()Paulo Alcantara2023-03-146-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set the DFS root session pointer earlier when creating a new SMB session to prevent racing with smb2_reconnect(), cifs_reconnect_tcon() and DFS cache refresher. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Cc: stable@vger.kernel.org # 6.2 Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | | cifs: generate signkey for the channel that's reconnectingShyam Prasad N2023-03-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before my changes to how multichannel reconnects work, the primary channel was always used to do a non-binding session setup. With my changes, that is not the case anymore. Missed this place where channel at index 0 was forcibly updated with the signing key. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | | cifs: Fix smb2_set_path_size()Volker Lendecke2023-03-141-7/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If cifs_get_writable_path() finds a writable file, smb2_compound_op() must use that file's FID and not the COMPOUND_FID. Cc: stable@vger.kernel.org Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | | cifs: Move the in_send statistic to __smb_send_rqst()Zhang Xiaoxu2023-03-051-12/+9
| | |/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When send SMB_COM_NT_CANCEL and RFC1002_SESSION_REQUEST, the in_send statistic was lost. Let's move the in_send statistic to the send function to avoid this scenario. Fixes: 7ee1af765dfa ("[CIFS]") Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
* | | | | | Merge tag 'mm-hotfixes-stable-2023-03-14-16-51' of ↵Linus Torvalds2023-03-141-2/+17
|\ \ \ \ \ \ | |_|/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Eleven hotfixes. Four of these are cc:stable and the remainder address post-6.2 issues or aren't considered suitable for backporting. Seven of these fixes are for MM" * tag 'mm-hotfixes-stable-2023-03-14-16-51' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/damon/paddr: fix folio_nr_pages() after folio_put() in damon_pa_mark_accessed_or_deactivate() mm/damon/paddr: fix folio_size() call after folio_put() in damon_pa_young() ocfs2: fix data corruption after failed write migrate_pages: try migrate in batch asynchronously firstly migrate_pages: move split folios processing out of migrate_pages_batch() migrate_pages: fix deadlock in batched migration .mailmap: add Alexandre Ghiti personal email address mailmap: correct Dikshita Agarwal's Qualcomm email address mailmap: updates for Jarkko Sakkinen mm/userfaultfd: propagate uffd-wp bit when PTE-mapping the huge zeropage mm: teach mincore_hugetlb about pte markers
| * | | | | ocfs2: fix data corruption after failed writeJan Kara via Ocfs2-devel2023-03-071-2/+17
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When buffered write fails to copy data into underlying page cache page, ocfs2_write_end_nolock() just zeroes out and dirties the page. This can leave dirty page beyond EOF and if page writeback tries to write this page before write succeeds and expands i_size, page gets into inconsistent state where page dirty bit is clear but buffer dirty bits stay set resulting in page data never getting written and so data copied to the page is lost. Fix the problem by invalidating page beyond EOF after failed write. Link: https://lkml.kernel.org/r/20230302153843.18499-1-jack@suse.cz Fixes: 6dbf7bb55598 ("fs: Don't invalidate page buffers in block_write_full_page()") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | | | | Merge tag 'xfs-6.3-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2023-03-122-21/+40
|\ \ \ \ \ | | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull xfs fixes from Darrick Wong: - Fix a crash if mount time quotacheck fails when there are inodes queued for garbage collection. - Fix an off by one error when discarding folios after writeback failure. * tag 'xfs-6.3-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix off-by-one-block in xfs_discard_folio() xfs: quotacheck failure can race with background inode inactivation
| * | | | xfs: fix off-by-one-block in xfs_discard_folio()Dave Chinner2023-03-051-7/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent writeback corruption fixes changed the code in xfs_discard_folio() to calculate a byte range to for punching delalloc extents. A mistake was made in using round_up(pos) for the end offset, because when pos points at the first byte of a block, it does not get rounded up to point to the end byte of the block. hence the punch range is short, and this leads to unexpected behaviour in certain cases in xfs_bmap_punch_delalloc_range. e.g. pos = 0 means we call xfs_bmap_punch_delalloc_range(0,0), so there is no previous extent and it rounds up the punch to the end of the delalloc extent it found at offset 0, not the end of the range given to xfs_bmap_punch_delalloc_range(). Fix this by handling the zero block offset case correctly. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=217030 Link: https://lore.kernel.org/linux-xfs/Y+vOfaxIWX1c%2Fyy9@bfoster/ Fixes: 7348b322332d ("xfs: xfs_bmap_punch_delalloc_range() should take a byte range") Reported-by: Pengfei Xu <pengfei.xu@intel.com> Found-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
| * | | | xfs: quotacheck failure can race with background inode inactivationDave Chinner2023-03-051-14/+26
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The background inode inactivation can attached dquots to inodes, but this can race with a foreground quotacheck failure that leads to disabling quotas and freeing the mp->m_quotainfo structure. The background inode inactivation then tries to allocate a quota, tries to dereference mp->m_quotainfo, and crashes like so: XFS (loop1): Quotacheck: Unsuccessful (Error -5): Disabling quotas. xfs filesystem being mounted at /root/syzkaller.qCVHXV/0/file0 supports timestamps until 2038 (0x7fffffff) BUG: kernel NULL pointer dereference, address: 00000000000002a8 .... CPU: 0 PID: 161 Comm: kworker/0:4 Not tainted 6.2.0-c9c3395d5e3d #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: xfs-inodegc/loop1 xfs_inodegc_worker RIP: 0010:xfs_dquot_alloc+0x95/0x1e0 .... Call Trace: <TASK> xfs_qm_dqread+0x46/0x440 xfs_qm_dqget_inode+0x154/0x500 xfs_qm_dqattach_one+0x142/0x3c0 xfs_qm_dqattach_locked+0x14a/0x170 xfs_qm_dqattach+0x52/0x80 xfs_inactive+0x186/0x340 xfs_inodegc_worker+0xd3/0x430 process_one_work+0x3b1/0x960 worker_thread+0x52/0x660 kthread+0x161/0x1a0 ret_from_fork+0x29/0x50 </TASK> .... Prevent this race by flushing all the queued background inode inactivations pending before purging all the cached dquots when quotacheck fails. Reported-by: Pengfei Xu <pengfei.xu@intel.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* | | | Merge tag 'vfs.misc.v6.3-rc2' of ↵Linus Torvalds2023-03-122-3/+2
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull vfs fixes from Christian Brauner: - When allocating pages for a watch queue failed, we didn't return an error causing userspace to proceed even though all subsequent notifcations would be lost. Make sure to return an error. - Fix a misformed tree entry for the idmapping maintainers entry. - When setting file leases from an idmapped mount via generic_setlease() we need to take the idmapping into account otherwise taking a lease would fail from an idmapped mount. - Remove two redundant assignments, one in splice code and the other in locks code, that static checkers complained about. * tag 'vfs.misc.v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: filelocks: use mount idmapping for setlease permission check fs/locks: Remove redundant assignment to cmd splice: Remove redundant assignment to ret MAINTAINERS: repair a malformed T: entry in IDMAPPED MOUNTS watch_queue: fix IOC_WATCH_QUEUE_SET_SIZE alloc error paths
| * | | | filelocks: use mount idmapping for setlease permission checkSeth Forshee2023-03-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A user should be allowed to take out a lease via an idmapped mount if the fsuid matches the mapped uid of the inode. generic_setlease() is checking the unmapped inode uid, causing these operations to be denied. Fix this by comparing against the mapped inode uid instead of the unmapped uid. Fixes: 9caccd41541a ("fs: introduce MOUNT_ATTR_IDMAP") Cc: stable@vger.kernel.org Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
| * | | | fs/locks: Remove redundant assignment to cmdJiapeng Chong2023-03-091-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Variable 'cmd' set but not used. fs/locks.c:2428:3: warning: Value stored to 'cmd' is never read. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4439 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
| * | | | splice: Remove redundant assignment to retJiapeng Chong2023-03-091-1/+0
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The variable ret belongs to redundant assignment and can be deleted. fs/splice.c:940:2: warning: Value stored to 'ret' is never read. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4406 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
* | | | Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds2023-03-1211-30/+87
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Bug fixes and regressions for ext4, the most serious of which is a potential deadlock during directory renames that was introduced during the merge window discovered by a combination of syzbot and lockdep" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: zero i_disksize when initializing the bootloader inode ext4: make sure fs error flag setted before clear journal error ext4: commit super block if fs record error when journal record without error ext4, jbd2: add an optimized bmap for the journal inode ext4: fix WARNING in ext4_update_inline_data ext4: move where set the MAY_INLINE_DATA flag is set ext4: Fix deadlock during directory rename ext4: Fix comment about the 64BIT feature docs: ext4: modify the group desc size to 64 ext4: fix another off-by-one fsmap error on 1k block filesystems ext4: fix RENAME_WHITEOUT handling for inline directories ext4: make kobj_type structures constant ext4: fix cgroup writeback accounting with fs-layer encryption
| * | | | ext4: zero i_disksize when initializing the bootloader inodeZhihao Cheng2023-03-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the boot loader inode has never been used before, the EXT4_IOC_SWAP_BOOT inode will initialize it, including setting the i_size to 0. However, if the "never before used" boot loader has a non-zero i_size, then i_disksize will be non-zero, and the inconsistency between i_size and i_disksize can trigger a kernel warning: WARNING: CPU: 0 PID: 2580 at fs/ext4/file.c:319 CPU: 0 PID: 2580 Comm: bb Not tainted 6.3.0-rc1-00004-g703695902cfa RIP: 0010:ext4_file_write_iter+0xbc7/0xd10 Call Trace: vfs_write+0x3b1/0x5c0 ksys_write+0x77/0x160 __x64_sys_write+0x22/0x30 do_syscall_64+0x39/0x80 Reproducer: 1. create corrupted image and mount it: mke2fs -t ext4 /tmp/foo.img 200 debugfs -wR "sif <5> size 25700" /tmp/foo.img mount -t ext4 /tmp/foo.img /mnt cd /mnt echo 123 > file 2. Run the reproducer program: posix_memalign(&buf, 1024, 1024) fd = open("file", O_RDWR | O_DIRECT); ioctl(fd, EXT4_IOC_SWAP_BOOT); write(fd, buf, 1024); Fix this by setting i_disksize as well as i_size to zero when initiaizing the boot loader inode. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217159 Cc: stable@kernel.org Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://lore.kernel.org/r/20230308032643.641113-1-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>