summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* mm/writeback: Add folio_redirty_for_writepage()Matthew Wilcox (Oracle)2021-10-181-0/+1
| | | | | | | | | | | Reimplement redirty_page_for_writepage() as a wrapper around folio_redirty_for_writepage(). Account the number of pages in the folio, add kernel-doc and move the prototype to writeback.h. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Howells <dhowells@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz>
* mm/filemap: Convert page wait queues to be foliosMatthew Wilcox (Oracle)2021-09-271-8/+8
| | | | | | | | | | | | | | Reinforce that page flags are actually in the head page by changing the type from page to folio. Increases the size of cachefiles by two bytes, but the kernel core is unchanged in size. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: David Howells <dhowells@redhat.com>
* mm/writeback: Add folio_wait_writeback()Matthew Wilcox (Oracle)2021-09-271-4/+5
| | | | | | | | | | | | | | | | | | | | | wait_on_page_writeback_killable() only has one caller, so convert it to call folio_wait_writeback_killable(). For the wait_on_page_writeback() callers, add a compatibility wrapper around folio_wait_writeback(). Turning PageWriteback() into folio_test_writeback() eliminates a call to compound_head() which saves 8 bytes and 15 bytes in the two functions. Unfortunately, that is more than offset by adding the wait_on_page_writeback compatibility wrapper for a net increase in text of 7 bytes. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: David Howells <dhowells@redhat.com>
* mm/filemap: Add __folio_lock_async()Matthew Wilcox (Oracle)2021-09-271-1/+1
| | | | | | | | | | | | | | | | | | | There aren't any actual callers of lock_page_async(), so remove it. Convert filemap_update_page() to call __folio_lock_async(). __folio_lock_async() is 21 bytes smaller than __lock_page_async(), but the real savings come from using a folio in filemap_update_page(), shrinking it from 515 bytes to 404 bytes, saving 110 bytes. The text shrinks by 132 bytes in total. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: David Howells <dhowells@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com>
* Merge tag '5.15-rc2-ksmbd-fixes' of git://git.samba.org/ksmbdLinus Torvalds2021-09-268-260/+164
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ksmbd fixes from Steve French: "Five fixes for the ksmbd kernel server, including three security fixes: - remove follow symlinks support - use LOOKUP_BENEATH to prevent out of share access - SMB3 compounding security fix - fix for returning the default streams correctly, fixing a bug when writing ppt or doc files from some clients - logging more clearly that ksmbd is experimental (at module load time)" * tag '5.15-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: use LOOKUP_BENEATH to prevent the out of share access ksmbd: remove follow symlinks support ksmbd: check protocol id in ksmbd_verify_smb_message() ksmbd: add default data stream name in FILE_STREAM_INFORMATION ksmbd: log that server is experimental at module load
| * ksmbd: use LOOKUP_BENEATH to prevent the out of share accessHyunchul Lee2021-09-245-206/+140
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instead of removing '..' in a given path, call kern_path with LOOKUP_BENEATH flag to prevent the out of share access. ran various test on this: smb2-cat-async smb://127.0.0.1/homes/../out_of_share smb2-cat-async smb://127.0.0.1/homes/foo/../../out_of_share smbclient //127.0.0.1/homes -c "mkdir ../foo2" smbclient //127.0.0.1/homes -c "rename bar ../bar" Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: Ralph Boehme <slow@samba.org> Tested-by: Steve French <smfrench@gmail.com> Tested-by: Namjae Jeon <linkinjeon@kernel.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * ksmbd: remove follow symlinks supportNamjae Jeon2021-09-222-56/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use LOOKUP_NO_SYMLINKS flags for default lookup to prohibit the middle of symlink component lookup and remove follow symlinks parameter support. We re-implement it as reparse point later. Test result: smbclient -Ulinkinjeon%1234 //172.30.1.42/share -c "get hacked/passwd passwd" NT_STATUS_OBJECT_NAME_NOT_FOUND opening remote file \hacked\passwd Cc: Ralph Böhme <slow@samba.org> Cc: Steve French <smfrench@gmail.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * ksmbd: check protocol id in ksmbd_verify_smb_message()Namjae Jeon2021-09-223-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | When second smb2 pdu has invalid protocol id, ksmbd doesn't detect it and allow to process smb2 request. This patch add the check it in ksmbd_verify_smb_message() and don't use protocol id of smb2 request as protocol id of response. Reviewed-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> Reviewed-by: Ralph Böhme <slow@samba.org> Reported-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * ksmbd: add default data stream name in FILE_STREAM_INFORMATIONNamjae Jeon2021-09-211-5/+3
| | | | | | | | | | | | | | | | | | | | Windows client expect to get default stream name(::DATA) in FILE_STREAM_INFORMATION response even if there is no stream data in file. This patch fix update failure when writing ppt or doc files. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-By: Tom Talpey <tom@talpey.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * ksmbd: log that server is experimental at module loadSteve French2021-09-211-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While we are working through detailed security reviews of ksmbd server code we should remind users that it is an experimental module by adding a warning when the module loads. Currently the module shows as experimental in Kconfig and is disabled by default, but we don't want to confuse users. Although ksmbd passes a wide variety of the important functional tests (since initial focus had been largely on functional testing such as smbtorture, xfstests etc.), and ksmbd has added key security features (e.g. GCM256 encryption, Kerberos support), there are ongoing detailed reviews of the code base for path processing and network buffer decoding, and this patch reminds users that the module should be considered "experimental." Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
* | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2021-09-252-3/+8
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc fixes from Andrew Morton: "16 patches. Subsystems affected by this patch series: xtensa, sh, ocfs2, scripts, lib, and mm (memory-failure, kasan, damon, shmem, tools, pagecache, debug, and pagemap)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: fix uninitialized use in overcommit_policy_handler mm/memory_failure: fix the missing pte_unmap() call kasan: always respect CONFIG_KASAN_STACK sh: pgtable-3level: fix cast to pointer from integer of different size mm/debug: sync up latest migrate_reason to migrate_reason_names mm/debug: sync up MR_CONTIG_RANGE and MR_LONGTERM_PIN mm: fs: invalidate bh_lrus for only cold path lib/zlib_inflate/inffast: check config in C to avoid unused function warning tools/vm/page-types: remove dependency on opt_file for idle page tracking scripts/sorttable: riscv: fix undeclared identifier 'EM_RISCV' error ocfs2: drop acl cache for directories too mm/shmem.c: fix judgment error in shmem_is_huge() xtensa: increase size of gcc stack frame check mm/damon: don't use strnlen() with known-bogus source length kasan: fix Kconfig check of CC_HAS_WORKING_NOSANITIZE_ADDRESS mm, hwpoison: add is_free_buddy_page() in HWPoisonHandlable()
| * | mm: fs: invalidate bh_lrus for only cold pathMinchan Kim2021-09-241-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel test robot reported the regression of fio.write_iops[1] with commit 8cc621d2f45d ("mm: fs: invalidate BH LRU during page migration"). Since lru_add_drain is called frequently, invalidate bh_lrus there could increase bh_lrus cache miss ratio, which needs more IO in the end. This patch moves the bh_lrus invalidation from the hot path( e.g., zap_page_range, pagevec_release) to cold path(i.e., lru_add_drain_all, lru_cache_disable). Zhengjun Xing confirmed "I test the patch, the regression reduced to -2.9%" [1] https://lore.kernel.org/lkml/20210520083144.GD14190@xsang-OptiPlex-9020/ [2] 8cc621d2f45d, mm: fs: invalidate BH LRU during page migration Link: https://lkml.kernel.org/r/20210907212347.1977686-1-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: kernel test robot <oliver.sang@intel.com> Reviewed-by: Chris Goldsworthy <cgoldswo@codeaurora.org> Tested-by: "Xing, Zhengjun" <zhengjun.xing@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | ocfs2: drop acl cache for directories tooWengang Wang2021-09-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ocfs2_data_convert_worker() is currently dropping any cached acl info for FILE before down-converting meta lock. It should also drop for DIRECTORY. Otherwise the second acl lookup returns the cached one (from VFS layer) which could be already stale. The problem we are seeing is that the acl changes on one node doesn't get refreshed on other nodes in the following case: Node 1 Node 2 -------------- ---------------- getfacl dir1 getfacl dir1 <-- this is OK setfacl -m u:user1:rwX dir1 getfacl dir1 <-- see the change for user1 getfacl dir1 <-- can't see change for user1 Link: https://lkml.kernel.org/r/20210903012631.6099-1-wen.gang.wang@oracle.com Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'io_uring-5.15-2021-09-25' of git://git.kernel.dk/linux-blockLinus Torvalds2021-09-252-16/+72
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull io_uring fixes from Jens Axboe: "This one looks a bit bigger than it is, but that's mainly because 2/3 of it is enabling IORING_OP_CLOSE to close direct file descriptors. We've had a few folks using them and finding it confusing that the way to close them is through using -1 for file update, this just brings API symmetry for direct descriptors. Hence I think we should just do this now and have a better API for 5.15 release. There's some room for de-duplicating the close code, but we're leaving that for the next merge window. Outside of that, just small fixes: - Poll race fixes (Hao) - io-wq core dump exit fix (me) - Reschedule around potentially intensive tctx and buffer iterators on teardown (me) - Fix for always ending up punting files update to io-wq (me) - Put the provided buffer meta data under memcg accounting (me) - Tweak for io_write(), removing dead code that was added with the iterator changes in this release (Pavel)" * tag 'io_uring-5.15-2021-09-25' of git://git.kernel.dk/linux-block: io_uring: make OP_CLOSE consistent with direct open io_uring: kill extra checks in io_write() io_uring: don't punt files update to io-wq unconditionally io_uring: put provided buffer meta data under memcg accounting io_uring: allow conditional reschedule for intensive iterators io_uring: fix potential req refcount underflow io_uring: fix missing set of EPOLLONESHOT for CQ ring overflow io_uring: fix race between poll completion and cancel_hash insertion io-wq: ensure we exit if thread group is exiting
| * | | io_uring: make OP_CLOSE consistent with direct openPavel Begunkov2021-09-241-1/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From recently open/accept are now able to manipulate fixed file table, but it's inconsistent that close can't. Close the gap, keep API same as with open/accept, i.e. via sqe->file_slot. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | io_uring: kill extra checks in io_write()Pavel Begunkov2021-09-241-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't retry short writes and so we would never get to async setup in io_write() in that case. Thus ret2 > 0 is always false and iov_iter_advance() is never used. Apparently, the same is found by Coverity, which complains on the code. Fixes: cd65869512ab ("io_uring: use iov_iter state save/restore helpers") Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5b33e61034748ef1022766efc0fb8854cfcf749c.1632500058.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | io_uring: don't punt files update to io-wq unconditionallyJens Axboe2021-09-241-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no reason to punt it unconditionally, we just need to ensure that the submit lock grabbing is conditional. Fixes: 05f3fb3c5397 ("io_uring: avoid ring quiesce for fixed file set unregister and update") Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | io_uring: put provided buffer meta data under memcg accountingJens Axboe2021-09-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For each provided buffer, we allocate a struct io_buffer to hold the data associated with it. As a large number of buffers can be provided, account that data with memcg. Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS") Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | io_uring: allow conditional reschedule for intensive iteratorsJens Axboe2021-09-241-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we have a lot of threads and rings, the tctx list can get quite big. This is especially true if we keep creating new threads and rings. Likewise for the provided buffers list. Be nice and insert a conditional reschedule point while iterating the nodes for deletion. Link: https://lore.kernel.org/io-uring/00000000000064b6b405ccb41113@google.com/ Reported-by: syzbot+111d2a03f51f5ae73775@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | io_uring: fix potential req refcount underflowHao Xu2021-09-241-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For multishot mode, there may be cases like: iowq original context io_poll_add _arm_poll() mask = vfs_poll() is not 0 if mask (2) io_poll_complete() compl_unlock (interruption happens tw queued to original context) io_poll_task_func() compl_lock (3) done = io_poll_complete() is true compl_unlock put req ref (1) if (poll->flags & EPOLLONESHOT) put req ref EPOLLONESHOT flag in (1) may be from (2) or (3), so there are multiple combinations that can cause ref underfow. Let's address it by: - check the return value in (2) as done - change (1) to if (done) in this way, we only do ref put in (1) if 'oneshot flag' is from (2) - do poll.done check in io_poll_task_func(), so that we won't put ref for the second time. Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/20210922101238.7177-4-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | io_uring: fix missing set of EPOLLONESHOT for CQ ring overflowHao Xu2021-09-241-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should set EPOLLONESHOT if cqring_fill_event() returns false since io_poll_add() decides to put req or not by it. Fixes: 5082620fb2ca ("io_uring: terminate multishot poll for CQ ring overflow") Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/20210922101238.7177-3-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | io_uring: fix race between poll completion and cancel_hash insertionHao Xu2021-09-241-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If poll arming and poll completion runs in parallel, there maybe races. For instance, run io_poll_add in iowq and io_poll_task_func in original context, then: iowq original context io_poll_add vfs_poll (interruption happens tw queued to original context) io_poll_task_func generate cqe del from cancel_hash[] if !poll.done insert to cancel_hash[] The entry left in cancel_hash[], similar case for fast poll. Fix it by set poll.done = true when del from cancel_hash[]. Fixes: 5082620fb2ca ("io_uring: terminate multishot poll for CQ ring overflow") Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/20210922101238.7177-2-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | io-wq: ensure we exit if thread group is exitingJens Axboe2021-09-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dave reports that a coredumping workload gets stuck in 5.15-rc2, and identified the culprit in the Fixes line below. The problem is that relying solely on fatal_signal_pending() to gate whether to exit or not fails miserably if a process gets eg SIGILL sent. Don't exclusively rely on fatal signals, also check if the thread group is exiting. Fixes: 15e20db2e0ce ("io-wq: only exit on fatal signals") Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | Merge tag 'erofs-for-5.15-rc3-fixes' of ↵Linus Torvalds2021-09-252-2/+3
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "Two bugfixes to fix the 4KiB blockmap chunk format availability and a dangling pointer usage. There is also a trivial cleanup to clarify compacted_2b if compacted_4b_initial > totalidx. Summary: - fix the dangling pointer use in erofs_lookup tracepoint - fix unsupported chunk format check - zero out compacted_2b if compacted_4b_initial > totalidx" * tag 'erofs-for-5.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: clear compacted_2b if compacted_4b_initial > totalidx erofs: fix misbehavior of unsupported chunk format check erofs: fix up erofs_lookup tracepoint
| * | | | erofs: clear compacted_2b if compacted_4b_initial > totalidxYue Hu2021-09-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the whole indexes will only be compacted 4B if compacted_4b_initial > totalidx. So, the calculated compacted_2b is worthless for that case. It may waste CPU resources. No need to update compacted_4b_initial as mkfs since it's used to fulfill the alignment of the 1st compacted_2b pack and would handle the case above. We also need to clarify compacted_4b_end here. It's used for the last lclusters which aren't fitted in the previous compacted_2b packs. Some messages are from Xiang. Link: https://lore.kernel.org/r/20210914035915.1190-1-zbestahu@gmail.com Signed-off-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> [ Gao Xiang: it's enough to use "compacted_4b_initial < totalidx". ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
| * | | | erofs: fix misbehavior of unsupported chunk format checkGao Xiang2021-09-231-1/+1
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unsupported chunk format should be checked with "if (vi->chunkformat & ~EROFS_CHUNK_FORMAT_ALL)" Found when checking with 4k-byte blockmap (although currently mkfs uses inode chunk indexes format by default.) Link: https://lore.kernel.org/r/20210922095141.233938-1-hsiangkao@linux.alibaba.com Fixes: c5aa903a59db ("erofs: support reading chunk-based uncompressed files") Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | | | Merge tag '5.15-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2021-09-254-9/+21
|\ \ \ \ | |_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull cifs fixes from Steve French: "Six small cifs/smb3 fixes, two for stable: - important fix for deferred close (found by a git functional test) related to attribute caching on close. - four (two cosmetic, two more serious) small fixes for problems pointed out by smatch via Dan Carpenter - fix for comment formatting problems pointed out by W=1" * tag '5.15-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix incorrect check for null pointer in header_assemble smb3: correct server pointer dereferencing check to be more consistent smb3: correct smb3 ACL security descriptor cifs: Clear modified attribute bit from inode flags cifs: Deal with some warnings from W=1 cifs: fix a sign extension bug
| * | | cifs: fix incorrect check for null pointer in header_assembleSteve French2021-09-231-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although very unlikely that the tlink pointer would be null in this case, get_next_mid function can in theory return null (but not an error) so need to check for null (not for IS_ERR, which can not be returned here). Address warning: fs/smbfs_client/connect.c:2392 cifs_match_super() warn: 'tlink' isn't an ERR_PTR Pointed out by Dan Carpenter via smatch code analysis tool CC: stable@vger.kernel.org Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | smb3: correct server pointer dereferencing check to be more consistentSteve French2021-09-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Address warning: fs/smbfs_client/misc.c:273 header_assemble() warn: variable dereferenced before check 'treeCon->ses->server' Pointed out by Dan Carpenter via smatch code analysis tool Although the check is likely unneeded, adding it makes the code more consistent and easier to read, as the same check is done elsewhere in the function. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | smb3: correct smb3 ACL security descriptorSteve French2021-09-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Address warning: fs/smbfs_client/smb2pdu.c:2425 create_sd_buf() warn: struct type mismatch 'smb3_acl vs cifs_acl' Pointed out by Dan Carpenter via smatch code analysis tool Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | cifs: Clear modified attribute bit from inode flagsSteve French2021-09-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clear CIFS_INO_MODIFIED_ATTR bit from inode flags after updating mtime and ctime Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Cc: stable@vger.kernel.org # 5.13+ Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | cifs: Deal with some warnings from W=1David Howells2021-09-231-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Deal with some warnings generated from make W=1: (1) Add/remove/fix kerneldoc parameters descriptions. (2) Turn cifs' rqst_page_get_length()'s banner comment into a kerneldoc comment. It should probably be prefixed with "cifs_" though. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | cifs: fix a sign extension bugDan Carpenter2021-09-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem is the mismatched types between "ctx->total_len" which is an unsigned int, "rc" which is an int, and "ctx->rc" which is a ssize_t. The code does: ctx->rc = (rc == 0) ? ctx->total_len : rc; We want "ctx->rc" to store the negative "rc" error code. But what happens is that "rc" is type promoted to a high unsigned int and 'ctx->rc" will store the high positive value instead of a negative value. The fix is to change "rc" from an int to a ssize_t. Fixes: c610c4b619e5 ("CIFS: Add asynchronous write support through kernel AIO") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <stfrench@microsoft.com>
* | | | Merge tag 'ceph-for-5.15-rc3' of git://github.com/ceph/ceph-clientLinus Torvalds2021-09-241-2/+2
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph fix from Ilya Dryomov: "A fix for a potential array out of bounds access from Dan" * tag 'ceph-for-5.15-rc3' of git://github.com/ceph/ceph-client: ceph: fix off by one bugs in unsafe_request_wait()
| * | | | ceph: fix off by one bugs in unsafe_request_wait()Dan Carpenter2021-09-211-2/+2
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "> max" tests should be ">= max" to prevent an out of bounds access on the next lines. Fixes: e1a4541ec0b9 ("ceph: flush the mdlog before waiting on unsafe reqs") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | | | Merge tag 'fixes_for_v5.15-rc3' of ↵Linus Torvalds2021-09-242-10/+10
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull misc filesystem fixes from Jan Kara: "A for ext2 sleep in atomic context in case of some fs problems and a cleanup of an invalidate_lock initialization" * tag 'fixes_for_v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: fix sleeping in atomic bugs on error mm: Fully initialize invalidate_lock, amend lock class later
| * | | | ext2: fix sleeping in atomic bugs on errorDan Carpenter2021-09-221-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ext2_error() function syncs the filesystem so it sleeps. The caller is holding a spinlock so it's not allowed to sleep. ext2_statfs() <- disables preempt -> ext2_count_free_blocks() -> ext2_get_group_desc() Fix this by using WARN() to print an error message and a stack trace instead of using ext2_error(). Link: https://lore.kernel.org/r/20210921203233.GA16529@kili Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * | | | mm: Fully initialize invalidate_lock, amend lock class laterSebastian Andrzej Siewior2021-09-171-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function __init_rwsem() is not part of the official API, it just a helper function used by init_rwsem(). Changing the lock's class and name should be done by using lockdep_set_class_and_name() after the has been fully initialized. The overhead of the additional class struct and setting it twice is negligible and it works across all locks. Fully initialize the lock with init_rwsem() and then set the custom class and name for the lock. Fixes: 730633f0b7f95 ("mm: Protect operations adding pages to page cache with invalidate_lock") Link: https://lore.kernel.org/r/20210901084403.g4fezi23cixemlhh@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jan Kara <jack@suse.cz>
* | | | | Merge tag 'for-5.15-rc2-tag' of ↵Linus Torvalds2021-09-234-5/+32
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - regression fix for leak of transaction handle after verity rollback failure - properly reset device last error between mounts - improve one error handling case when checksumming bios - fixup confusing displayed size of space info free space * tag 'for-5.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: prevent __btrfs_dump_space_info() to underflow its free space btrfs: fix mount failure due to past and transient device flush error btrfs: fix transaction handle leak after verity rollback failure btrfs: replace BUG_ON() in btrfs_csum_one_bio() with proper error handling
| * | | | | btrfs: prevent __btrfs_dump_space_info() to underflow its free spaceQu Wenruo2021-09-171-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's not uncommon where __btrfs_dump_space_info() gets called under over-commit situations. In that case free space would underflow as total allocated space is not enough to handle all the over-committed space. Such underflow values can sometimes cause confusion for users enabled enospc_debug mount option, and takes some seconds for developers to convert the underflow value to signed result. Just output the free space as s64 to avoid such problem. Reported-by: Eli V <eliventer@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CAJtFHUSy4zgyhf-4d9T+KdJp9w=UgzC2A0V=VtmaeEpcGgm1-Q@mail.gmail.com/ CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | btrfs: fix mount failure due to past and transient device flush errorFilipe Manana2021-09-171-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we get an error flushing one device, during a super block commit, we record the error in the device structure, in the field 'last_flush_error'. This is used to later check if we should error out the super block commit, depending on whether the number of flush errors is greater than or equals to the maximum tolerated device failures for a raid profile. However if we get a transient device flush error, unmount the filesystem and later try to mount it, we can fail the mount because we treat that past error as critical and consider the device is missing. Even if it's very likely that the error will happen again, as it's probably due to a hardware related problem, there may be cases where the error might not happen again. One example is during testing, and a test case like the new generic/648 from fstests always triggers this. The test cases generic/019 and generic/475 also trigger this scenario, but very sporadically. When this happens we get an error like this: $ mount /dev/sdc /mnt mount: /mnt wrong fs type, bad option, bad superblock on /dev/sdc, missing codepage or helper program, or other error. $ dmesg (...) [12918.886926] BTRFS warning (device sdc): chunk 13631488 missing 1 devices, max tolerance is 0 for writable mount [12918.888293] BTRFS warning (device sdc): writable mount is not allowed due to too many missing devices [12918.890853] BTRFS error (device sdc): open_ctree failed The failure happens because when btrfs_check_rw_degradable() is called at mount time, or at remount from RO to RW time, is sees a non zero value in a device's ->last_flush_error attribute, and therefore considers that the device is 'missing'. Fix this by setting a device's ->last_flush_error to zero when we close a device, making sure the error is not seen on the next mount attempt. We only need to track flush errors during the current mount, so that we never commit a super block if such errors happened. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | btrfs: fix transaction handle leak after verity rollback failureFilipe Manana2021-09-171-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During a verity rollback, if we fail to update the inode or delete the orphan, we abort the transaction and return without releasing our transaction handle. Fix that by releasing the handle. Fixes: 146054090b0859 ("btrfs: initial fsverity support") Fixes: 705242538ff348 ("btrfs: verity metadata orphan items") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | btrfs: replace BUG_ON() in btrfs_csum_one_bio() with proper error handlingQu Wenruo2021-09-171-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a BUG_ON() in btrfs_csum_one_bio() to catch code logic error. It has indeed caught several bugs during subpage development. But the BUG_ON() itself will bring down the whole system which is an overkill. Replace it with a WARN() and exit gracefully, so that it won't crash the whole system while we can still catch the code logic error. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* | | | | | Merge tag 'nfsd-5.15-2' of ↵Linus Torvalds2021-09-222-14/+15
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Critical bug fixes: - Fix crash in NLM TEST procedure - NFSv4.1+ backchannel not restored after PATH_DOWN" * tag 'nfsd-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: back channel stuck in SEQ4_STATUS_CB_PATH_DOWN NLM: Fix svcxdr_encode_owner()
| * | | | | | nfsd: back channel stuck in SEQ4_STATUS_CB_PATH_DOWNDai Ngo2021-09-171-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the back channel enters SEQ4_STATUS_CB_PATH_DOWN state, the client recovers by sending BIND_CONN_TO_SESSION but the server fails to recover the back channel and leaves it as NFSD4_CB_DOWN. Fix by enhancing nfsd4_bind_conn_to_session to probe the back channel by calling nfsd4_probe_callback. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | | | NLM: Fix svcxdr_encode_owner()Chuck Lever2021-09-171-11/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dai Ngo reports that, since the XDR overhaul, the NLM server crashes when the TEST procedure wants to return NLM_DENIED. There is a bug in svcxdr_encode_owner() that none of our standard test cases found. Replace the open-coded function with a call to an appropriate pre-fabricated XDR helper. Reported-by: Dai Ngo <Dai.Ngo@oracle.com> Fixes: a6a63ca5652e ("lockd: Common NLM XDR helpers") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | | | | | qnx4: work around gcc false positive warning bugLinus Torvalds2021-09-211-9/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit b7213ffa0e58 ("qnx4: avoid stringop-overread errors") I tried to teach gcc about how the directory entry structure can be two different things depending on a status flag. It made the code clearer, and it seemed to make gcc happy. However, Arnd points to a gcc bug, where despite using two different members of a union, gcc then gets confused, and uses the size of one of the members to decide if a string overrun happens. And not necessarily the rigth one. End result: with some configurations, gcc-11 will still complain about the source buffer size being overread: fs/qnx4/dir.c: In function 'qnx4_readdir': fs/qnx4/dir.c:76:32: error: 'strnlen' specified bound [16, 48] exceeds source size 1 [-Werror=stringop-overread] 76 | size = strnlen(name, size); | ^~~~~~~~~~~~~~~~~~~ fs/qnx4/dir.c:26:22: note: source object declared here 26 | char de_name; | ^~~~~~~ because gcc will get confused about which union member entry is actually getting accessed, even when the source code is very clear about it. Gcc internally will have combined two "redundant" pointers (pointing to different union elements that are at the same offset), and takes the size checking from one or the other - not necessarily the right one. This is clearly a gcc bug, but we can work around it fairly easily. The biggest thing here is the big honking comment about why we do what we do. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578#c6 Reported-and-tested-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | Merge tag 'afs-fixes-20210913' of ↵Linus Torvalds2021-09-2015-118/+287
|\ \ \ \ \ \ \ | |_|_|_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS fixes from David Howells: "Fixes for AFS problems that can cause data corruption due to interaction with another client modifying data cached locally: - When d_revalidating a dentry, don't look at the inode to which it points. Only check the directory to which the dentry belongs. This was confusing things and causing the silly-rename cleanup code to remove the file now at the dentry of a file that got deleted. - Fix mmap data coherency. When a callback break is received that relates to a file that we have cached, the data content may have been changed (there are other reasons, such as the user's rights having been changed). However, we're checking it lazily, only on entry to the kernel, which doesn't happen if we have a writeable shared mapped page on that file. We make the kernel keep track of mmapped files and clear all PTEs mapping to that file as soon as the callback comes in by calling unmap_mapping_pages() (we don't necessarily want to zap the pagecache). This causes the kernel to be reentered when userspace tries to access the mmapped address range again - and at that point we can query the server and, if we need to, zap the page cache. Ideally, I would check each file at the point of notification, but that involves poking the server[*] - which is holding an exclusive lock on the vnode it is changing, waiting for all the clients it notified to reply. This could then deadlock against the server. Further, invalidating the pagecache might call ->launder_page(), which would try to write to the file, which would definitely deadlock. (AFS doesn't lease file access). [*] Checking to see if the file content has changed is a matter of comparing the current data version number, but we have to ask the server for that. We also need to get a new callback promise and we need to poke the server for that too. - Add some more points at which the inode is validated, since we're doing it lazily, notably in ->read_iter() and ->page_mkwrite(), but also when performing some directory operations. Ideally, checking in ->read_iter() would be done in some derivation of filemap_read(). If we're going to call the server to read the file, then we get the file status fetch as part of that. - The above is now causing us to make a lot more calls to afs_validate() to check the inode - and afs_validate() takes the RCU read lock each time to make a quick check (ie. afs_check_validity()). This is entirely for the purpose of checking cb_s_break to see if the server we're using reinitialised its list of callbacks - however this isn't a very common event, so most of the time we're taking this needlessly. Add a new cell-wide counter to count the number of reinitialisations done by any server and check that - and only if that changes, take the RCU read lock and check the server list (the server list may change, but the cell a file is part of won't). - Don't update vnode->cb_s_break and ->cb_v_break inside the validity checking loop. The cb_lock is done with read_seqretry, so we might go round the loop a second time after resetting those values - and that could cause someone else checking validity to miss something (I think). Also included are patches for fixes for some bugs encountered whilst debugging this: - Fix a leak of afs_read objects and fix a leak of keys hidden by that. - Fix a leak of pages that couldn't be added to extend a writeback. - Fix the maintenance of i_blocks when i_size is changed by a local write or a local dir edit" Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217 [1] Link: https://lore.kernel.org/r/163111665183.283156.17200205573146438918.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163113612442.352844.11162345591911691150.stgit@warthog.procyon.org.uk/ # i_blocks patch * tag 'afs-fixes-20210913' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix updating of i_blocks on file/dir extension afs: Fix corruption in reads at fpos 2G-4G from an OpenAFS server afs: Try to avoid taking RCU read lock when checking vnode validity afs: Fix mmap coherency vs 3rd-party changes afs: Fix incorrect triggering of sillyrename on 3rd-party invalidation afs: Add missing vnode validation checks afs: Fix page leak afs: Fix missing put on afs_read objects and missing get on the key therein
| * | | | | | afs: Fix updating of i_blocks on file/dir extensionDavid Howells2021-09-134-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an afs file or directory is modified locally such that the total file size is extended, i_blocks needs to be recalculated too. Fix this by making afs_write_end() and afs_edit_dir_add() call afs_set_i_size() rather than setting inode->i_size directly as that also recalculates inode->i_blocks. This can be tested by creating and writing into directories and files and then examining them with du. Without this change, directories show a 4 blocks (they start out at 2048 bytes) and files show 0 blocks; with this change, they should show a number of blocks proportional to the file size rounded up to 1024. Fixes: 31143d5d515e ("AFS: implement basic file write support") Fixes: 63a4681ff39c ("afs: Locally edit directory data for mkdir/create/unlink/...") Reported-by: Markus Suvanto <markus.suvanto@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Markus Suvanto <markus.suvanto@gmail.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/163113612442.352844.11162345591911691150.stgit@warthog.procyon.org.uk/
| * | | | | | afs: Fix corruption in reads at fpos 2G-4G from an OpenAFS serverDavid Howells2021-09-135-12/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AFS-3 has two data fetch RPC variants, FS.FetchData and FS.FetchData64, and Linux's afs client switches between them when talking to a non-YFS server if the read size, the file position or the sum of the two have the upper 32 bits set of the 64-bit value. This is a problem, however, since the file position and length fields of FS.FetchData are *signed* 32-bit values. Fix this by capturing the capability bits obtained from the fileserver when it's sent an FS.GetCapabilities RPC, rather than just discarding them, and then picking out the VICED_CAPABILITY_64BITFILES flag. This can then be used to decide whether to use FS.FetchData or FS.FetchData64 - and also FS.StoreData or FS.StoreData64 - rather than using upper_32_bits() to switch on the parameter values. This capabilities flag could also be used to limit the maximum size of the file, but all servers must be checked for that. Note that the issue does not exist with FS.StoreData - that uses *unsigned* 32-bit values. It's also not a problem with Auristor servers as its YFS.FetchData64 op uses unsigned 64-bit values. This can be tested by cloning a git repo through an OpenAFS client to an OpenAFS server and then doing "git status" on it from a Linux afs client[1]. Provided the clone has a pack file that's in the 2G-4G range, the git status will show errors like: error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index This can be observed in the server's FileLog with something like the following appearing: Sun Aug 29 19:31:39 2021 SRXAFS_FetchData, Fid = 2303380852.491776.3263114, Host 192.168.11.201:7001, Id 1001 Sun Aug 29 19:31:39 2021 CheckRights: len=0, for host=192.168.11.201:7001 Sun Aug 29 19:31:39 2021 FetchData_RXStyle: Pos 18446744071815340032, Len 3154 Sun Aug 29 19:31:39 2021 FetchData_RXStyle: file size 2400758866 ... Sun Aug 29 19:31:40 2021 SRXAFS_FetchData returns 5 Note the file position of 18446744071815340032. This is the requested file position sign-extended. Fixes: b9b1f8d5930a ("AFS: write support fixes") Reported-by: Markus Suvanto <markus.suvanto@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Markus Suvanto <markus.suvanto@gmail.com> cc: linux-afs@lists.infradead.org cc: openafs-devel@openafs.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217#c9 [1] Link: https://lore.kernel.org/r/951332.1631308745@warthog.procyon.org.uk/