summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag '6.3-rc-smb3-client-fixes-part2' of ↵Linus Torvalds2023-03-0310-157/+190
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.samba.org/sfrench/cifs-2.6 Pull more cifs updates from Steve French: - xfstest generic/208 fix (memory leak) - minor netfs fix (to address smatch warning) - a DFS fix for stable - a reconnect race fix - two multichannel fixes - RDMA (smbdirect) fix - two additional writeback fixes from David * tag '6.3-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix memory leak in direct I/O cifs: prevent data race in cifs_reconnect_tcon() cifs: improve checking of DFS links over STATUS_OBJECT_NAME_INVALID iov: Fix netfs_extract_user_to_sg() cifs: Fix cifs_write_back_from_locked_folio() cifs: reuse cifs_match_ipaddr for comparison of dstaddr too cifs: match even the scope id for ipv6 addresses cifs: Fix an uninitialised variable cifs: Add some missing xas_retry() calls
| * cifs: Fix memory leak in direct I/ODavid Howells2023-03-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When __cifs_readv() and __cifs_writev() extract pages from a user-backed iterator into a BVEC-type iterator, they set ->bv_need_unpin to note whether they need to unpin the pages later. However, in both cases they examine the BVEC-type iterator and not the source iterator - and so bv_need_unpin doesn't get set and the pages are leaked. I think this may be responsible for the generic/208 xfstest failing occasionally with: WARNING: CPU: 0 PID: 3064 at mm/gup.c:218 try_grab_page+0x65/0x100 RIP: 0010:try_grab_page+0x65/0x100 follow_page_pte+0x1a7/0x570 __get_user_pages+0x1a2/0x650 __gup_longterm_locked+0xdc/0xb50 internal_get_user_pages_fast+0x17f/0x310 pin_user_pages_fast+0x46/0x60 iov_iter_extract_pages+0xc9/0x510 ? __kmalloc_large_node+0xb1/0x120 ? __kmalloc_node+0xbe/0x130 netfs_extract_user_iter+0xbf/0x200 [netfs] __cifs_writev+0x150/0x330 [cifs] vfs_write+0x2a8/0x3c0 ksys_pwrite64+0x65/0xa0 with the page refcount going negative. This is less unlikely than it seems because the page is being pinned, not simply got, and so the refcount increased by 1024 each time, and so only needs to be called around ~2097152 for the refcount to go negative. Further, the test program (aio-dio-invalidate-failure) uses a 32MiB static buffer and all the PTEs covering it refer to the same page because it's never written to. The warning in try_grab_page(): if (WARN_ON_ONCE(folio_ref_count(folio) <= 0)) return -ENOMEM; then trips and prevents us ever using the page again for DIO at least. Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list") Reported-by: Murphy Zhou <jencce.kernel@gmail.com> Link: https://lore.kernel.org/r/CAH2r5mvaTsJ---n=265a4zqRA7pP+o4MJ36WCQUS6oPrOij8cw@mail.gmail.com Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: prevent data race in cifs_reconnect_tcon()Paulo Alcantara2023-03-014-101/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | Make sure to get an up-to-date TCP_Server_Info::nr_targets value prior to waiting the server to be reconnected in cifs_reconnect_tcon(). It is set in cifs_tcp_ses_needs_reconnect() and protected by TCP_Server_Info::srv_lock. Create a new cifs_wait_for_server_reconnect() helper that can be used by both SMB2+ and CIFS reconnect code. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: improve checking of DFS links over STATUS_OBJECT_NAME_INVALIDPaulo Alcantara2023-03-014-25/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Do not map STATUS_OBJECT_NAME_INVALID to -EREMOTE under non-DFS shares, or 'nodfs' mounts or CONFIG_CIFS_DFS_UPCALL=n builds. Otherwise, in the slow path, get a referral to figure out whether it is an actual DFS link. This could be simply reproduced under a non-DFS share by running the following $ mount.cifs //srv/share /mnt -o ... $ cat /mnt/$(printf '\U110000') cat: '/mnt/'$'\364\220\200\200': Object is remote Fixes: c877ce47e137 ("cifs: reduce roundtrips on create/qinfo requests") CC: stable@vger.kernel.org # 6.2 Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * iov: Fix netfs_extract_user_to_sg()David Howells2023-03-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the loop check in netfs_extract_user_to_sg() for extraction from user-backed iterators to do the body if npages > 0, not if npages < 0 (which it can never be). This isn't currently used by cifs, which only ever extracts data from BVEC, KVEC and XARRAY iterators at this level, user-backed iterators having being decanted into BVEC iterators at a higher level to accommodate the work being done in a kernel thread. Found by smatch: fs/netfs/iterator.c:139 netfs_extract_user_to_sg() warn: unsigned 'npages' is never less than zero. Fixes: 018584697533 ("netfs: Add a function to extract an iterator into a scatterlist") Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202302261115.P3TQi1ZO-lkp@intel.com/ Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/Y/yYnAhoAYDBKixX@kili Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-cachefs@redhat.com Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: Fix cifs_write_back_from_locked_folio()David Howells2023-03-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cifs_write_back_from_locked_folio() should return the number of bytes read, but returns the result of ->async_writev(), which will be 0 on success. As it happens, this doesn't prevent cifs_writepages_region() from working as it will then examine and ignore the pages that are no longer dirty rather than just skipping over them. Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list") Signed-off-by: David Howells <dhowells@redhat.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Tom Talpey <tom@talpey.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: reuse cifs_match_ipaddr for comparison of dstaddr tooShyam Prasad N2023-03-011-26/+2
| | | | | | | | | | | | | | | | | | We have two pieces of code that does pretty much the same comparison. This change reuses cifs_match_ipaddr within match_address. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: match even the scope id for ipv6 addressesShyam Prasad N2023-03-011-1/+2
| | | | | | | | | | | | | | | | | | match_address function matches the scope id for ipv6 addresses, but cifs_match_ipaddr (which is another function used for comparison) does not use scope id. Doing so with this change. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: Fix an uninitialised variableDavid Howells2023-03-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix an uninitialised variable introduced in cifs. Fixes: 3d78fe73fa12 ("cifs: Build the RDMA SGE list directly from an iterator") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> cc: Steve French <sfrench@samba.org> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Tom Talpey <tom@talpey.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-rdma@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: Add some missing xas_retry() callsDavid Howells2023-03-011-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The xas_for_each loops added into fs/cifs/file.c need to go round again if indicated by xas_retry(). Fixes: b8713c4dbfa3 ("cifs: Add some helper functions") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Tom Talpey <tom@talpey.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
* | Merge tag 'ceph-for-6.3-rc1' of https://github.com/ceph/ceph-clientLinus Torvalds2023-03-021-0/+8
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph fixes from Ilya Dryomov: "Two small fixes from Xiubo and myself, marked for stable" * tag 'ceph-for-6.3-rc1' of https://github.com/ceph/ceph-client: rbd: avoid use-after-free in do_rbd_add() when rbd_dev_create() fails ceph: update the time stamps and try to drop the suid/sgid
| * | ceph: update the time stamps and try to drop the suid/sgidXiubo Li2023-02-261-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fallocate will try to clear the suid/sgid if a unprevileged user changed the file. There is no POSIX item requires that we should clear the suid/sgid in fallocate code path but this is the default behaviour for most of the filesystems and the VFS layer. And also the same for the write code path, which have already support it. And also we need to update the time stamps since the fallocate will change the file contents. Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/58054 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | | capability: just use a 'u64' instead of a 'u32[2]' arrayLinus Torvalds2023-03-011-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Back in 2008 we extended the capability bits from 32 to 64, and we did it by extending the single 32-bit capability word from one word to an array of two words. It was then obfuscated by hiding the "2" behind two macro expansions, with the reasoning being that maybe it gets extended further some day. That reasoning may have been valid at the time, but the last thing we want to do is to extend the capability set any more. And the array of values not only causes source code oddities (with loops to deal with it), but also results in worse code generation. It's a lose-lose situation. So just change the 'u32[2]' into a 'u64' and be done with it. We still have to deal with the fact that the user space interface is designed around an array of these 32-bit values, but that was the case before too, since the array layouts were different (ie user space doesn't use an array of 32-bit values for individual capability masks, but an array of 32-bit slices of multiple masks). So that marshalling of data is actually simplified too, even if it does remain somewhat obscure and odd. This was all triggered by my reaction to the new "cap_isidentical()" introduced recently. By just using a saner data structure, it went from unsigned __capi; CAP_FOR_EACH_U32(__capi) { if (a.cap[__capi] != b.cap[__capi]) return false; } return true; to just being return a.val == b.val; instead. Which is rather more obvious both to humans and to compilers. Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: Serge Hallyn <serge@hallyn.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Paul Moore <paul@paul-moore.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'uml-for-linus-6.3-rc1' of ↵Linus Torvalds2023-03-011-7/+8
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Richard Weinberger: - Add support for rust (yay!) - Add support for LTO - Add platform bus support to virtio-pci - Various virtio fixes - Coding style, spelling cleanups * tag 'uml-for-linus-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (27 commits) Documentation: rust: Fix arch support table uml: vector: Remove unused definitions VECTOR_{WRITE,HEADERS} um: virt-pci: properly remove PCI device from bus um: virtio_uml: move device breaking into workqueue um: virtio_uml: mark device as unregistered when breaking it um: virtio_uml: free command if adding to virtqueue failed UML: define RUNTIME_DISCARD_EXIT virt-pci: add platform bus support um-virt-pci: Make max delay configurable um: virt-pci: implement pcibios_get_phb_of_node() um: Support LTO um: put power options in a menu um: Use CFLAGS_vmlinux um: Prevent building modules incompatible with MODVERSIONS um: Avoid pcap multiple definition errors um: Make the definition of cpu_data more compatible x86: um: vdso: Add '%rcx' and '%r11' to the syscall clobber list rust: arch/um: Add support for CONFIG_RUST under x86_64 UML rust: arch/um: Disable FP/SIMD instruction to match x86 rust: arch/um: Use 'pie' relocation mode under UML ...
| * | | hostfs: Replace kmap() with kmap_local_page()Fabio M. De Francesco2023-02-011-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The use of kmap() is being deprecated in favor of kmap_local_page(). There are two main problems with kmap(): (1) It comes with an overhead as the mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and still valid. Therefore, replace kmap() with kmap_local_page() in hostfs_kern.c, it being the only file with kmap() call sites currently left in fs/hostfs. Cc: "Venkataramanan, Anirudh" <anirudh.venkataramanan@intel.com> Suggested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* | | | Merge tag 'ubifs-for-linus-6.3-rc1' of ↵Linus Torvalds2023-03-0113-62/+155
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull jffs2, ubi and ubifs updates from Richard Weinberger: "JFFS2: - Fix memory corruption in error path - Spelling and coding style fixes UBI: - Switch to BLK_MQ_F_BLOCKING in ubiblock - Wire up partent device (for sysfs) - Multiple UAF bugfixes - Fix for an infinite loop in WL error path UBIFS: - Fix for multiple memory leaks in error paths - Fixes for wrong space accounting - Minor cleanups - Spelling and coding style fixes" * tag 'ubifs-for-linus-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: (36 commits) ubi: block: Fix a possible use-after-free bug in ubiblock_create() ubifs: make kobj_type structures constant mtd: ubi: block: wire-up device parent mtd: ubi: wire-up parent MTD device ubi: use correct names in function kernel-doc comments ubi: block: set BLK_MQ_F_BLOCKING jffs2: Fix list_del corruption if compressors initialized failed jffs2: Use function instead of macro when initialize compressors jffs2: fix spelling mistake "neccecary"->"necessary" ubifs: Fix kernel-doc ubifs: Fix some kernel-doc comments UBI: Fastmap: Fix kernel-doc ubi: ubi_wl_put_peb: Fix infinite loop when wear-leveling work failed ubi: Fix UAF wear-leveling entry in eraseblk_count_seq_show() ubi: fastmap: Fix missed fm_anchor PEB in wear-leveling after disabling fastmap ubifs: ubifs_releasepage: Remove ubifs_assert(0) to valid this process ubifs: ubifs_writepage: Mark page dirty after writing inode failed ubifs: dirty_cow_znode: Fix memleak in error handling path ubifs: Re-statistic cleaned znode count if commit failed ubi: Fix permission display of the debugfs files ...
| * | | | ubifs: make kobj_type structures constantThomas Weißschuh2023-02-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definitions to prevent modification at runtime. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | jffs2: Fix list_del corruption if compressors initialized failedZhang Xiaoxu2023-02-021-5/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a list_del corruption when remove the jffs2 module: list_del corruption, ffffffffa0623e60->next is NULL WARNING: CPU: 6 PID: 6332 at lib/list_debug.c:49 __list_del_entry_valid+0x98/0x130 Modules linked in: jffs2(-) ] CPU: 6 PID: 6332 Comm: rmmod Tainted: G W 6.1.0-rc2+ #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 RIP: 0010:__list_del_entry_valid+0x98/0x130 ... Call Trace: <TASK> jffs2_unregister_compressor+0x3e/0xe0 [jffs2] jffs2_zlib_exit+0x11/0x30 [jffs2] jffs2_compressors_exit+0x1e/0x30 [jffs2] exit_jffs2_fs+0x16/0x44f [jffs2] __do_sys_delete_module.constprop.0+0x244/0x370 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 If one of the compressor initialize failed, the module always insert success since jffs2_compressors_init() always return success, then something bad may happen during remove the module. For this scenario, let's insmod failed. Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | jffs2: Use function instead of macro when initialize compressorsZhang Xiaoxu2023-02-022-22/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The initialized compressors should be released if one of them initialize fail, this is the pre-patch for fix the problem, use function instead of the macro in jffs2_compressors_init() to simplify the codes, no functional change intended. Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | jffs2: fix spelling mistake "neccecary"->"necessary"Yu Zhe2023-02-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a spelling mistake in comment. Fix it. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | ubifs: Fix kernel-docYang Li2023-02-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix function name in fs/ubifs/io.c kernel-doc comment to remove some warnings found by clang(make W=1 LLVM=1). fs/ubifs/io.c:497: warning: expecting prototype for wbuf_timer_callback(). Prototype was for wbuf_timer_callback_nolock() instead fs/ubifs/io.c:513: warning: expecting prototype for new_wbuf_timer(). Prototype was for new_wbuf_timer_nolock() instead fs/ubifs/io.c:538: warning: expecting prototype for cancel_wbuf_timer(). Prototype was for cancel_wbuf_timer_nolock() instead Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | ubifs: Fix some kernel-doc commentsYang Li2023-02-021-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove warnings found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/ubifs/journal.c:1221: warning: Function parameter or member 'old_inode' not described in 'ubifs_jnl_rename' fs/ubifs/journal.c:1221: warning: Function parameter or member 'old_nm' not described in 'ubifs_jnl_rename' fs/ubifs/journal.c:1221: warning: Function parameter or member 'new_inode' not described in 'ubifs_jnl_rename' fs/ubifs/journal.c:1221: warning: Function parameter or member 'new_nm' not described in 'ubifs_jnl_rename' fs/ubifs/journal.c:1221: warning: Function parameter or member 'whiteout' not described in 'ubifs_jnl_rename' fs/ubifs/journal.c:1221: warning: Excess function parameter 'old_dentry' description in 'ubifs_jnl_rename' fs/ubifs/journal.c:1221: warning: Excess function parameter 'new_dentry' description in 'ubifs_jnl_rename' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | ubifs: ubifs_releasepage: Remove ubifs_assert(0) to valid this processZhihao Cheng2023-02-021-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two states for ubifs writing pages: 1. Dirty, Private 2. Not Dirty, Not Private The normal process cannot go to ubifs_releasepage() which means there exists pages being private but not dirty. Reproducer[1] shows that it could occur (which maybe related to [2]) with following process: PA PB PC lock(page)[PA] ubifs_write_end attach_page_private // set Private __set_page_dirty_nobuffers // set Dirty unlock(page) write_cache_pages[PA] lock(page) clear_page_dirty_for_io(page) // clear Dirty ubifs_writepage do_truncation[PB] truncate_setsize i_size_write(inode, newsize) // newsize = 0 i_size = i_size_read(inode) // i_size = 0 end_index = i_size >> PAGE_SHIFT if (page->index > end_index) goto out // jump out: unlock(page) // Private, Not Dirty generic_fadvise[PC] lock(page) invalidate_inode_page try_to_release_page ubifs_releasepage ubifs_assert(c, 0) // bad assertion! unlock(page) truncate_pagecache[PB] Then we may get following assertion failed: UBIFS error (ubi0:0 pid 1683): ubifs_assert_failed [ubifs]: UBIFS assert failed: 0, in fs/ubifs/file.c:1513 UBIFS warning (ubi0:0 pid 1683): ubifs_ro_mode [ubifs]: switched to read-only mode, error -22 CPU: 2 PID: 1683 Comm: aa Not tainted 5.16.0-rc5-00184-g0bca5994cacc-dirty #308 Call Trace: dump_stack+0x13/0x1b ubifs_ro_mode+0x54/0x60 [ubifs] ubifs_assert_failed+0x4b/0x80 [ubifs] ubifs_releasepage+0x67/0x1d0 [ubifs] try_to_release_page+0x57/0xe0 invalidate_inode_page+0xfb/0x130 __invalidate_mapping_pages+0xb9/0x280 invalidate_mapping_pagevec+0x12/0x20 generic_fadvise+0x303/0x3c0 ksys_fadvise64_64+0x4c/0xb0 [1] https://bugzilla.kernel.org/show_bug.cgi?id=215373 [2] https://linux-mtd.infradead.narkive.com/NQoBeT1u/patch-rfc-ubifs-fix-assert-failed-in-ubifs-set-page-dirty Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | ubifs: ubifs_writepage: Mark page dirty after writing inode failedZhihao Cheng2023-02-021-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two states for ubifs writing pages: 1. Dirty, Private 2. Not Dirty, Not Private There is a third possibility which maybe related to [1] that page is private but not dirty caused by following process: PA lock(page) ubifs_write_end attach_page_private // set Private __set_page_dirty_nobuffers // set Dirty unlock(page) write_cache_pages lock(page) clear_page_dirty_for_io(page) // clear Dirty ubifs_writepage write_inode // fail, goto out, following codes are not executed // do_writepage // set_page_writeback // set Writeback // detach_page_private // clear Private // end_page_writeback // clear Writeback out: unlock(page) // Private, Not Dirty PB ksys_fadvise64_64 generic_fadvise invalidate_inode_page // page is neither Dirty nor Writeback invalidate_complete_page // page_has_private is true try_to_release_page ubifs_releasepage ubifs_assert(c, 0) !!! Then we may get following assertion failed: UBIFS error (ubi0:0 pid 1492): ubifs_assert_failed [ubifs]: UBIFS assert failed: 0, in fs/ubifs/file.c:1499 UBIFS warning (ubi0:0 pid 1492): ubifs_ro_mode [ubifs]: switched to read-only mode, error -22 CPU: 2 PID: 1492 Comm: aa Not tainted 5.16.0-rc2-00012-g7bb767dee0ba-dirty Call Trace: dump_stack+0x13/0x1b ubifs_ro_mode+0x54/0x60 [ubifs] ubifs_assert_failed+0x4b/0x80 [ubifs] ubifs_releasepage+0x7e/0x1e0 [ubifs] try_to_release_page+0x57/0xe0 invalidate_inode_page+0xfb/0x130 invalidate_mapping_pagevec+0x12/0x20 generic_fadvise+0x303/0x3c0 vfs_fadvise+0x35/0x40 ksys_fadvise64_64+0x4c/0xb0 Jump [2] to find a reproducer. [1] https://linux-mtd.infradead.narkive.com/NQoBeT1u/patch-rfc-ubifs-fix-assert-failed-in-ubifs-set-page-dirty [2] https://bugzilla.kernel.org/show_bug.cgi?id=215357 Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | ubifs: dirty_cow_znode: Fix memleak in error handling pathZhihao Cheng2023-02-021-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Following process will cause a memleak for copied up znode: dirty_cow_znode zn = copy_znode(c, znode); err = insert_old_idx(c, zbr->lnum, zbr->offs); if (unlikely(err)) return ERR_PTR(err); // No one refers to zn. Fix it by adding copied znode back to tnc, then it will be freed by ubifs_destroy_tnc_subtree() while closing tnc. Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216705 Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | ubifs: Re-statistic cleaned znode count if commit failedZhihao Cheng2023-02-021-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dirty znodes will be written on flash in committing process with following states: process A | znode state ------------------------------------------------------ do_commit | DIRTY_ZNODE ubifs_tnc_start_commit | DIRTY_ZNODE get_znodes_to_commit | DIRTY_ZNODE | COW_ZNODE layout_commit | DIRTY_ZNODE | COW_ZNODE fill_gap | 0 write master | 0 or OBSOLETE_ZNODE process B | znode state ------------------------------------------------------ do_commit | DIRTY_ZNODE[1] ubifs_tnc_start_commit | DIRTY_ZNODE get_znodes_to_commit | DIRTY_ZNODE | COW_ZNODE ubifs_tnc_end_commit | DIRTY_ZNODE | COW_ZNODE write_index | 0 write master | 0 or OBSOLETE_ZNODE[2] or | DIRTY_ZNODE[3] [1] znode is dirtied without concurrent committing process [2] znode is copied up (re-dirtied by other process) before cleaned up in committing process [3] znode is re-dirtied after cleaned up in committing process Currently, the clean znode count is updated in free_obsolete_znodes(), which is called only in normal path. If do_commit failed, clean znode count won't be updated, which triggers a failure ubifs assertion[4] in ubifs_tnc_close(): ubifs_assert_failed [ubifs]: UBIFS assert failed: freed == n [4] Commit 380347e9ca7682 ("UBIFS: Add an assertion for clean_zn_cnt"). Fix it by re-statisticing cleaned znode count in tnc_destroy_cnext(). Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216704 Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | ubifs: Fix memory leak in alloc_wbufs()Li Zetao2023-02-021-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kmemleak reported a sequence of memory leaks, and show them as following: unreferenced object 0xffff8881575f8400 (size 1024): comm "mount", pid 19625, jiffies 4297119604 (age 20.383s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8176cecd>] __kmalloc+0x4d/0x150 [<ffffffffa0406b2b>] ubifs_mount+0x307b/0x7170 [ubifs] [<ffffffff819fa8fd>] legacy_get_tree+0xed/0x1d0 [<ffffffff81936f2d>] vfs_get_tree+0x7d/0x230 [<ffffffff819b2bd4>] path_mount+0xdd4/0x17b0 [<ffffffff819b37aa>] __x64_sys_mount+0x1fa/0x270 [<ffffffff83c14295>] do_syscall_64+0x35/0x80 [<ffffffff83e0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 unreferenced object 0xffff8881798a6e00 (size 512): comm "mount", pid 19677, jiffies 4297121912 (age 37.816s) hex dump (first 32 bytes): 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk backtrace: [<ffffffff8176cecd>] __kmalloc+0x4d/0x150 [<ffffffffa0418342>] ubifs_wbuf_init+0x52/0x480 [ubifs] [<ffffffffa0406ca5>] ubifs_mount+0x31f5/0x7170 [ubifs] [<ffffffff819fa8fd>] legacy_get_tree+0xed/0x1d0 [<ffffffff81936f2d>] vfs_get_tree+0x7d/0x230 [<ffffffff819b2bd4>] path_mount+0xdd4/0x17b0 [<ffffffff819b37aa>] __x64_sys_mount+0x1fa/0x270 [<ffffffff83c14295>] do_syscall_64+0x35/0x80 [<ffffffff83e0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 The problem is that the ubifs_wbuf_init() returns an error in the loop which in the alloc_wbufs(), then the wbuf->buf and wbuf->inodes that were successfully alloced before are not freed. Fix it by adding error hanging path in alloc_wbufs() which frees the memory alloced before when ubifs_wbuf_init() returns an error. Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | ubifs: Reserve one leb for each journal head while doing budgetZhihao Cheng2023-02-021-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UBIFS calculates available space by c->main_bytes - c->lst.total_used (which means non-index lebs' free and dirty space is accounted into total available), then index lebs and four lebs (one for gc_lnum, one for deletions, two for journal heads) are deducted. In following situation, ubifs may get -ENOSPC from make_reservation(): LEB 84: DATAHD free 122880 used 1920 dirty 2176 dark 6144 LEB 110:DELETION free 126976 used 0 dirty 0 dark 6144 (empty) LEB 201:gc_lnum free 126976 used 0 dirty 0 dark 6144 LEB 272:GCHD free 77824 used 47672 dirty 1480 dark 6144 LEB 356:BASEHD free 0 used 39776 dirty 87200 dark 6144 OTHERS: index lebs, zero-available non-index lebs UBIFS calculates the available bytes is 6888 (How to calculate it: 126976 * 5[remain main bytes] - 1920[used] - 47672[used] - 39776[used] - 126976 * 1[deletions] - 126976 * 1[gc_lnum] - 126976 * 2[journal heads] - 6144 * 5[dark] = 6888) after doing budget, however UBIFS cannot use BASEHD's dirty space(87200), because UBIFS cannot find next BASEHD to reclaim current BASEHD. (c->bi.min_idx_lebs equals to c->lst.idx_lebs, the empty leb won't be found by ubifs_find_free_space(), and dirty index lebs won't be picked as gced lebs. All non-index lebs has dirty space less then c->dead_wm, non-index lebs won't be picked as gced lebs either. So new free lebs won't be produced.). See more details in Link. To fix it, reserve one leb for each journal head while doing budget. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216562 Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | ubifs: do_rename: Fix wrong space budget when target inode's nlink > 1Zhihao Cheng2023-02-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If target inode is a special file (eg. block/char device) with nlink count greater than 1, the inode with ui->data will be re-written on disk. However, UBIFS losts target inode's data_len while doing space budget. Bad space budget may let make_reservation() return with -ENOSPC, which could turn ubifs to read-only mode in do_writepage() process. Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216494 Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | ubifs: Fix wrong dirty space budget for dirty inodeZhihao Cheng2023-02-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each dirty inode should reserve 'c->bi.inode_budget' bytes in space budget calculation. Currently, space budget for dirty inode reports more space than what UBIFS actually needs to write. Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | ubifs: Add comments and debug info for ubifs_xrename()Zhihao Cheng2023-02-021-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just like other operations (eg. ubifs_create, do_rename), add comments and debug information for ubifs_xrename(). Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | ubifs: Rectify space budget for ubifs_xrename()Zhihao Cheng2023-02-021-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no space budget for ubifs_xrename(). It may let make_reservation() return with -ENOSPC, which could turn ubifs to read-only mode in do_writepage() process. Fix it by adding space budget for ubifs_xrename(). Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216569 Fixes: 9ec64962afb170 ("ubifs: Implement RENAME_EXCHANGE") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | ubifs: Rectify space budget for ubifs_symlink() if symlink is encryptedZhihao Cheng2023-02-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix bad space budget when symlink file is encrypted. Bad space budget may let make_reservation() return with -ENOSPC, which could turn ubifs to read-only mode in do_writepage() process. Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216490 Fixes: ca7f85be8d6cf9 ("ubifs: Add support for encrypted symlinks") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | ubifs: Fix memory leak in ubifs_sysfs_init()Liu Shixin2023-02-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When insmod ubifs.ko, a kmemleak reported as below: unreferenced object 0xffff88817fb1a780 (size 8): comm "insmod", pid 25265, jiffies 4295239702 (age 100.130s) hex dump (first 8 bytes): 75 62 69 66 73 00 ff ff ubifs... backtrace: [<ffffffff81b3fc4c>] slab_post_alloc_hook+0x9c/0x3c0 [<ffffffff81b44bf3>] __kmalloc_track_caller+0x183/0x410 [<ffffffff8198d3da>] kstrdup+0x3a/0x80 [<ffffffff8198d486>] kstrdup_const+0x66/0x80 [<ffffffff83989325>] kvasprintf_const+0x155/0x190 [<ffffffff83bf55bb>] kobject_set_name_vargs+0x5b/0x150 [<ffffffff83bf576b>] kobject_set_name+0xbb/0xf0 [<ffffffff8100204c>] do_one_initcall+0x14c/0x5a0 [<ffffffff8157e380>] do_init_module+0x1f0/0x660 [<ffffffff815857be>] load_module+0x6d7e/0x7590 [<ffffffff8158644f>] __do_sys_finit_module+0x19f/0x230 [<ffffffff815866b3>] __x64_sys_finit_module+0x73/0xb0 [<ffffffff88c98e85>] do_syscall_64+0x35/0x80 [<ffffffff88e00087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd When kset_register() failed, we should call kset_put to cleanup it. Fixes: 2e3cbf425804 ("ubifs: Export filesystem error counters") Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | ubifs: Fix build errors as symbol undefinedLi Hua2023-02-021-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With CONFIG_UBIFS_FS_AUTHENTICATION not set, the compiler can assume that ubifs_node_check_hash() is never true and drops the call to ubifs_bad_hash(). Is CONFIG_CC_OPTIMIZE_FOR_SIZE enabled this optimization does not happen anymore. So When CONFIG_UBIFS_FS and CONFIG_CC_OPTIMIZE_FOR_SIZE is enabled but CONFIG_UBIFS_FS_AUTHENTICATION is not set, the build errors is as followd: ERROR: modpost: "ubifs_bad_hash" [fs/ubifs/ubifs.ko] undefined! Fix it by add no-op ubifs_bad_hash() for the CONFIG_UBIFS_FS_AUTHENTICATION=n case. Fixes: 16a26b20d2af ("ubifs: authentication: Add hashes to index nodes") Signed-off-by: Li Hua <hucool.lihua@huawei.com> Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | jffs2: correct logic when creating a hole in jffs2_write_beginYifei Liu2023-02-021-8/+7
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bug description and fix: 1. Write data to a file, say all 1s from offset 0 to 16. 2. Truncate the file to a smaller size, say 8 bytes. 3. Write new bytes (say 2s) from an offset past the original size of the file, say at offset 20, for 4 bytes. This is supposed to create a "hole" in the file, meaning that the bytes from offset 8 (where it was truncated above) up to the new write at offset 20, should all be 0s (zeros). 4. Flush all caches using "echo 3 > /proc/sys/vm/drop_caches" (or unmount and remount) the f/s. 5. Check the content of the file. It is wrong. The 1s that used to be between bytes 9 and 16, before the truncation, have REAPPEARED (they should be 0s). We wrote a script and helper C program to reproduce the bug (reproduce_jffs2_write_begin_issue.sh, write_file.c, and Makefile). We can make them available to anyone. The above example is shown when writing a small file within the same first page. But the bug happens for larger files, as long as steps 1, 2, and 3 above all happen within the same page. The problem was traced to the jffs2_write_begin code, where it goes into an 'if' statement intended to handle writes past the current EOF (i.e., writes that may create a hole). The code computes a 'pageofs' that is the floor of the write position (pos), aligned to the page size boundary. In other words, 'pageofs' will never be larger than 'pos'. The code then sets the internal jffs2_raw_inode->isize to the size of max(current inode size, pageofs) but that is wrong: the new file size should be the 'pos', which is larger than both the current inode size and pageofs. Similarly, the code incorrectly sets the internal jffs2_raw_inode->dsize to the difference between the pageofs minus current inode size; instead it should be the current pos minus the current inode size. Finally, inode->i_size was also set incorrectly. The patch below fixes this bug. The bug was discovered using a new tool for finding f/s bugs using model checking, called MCFS (Model Checking File Systems). Signed-off-by: Yifei Liu <yifeliu@cs.stonybrook.edu> Signed-off-by: Erez Zadok <ezk@cs.stonybrook.edu> Signed-off-by: Manish Adkar <madkar@cs.stonybrook.edu> Signed-off-by: Richard Weinberger <richard@nod.at>
* | | | Merge tag '9p-6.3-for-linus-part1' of ↵Linus Torvalds2023-03-016-14/+14
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs Pull 9p updates from Eric Van Hensbergen: - some fixes and cleanup setting up for a larger set of performance patches I've been working on - a contributed fixes relating to 9p/rdma - some contributed fixes relating to 9p/xen * tag '9p-6.3-for-linus-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: fs/9p: fix error reporting in v9fs_dir_release net/9p: fix bug in client create for .L 9p/rdma: unmap receive dma buffer in rdma_request()/post_recv() 9p/xen: fix connection sequence 9p/xen: fix version parsing fs/9p: Expand setup of writeback cache to all levels net/9p: Adjust maximum MSIZE to account for p9 header
| * | | | fs/9p: fix error reporting in v9fs_dir_releaseEric Van Hensbergen2023-02-241-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Checking the p9_fid_put value allows us to pass back errors involved if we end up clunking the fid as part of dir_release. This can help with more graceful response to errors in writeback among other things. Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org> Reviewed-by: Dominique Martinet <asmadeus@codewreck.org>
| * | | | fs/9p: Expand setup of writeback cache to all levelsEric Van Hensbergen2023-02-235-11/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If cache is enabled, make sure we are putting the right things in place (mainly impacts mmap). This also sets us up for more cache levels. Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org> Reviewed-by: Dominique Martinet <asmadeus@codewreck.org>
* | | | | Merge tag 'jfs-6.3' of https://github.com/kleikamp/linux-shaggyLinus Torvalds2023-03-011-1/+2
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull jfs update from Dave Kleikamp: "Just one simple sanity check" * tag 'jfs-6.3' of https://github.com/kleikamp/linux-shaggy: fs/jfs: fix shift exponent db_agl2size negative
| * | | | | fs/jfs: fix shift exponent db_agl2size negativeLiu Shixin via Jfs-discussion2023-01-031-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As a shift exponent, db_agl2size can not be less than 0. Add the missing check to fix the shift-out-of-bounds bug reported by syzkaller: UBSAN: shift-out-of-bounds in fs/jfs/jfs_dmap.c:2227:15 shift exponent -744642816 is negative Reported-by: syzbot+0be96567042453c0c820@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
* | | | | | Merge tag 'exfat-for-6.3-rc1' of ↵Linus Torvalds2023-03-018-60/+101
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Handle vendor extension and allocation entries as unrecognized benign secondary entries - Fix wrong ->i_blocks on devices with non-512 byte sector - Add the check to avoid returning -EIO from exfat_readdir() at current position exceeding the directory size - Fix a bug that reach the end of the directory stream at a position not aligned with the dentry size - Redefine DIR_DELETED as 0xFFFFFFF7, the bad cluster number - Two cleanup fixes and fix cluster leakage in error handling * tag 'exfat-for-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: fix the newly allocated clusters are not freed in error handling exfat: don't print error log in normal case exfat: remove unneeded code from exfat_alloc_cluster() exfat: handle unreconized benign secondary entries exfat: fix inode->i_blocks for non-512 byte sector size device exfat: redefine DIR_DELETED as the bad cluster number exfat: fix reporting fs error when reading dir beyond EOF exfat: fix unexpected EOF while reading dir
| * | | | | | exfat: fix the newly allocated clusters are not freed in error handlingYuezhang Mo2023-02-281-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In error handling 'free_cluster', before num_alloc clusters allocated, p_chain->size will not updated and always 0, thus the newly allocated clusters are not freed. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
| * | | | | | exfat: don't print error log in normal caseYuezhang Mo2023-02-281-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When allocating a new cluster, exFAT first allocates from the next cluster of the last cluster of the file. If the last cluster of the file is the last cluster of the volume, allocate from the first cluster. This is a normal case, but the following error log will be printed. It makes users confused, so this commit removes the error log. [1960905.181545] exFAT-fs (sdb1): hint_cluster is invalid (262130) Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
| * | | | | | exfat: remove unneeded code from exfat_alloc_cluster()Yuezhang Mo2023-02-281-8/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the removed code, num_clusters is 0, nothing is done in exfat_chain_cont_cluster(), so it is unneeded, remove it. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
| * | | | | | exfat: handle unreconized benign secondary entriesNamjae Jeon2023-02-273-25/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sony PXW-Z280 camera add vendor allocation entries to directory of pictures. Currently, linux exfat does not support it and the file is not visible. This patch handle vendor extension and allocation entries as unreconized benign secondary entries. As described in the specification, it is recognized but ignored, and when deleting directory entry set, the associated clusters allocation are removed as well as benign secondary directory entries. Reported-by: Barócsi Dénes <admin@tveger.hu> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
| * | | | | | exfat: fix inode->i_blocks for non-512 byte sector size deviceYuezhang Mo2023-02-274-9/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | inode->i_blocks is not real number of blocks, but 512 byte ones. Fixes: 98d917047e8b ("exfat: add file operations") Cc: stable@vger.kernel.org # v5.7+ Reported-by: Wang Yugui <wangyugui@e16-tech.com> Tested-by: Wang Yugui <wangyugui@e16-tech.com> Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
| * | | | | | exfat: redefine DIR_DELETED as the bad cluster numberSungjong Seo2023-02-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a file or a directory is deleted, the hint for the cluster of its parent directory in its in-memory inode is set as DIR_DELETED. Therefore, DIR_DELETED must be one of invalid cluster numbers. According to the exFAT specification, a volume can have at most 2^32-11 clusters. However, DIR_DELETED is wrongly defined as 0xFFFF0321, which could be a valid cluster number. To fix it, let's redefine DIR_DELETED as 0xFFFFFFF7, the bad cluster number. Fixes: 1acf1a564b60 ("exfat: add in-memory and on-disk structures and headers") Cc: stable@vger.kernel.org # v5.7+ Reported-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
| * | | | | | exfat: fix reporting fs error when reading dir beyond EOFYuezhang Mo2023-02-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since seekdir() does not check whether the position is valid, the position may exceed the size of the directory. We found that for a directory with discontinuous clusters, if the position exceeds the size of the directory and the excess size is greater than or equal to the cluster size, exfat_readdir() will return -EIO, causing a file system error and making the file system unavailable. Reproduce this bug by: seekdir(dir, dir_size + cluster_size); dirent = readdir(dir); The following log will be printed if mount with 'errors=remount-ro'. [11166.712896] exFAT-fs (sdb1): error, invalid access to FAT (entry 0xffffffff) [11166.712905] exFAT-fs (sdb1): Filesystem has been set read-only Fixes: 1e5654de0f51 ("exfat: handle wrong stream entry size in exfat_readdir()") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
| * | | | | | exfat: fix unexpected EOF while reading dirYuezhang Mo2023-02-271-4/+1
| | |_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the position is not aligned with the dentry size, the return value of readdir() will be NULL and errno is 0, which means the end of the directory stream is reached. If the position is aligned with dentry size, but there is no file or directory at the position, exfat_readdir() will continue to get dentry from the next dentry. So the dentry gotten by readdir() may not be at the position. After this commit, if the position is not aligned with the dentry size, round the position up to the dentry size and continue to get the dentry. Fixes: ca06197382bd ("exfat: add directory operations") Cc: stable@vger.kernel.org # v5.7+ Reported-by: Wang Yugui <wangyugui@e16-tech.com> Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>