summaryrefslogtreecommitdiffstats
path: root/fs/erofs
Commit message (Collapse)AuthorAgeFilesLines
* erofs: fix use-after-free of on-stack io[]Hongyu Jin2022-04-152-9/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The root cause is the race as follows: Thread #1 Thread #2(irq ctx) z_erofs_runqueue() struct z_erofs_decompressqueue io_A[]; submit bio A z_erofs_decompress_kickoff(,,1) z_erofs_decompressqueue_endio(bio A) z_erofs_decompress_kickoff(,,-1) spin_lock_irqsave() atomic_add_return() io_wait_event() -> pending_bios is already 0 [end of function] wake_up_locked(io_A[]) // crash Referenced backtrace in kernel 5.4: [ 10.129422] Unable to handle kernel paging request at virtual address eb0454a4 [ 10.364157] CPU: 0 PID: 709 Comm: getprop Tainted: G WC O 5.4.147-ab09225 #1 [ 11.556325] [<c01b33b8>] (__wake_up_common) from [<c01b3300>] (__wake_up_locked+0x40/0x48) [ 11.565487] [<c01b3300>] (__wake_up_locked) from [<c044c8d0>] (z_erofs_vle_unzip_kickoff+0x6c/0xc0) [ 11.575438] [<c044c8d0>] (z_erofs_vle_unzip_kickoff) from [<c044c854>] (z_erofs_vle_read_endio+0x16c/0x17c) [ 11.586082] [<c044c854>] (z_erofs_vle_read_endio) from [<c06a80e8>] (clone_endio+0xb4/0x1d0) [ 11.595428] [<c06a80e8>] (clone_endio) from [<c04a1280>] (blk_update_request+0x150/0x4dc) [ 11.604516] [<c04a1280>] (blk_update_request) from [<c06dea28>] (mmc_blk_cqe_complete_rq+0x144/0x15c) [ 11.614640] [<c06dea28>] (mmc_blk_cqe_complete_rq) from [<c04a5d90>] (blk_done_softirq+0xb0/0xcc) [ 11.624419] [<c04a5d90>] (blk_done_softirq) from [<c010242c>] (__do_softirq+0x184/0x56c) [ 11.633419] [<c010242c>] (__do_softirq) from [<c01051e8>] (irq_exit+0xd4/0x138) [ 11.641640] [<c01051e8>] (irq_exit) from [<c010c314>] (__handle_domain_irq+0x94/0xd0) [ 11.650381] [<c010c314>] (__handle_domain_irq) from [<c04fde70>] (gic_handle_irq+0x50/0xd4) [ 11.659641] [<c04fde70>] (gic_handle_irq) from [<c0101b70>] (__irq_svc+0x70/0xb0) Signed-off-by: Hongyu Jin <hongyu.jin@unisoc.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220401115527.4935-1-hongyu.jin.cn@gmail.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds2022-03-221-9/+8
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull filesystem folio updates from Matthew Wilcox: "Primarily this series converts some of the address_space operations to take a folio instead of a page. Notably: - a_ops->is_partially_uptodate() takes a folio instead of a page and changes the type of the 'from' and 'count' arguments to make it obvious they're bytes. - a_ops->invalidatepage() becomes ->invalidate_folio() and has a similar type change. - a_ops->launder_page() becomes ->launder_folio() - a_ops->set_page_dirty() becomes ->dirty_folio() and adds the address_space as an argument. There are a couple of other misc changes up front that weren't worth separating into their own pull request" * tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache: (53 commits) fs: Remove aops ->set_page_dirty fb_defio: Use noop_dirty_folio() fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio fs: Convert __set_page_dirty_buffers to block_dirty_folio nilfs: Convert nilfs_set_page_dirty() to nilfs_dirty_folio() mm: Convert swap_set_page_dirty() to swap_dirty_folio() ubifs: Convert ubifs_set_page_dirty to ubifs_dirty_folio f2fs: Convert f2fs_set_node_page_dirty to f2fs_dirty_node_folio f2fs: Convert f2fs_set_data_page_dirty to f2fs_dirty_data_folio f2fs: Convert f2fs_set_meta_page_dirty to f2fs_dirty_meta_folio afs: Convert afs_dir_set_page_dirty() to afs_dir_dirty_folio() btrfs: Convert extent_range_redirty_for_io() to use folios fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folio btrfs: Convert from set_page_dirty to dirty_folio fscache: Convert fscache_set_page_dirty() to fscache_dirty_folio() fs: Add aops->dirty_folio fs: Remove aops->launder_page orangefs: Convert launder_page to launder_folio nfs: Convert from launder_page to launder_folio fuse: Convert from launder_page to launder_folio ...
| * erofs: Convert from invalidatepage to invalidate_folioMatthew Wilcox (Oracle)2022-03-151-9/+8
| | | | | | | | | | | | | | | | | | | | A straightforward conversion. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
* | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2022-03-221-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge updates from Andrew Morton: - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs - Most the MM patches which precede the patches in Willy's tree: kasan, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb, userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp, cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap, zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits) mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release() Docs/ABI/testing: add DAMON sysfs interface ABI document Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface selftests/damon: add a test for DAMON sysfs interface mm/damon/sysfs: support DAMOS stats mm/damon/sysfs: support DAMOS watermarks mm/damon/sysfs: support schemes prioritization mm/damon/sysfs: support DAMOS quotas mm/damon/sysfs: support DAMON-based Operation Schemes mm/damon/sysfs: support the physical address space monitoring mm/damon/sysfs: link DAMON for virtual address spaces monitoring mm/damon: implement a minimal stub for sysfs-based DAMON interface mm/damon/core: add number of each enum type values mm/damon/core: allow non-exclusive DAMON start/stop Docs/damon: update outdated term 'regions update interval' Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling Docs/vm/damon: call low level monitoring primitives the operations mm/damon: remove unnecessary CONFIG_DAMON option mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}() mm/damon/dbgfs-test: fix is_target_id() change ...
| * | fs: allocate inode by using alloc_inode_sb()Muchun Song2022-03-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() of all filesystems to alloc_inode_sb(). Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4] Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'erofs-for-5.18-rc1' of ↵Linus Torvalds2022-03-2210-197/+185
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "In this cycle, we continue converting to use meta buffers for all remaining uncompressed paths to prepare for the upcoming subpage, folio and fscache features. We also fixed a double-free issue when sysfs initialization fails, which was reported by syzbot. Besides, in order for the userspace to control per-file timestamp easier, we now switch to record mtime instead of ctime with a compatible feature marked. And there are also some code cleanups and documentation update as usual. Summary: - Avoid using page structure directly for all uncompressed paths - Fix a double-free issue when sysfs initialization fails - Complete DAX description for erofs - Use mtime instead since there's no (easy) way for users to control ctime - Several code cleanups" * tag 'erofs-for-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: rename ctime to mtime erofs: use meta buffers for inode lookup erofs: use meta buffers for reading directories fs: erofs: add sanity check for kobject in erofs_unregister_sysfs erofs: refine managed inode stuffs erofs: clean up z_erofs_extent_lookback erofs: silence warnings related to impossible m_plen Documentation/filesystem/dax: update DAX description on erofs erofs: clean up preload_compressed_pages() erofs: get rid of `struct z_erofs_collector' erofs: use meta buffers for erofs_read_superblock()
| * | | erofs: rename ctime to mtimeDavid Anderson2022-03-172-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | EROFS images should inherit modification time rather than change time, since users and host tooling have no easy way to control change time. To reflect the new timestamp meaning, i_ctime and i_ctime_nsec are renamed to i_mtime and i_mtime_nsec. Link: https://lore.kernel.org/r/20220311041829.3109511-1-dvander@google.com # v1 Signed-off-by: David Anderson <dvander@google.com> [ Gao Xiang: update document as well. ] Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220317114959.106787-1-hsiangkao@linux.alibaba.com # v2 Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
| * | | erofs: use meta buffers for inode lookupGao Xiang2022-03-171-30/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This converts the remaining inode lookup part by using metabuf in a straight-forward way. Except that it doesn't use kmap_atomic() anymore since we now have to maintain two metabufs together. After this patch, all uncompressed paths are handled with metabuf instead of page structure. Link: https://lore.kernel.org/r/20220316012246.95131-2-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
| * | | erofs: use meta buffers for reading directoriesGao Xiang2022-03-173-18/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, directory inodes are directly handled with page cache interfaces. In order to support sub-page directory blocks and folios, let's convert them into the latest metabuf infrastructure as well and this patch addresses the readdir case first. Link: https://lore.kernel.org/r/20220316012246.95131-1-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
| * | | fs: erofs: add sanity check for kobject in erofs_unregister_sysfsDongliang Mu2022-03-171-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Syzkaller hit 'WARNING: kobject bug in erofs_unregister_sysfs'. This bug is triggered by injecting fault in kobject_init_and_add of erofs_unregister_sysfs. Fix this by adding sanity check for kobject in erofs_unregister_sysfs Note that I've tested the patch and the crash does not occur any more. Link: https://lore.kernel.org/r/20220315132814.12332-1-dzm91@hust.edu.cn Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Fixes: 168e9a76200c ("erofs: add sysfs interface") Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
| * | | erofs: refine managed inode stuffsGao Xiang2022-03-172-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set up the correct gfp mask and use it instead of hard coding. Also add comments about .invalidatepage() to show more details. Link: https://lore.kernel.org/r/20220310182743.102365-2-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
| * | | erofs: clean up z_erofs_extent_lookbackGao Xiang2022-03-171-34/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid the unnecessary tail recursion since it can be converted into a loop directly in order to prevent potential stack overflow. It's a pretty straightforward conversion. Link: https://lore.kernel.org/r/20220310182743.102365-1-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
| * | | erofs: silence warnings related to impossible m_plenGao Xiang2022-03-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dan reported two smatch warnings [1], .. warn: should '1 << lclusterbits' be a 64 bit type? .. warn: should 'm->compressedlcs << lclusterbits' be a 64 bit type? In practice, m_plen cannot be more than 1MiB due to on-disk constraint for the compression mode, so we're always safe here. In order to make static analyzers happy and not report again, let's silence them instead. [1] https://lore.kernel.org/r/202203091002.lJVzsX6e-lkp@intel.com Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220310173448.19962-1-hsiangkao@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
| * | | erofs: clean up preload_compressed_pages()Gao Xiang2022-03-161-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename preload_compressed_pages() as z_erofs_bind_cache() since we're trying to prepare for adapting folios. Also, add a comment for the gfp setting. No logic changes. Link: https://lore.kernel.org/r/20220301194951.106227-2-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
| * | | erofs: get rid of `struct z_erofs_collector'Gao Xiang2022-03-161-86/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid `struct z_erofs_collector' since there is another context structure called "struct z_erofs_decompress_frontend". No logic changes. Link: https://lore.kernel.org/r/20220301194951.106227-1-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
| * | | erofs: use meta buffers for erofs_read_superblock()Jeffle Xu2022-03-161-8/+5
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only change is that, meta buffers read cache page without __GFP_FS flag, which shall not matter. Link: https://lore.kernel.org/r/20220209060108.43051-7-jefflexu@linux.alibaba.com Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | | Merge tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-blockLinus Torvalds2022-03-211-3/+2
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block updates from Jens Axboe: - BFQ cleanups and fixes (Yu, Zhang, Yahu, Paolo) - blk-rq-qos completion fix (Tejun) - blk-cgroup merge fix (Tejun) - Add offline error return value to distinguish it from an IO error on the device (Song) - IO stats fixes (Zhang, Christoph) - blkcg refcount fixes (Ming, Yu) - Fix for indefinite dispatch loop softlockup (Shin'ichiro) - blk-mq hardware queue management improvements (Ming) - sbitmap dead code removal (Ming, John) - Plugging merge improvements (me) - Show blk-crypto capabilities in sysfs (Eric) - Multiple delayed queue run improvement (David) - Block throttling fixes (Ming) - Start deprecating auto module loading based on dev_t (Christoph) - bio allocation improvements (Christoph, Chaitanya) - Get rid of bio_devname (Christoph) - bio clone improvements (Christoph) - Block plugging improvements (Christoph) - Get rid of genhd.h header (Christoph) - Ensure drivers use appropriate flush helpers (Christoph) - Refcounting improvements (Christoph) - Queue initialization and teardown improvements (Ming, Christoph) - Misc fixes/improvements (Barry, Chaitanya, Colin, Dan, Jiapeng, Lukas, Nian, Yang, Eric, Chengming) * tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block: (127 commits) block: cancel all throttled bios in del_gendisk() block: let blkcg_gq grab request queue's refcnt block: avoid use-after-free on throttle data block: limit request dispatch loop duration block/bfq-iosched: Fix spelling mistake "tenative" -> "tentative" sr: simplify the local variable initialization in sr_block_open() block: don't merge across cgroup boundaries if blkcg is enabled block: fix rq-qos breakage from skipping rq_qos_done_bio() block: flush plug based on hardware and software queue order block: ensure plug merging checks the correct queue at least once block: move rq_qos_exit() into disk_release() block: do more work in elevator_exit block: move blk_exit_queue into disk_release block: move q_usage_counter release into blk_queue_release block: don't remove hctx debugfs dir from blk_mq_exit_queue block: move blkcg initialization/destroy into disk allocation/release handler sr: implement ->free_disk to simplify refcounting sd: implement ->free_disk to simplify refcounting sd: delay calling free_opal_dev sd: call sd_zbc_release_disk before releasing the scsi_device reference ...
| * | block: pass a block_device and opf to bio_allocChristoph Hellwig2022-02-021-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass the block_device and operation that we plan to use this bio for to bio_alloc to optimize the assignment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | erofs: fix ztailpacking on > 4GiB filesystemsGao Xiang2022-03-021-1/+1
| |/ |/| | | | | | | | | | | | | | | | | | | | | z_idataoff here is an absolute physical offset, so it should use erofs_off_t (64 bits at least). Otherwise, it'll get trimmed and cause the decompresion failure. Link: https://lore.kernel.org/r/20220222033118.20540-1-hsiangkao@linux.alibaba.com Fixes: ab92184ff8f1 ("erofs: add on-disk compressed tail-packing inline support") Reviewed-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | erofs: fix small compressed files inliningGao Xiang2022-02-041-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to ztailpacking feature, it's enough that each lcluster has two pclusters at most, and the last pcluster should be turned into an uncompressed pcluster when necessary. For example, _________________________________________________ |_ pcluster n-2 _|_ pcluster n-1 _|____ EOFed ____| which should be converted into: _________________________________________________ |_ pcluster n-2 _|_ pcluster n-1 (uncompressed)' _| That is fine since either pcluster n-1 or (uncompressed)' takes one physical block. However, after ztailpacking was supported, the game is changed since the last pcluster can be inlined now. And such case above is quite common for inlining small files. Therefore, in order to inline more effectively, special EOF lclusters are now supported which can have three parts at most, as illustrated below: _________________________________________________ |_ pcluster n-2 _|_ pcluster n-1 _|____ EOFed ____| ^ i_size Actually similar code exists in Yue Hu's original patchset [1], but I removed this part on purpose. After evaluating more real cases with small files, I've changed my mind. [1] https://lore.kernel.org/r/20211215094449.15162-1-huyue2@yulong.com Link: https://lore.kernel.org/r/20220203190203.30794-1-xiang@kernel.org Fixes: ab92184ff8f1 ("erofs: add on-disk compressed tail-packing inline support") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | erofs: avoid unnecessary z_erofs_decompressqueue_work() declarationGao Xiang2022-01-241-57/+56
| | | | | | | | | | | | | | | | | | Just code rearrange. No logic changes. Link: https://lore.kernel.org/r/20220121091412.86086-1-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | erofs: fix fsdax partition offset handlingGao Xiang2022-01-241-4/+4
|/ | | | | | | | | | | | | | | | | | | | After seeking time on testing today upstream fsdax, I found it actually doesn't work well as below: [ 186.492983] ------------[ cut here ]------------ [ 186.493629] WARNING: CPU: 1 PID: 205 at fs/iomap/iter.c:33 iomap_iter+0x2f6/0x310 The problem is that m_dax_part_off should be applied to physical addresses and very sorry about that I didn't catch this eariler. Anyway, let's fix it up now. Also, I need to find a way to set up a standalone testcase to look after this later. Link: https://lore.kernel.org/r/20220113051845.244461-1-hsiangkao@linux.alibaba.com Fixes: de2051147771 ("fsdax: shift partition offset handling into the file systems") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* Merge tag 'libnvdimm-for-5.17' of ↵Linus Torvalds2022-01-123-8/+21
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull dax and libnvdimm updates from Dan Williams: "The bulk of this is a rework of the dax_operations API after discovering the obstacles it posed to the work-in-progress DAX+reflink support for XFS and other copy-on-write filesystem mechanics. Primarily the need to plumb a block_device through the API to handle partition offsets was a sticking point and Christoph untangled that dependency in addition to other cleanups to make landing the DAX+reflink support easier. The DAX_PMEM_COMPAT option has been around for 4 years and not only are distributions shipping userspace that understand the current configuration API, but some are not even bothering to turn this option on anymore, so it seems a good time to remove it per the deprecation schedule. Recall that this was added after the device-dax subsystem moved from /sys/class/dax to /sys/bus/dax for its sysfs organization. All recent functionality depends on /sys/bus/dax. Some other miscellaneous cleanups and reflink prep patches are included as well. Summary: - Simplify the dax_operations API: - Eliminate bdev_dax_pgoff() in favor of the filesystem maintaining and applying a partition offset to all its DAX iomap operations. - Remove wrappers and device-mapper stacked callbacks for ->copy_from_iter() and ->copy_to_iter() in favor of moving block_device relative offset responsibility to the dax_direct_access() caller. - Remove the need for an @bdev in filesystem-DAX infrastructure - Remove unused uio helpers copy_from_iter_flushcache() and copy_mc_to_iter() as only the non-check_copy_size() versions are used for DAX. - Prepare XFS for the pending (next merge window) DAX+reflink support - Remove deprecated DEV_DAX_PMEM_COMPAT support - Cleanup a straggling misuse of the GUID api" * tag 'libnvdimm-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (38 commits) iomap: Fix error handling in iomap_zero_iter() ACPI: NFIT: Import GUID before use dax: remove the copy_from_iter and copy_to_iter methods dax: remove the DAXDEV_F_SYNC flag dax: simplify dax_synchronous and set_dax_synchronous uio: remove copy_from_iter_flushcache() and copy_mc_to_iter() iomap: turn the byte variable in iomap_zero_iter into a ssize_t memremap: remove support for external pgmap refcounts fsdax: don't require CONFIG_BLOCK iomap: build the block based code conditionally dax: fix up some of the block device related ifdefs fsdax: shift partition offset handling into the file systems dax: return the partition offset from fs_dax_get_by_bdev iomap: add a IOMAP_DAX flag xfs: pass the mapping flags to xfs_bmbt_to_iomap xfs: use xfs_direct_write_iomap_ops for DAX zeroing xfs: move dax device handling into xfs_{alloc,free}_buftarg ext4: cleanup the dax handling in ext4_fill_super ext2: cleanup the dax handling in ext2_fill_super fsdax: decouple zeroing from the iomap buffered I/O code ...
| * fsdax: shift partition offset handling into the file systemsChristoph Hellwig2021-12-042-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | Remove the last user of ->bdev in dax.c by requiring the file system to pass in an address that already includes the DAX offset. As part of the only set ->bdev or ->daxdev when actually required in the ->iomap_begin methods. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> [erofs] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-27-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * dax: return the partition offset from fs_dax_get_by_bdevChristoph Hellwig2021-12-042-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | Prepare for the removal of the block_device from the DAX I/O path by returning the partition offset from fs_dax_get_by_bdev so that the file systems have it at hand for use during I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-26-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * dax: remove dax_capableChristoph Hellwig2021-12-041-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | Just open code the block size and dax_dev == NULL checks in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> [erofs] Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-9-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | erofs: use meta buffers for zmap operationsGao Xiang2022-01-044-70/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Get rid of old erofs_get_meta_page() within zmap operations by using on-stack meta buffers in order to prepare subpage and folio features. Finally, erofs_get_meta_page() is useless. Get rid of it! Link: https://lore.kernel.org/r/20220102040017.51352-6-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | erofs: use meta buffers for xattr operationsGao Xiang2022-01-042-95/+41
| | | | | | | | | | | | | | | | | | | | | | Get rid of old erofs_get_meta_page() within xattr operations by using on-stack meta buffers in order to prepare subpage and folio features. Link: https://lore.kernel.org/r/20220102040017.51352-5-hsiangkao@linux.alibaba.com Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | erofs: use meta buffers for super operationsGao Xiang2022-01-041-77/+27
| | | | | | | | | | | | | | | | | | | | | | | | Get rid of old erofs_get_meta_page() within super operations by using on-stack meta buffers in order to prepare subpage and folio features. Link: https://lore.kernel.org/r/20220102081317.109797-1-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | erofs: use meta buffers for inode operationsGao Xiang2022-01-042-36/+35
| | | | | | | | | | | | | | | | | | | | | | | | Get rid of old erofs_get_meta_page() within inode operations by using on-stack meta buffers in order to prepare subpage and folio features. Link: https://lore.kernel.org/r/20220102040017.51352-3-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | erofs: introduce meta buffer operationsGao Xiang2022-01-042-21/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | In order to support subpage and folio for all uncompressed files, introduce meta buffer descriptors, which can be effectively stored on stack, in place of meta page operations. This converts the uncompressed data path to meta buffers. Link: https://lore.kernel.org/r/20220102040017.51352-2-hsiangkao@linux.alibaba.com Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | erofs: add on-disk compressed tail-packing inline supportYue Hu2021-12-315-31/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduces erofs compressed tail-packing inline support. This approach adds a new field called `h_idata_size' in the per-file compression header to indicate the encoded size of each tail-packing pcluster. At runtime, it will find the start logical offset of the tail pcluster when initializing per-inode zmap and record such extent (headlcn, idataoff) information to the in-memory inode. Therefore, follow-on requests can directly recognize if one pcluster is a tail-packing inline pcluster or not. Link: https://lore.kernel.org/r/20211228054604.114518-6-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | erofs: support inline data decompressionYue Hu2021-12-312-44/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we have already support tail-packing inline for uncompressed file, let's also implement this for compressed files to save I/Os and storage space. Different from normal pclusters, compressed data is available in advance because of other metadata I/Os. Therefore, they directly move into the bypass queue without extra I/O submission. It's the last compression feature before folio/subpage support. Link: https://lore.kernel.org/r/20211228232919.21413-1-xiang@kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | erofs: support unaligned data decompressionGao Xiang2021-12-311-8/+9
| | | | | | | | | | | | | | | | | | | | Previously, compressed data was assumed as block-aligned. This should be changed due to in-block tail-packing inline data. Link: https://lore.kernel.org/r/20211228054604.114518-4-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | erofs: introduce z_erofs_fixup_insizeGao Xiang2021-12-293-21/+36
| | | | | | | | | | | | | | | | | | | | To prepare for the upcoming ztailpacking feature, introduce z_erofs_fixup_insize() and pageofs_in to wrap up the process to get the exact compressed size via zero padding. Link: https://lore.kernel.org/r/20211228054604.114518-3-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | erofs: tidy up z_erofs_lz4_decompressGao Xiang2021-12-291-39/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | To prepare for the upcoming ztailpacking feature and further cleanups, introduce a unique z_erofs_lz4_decompress_ctx to keep the context, including inpages, outpages and oend, which are frequently used by the lz4 decompressor. No logic changes. Link: https://lore.kernel.org/r/20211228054604.114518-2-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | erofs: clean up erofs_map_blocks tracepointsGao Xiang2021-12-091-22/+17
| | | | | | | | | | | | | | | | | | | | | | Since the new type of chunk-based files is introduced, there is no need to leave flatmode tracepoints. Rename to erofs_map_blocks instead. Link: https://lore.kernel.org/r/20211209012918.30337-1-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | erofs: Replace zero-length array with flexible-array memberGao Xiang2021-12-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use `flexible array members' [1] for these cases. The older style of one-element or zero-length arrays should no longer be used [2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.15/process/deprecated.html#zero-length-and-one-element-arrays Link: https://lore.kernel.org/r/20211206121702.221331-1-hsiangkao@linux.alibaba.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | erofs: add sysfs node to control sync decompression strategyHuang Jianan2021-12-084-7/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | Although readpage is a synchronous path, there will be no additional kworker scheduling overhead in non-atomic contexts together with dm-verity. Let's add a sysfs node to disable sync decompression as an option. Link: https://lore.kernel.org/r/20211206143552.8384-1-huangjianan@oppo.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Huang Jianan <huangjianan@oppo.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | erofs: add sysfs interfaceHuang Jianan2021-12-084-1/+264
| | | | | | | | | | | | | | | | | | | | Add sysfs interface to configure erofs related parameters later. Link: https://lore.kernel.org/r/20211201145436.4357-1-huangjianan@oppo.com Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Huang Jianan <huangjianan@oppo.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | erofs: rename lz4_0pading to zero_paddingHuang Jianan2021-12-013-5/+5
| | | | | | | | | | | | | | | | | | | | Renaming lz4_0padding to zero_padding globally since LZMA and later algorithms also need that. Link: https://lore.kernel.org/r/20211112160935.19394-1-jnhuang95@gmail.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Huang Jianan <huangjianan@oppo.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | erofs: fix deadlock when shrink erofs slabHuang Jianan2021-11-231-2/+6
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We observed the following deadlock in the stress test under low memory scenario: Thread A Thread B - erofs_shrink_scan - erofs_try_to_release_workgroup - erofs_workgroup_try_to_freeze -- A - z_erofs_do_read_page - z_erofs_collection_begin - z_erofs_register_collection - erofs_insert_workgroup - xa_lock(&sbi->managed_pslots) -- B - erofs_workgroup_get - erofs_wait_on_workgroup_freezed -- A - xa_erase - xa_lock(&sbi->managed_pslots) -- B To fix this, it needs to hold xa_lock before freezing the workgroup since xarray will be touched then. So let's hold the lock before accessing each workgroup, just like what we did with the radix tree before. [ Gao Xiang: Jianhua Hao also reports this issue at https://lore.kernel.org/r/b10b85df30694bac8aadfe43537c897a@xiaomi.com ] Link: https://lore.kernel.org/r/20211118135844.3559-1-huangjianan@oppo.com Fixes: 64094a04414f ("erofs: convert workstn to XArray") Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Huang Jianan <huangjianan@oppo.com> Reported-by: Jianhua Hao <haojianhua1@xiaomi.com> Signed-off-by: Gao Xiang <xiang@kernel.org>
* Merge tag 'erofs-for-5.16-rc1-fixes' of ↵Linus Torvalds2021-11-133-30/+17
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - fix unsafe pagevec reuse which could cause unexpected behaviors - get rid of the unused DELAYEDALLOC strategy that has been replaced by TRYALLOC * tag 'erofs-for-5.16-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: remove useless cache strategy of DELAYEDALLOC erofs: fix unsafe pagevec reuse of hooked pclusters
| * erofs: remove useless cache strategy of DELAYEDALLOCYue Hu2021-11-082-21/+0
| | | | | | | | | | | | | | | | | | | | | | | | After commit 1825c8d7ce93 ("erofs: force inplace I/O under low memory scenario") and TRYALLOC is widely used, DELAYEDALLOC won't be used anymore. Remove related dead code. Also, remove the blank line at the end of zdata.h. Link: https://lore.kernel.org/r/20211106082315.25781-1-huyue2@yulong.com Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <xiang@kernel.org>
| * erofs: fix unsafe pagevec reuse of hooked pclustersGao Xiang2021-11-082-9/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are pclusters in runtime marked with Z_EROFS_PCLUSTER_TAIL before actual I/O submission. Thus, the decompression chain can be extended if the following pcluster chain hooks such tail pcluster. As the related comment mentioned, if some page is made of a hooked pcluster and another followed pcluster, it can be reused for in-place I/O (since I/O should be submitted anyway): _______________________________________________________________ | tail (partial) page | head (partial) page | |_____PRIMARY_HOOKED___|____________PRIMARY_FOLLOWED____________| However, it's by no means safe to reuse as pagevec since if such PRIMARY_HOOKED pclusters finally move into bypass chain without I/O submission. It's somewhat hard to reproduce with LZ4 and I just found it (general protection fault) by ro_fsstressing a LZMA image for long time. I'm going to actively clean up related code together with multi-page folio adaption in the next few months. Let's address it directly for easier backporting for now. Call trace for reference: z_erofs_decompress_pcluster+0x10a/0x8a0 [erofs] z_erofs_decompress_queue.isra.36+0x3c/0x60 [erofs] z_erofs_runqueue+0x5f3/0x840 [erofs] z_erofs_readahead+0x1e8/0x320 [erofs] read_pages+0x91/0x270 page_cache_ra_unbounded+0x18b/0x240 filemap_get_pages+0x10a/0x5f0 filemap_read+0xa9/0x330 new_sync_read+0x11b/0x1a0 vfs_read+0xf1/0x190 Link: https://lore.kernel.org/r/20211103182006.4040-1-xiang@kernel.org Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: <stable@vger.kernel.org> # 4.19+ Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | Merge tag 'gfs2-v5.15-rc5-mmap-fault' of ↵Linus Torvalds2021-11-021-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 mmap + page fault deadlocks fixes from Andreas Gruenbacher: "Functions gfs2_file_read_iter and gfs2_file_write_iter are both accessing the user buffer to write to or read from while holding the inode glock. In the most basic deadlock scenario, that buffer will not be resident and it will be mapped to the same file. Accessing the buffer will trigger a page fault, and gfs2 will deadlock trying to take the same inode glock again while trying to handle that fault. Fix that and similar, more complex scenarios by disabling page faults while accessing user buffers. To make this work, introduce a small amount of new infrastructure and fix some bugs that didn't trigger so far, with page faults enabled" * tag 'gfs2-v5.15-rc5-mmap-fault' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix mmap + page fault deadlocks for direct I/O iov_iter: Introduce nofault flag to disable page faults gup: Introduce FOLL_NOFAULT flag to disable page faults iomap: Add done_before argument to iomap_dio_rw iomap: Support partial direct I/O on user copy failures iomap: Fix iomap_dio_rw return value for user copies gfs2: Fix mmap + page fault deadlocks for buffered I/O gfs2: Eliminate ip->i_gh gfs2: Move the inode glock locking to gfs2_file_buffered_write gfs2: Introduce flag for glock holder auto-demotion gfs2: Clean up function may_grant gfs2: Add wrapper for iomap_file_buffered_write iov_iter: Introduce fault_in_iov_iter_writeable iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} powerpc/kvm: Fix kvm_use_magic_page iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value
| * | iomap: Add done_before argument to iomap_dio_rwAndreas Gruenbacher2021-10-241-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a done_before argument to iomap_dio_rw that indicates how much of the request has already been transferred. When the request succeeds, we report that done_before additional bytes were tranferred. This is useful for finishing a request asynchronously when part of the request has already been completed synchronously. We'll use that to allow iomap_dio_rw to be used with page faults disabled: when a page fault occurs while submitting a request, we synchronously complete the part of the request that has already been submitted. The caller can then take care of the page fault and call iomap_dio_rw again for the rest of the request, passing in the number of bytes already tranferred. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
* | erofs: don't trigger WARN() when decompression failsGao Xiang2021-10-311-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reported a WARNING [1] due to corrupted compressed data. As Dmitry said, "If this is not a kernel bug, then the code should not use WARN. WARN if for kernel bugs and is recognized as such by all testing systems and humans." [1] https://lore.kernel.org/r/000000000000b3586105cf0ff45e@google.com Link: https://lore.kernel.org/r/20211025074311.130395-1-hsiangkao@linux.alibaba.com Cc: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Reported-by: syzbot+d8aaffc3719597e8cfb4@syzkaller.appspotmail.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | erofs: get rid of ->lru usageGao Xiang2021-10-257-44/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, ->lru is a way to arrange non-LRU pages and has some in-kernel users. In order to minimize noticable issues of page reclaim and cache thrashing under high memory presure, limited temporary pages were all chained with ->lru and can be reused during the request. However, it seems that ->lru could be removed when folio is landing. Let's use page->private to chain temporary pages for now instead and transform EROFS formally after the topic of the folio / file page design is finalized. Link: https://lore.kernel.org/r/20211022090120.14675-1-hsiangkao@linux.alibaba.com Cc: Matthew Wilcox <willy@infradead.org> Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
* | erofs: lzma compression supportGao Xiang2021-10-1911-21/+383
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add MicroLZMA support in order to maximize compression ratios for specific scenarios. For example, it's useful for low-end embedded boards and as a secondary algorithm in a file for specific access patterns. MicroLZMA is a new container format for raw LZMA1, which was created by Lasse Collin aiming to minimize old LZMA headers and get rid of unnecessary EOPM (end of payload marker) as well as to enable fixed-sized output compression, especially for 4KiB pclusters. Similar to LZ4, inplace I/O approach is used to minimize runtime memory footprint when dealing with I/O. Overlapped decompression is handled with 1) bounced buffer for data under processing or 2) extra short-lived pages from the on-stack pagepool which will be shared in the same read request (128KiB for example). Link: https://lore.kernel.org/r/20211010213145.17462-8-xiang@kernel.org Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>