summaryrefslogtreecommitdiffstats
path: root/fs/ocfs2/journal.c
Commit message (Collapse)AuthorAgeFilesLines
* ocfs2: annotate struct ocfs2_replay_map with __counted_byKees Cook2023-10-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct ocfs2_replay_map. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Link: https://lkml.kernel.org/r/20230922174925.work.293-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Tom Rix <trix@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* Merge tag 'ext4_for_linus-6.6-rc1' of ↵Linus Torvalds2023-08-311-4/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Many ext4 and jbd2 cleanups and bug fixes: - Cleanups in the ext4 remount code when going to and from read-only - Cleanups in ext4's multiblock allocator - Cleanups in the jbd2 setup/mounting code paths - Performance improvements when appending to a delayed allocation file - Miscellaneous syzbot and other bug fixes" * tag 'ext4_for_linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (60 commits) ext4: fix slab-use-after-free in ext4_es_insert_extent() libfs: remove redundant checks of s_encoding ext4: remove redundant checks of s_encoding ext4: reject casefold inode flag without casefold feature ext4: use LIST_HEAD() to initialize the list_head in mballoc.c ext4: do not mark inode dirty every time when appending using delalloc ext4: rename s_error_work to s_sb_upd_work ext4: add periodic superblock update check ext4: drop dio overwrite only flag and associated warning ext4: add correct group descriptors and reserved GDT blocks to system zone ext4: remove unused function declaration ext4: mballoc: avoid garbage value from err ext4: use sbi instead of EXT4_SB(sb) in ext4_mb_new_blocks_simple() ext4: change the type of blocksize in ext4_mb_init_cache() ext4: fix unttached inode after power cut with orphan file feature enabled jbd2: correct the end of the journal recovery scan range ext4: ext4_get_{dev}_journal return proper error value ext4: cleanup ext4_get_dev_journal() and ext4_get_journal() jbd2: jbd2_journal_init_{dev,inode} return proper error return value jbd2: drop useless error tag in jbd2_journal_wipe() ...
| * jbd2: jbd2_journal_init_{dev,inode} return proper error return valueZhang Yi2023-08-231-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Current jbd2_journal_init_{dev,inode} return NULL if some error happens, make them to pass out proper error return value. [ Fix from Yang Yingliang folded in. ] Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-11-yi.zhang@huaweicloud.com Link: https://lore.kernel.org/r/20230822030018.644419-1-yangyingliang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | Merge tag 'mm-nonmm-stable-2023-08-28-22-48' of ↵Linus Torvalds2023-08-291-7/+4
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - An extensive rework of kexec and crash Kconfig from Eric DeVolder ("refactor Kconfig to consolidate KEXEC and CRASH options") - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a couple of macros to args.h") - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper commands") - vsprintf inclusion rationalization from Andy Shevchenko ("lib/vsprintf: Rework header inclusions") - Switch the handling of kdump from a udev scheme to in-kernel handling, by Eric DeVolder ("crash: Kernel handling of CPU and memory hot un/plug") - Many singleton patches to various parts of the tree * tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (81 commits) document while_each_thread(), change first_tid() to use for_each_thread() drivers/char/mem.c: shrink character device's devlist[] array x86/crash: optimize CPU changes crash: change crash_prepare_elf64_headers() to for_each_possible_cpu() crash: hotplug support for kexec_load() x86/crash: add x86 crash hotplug support crash: memory and CPU hotplug sysfs attributes kexec: exclude elfcorehdr from the segment digest crash: add generic infrastructure for crash hotplug support crash: move a few code bits to setup support of crash hotplug kstrtox: consistently use _tolower() kill do_each_thread() nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse scripts/bloat-o-meter: count weak symbol sizes treewide: drop CONFIG_EMBEDDED lockdep: fix static memory detection even more lib/vsprintf: declare no_hash_pointers in sprintf.h lib/vsprintf: split out sprintf() and friends kernel/fork: stop playing lockless games for exe_file replacement adfs: delete unused "union adfs_dirtail" definition ...
| * | ocfs2: Use struct_size()Christophe JAILLET2023-08-181-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use struct_size() instead of hand-writing it, when allocating a structure with a flex array. This is less verbose. Link: https://lkml.kernel.org/r/9d99ea2090739f816d0dc0c4ebaa42b26fc48a9e.1689533270.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | ocfs2: use flexible array in 'struct ocfs2_recovery_map'Christophe JAILLET2023-08-181-4/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Turn 'rm_entries' in 'struct ocfs2_recovery_map' into a flexible array. The advantages are: - save the size of a pointer when the new undo structure is allocated - avoid some always ugly pointer arithmetic to get the address of 'rm_entries' - avoid an indirection when the array is accessed While at it, use struct_size() to compute the size of the new undo structure. Link: https://lkml.kernel.org/r/c645911ffd2720fce5e344c17de642518cd0db52.1689533270.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* / ocfs2: stop using bdev->bd_super for journal error loggingChristoph Hellwig2023-08-091-3/+3
|/ | | | | | | | | | | | All ocfs2 journal error handling and logging is based on buffer_heads, and the owning inode and thus super_block can be retrieved through bh->b_assoc_map->host. Switch to using that to remove the last users of bdev->bd_super. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Message-Id: <20230807112625.652089-4-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
* ocfs2: use filemap_fdatawrite_wbc instead of generic_writepagesChristoph Hellwig2023-01-181-1/+1
| | | | | | | | | | | | | | | | | | | filemap_fdatawrite_wbc is a fairly thing wrapper around do_writepages, and the big difference there is support for cgroup writeback, which is not supported by ocfs2, and the potential to use ->writepages instead of ->writepage, which ocfs2 does not currently implement but eventually should. Link: https://lkml.kernel.org/r/20221229161031.391878-6-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* jbd2,ocfs2: move jbd2_journal_submit_inode_data_buffers to ocfs2Christoph Hellwig2023-01-181-1/+15
| | | | | | | | | | | | | | | | jbd2_journal_submit_inode_data_buffers is only used by ocfs2, so move it there to prepare for removing generic_writepages. Link: https://lkml.kernel.org/r/20221229161031.391878-5-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* ocfs2: fix memory leak in ocfs2_mount_volume()Li Zetao2022-11-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a memory leak reported by kmemleak: unreferenced object 0xffff88810cc65e60 (size 32): comm "mount.ocfs2", pid 23753, jiffies 4302528942 (age 34735.105s) hex dump (first 32 bytes): 10 00 00 00 00 00 00 00 00 01 01 01 01 01 01 01 ................ 01 01 01 01 01 01 01 01 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8170f73d>] __kmalloc+0x4d/0x150 [<ffffffffa0ac3f51>] ocfs2_compute_replay_slots+0x121/0x330 [ocfs2] [<ffffffffa0b65165>] ocfs2_check_volume+0x485/0x900 [ocfs2] [<ffffffffa0b68129>] ocfs2_mount_volume.isra.0+0x1e9/0x650 [ocfs2] [<ffffffffa0b7160b>] ocfs2_fill_super+0xe0b/0x1740 [ocfs2] [<ffffffff818e1fe2>] mount_bdev+0x312/0x400 [<ffffffff819a086d>] legacy_get_tree+0xed/0x1d0 [<ffffffff818de82d>] vfs_get_tree+0x7d/0x230 [<ffffffff81957f92>] path_mount+0xd62/0x1760 [<ffffffff81958a5a>] do_mount+0xca/0xe0 [<ffffffff81958d3c>] __x64_sys_mount+0x12c/0x1a0 [<ffffffff82f26f15>] do_syscall_64+0x35/0x80 [<ffffffff8300006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 This call stack is related to two problems. Firstly, the ocfs2 super uses "replay_map" to trace online/offline slots, in order to recover offline slots during recovery and mount. But when ocfs2_truncate_log_init() returns an error in ocfs2_mount_volume(), the memory of "replay_map" will not be freed in error handling path. Secondly, the memory of "replay_map" will not be freed if d_make_root() returns an error in ocfs2_fill_super(). But the memory of "replay_map" will be freed normally when completing recovery and mount in ocfs2_complete_mount_recovery(). Fix the first problem by adding error handling path to free "replay_map" when ocfs2_truncate_log_init() fails. And fix the second problem by calling ocfs2_free_replay_slots(osb) in the error handling path "out_dismount". In addition, since ocfs2_free_replay_slots() is static, it is necessary to remove its static attribute and declare it in header file. Link: https://lkml.kernel.org/r/20221109074627.2303950-1-lizetao1@huawei.com Fixes: 9140db04ef18 ("ocfs2: recover orphans in offline slots during recovery and mount") Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* Change calling conventions for filldir_tAl Viro2022-08-171-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | filldir_t instances (directory iterators callbacks) used to return 0 for "OK, keep going" or -E... for "stop". Note that it's *NOT* how the error values are reported - the rules for those are callback-dependent and ->iterate{,_shared}() instances only care about zero vs. non-zero (look at emit_dir() and friends). So let's just return bool ("should we keep going?") - it's less confusing that way. The choice between "true means keep going" and "true means stop" is bikesheddable; we have two groups of callbacks - do something for everything in directory, until we run into problem and find an entry in directory and do something to it. The former tended to use 0/-E... conventions - -E<something> on failure. The latter tended to use 0/1, 1 being "stop, we are done". The callers treated anything non-zero as "stop", ignoring which non-zero value did they get. "true means stop" would be more natural for the second group; "true means keep going" - for the first one. I tried both variants and the things like if allocation failed something = -ENOMEM; return true; just looked unnatural and asking for trouble. [folded suggestion from Matthew Wilcox <willy@infradead.org>] Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ocfs2: fix mounting crash if journal is not allocedHeming Zhao via Ocfs2-devel2022-04-291-10/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "rewrite error handling during mounting stage". This patch (of 5): After commit da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown"), journal init later than before, it makes NULL pointer access in free routine. Crash flow: ocfs2_fill_super + ocfs2_mount_volume | + ocfs2_dlm_init //fail & return, osb->journal is NULL. | + ... | + ocfs2_check_volume //no chance to init osb->journal | + ... + ocfs2_dismount_volume ocfs2_release_system_inodes ... evict ... ocfs2_clear_inode ocfs2_checkpoint_inode ocfs2_ci_fully_checkpointed time_after(journal->j_trans_id, ci->ci_last_trans) + journal is empty, crash! For fixing, there are three solutions: 1> Partly revert commit da5e7c87827e8 For avoiding kernel crash, this make sense for us. We only concerned whether there has any non-system inode access before dlm init. The answer is NO. And all journal replay/recovery handling happen after dlm & journal init done. So this method is not graceful but workable. 2> Add osb->journal check in free inode routine (eg ocfs2_clear_inode) The fix code is special for mounting phase, but it will continue working after mounting stage. In another word, this method adds useless code in normal inode free flow. 3> Do directly free inode in mounting phase This method is brutal/complex and may introduce unsafe code, currently maintainer didn't like. At last, we chose method <1> and did partly reverted job. We reverted journal init codes, and kept cleanup codes flow. Link: https://lkml.kernel.org/r/20220424130952.2436-1-heming.zhao@suse.com Link: https://lkml.kernel.org/r/20220424130952.2436-2-heming.zhao@suse.com Fixes: da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown") Signed-off-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* ocfs2: use BUG_ON instead of if condition followed by BUG.Zhang Mingyu2022-01-151-4/+2
| | | | | | | | | | | | | | | | | This issue was detected with the help of Coccinelle. Link: https://lkml.kernel.org/r/20211105014424.75372-1-zhang.mingyu@zte.com.cn Signed-off-by: Zhang Mingyu <zhang.mingyu@zte.com.cn> Reported-by: Zeal Robot <zealci@zte.com.cn> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'exit-cleanups-for-v5.16' of ↵Linus Torvalds2021-11-101-4/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull exit cleanups from Eric Biederman: "While looking at some issues related to the exit path in the kernel I found several instances where the code is not using the existing abstractions properly. This set of changes introduces force_fatal_sig a way of sending a signal and not allowing it to be caught, and corrects the misuse of the existing abstractions that I found. A lot of the misuse of the existing abstractions are silly things such as doing something after calling a no return function, rolling BUG by hand, doing more work than necessary to terminate a kernel thread, or calling do_exit(SIGKILL) instead of calling force_sig(SIGKILL). In the review a deficiency in force_fatal_sig and force_sig_seccomp where ptrace or sigaction could prevent the delivery of the signal was found. I have added a change that adds SA_IMMUTABLE to change that makes it impossible to interrupt the delivery of those signals, and allows backporting to fix force_sig_seccomp And Arnd found an issue where a function passed to kthread_run had the wrong prototype, and after my cleanup was failing to build." * 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (23 commits) soc: ti: fix wkup_m3_rproc_boot_thread return type signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV) exit/r8188eu: Replace the macro thread_exit with a simple return 0 exit/rtl8712: Replace the macro thread_exit with a simple return 0 exit/rtl8723bs: Replace the macro thread_exit with a simple return 0 signal/x86: In emulate_vsyscall force a signal instead of calling do_exit signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails exit/syscall_user_dispatch: Send ordinary signals on failure signal: Implement force_fatal_sig exit/kthread: Have kernel threads return instead of calling do_exit signal/s390: Use force_sigsegv in default_trap_handler signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved. signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON signal/sparc: In setup_tsb_params convert open coded BUG into BUG signal/powerpc: On swapcontext failure force SIGSEGV signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL) signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT signal/sparc32: Remove unreachable do_exit in do_sparc_fault ...
| * exit/kthread: Have kernel threads return instead of calling do_exitEric W. Biederman2021-10-291-4/+1
| | | | | | | | | | | | | | | | | | | | | | In 2009 Oleg reworked[1] the kernel threads so that it is not necessary to call do_exit if you are not using kthread_stop(). Remove the explicit calls of do_exit and complete_and_exit (with a NULL completion) that were previously necessary. [1] 63706172f332 ("kthreads: rework kthread_stop()") Link: https://lkml.kernel.org/r/20211020174406.17889-12-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
* | ocfs2: cleanup journal init and shutdownValentin Vidic2021-11-061-5/+21
|/ | | | | | | | | | | | | | | | | | Allocate and free struct ocfs2_journal in ocfs2_journal_init and ocfs2_journal_shutdown. Init and release of system inodes references the journal so reorder calls to make sure they work correctly. Link: https://lkml.kernel.org/r/20211009145006.3478-1-vvidic@valentin-vidic.from.hr Signed-off-by: Valentin Vidic <vvidic@valentin-vidic.from.hr> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ext4: add discard/zeroout flags to journal flushLeah Rumancik2021-06-221-4/+4
| | | | | | | | | Add a flags argument to jbd2_journal_flush to enable discarding or zero-filling the journal blocks while flushing the journal. Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> Link: https://lore.kernel.org/r/20210518151327.130198-1-leah.rumancik@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* treewide: remove editor modelines and cruftMasahiro Yamada2021-05-071-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | The section "19) Editor modelines and other cruft" in Documentation/process/coding-style.rst clearly says, "Do not include any of these in source files." I recently receive a patch to explicitly add a new one. Let's do treewide cleanups, otherwise some people follow the existing code and attempt to upstream their favoriate editor setups. It is even nicer if scripts/checkpatch.pl can check it. If we like to impose coding style in an editor-independent manner, I think editorconfig (patch [1]) is a saner solution. [1] https://lore.kernel.org/lkml/20200703073143.423557-1-danny@kdrag0n.dev/ Link: https://lkml.kernel.org/r/20210324054457.1477489-1-masahiroy@kernel.org Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> [auxdisplay] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* jbd2: rename j_maxlen to j_total_len and add jbd2_journal_max_txn_bufsHarshad Shirwadkar2020-11-061-1/+1
| | | | | | | | | | | | | | | The on-disk superblock field sb->s_maxlen represents the total size of the journal including the fast commit area and is no more the max number of blocks available for a transaction. The maximum number of blocks available to a transaction is reduced by the number of fast commit blocks. So, this patch renames j_maxlen to j_total_len to better represent its intent. Also, it adds a function to calculate max number of bufs available for a transaction. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-6-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2, ext4, ocfs2: introduce/use journal callbacks ↵Mauricio Faria de Oliveira2020-10-181-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | j_submit|finish_inode_data_buffers() Introduce journal callbacks to allow different behaviors for an inode in journal_submit|finish_inode_data_buffers(). The existing users of the current behavior (ext4, ocfs2) are adapted to use the previously exported functions that implement the current behavior. Users are callers of jbd2_journal_inode_ranged_write|wait(), which adds the inode to the transaction's inode list with the JI_WRITE|WAIT_DATA flags. Only ext4 and ocfs2 in-tree. Both CONFIG_EXT4_FS and CONFIG_OCSFS2_FS select CONFIG_JBD2, which builds fs/jbd2/commit.c and journal.c that define and export the functions, so we can call directly in ext4/ocfs2. Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201006004841.600488-3-mfo@canonical.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ocfs2: replace zero-length array with flexible-array memberGustavo A. R. Silva2020-04-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200213160244.GA6088@embeddedor Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: call journal flush to mark journal as empty after journal recovery ↵Kai Li2020-01-041-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when mount If journal is dirty when mount, it will be replayed but jbd2 sb log tail cannot be updated to mark a new start because journal->j_flag has already been set with JBD2_ABORT first in journal_init_common. When a new transaction is committed, it will be recored in block 1 first(journal->j_tail is set to 1 in journal_reset). If emergency restart happens again before journal super block is updated unfortunately, the new recorded trans will not be replayed in the next mount. The following steps describe this procedure in detail. 1. mount and touch some files 2. these transactions are committed to journal area but not checkpointed 3. emergency restart 4. mount again and its journals are replayed 5. journal super block's first s_start is 1, but its s_seq is not updated 6. touch a new file and its trans is committed but not checkpointed 7. emergency restart again 8. mount and journal is dirty, but trans committed in 6 will not be replayed. This exception happens easily when this lun is used by only one node. If it is used by multi-nodes, other node will replay its journal and its journal super block will be updated after recovery like what this patch does. ocfs2_recover_node->ocfs2_replay_journal. The following jbd2 journal can be generated by touching a new file after journal is replayed, and seq 15 is the first valid commit, but first seq is 13 in journal super block. logdump: Block 0: Journal Superblock Seq: 0 Type: 4 (JBD2_SUPERBLOCK_V2) Blocksize: 4096 Total Blocks: 32768 First Block: 1 First Commit ID: 13 Start Log Blknum: 1 Error: 0 Feature Compat: 0 Feature Incompat: 2 block64 Feature RO compat: 0 Journal UUID: 4ED3822C54294467A4F8E87D2BA4BC36 FS Share Cnt: 1 Dynamic Superblk Blknum: 0 Per Txn Block Limit Journal: 0 Data: 0 Block 1: Journal Commit Block Seq: 14 Type: 2 (JBD2_COMMIT_BLOCK) Block 2: Journal Descriptor Seq: 15 Type: 1 (JBD2_DESCRIPTOR_BLOCK) No. Blocknum Flags 0. 587 none UUID: 00000000000000000000000000000000 1. 8257792 JBD2_FLAG_SAME_UUID 2. 619 JBD2_FLAG_SAME_UUID 3. 24772864 JBD2_FLAG_SAME_UUID 4. 8257802 JBD2_FLAG_SAME_UUID 5. 513 JBD2_FLAG_SAME_UUID JBD2_FLAG_LAST_TAG ... Block 7: Inode Inode: 8257802 Mode: 0640 Generation: 57157641 (0x3682809) FS Generation: 2839773110 (0xa9437fb6) CRC32: 00000000 ECC: 0000 Type: Regular Attr: 0x0 Flags: Valid Dynamic Features: (0x1) InlineData User: 0 (root) Group: 0 (root) Size: 7 Links: 1 Clusters: 0 ctime: 0x5de5d870 0x11104c61 -- Tue Dec 3 11:37:20.286280801 2019 atime: 0x5de5d870 0x113181a1 -- Tue Dec 3 11:37:20.288457121 2019 mtime: 0x5de5d870 0x11104c61 -- Tue Dec 3 11:37:20.286280801 2019 dtime: 0x0 -- Thu Jan 1 08:00:00 1970 ... Block 9: Journal Commit Block Seq: 15 Type: 2 (JBD2_COMMIT_BLOCK) The following is journal recovery log when recovering the upper jbd2 journal when mount again. syslog: ocfs2: File system on device (252,1) was not unmounted cleanly, recovering it. fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 0 fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 1 fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 2 fs/jbd2/recovery.c:(jbd2_journal_recover, 278): JBD2: recovery, exit status 0, recovered transactions 13 to 13 Due to first commit seq 13 recorded in journal super is not consistent with the value recorded in block 1(seq is 14), journal recovery will be terminated before seq 15 even though it is an unbroken commit, inode 8257802 is a new file and it will be lost. Link: http://lkml.kernel.org/r/20191217020140.2197-1-li.kai4@h3c.com Signed-off-by: Kai Li <li.kai4@h3c.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Changwei Ge <gechangwei@live.cn> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'ext4_for_linus' of ↵Linus Torvalds2019-11-301-4/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "This merge window saw the the following new featuers added to ext4: - Direct I/O via iomap (required the iomap-for-next branch from Darrick as a prereq). - Support for using dioread-nolock where the block size < page size. - Support for encryption for file systems where the block size < page size. - Rework of journal credits handling so a revoke-heavy workload will not cause the journal to run out of space. - Replace bit-spinlocks with spinlocks in jbd2 Also included were some bug fixes and cleanups, mostly to clean up corner cases from fuzzed file systems and error path handling" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (59 commits) ext4: work around deleting a file with i_nlink == 0 safely ext4: add more paranoia checking in ext4_expand_extra_isize handling jbd2: make jbd2_handle_buffer_credits() handle reserved handles ext4: fix a bug in ext4_wait_for_tail_page_commit ext4: bio_alloc with __GFP_DIRECT_RECLAIM never fails ext4: code cleanup for get_next_id ext4: fix leak of quota reservations ext4: remove unused variable warning in parse_options() ext4: Enable encryption for subpage-sized blocks fs/buffer.c: support fscrypt in block_read_full_page() ext4: Add error handling for io_end_vec struct allocation jbd2: Fine tune estimate of necessary descriptor blocks jbd2: Provide trace event for handle restarts ext4: Reserve revoke credits for freed blocks jbd2: Make credit checking more strict jbd2: Rename h_buffer_credits to h_total_credits jbd2: Reserve space for revoke descriptor blocks jbd2: Drop jbd2_space_needed() jbd2: Account descriptor blocks into t_outstanding_credits jbd2: Factor out common parts of stopping and restarting a handle ...
| * jbd2: Reserve space for revoke descriptor blocksJan Kara2019-11-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend functions for starting, extending, and restarting transaction handles to take number of revoke records handle must be able to accommodate. These functions then make sure transaction has enough credits to be able to store resulting revoke descriptor blocks. Also revoke code tracks number of revoke records created by a handle to catch situation where some place didn't reserve enough space for revoke records. Similarly to standard transaction credits, space for unused reserved revoke records is released when the handle is stopped. On the ext4 side we currently take a simplistic approach of reserving space for 1024 revoke records for any transaction. This grows amount of credits reserved for each handle only by a few and is enough for any normal workload so that we don't hit warnings in jbd2. We will refine the logic in following commits. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-20-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ocfs2: Use accessor function for h_buffer_creditsJan Kara2019-11-051-2/+2
| | | | | | | | | | | | | | | | | | Use the jbd2 accessor function for h_buffer_credits. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-12-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | ocfs2: fix panic due to ocfs2_wq is nullYi Li2019-10-191-1/+2
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mount.ocfs2 failed when reading ocfs2 filesystem superblock encounters an error. ocfs2_initialize_super() returns before allocating ocfs2_wq. ocfs2_dismount_volume() triggers the following panic. Oct 15 16:09:27 cnwarekv-205120 kernel: On-disk corruption discovered.Please run fsck.ocfs2 once the filesystem is unmounted. Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_read_locked_inode:537 ERROR: status = -30 Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:458 ERROR: status = -30 Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:491 ERROR: status = -30 Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_initialize_super:2313 ERROR: status = -30 Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_fill_super:1033 ERROR: status = -30 ------------[ cut here ]------------ Oops: 0002 [#1] SMP NOPTI CPU: 1 PID: 11753 Comm: mount.ocfs2 Tainted: G E 4.14.148-200.ckv.x86_64 #1 Hardware name: Sugon H320-G30/35N16-US, BIOS 0SSDX017 12/21/2018 task: ffff967af0520000 task.stack: ffffa5f05484000 RIP: 0010:mutex_lock+0x19/0x20 Call Trace: flush_workqueue+0x81/0x460 ocfs2_shutdown_local_alloc+0x47/0x440 [ocfs2] ocfs2_dismount_volume+0x84/0x400 [ocfs2] ocfs2_fill_super+0xa4/0x1270 [ocfs2] ? ocfs2_initialize_super.isa.211+0xf20/0xf20 [ocfs2] mount_bdev+0x17f/0x1c0 mount_fs+0x3a/0x160 Link: http://lkml.kernel.org/r/1571139611-24107-1-git-send-email-yili@winhong.com Signed-off-by: Yi Li <yilikernel@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 145Thomas Gleixner2019-05-301-15/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 021110 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 84 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190524100844.756442981@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ocfs2: clear journal dirty flag after shutdown journalJunxiao Bi2018-12-281-4/+2
| | | | | | | | | | | | | | | | | | | Dirty flag of the journal should be cleared at the last stage of umount, if do it before jbd2_journal_destroy(), then some metadata in uncommitted transaction could be lost due to io error, but as dirty flag of journal was already cleared, we can't find that until run a full fsck. This may cause system panic or other corruption. Link: http://lkml.kernel.org/r/20181121020023.3034-3-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Changwei Ge <ge.changwei@h3c.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@versity.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: without quota support, avoid calling quota recoveryGuozhonghua2018-11-031-17/+34
| | | | | | | | | | | | | | | | | During one dead node's recovery by other node, quota recovery work will be queued. We should avoid calling quota when it is not supported, so check the quota flags. Link: http://lkml.kernel.org/r/71604351584F6A4EBAE558C676F37CA401071AC9FB@H3CMLB12-EX.srv.huawei-3com.com Signed-off-by: guozhonghua <guozhonghua@h3c.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* treewide: kzalloc() -> kcalloc()Kees Cook2018-06-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
* ocfs2: return error when we attempt to access a dirty bh in jbd2piaojun2018-01-311-11/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should not reuse the dirty bh in jbd2 directly due to the following situation: 1. When removing extent rec, we will dirty the bhs of extent rec and truncate log at the same time, and hand them over to jbd2. 2. The bhs are submitted to jbd2 area successfully. 3. The write-back thread of device help flush the bhs to disk but encounter write error due to abnormal storage link. 4. After a while the storage link become normal. Truncate log flush worker triggered by the next space reclaiming found the dirty bh of truncate log and clear its 'BH_Write_EIO' and then set it uptodate in __ocfs2_journal_access(): ocfs2_truncate_log_worker ocfs2_flush_truncate_log __ocfs2_flush_truncate_log ocfs2_replay_truncate_records ocfs2_journal_access_di __ocfs2_journal_access // here we clear io_error and set 'tl_bh' uptodata. 5. Then jbd2 will flush the bh of truncate log to disk, but the bh of extent rec is still in error state, and unfortunately nobody will take care of it. 6. At last the space of extent rec was not reduced, but truncate log flush worker have given it back to globalalloc. That will cause duplicate cluster problem which could be identified by fsck.ocfs2. Sadly we can hardly revert this but set fs read-only in case of ruining atomicity and consistency of space reclaim. Link: http://lkml.kernel.org/r/5A6E8092.8090701@huawei.com Fixes: acf8fdbe6afb ("ocfs2: do not BUG if buffer not uptodate in __ocfs2_journal_access") Signed-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: clean up some dead codeJun Piao2017-09-061-1/+0
| | | | | | | | | | | | | | clean up some unused functions and parameters. Link: http://lkml.kernel.org/r/598A5E21.2080807@huawei.com Signed-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Alex Chen <alex.chen@huawei.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: use time64_t to represent orphan scan timesDeepa Dinamani2016-12-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | struct timespec is not y2038 safe. Use time64_t which is y2038 safe to represent orphan scan times. time64_t is sufficient here as only the seconds delta times are relevant. Also use appropriate time functions that return time in time64_t format. Time functions now return monotonic time instead of real time as only delta scan times are relevant and these values are not persistent across reboots. The format string for the debug print is still using long as this is only the time elapsed since the last scan and long is sufficient to represent this value. Link: http://lkml.kernel.org/r/1475365138-20567-1-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: improve recovery performanceJunxiao Bi2016-07-261-20/+21
| | | | | | | | | | | | | | | | | | Journal replay will be run when performing recovery for a dead node. To avoid the stale cache impact, all blocks of dead node's journal inode were reloaded from disk. This hurts the performance. Check whether one block is cached before reloading it can improve performance a lot. In my test env, the time doing recovery was improved from 120s to 1s. [akpm@linux-foundation.org: clean up the for loop p_blkno handling] Link: http://lkml.kernel.org/r/1466155682-24656-1-git-send-email-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: "Gang He" <ghe@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: fix occurring deadlock by changing ocfs2_wq from global to localjiangyiwen2016-03-251-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a deadlock, as follows: Node 1 Node 2 Node 3 1)volume a and b are only mount vol a only mount vol b mounted 2) start to mount b start to mount a 3) check hb of Node 3 check hb of Node 2 in vol a, qs_holds++ in vol b, qs_holds++ 4) -------------------- all nodes' network down -------------------- 5) progress of mount b the same situation as failed, and then call Node 2 ocfs2_dismount_volume. but the process is hung, since there is a work in ocfs2_wq cannot beo completed. This work is about vol a, because ocfs2_wq is global wq. BTW, this work which is scheduled in ocfs2_wq is ocfs2_orphan_scan_work, and the context in this work needs to take inode lock of orphan_dir, because lockres owner are Node 1 and all nodes' nework has been down at the same time, so it can't get the inode lock. 6) Why can't this node be fenced when network disconnected? Because the process of mount is hung what caused qs_holds is not equal 0. Because all works in the ocfs2_wq are relative to the super block. The solution is to change the ocfs2_wq from global to local. In other words, move it into struct ocfs2_super. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Xue jiufei <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* wrappers for ->i_mutex accessAl Viro2016-01-221-4/+4
| | | | | | | | | | | parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ocfs2: clean up redundant NULL check before iputJoseph Qi2016-01-141-7/+3
| | | | | | | | | | | Since iput will take care the NULL check itself, NULL check before calling it is redundant. So clean them up. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: add uuid to ocfs2 thread name for problem analysisJoseph Qi2015-11-051-2/+2
| | | | | | | | | | | | | | | | A node can mount multiple ocfs2 volumes. And if thread names are same for each volume/domain, it will bring inconvenience when analyzing problems because we have to identify which volume/domain the messages belong to. Since thread name will be printed to messages, so add volume uuid or dlm name to thread name can benefit problem analysis. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Gang He <ghe@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: only take lock if dio entry when recover orphansJoseph Qi2015-11-051-39/+47
| | | | | | | | | | | | We have no need to take inode mutex, rw and inode lock if it is not dio entry when recover orphans. Optimize it by adding a flag OCFS2_INODE_DIO_ORPHAN_ENTRY to ocfs2_inode_info to reduce contention. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: do not include dio entry in case of orphan scanJoseph Qi2015-11-051-3/+12
| | | | | | | | | | | dio entry will only do truncate in case of ORPHAN_NEED_TRUNCATE. So do not include it when doing normal orphan scan to reduce contention. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: neaten do_error, ocfs2_error and ocfs2_abortJoe Perches2015-09-041-1/+1
| | | | | | | | | | | | | | | | | These uses sometimes do and sometimes don't have '\n' terminations. Make the uses consistently use '\n' terminations and remove the newline from the functions. Miscellanea: o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: fix race between crashed dio and rmJoseph Qi2015-09-041-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | There is a race case between crashed dio and rm, which will lead to OCFS2_VALID_FL not set read-only. N1 N2 ------------------------------------------------------------------------ dd with direct flag rm file crashed with an dio entry left in orphan dir clear OCFS2_VALID_FL in ocfs2_remove_inode recover N1 and read the corrupted inode, and set filesystem read-only So we skip the inode deletion this time and wait for dio entry recovered first. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: do not BUG if buffer not uptodate in __ocfs2_journal_accessJoseph Qi2015-09-041-1/+17
| | | | | | | | | | | | | | When storage network is unstable, it may trigger the BUG in __ocfs2_journal_access because of buffer not uptodate. We can retry the write in this case or return error instead of BUG. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reported-by: Zhangguanghui <zhang.guanghui@h3c.com> Tested-by: Zhangguanghui <zhang.guanghui@h3c.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: fix race between dio and recover orphanJoseph Qi2015-09-041-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | During direct io the inode will be added to orphan first and then deleted from orphan. There is a race window that the orphan entry will be deleted twice and thus trigger the BUG when validating OCFS2_DIO_ORPHANED_FL in ocfs2_del_inode_from_orphan. ocfs2_direct_IO_write ... ocfs2_add_inode_to_orphan >>>>>>>> race window. 1) another node may rm the file and then down, this node take care of orphan recovery and clear flag OCFS2_DIO_ORPHANED_FL. 2) since rw lock is unlocked, it may race with another orphan recovery and append dio. ocfs2_del_inode_from_orphan So take inode mutex lock when recovering orphans and make rw unlock at the end of aio write in case of append dio. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reported-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Weiwei Wang <wangww631@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: mark local functions as staticJoseph Qi2015-06-241-5/+5
| | | | | | | | | | Some functions are only used locally, so mark them as static. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: fix NULL pointer dereference in function ocfs2_abort_trigger()Xue jiufei2015-06-241-3/+1
| | | | | | | | | | | | | | | ocfs2_abort_trigger() use bh->b_assoc_map to get sb. But there's no function to set bh->b_assoc_map in ocfs2, it will trigger NULL pointer dereference while calling this function. We can get sb from bh->b_bdev->bd_super instead of b_assoc_map. [akpm@linux-foundation.org: update comment, per Joseph] Signed-off-by: joyce.xue <xuejiufei@huawei.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: do not BUG if jbd2_journal_dirty_metadata failsJoseph Qi2015-06-241-1/+14
| | | | | | | | | | | | | | jbd2_journal_dirty_metadata may fail. Currently it cannot take care of non zero return value and just BUG in ocfs2_journal_dirty. This patch is aborting the handle and journal instead of BUG. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: joyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: fix a tiny race when truncate dio orohaned entryJoseph Qi2015-06-241-26/+21
| | | | | | | | | | | | | Once dio crashed it will leave an entry in orphan dir. And orphan scan will take care of the clean up. There is a tiny race case that the same entry will be truncated twice and then trigger the BUG in ocfs2_del_inode_from_orphan. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: wait for orphan recovery first once append O_DIRECT write crashJoseph Qi2015-02-161-0/+2
| | | | | | | | | | | | | | | | | If one node has crashed with orphan entry leftover, another node which do append O_DIRECT write to the same file will override the i_dio_orphaned_slot. Then the old entry won't be cleaned forever. If this case happens, we let it wait for orphan recovery first. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Weiwei Wang <wangww631@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Xuejiufei <xuejiufei@huawei.com> Cc: alex chen <alex.chen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: add orphan recovery types in ocfs2_recover_orphansJoseph Qi2015-02-161-20/+88
| | | | | | | | | | | | | | | Define two orphan recovery types, which indicates if need truncate file or not. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Weiwei Wang <wangww631@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Xuejiufei <xuejiufei@huawei.com> Cc: alex chen <alex.chen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>