summaryrefslogtreecommitdiffstats
path: root/fs/jbd2
Commit message (Collapse)AuthorAgeFilesLines
* jbd2: fix data missing when reusing bh which is ready to be checkpointedZhihao Cheng2023-03-101-21/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e6b9bd7290d334451ce054e98e752abc055e0034 upstream. Following process will make data lost and could lead to a filesystem corrupted problem: 1. jh(bh) is inserted into T1->t_checkpoint_list, bh is dirty, and jh->b_transaction = NULL 2. T1 is added into journal->j_checkpoint_transactions. 3. Get bh prepare to write while doing checkpoing: PA PB do_get_write_access jbd2_log_do_checkpoint spin_lock(&jh->b_state_lock) if (buffer_dirty(bh)) clear_buffer_dirty(bh) // clear buffer dirty set_buffer_jbddirty(bh) transaction = journal->j_checkpoint_transactions jh = transaction->t_checkpoint_list if (!buffer_dirty(bh)) __jbd2_journal_remove_checkpoint(jh) // bh won't be flushed jbd2_cleanup_journal_tail __jbd2_journal_file_buffer(jh, transaction, BJ_Reserved) 4. Aborting journal/Power-cut before writing latest bh on journal area. In this way we get a corrupted filesystem with bh's data lost. Fix it by moving the clearing of buffer_dirty bit just before the call to __jbd2_journal_file_buffer(), both bit clearing and jh->b_transaction assignment are under journal->j_list_lock locked, so that jbd2_log_do_checkpoint() will wait until jh's new transaction fininshed even bh is currently not dirty. And journal_shrink_one_cp_list() won't remove jh from checkpoint list if the buffer head is reused in do_get_write_access(). Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216898 Cc: <stable@kernel.org> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: zhanchengbin <zhanchengbin1@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230110015327.1181863-1-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* jbd2: add miss release buffer head in fc_do_one_pass()Ye Bin2022-10-261-0/+1
| | | | | | | | | | | | | | commit dfff66f30f66b9524b661f311bbed8ff3d2ca49f upstream. In fc_do_one_pass() miss release buffer head after use which will lead to reference count leak. Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220917093805.1782845-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* jbd2: fix potential use-after-free in jbd2_fc_wait_bufsYe Bin2022-10-261-3/+3
| | | | | | | | | | | | | | | commit 243d1a5d505d0b0460c9af0ad56ed4a56ef0bebd upstream. In 'jbd2_fc_wait_bufs' use 'bh' after put buffer head reference count which may lead to use-after-free. So judge buffer if uptodate before put buffer head reference count. Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220914100812.1414768-3-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* jbd2: fix potential buffer head reference count leakYe Bin2022-10-261-1/+7
| | | | | | | | | | | | | | | | | commit e0d5fc7a6d80ac2406c7dfc6bb625201d0250a8a upstream. As in 'jbd2_fc_wait_bufs' if buffer isn't uptodate, will return -EIO without update 'journal->j_fc_off'. But 'jbd2_fc_release_bufs' will release buffer head from ‘j_fc_off - 1’ if 'bh' is NULL will terminal release which will lead to buffer head buffer head reference count leak. To solve above issue, update 'journal->j_fc_off' before return -EIO. Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220914100812.1414768-2-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* jbd2: wake up journal waiters in FIFO order, not LIFOAndrew Perepechko2022-10-262-4/+4
| | | | | | | | | | | | | | | | | commit 34fc8768ec6089565d6d73bad26724083cecf7bd upstream. LIFO wakeup order is unfair and sometimes leads to a journal user not being able to get a journal handle for hundreds of transactions in a row. FIFO wakeup can make things more fair. Cc: stable@kernel.org Signed-off-by: Alexey Lyashkov <alexey.lyashkov@gmail.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20220907165959.1137482-1-alexey.lyashkov@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* jbd2: fix assertion 'jh->b_frozen_data == NULL' failure when journal abortedZhihao Cheng2022-08-171-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 4a734f0869f970b8a9b65062ea40b09a5da9dba8 ] Following process will fail assertion 'jh->b_frozen_data == NULL' in jbd2_journal_dirty_metadata(): jbd2_journal_commit_transaction unlink(dir/a) jh->b_transaction = trans1 jh->b_jlist = BJ_Metadata journal->j_running_transaction = NULL trans1->t_state = T_COMMIT unlink(dir/b) handle->h_trans = trans2 do_get_write_access jh->b_modified = 0 jh->b_frozen_data = frozen_buffer jh->b_next_transaction = trans2 jbd2_journal_dirty_metadata is_handle_aborted is_journal_aborted // return false --> jbd2 abort <-- while (commit_transaction->t_buffers) if (is_journal_aborted) jbd2_journal_refile_buffer __jbd2_journal_refile_buffer WRITE_ONCE(jh->b_transaction, jh->b_next_transaction) WRITE_ONCE(jh->b_next_transaction, NULL) __jbd2_journal_file_buffer(jh, BJ_Reserved) J_ASSERT_JH(jh, jh->b_frozen_data == NULL) // assertion failure ! The reproducer (See detail in [Link]) reports: ------------[ cut here ]------------ kernel BUG at fs/jbd2/transaction.c:1629! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 2 PID: 584 Comm: unlink Tainted: G W 5.19.0-rc6-00115-g4a57a8400075-dirty #697 RIP: 0010:jbd2_journal_dirty_metadata+0x3c5/0x470 RSP: 0018:ffffc90000be7ce0 EFLAGS: 00010202 Call Trace: <TASK> __ext4_handle_dirty_metadata+0xa0/0x290 ext4_handle_dirty_dirblock+0x10c/0x1d0 ext4_delete_entry+0x104/0x200 __ext4_unlink+0x22b/0x360 ext4_unlink+0x275/0x390 vfs_unlink+0x20b/0x4c0 do_unlinkat+0x42f/0x4c0 __x64_sys_unlink+0x37/0x50 do_syscall_64+0x35/0x80 After journal aborting, __jbd2_journal_refile_buffer() is executed with holding @jh->b_state_lock, we can fix it by moving 'is_handle_aborted()' into the area protected by @jh->b_state_lock. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216251 Fixes: 470decc613ab20 ("[PATCH] jbd2: initial copy of files from jbd") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://lore.kernel.org/r/20220715125152.4022726-1-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
* jbd2: fix outstanding credits assert in jbd2_journal_commit_transaction()Zhang Yi2022-08-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit a89573ce4ad32f19f43ec669771726817e185be0 ] We catch an assert problem in jbd2_journal_commit_transaction() when doing fsstress and request falut injection tests. The problem is happened in a race condition between jbd2_journal_commit_transaction() and ext4_end_io_end(). Firstly, ext4_writepages() writeback dirty pages and start reserved handle, and then the journal was aborted due to some previous metadata IO error, jbd2_journal_abort() start to commit current running transaction, the committing procedure could be raced by ext4_end_io_end() and lead to subtract j_reserved_credits twice from commit_transaction->t_outstanding_credits, finally the t_outstanding_credits is mistakenly smaller than t_nr_buffers and trigger assert. kjournald2 kworker jbd2_journal_commit_transaction() write_unlock(&journal->j_state_lock); atomic_sub(j_reserved_credits, t_outstanding_credits); //sub once jbd2_journal_start_reserved() start_this_handle() //detect aborted journal jbd2_journal_free_reserved() //get running transaction read_lock(&journal->j_state_lock) __jbd2_journal_unreserve_handle() atomic_sub(j_reserved_credits, t_outstanding_credits); //sub again read_unlock(&journal->j_state_lock); journal->j_running_transaction = NULL; J_ASSERT(t_nr_buffers <= t_outstanding_credits) //bomb!!! Fix this issue by using journal->j_state_lock to protect the subtraction in jbd2_journal_commit_transaction(). Fixes: 96f1e0974575 ("jbd2: avoid long hold times of j_state_lock while committing a transaction") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220611130426.2013258-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
* jbd2: fix a potential race while discarding reserved buffers after an abortYe Bin2022-04-271-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 23e3d7f7061f8682c751c46512718f47580ad8f0 upstream. we got issue as follows: [ 72.796117] EXT4-fs error (device sda): ext4_journal_check_start:83: comm fallocate: Detected aborted journal [ 72.826847] EXT4-fs (sda): Remounting filesystem read-only fallocate: fallocate failed: Read-only file system [ 74.791830] jbd2_journal_commit_transaction: jh=0xffff9cfefe725d90 bh=0x0000000000000000 end delay [ 74.793597] ------------[ cut here ]------------ [ 74.794203] kernel BUG at fs/jbd2/transaction.c:2063! [ 74.794886] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 74.795533] CPU: 4 PID: 2260 Comm: jbd2/sda-8 Not tainted 5.17.0-rc8-next-20220315-dirty #150 [ 74.798327] RIP: 0010:__jbd2_journal_unfile_buffer+0x3e/0x60 [ 74.801971] RSP: 0018:ffffa828c24a3cb8 EFLAGS: 00010202 [ 74.802694] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 74.803601] RDX: 0000000000000001 RSI: ffff9cfefe725d90 RDI: ffff9cfefe725d90 [ 74.804554] RBP: ffff9cfefe725d90 R08: 0000000000000000 R09: ffffa828c24a3b20 [ 74.805471] R10: 0000000000000001 R11: 0000000000000001 R12: ffff9cfefe725d90 [ 74.806385] R13: ffff9cfefe725d98 R14: 0000000000000000 R15: ffff9cfe833a4d00 [ 74.807301] FS: 0000000000000000(0000) GS:ffff9d01afb00000(0000) knlGS:0000000000000000 [ 74.808338] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 74.809084] CR2: 00007f2b81bf4000 CR3: 0000000100056000 CR4: 00000000000006e0 [ 74.810047] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 74.810981] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 74.811897] Call Trace: [ 74.812241] <TASK> [ 74.812566] __jbd2_journal_refile_buffer+0x12f/0x180 [ 74.813246] jbd2_journal_refile_buffer+0x4c/0xa0 [ 74.813869] jbd2_journal_commit_transaction.cold+0xa1/0x148 [ 74.817550] kjournald2+0xf8/0x3e0 [ 74.819056] kthread+0x153/0x1c0 [ 74.819963] ret_from_fork+0x22/0x30 Above issue may happen as follows: write truncate kjournald2 generic_perform_write ext4_write_begin ext4_walk_page_buffers do_journal_get_write_access ->add BJ_Reserved list ext4_journalled_write_end ext4_walk_page_buffers write_end_fn ext4_handle_dirty_metadata ***************JBD2 ABORT************** jbd2_journal_dirty_metadata -> return -EROFS, jh in reserved_list jbd2_journal_commit_transaction while (commit_transaction->t_reserved_list) jh = commit_transaction->t_reserved_list; truncate_pagecache_range do_invalidatepage ext4_journalled_invalidatepage jbd2_journal_invalidatepage journal_unmap_buffer __dispose_buffer __jbd2_journal_unfile_buffer jbd2_journal_put_journal_head ->put last ref_count __journal_remove_journal_head bh->b_private = NULL; jh->b_bh = NULL; jbd2_journal_refile_buffer(journal, jh); bh = jh2bh(jh); ->bh is NULL, later will trigger null-ptr-deref journal_free_journal_head(jh); After commit 96f1e0974575, we no longer hold the j_state_lock while iterating over the list of reserved handles in jbd2_journal_commit_transaction(). This potentially allows the journal_head to be freed by journal_unmap_buffer while the commit codepath is also trying to free the BJ_Reserved buffers. Keeping j_state_lock held while trying extends hold time of the lock minimally, and solves this issue. Fixes: 96f1e0974575("jbd2: avoid long hold times of j_state_lock while committing a transaction") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220317142137.1821590-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: fast commit may not fallback for ineligible commitXin Yin2022-03-082-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit e85c81ba8859a4c839bcd69c5d83b32954133a5b ] For the follow scenario: 1. jbd start commit transaction n 2. task A get new handle for transaction n+1 3. task A do some ineligible actions and mark FC_INELIGIBLE 4. jbd complete transaction n and clean FC_INELIGIBLE 5. task A call fsync In this case fast commit will not fallback to full commit and transaction n+1 also not handled by jbd. Make ext4_fc_mark_ineligible() also record transaction tid for latest ineligible case, when call ext4_fc_cleanup() check current transaction tid, if small than latest ineligible tid do not clear the EXT4_MF_FC_INELIGIBLE. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: Ritesh Harjani <riteshh@linux.ibm.com> Suggested-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Link: https://lore.kernel.org/r/20220117093655.35160-2-yinxin.x@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
* jbd2: export jbd2_journal_[grab|put]_journal_headJoseph Qi2022-02-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 4cd1103d8c66b2cdb7e64385c274edb0ac5e8887 upstream. Patch series "ocfs2: fix a deadlock case". This fixes a deadlock case in ocfs2. We firstly export jbd2 symbols jbd2_journal_[grab|put]_journal_head as preparation and later use them in ocfs2 insread of jbd_[lock|unlock]_bh_journal_head to fix the deadlock. This patch (of 2): This exports symbols jbd2_journal_[grab|put]_journal_head, which will be used outside modules, e.g. ocfs2. Link: https://lkml.kernel.org/r/20220121071205.100648-2-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com> Cc: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: Support for checksumming from journal triggersJan Kara2021-08-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | JBD2 layer support triggers which are called when journaling layer moves buffer to a certain state. We can use the frozen trigger, which gets called when buffer data is frozen and about to be written out to the journal, to compute block checksums for some buffer types (similarly as does ocfs2). This avoids unnecessary repeated recomputation of the checksum (at the cost of larger window where memory corruption won't be caught by checksumming) and is even necessary when there are unsynchronized updaters of the checksummed data. So add superblock and journal trigger type arguments to ext4_journal_get_write_access() and ext4_journal_get_create_access() so that frozen triggers can be set accordingly. Also add inode argument to ext4_walk_page_buffers() and all the callbacks used with that function for the same purpose. This patch is mostly only a change of prototype of the above mentioned functions and a few small helpers. Real checksumming will come later. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210816095713.16537-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: add sparse annotations for add_transaction_credits()Theodore Ts'o2021-08-301-1/+18
| | | | Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: clean up two gcc -Wall warnings in recovery.cTheodore Ts'o2021-08-101-3/+3
| | | | | | | | | Fix a signed vs unsigned and a void * pointer arithmetic warning. This cleanup is also in e2fsprogs commit aec460db9a93 ("e2fsck: clean up two gcc -Wall warnings in recovery.c"). Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: fix clang warning in recovery.cTheodore Ts'o2021-08-101-1/+0
| | | | | | | | | | Remove unused variable store which was never used. This fix is also in e2fsprogs commit 99a2294f85f0 ("e2fsck: value stored to err is never read"). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: fix portability problems caused by unaligned accessesTheodore Ts'o2021-08-101-11/+11
| | | | | | | | | | | | | This commit applies the e2fsck/recovery.c portions of commit 1e0c8ca7c08a ("e2fsck: fix portability problems caused by unaligned accesses) from the e2fsprogs git tree. The on-disk format for the ext4 journal can have unaigned 32-bit integers. This can happen when replaying a journal using a obsolete checksum format (which was never popularly used, since the v3 format replaced v2 while the metadata checksum feature was being stablized). Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: inline jbd2_journal_[un]register_shrinker()Theodore Ts'o2021-07-082-91/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | The function jbd2_journal_unregister_shrinker() was getting called twice when the file system was getting unmounted. On Power and ARM platforms this was causing kernel crash when unmounting the file system, when a percpu_counter was destroyed twice. Fix this by removing jbd2_journal_[un]register_shrinker() functions, and inlining the shrinker setup and teardown into journal_init_common() and jbd2_journal_destroy(). This means that ext4 and ocfs2 now no longer need to know about registering and unregistering jbd2's shrinker. Also, while we're at it, rename the percpu counter from j_jh_shrink_count to j_checkpoint_jh_count, since this makes it clearer what this counter is intended to track. Link: https://lore.kernel.org/r/20210705145025.3363130-1-tytso@mit.edu Fixes: 4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers") Reported-by: Jon Hunter <jonathanh@nvidia.com> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: export jbd2_journal_[un]register_shrinker()Zhang Yi2021-06-301-0/+2
| | | | | | | | | | | | | | Export jbd2_journal_[un]register_shrinker() to fix this error when ext4 is built as a module: ERROR: modpost: "jbd2_journal_unregister_shrinker" undefined! ERROR: modpost: "jbd2_journal_register_shrinker" undefined! Fixes: 4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210630083638.140218-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: simplify journal_clean_one_cp_list()Zhang Yi2021-06-241-26/+4
| | | | | | | | | | | | Now that __try_to_free_cp_buf() remove checkpointed buffer or transaction when the buffer is not 'busy', which is only called by journal_clean_one_cp_list(). This patch simplify this function by remove __try_to_free_cp_buf() and invoke __cp_buffer_busy() directly. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-7-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2,ext4: add a shrinker to release checkpointed buffersZhang Yi2021-06-242-0/+234
| | | | | | | | | | | | | | | | | | | | | | | | | | Current metadata buffer release logic in bdev_try_to_free_page() have a lot of use-after-free issues when umount filesystem concurrently, and it is difficult to fix directly because ext4 is the only user of s_op->bdev_try_to_free_page callback and we may have to add more special refcount or lock that is only used by ext4 into the common vfs layer, which is unacceptable. One better solution is remove the bdev_try_to_free_page callback, but the real problem is we cannot easily release journal_head on the checkpointed buffer, so try_to_free_buffers() cannot release buffers and page under memory pressure, which is more likely to trigger out-of-memory. So we cannot remove the callback directly before we find another way to release journal_head. This patch introduce a shrinker to free journal_head on the checkpointed transaction. After the journal_head got freed, try_to_free_buffers() could free buffer properly. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-6-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: remove redundant buffer io error checksZhang Yi2021-06-241-11/+2
| | | | | | | | | | | | Now that __jbd2_journal_remove_checkpoint() can detect buffer io error and mark journal checkpoint error, then we abort the journal later before updating log tail to ensure the filesystem works consistently. So we could remove other redundant buffer io error checkes. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-5-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: don't abort the journal when freeing buffersZhang Yi2021-06-241-17/+0
| | | | | | | | | | | | | Now that we can be sure the journal is aborted once a buffer has failed to be written back to disk, we can remove the journal abort logic in jbd2_journal_try_to_free_buffers() which was introduced in commit c044f3d8360d ("jbd2: abort journal if free a async write error metadata buffer"), because it may cost and propably is not safe. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-4-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: ensure abort the journal if detect IO error when writing original ↵Zhang Yi2021-06-242-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | buffer back Although we merged c044f3d8360 ("jbd2: abort journal if free a async write error metadata buffer"), there is a race between jbd2_journal_try_to_free_buffers() and jbd2_journal_destroy(), so the jbd2_log_do_checkpoint() may still fail to detect the buffer write io error flag which may lead to filesystem inconsistency. jbd2_journal_try_to_free_buffers() ext4_put_super() jbd2_journal_destroy() __jbd2_journal_remove_checkpoint() detect buffer write error jbd2_log_do_checkpoint() jbd2_cleanup_journal_tail() <--- lead to inconsistency jbd2_journal_abort() Fix this issue by introducing a new atomic flag which only have one JBD2_CHECKPOINT_IO_ERROR bit now, and set it in __jbd2_journal_remove_checkpoint() when freeing a checkpoint buffer which has write_io_error flag. Then jbd2_journal_destroy() will detect this mark and abort the journal to prevent updating log tail. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-3-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: remove the out label in __jbd2_journal_remove_checkpoint()Zhang Yi2021-06-241-12/+12
| | | | | | | | | | | The 'out' lable just return the 'ret' value and seems not required, so remove this label and switch to return appropriate value immediately. This patch also do some minor cleanup, no logical change. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-2-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: clean up misleading comments for jbd2_fc_release_bufsyangerkun2021-06-241-8/+0
| | | | | | | | | | This comments was for jbd2_fc_wait_bufs, not for jbd2_fc_release_bufs. Remove this misleading comments. Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210608141236.459441-1-yangerkun@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: add discard/zeroout flags to journal flushLeah Rumancik2021-06-221-3/+116
| | | | | | | | | Add a flags argument to jbd2_journal_flush to enable discarding or zero-filling the journal blocks while flushing the journal. Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> Link: https://lore.kernel.org/r/20210518151327.130198-1-leah.rumancik@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: fix debug format string warningArnd Bergmann2021-04-091-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | Using no_printk() for jbd_debug() revealed two warnings: fs/jbd2/recovery.c: In function 'fc_do_one_pass': fs/jbd2/recovery.c:256:30: error: format '%d' expects a matching 'int' argument [-Werror=format=] 256 | jbd_debug(3, "Processing fast commit blk with seq %d"); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ fs/ext4/fast_commit.c: In function 'ext4_fc_replay_add_range': fs/ext4/fast_commit.c:1732:30: error: format '%d' expects argument of type 'int', but argument 2 has type 'long unsigned int' [-Werror=format=] 1732 | jbd_debug(1, "Converting from %d to %d %lld", The first one was added incorrectly, and was also missing a few newlines in debug output, and the second one happened when the type of an argument changed. Reported-by: kernel test robot <lkp@intel.com> Fixes: d556435156b7 ("jbd2: avoid -Wempty-body warnings") Fixes: 6db074618969 ("ext4: use BIT() macro for BH_** state bits") Fixes: 5b849b5f96b4 ("jbd2: fast commit recovery path") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20210409201211.1866633-1-arnd@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: annotate data race in jbd2_journal_dirty_metadata()Jan Kara2021-04-091-4/+4
| | | | | | | | | | | | | Assertion checks in jbd2_journal_dirty_metadata() are known to be racy but we don't want to be grabbing locks just for them. We thus recheck them under b_state_lock only if it looks like they would fail. Annotate the checks with data_race(). Cc: stable@kernel.org Reported-by: Hao Sun <sunhao.th@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210406161804.20150-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: annotate data race in start_this_handle()Jan Kara2021-04-091-1/+6
| | | | | | | | | | | | Access to journal->j_running_transaction is not protected by appropriate lock and thus is racy. We are well aware of that and the code handles the race properly. Just add a comment and data_race() annotation. Cc: stable@kernel.org Reported-by: syzbot+30774a6acf6a2cf6d535@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210406161804.20150-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* block: use an on-stack bio in blkdev_issue_flushChristoph Hellwig2021-01-273-4/+4
| | | | | | | | | | There is no point in allocating memory for a synchronous flush. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* jbd2: add a helper to find out number of fast commit blocksHarshad Shirwadkar2020-12-171-6/+2
| | | | | | | | | | | | Add a helper to read number of fast commit blocks from jbd2 superblock and also rename the JBD2_MIN_FC_BLKS to JBD2_DEFAULT_FAST_COMMIT_BLOCKS since this constant is just the default number of fast commit blocks to use in case number of fast commit blocks isn't set in jbd2 superblock. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201120202232.2240293-2-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: fix kernel-doc markupsMauro Carvalho Chehab2020-11-192-31/+34
| | | | | | | | | | | | | | | | Kernel-doc markup should use this format: identifier - description They should not have any type before that, as otherwise the parser won't do the right thing. Also, some identifiers have different names between their prototypes and the kernel-doc markup. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/72f5c6628f5f278d67625f60893ffbc2ca28d46e.1605521731.git.mchehab+huawei@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: fix up sparse warnings in checkpoint codeTheodore Ts'o2020-11-072-1/+5
| | | | | | | | | | | | Add missing __acquires() and __releases() annotations. Also, in an "this should never happen" WARN_ON check, if it *does* actually happen, we need to release j_state_lock since this function is always supposed to release that lock. Otherwise, things will quickly grind to a halt after the WARN_ON trips. Fixes: 96f1e0974575 ("jbd2: avoid long hold times of j_state_lock...") Cc: stable@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: don't start fast commit on aborted journalHarshad Shirwadkar2020-11-061-0/+2
| | | | | | | | Fast commit should not be started if the journal is aborted. Signed-off-by: Harshad Shirwadkar<harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-22-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: don't read journal->j_commit_sequence without taking a lockHarshad Shirwadkar2020-11-061-2/+4
| | | | | | | | Take journal state lock before reading journal->j_commit_sequence. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-13-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: don't touch buffer state until it is filledHarshad Shirwadkar2020-11-061-4/+0
| | | | | | | | | | | Fast commit buffers should be filled in before toucing their state. Remove code that sets buffer state as dirty before the buffer is passed to the file system. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-12-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: add todo for a fast commit performance optimizationHarshad Shirwadkar2020-11-061-0/+9
| | | | | | | | | | | Fast commit performance can be optimized if commit thread doesn't wait for ongoing fast commits to complete until the transaction enters T_FLUSH state. Document this optimization. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-11-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: don't pass tid to jbd2_fc_end_commit_fallback()Harshad Shirwadkar2020-11-061-3/+9
| | | | | | | | | | In jbd2_fc_end_commit_fallback(), we know which tid to commit. There's no need for caller to pass it. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-10-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: don't use state lock during commit pathHarshad Shirwadkar2020-11-061-6/+0
| | | | | | | | | | | | | Variables journal->j_fc_off, journal->j_fc_wbuf are accessed during commit path. Since today we allow only one process to perform a fast commit, there is no need take state lock before accessing these variables. This patch removes these locks and adds comments to describe this. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-9-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: clean up the JBD2 API that initializes fast commitsHarshad Shirwadkar2020-11-061-40/+56
| | | | | | | | | | | | | | | | | | | | | | | | This patch removes jbd2_fc_init() API and its related functions to simplify enabling fast commits. With this change, the number of fast commit blocks to use is solely determined by the JBD2 layer. So, we move the default value for minimum number of fast commit blocks from ext4/fast_commit.h to include/linux/jbd2.h. However, whether or not to use fast commits is determined by the file system. The file system just sets the fast commit feature using jbd2_journal_set_features(). JBD2 layer then determines how many blocks to use for fast commits (based on the value found in the JBD2 superblock). Note that the JBD2 feature flag of fast commits is just an indication that there are fast commit blocks present on disk. It doesn't tell JBD2 layer about the intent of the file system of whether to it wants to use fast commit or not. That's why, we blindly clear the fast commit flag in journal_reset() after the recovery is done. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-7-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: rename j_maxlen to j_total_len and add jbd2_journal_max_txn_bufsHarshad Shirwadkar2020-11-063-10/+10
| | | | | | | | | | | | | | | The on-disk superblock field sb->s_maxlen represents the total size of the journal including the fast commit area and is no more the max number of blocks available for a transaction. The maximum number of blocks available to a transaction is reduced by the number of fast commit blocks. So, this patch renames j_maxlen to j_total_len to better represent its intent. Also, it adds a function to calculate max number of bufs available for a transaction. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-6-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: fast commit recovery pathHarshad Shirwadkar2020-10-211-4/+53
| | | | | | | | This patch adds fast commit recovery support in JBD2. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201015203802.3597742-7-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: add fast commit machineryHarshad Shirwadkar2020-10-212-1/+233
| | | | | | | | | This functions adds necessary APIs needed in JBD2 layer for fast commits. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201015203802.3597742-5-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4 / jbd2: add fast commit initializationHarshad Shirwadkar2020-10-211-5/+48
| | | | | | | | | | | | | This patch adds fast commit area trackers in the journal_t structure. These are initialized via the jbd2_fc_init() routine that this patch adds. This patch also adds ext4/fast_commit.c and ext4/fast_commit.h files for fast commit code that will be added in subsequent patches in this series. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201015203802.3597742-4-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: avoid transaction reuse after reformattingchangfengnan2020-10-181-12/+66
| | | | | | | | | | | | | | | | | | When ext4 is formatted with lazy_journal_init=1 and transactions from the previous filesystem are still on disk, it is possible that they are considered during a recovery after a crash. Because the checksum seed has changed, the CRC check will fail, and the journal recovery fails with checksum error although the journal is otherwise perfectly valid. Fix the problem by checking commit block time stamps to determine whether the data in the journal block is just stale or whether it is indeed corrupt. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Fengnan Chang <changfengnan@hikvision.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201012164900.20197-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2, ext4, ocfs2: introduce/use journal callbacks ↵Mauricio Faria de Oliveira2020-10-181-12/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | j_submit|finish_inode_data_buffers() Introduce journal callbacks to allow different behaviors for an inode in journal_submit|finish_inode_data_buffers(). The existing users of the current behavior (ext4, ocfs2) are adapted to use the previously exported functions that implement the current behavior. Users are callers of jbd2_journal_inode_ranged_write|wait(), which adds the inode to the transaction's inode list with the JI_WRITE|WAIT_DATA flags. Only ext4 and ocfs2 in-tree. Both CONFIG_EXT4_FS and CONFIG_OCSFS2_FS select CONFIG_JBD2, which builds fs/jbd2/commit.c and journal.c that define and export the functions, so we can call directly in ext4/ocfs2. Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201006004841.600488-3-mfo@canonical.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: introduce/export functions jbd2_journal_submit|finish_inode_data_buffers()Mauricio Faria de Oliveira2020-10-182-20/+18
| | | | | | | | | | | | | | Export functions that implement the current behavior done for an inode in journal_submit|finish_inode_data_buffers(). No functional change. Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201006004841.600488-2-mfo@canonical.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: clean up checksum verification in do_one_pass()Shijie Luo2020-08-191-34/+12
| | | | | | | | | | | | | Remove the unnecessary chksum_err and checksum_seen variables as well as some redundant code to make the function easier to understand. [ With changes suggested by jack@ and tytso@ ] Signed-off-by: Shijie Luo <luoshijie1@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200819122955.33526-1-luoshijie1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: fix incorrect code styleXianting Tian2020-08-071-6/+6
| | | | | | | | | Remove unnecessary blank. Signed-off-by: Xianting Tian <xianting_tian@126.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/1595077057-8048-1-git-send-email-xianting_tian@126.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: remove unused parameter in jbd2_journal_try_to_free_buffers()zhangyi (F)2020-08-071-6/+1
| | | | | | | | | | Parameter gfp_mask in jbd2_journal_try_to_free_buffers() is no longer used after commit <536fc240e7147> ("jbd2: clean up jbd2_journal_try_to_free_buffers()"), so just remove it. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20200620025427.1756360-6-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* jbd2: abort journal if free a async write error metadata bufferzhangyi (F)2020-08-071-0/+16
| | | | | | | | | | | | | If we free a metadata buffer which has been failed to async write out in the background, the jbd2 checkpoint procedure will not detect this failure in jbd2_log_do_checkpoint(), so it may lead to filesystem inconsistency after cleanup journal tail. This patch abort the journal if free a buffer has write_io_error flag to prevent potential further inconsistency. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20200620025427.1756360-5-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>