summaryrefslogtreecommitdiffstats
path: root/fs/jbd2/transaction.c
Commit message (Collapse)AuthorAgeFilesLines
* jbd2: invalidate handle if jbd2_journal_restart() failsTheodore Ts'o2013-07-011-31/+43
| | | | | | | | | | | | | | | | | | | | If jbd2_journal_restart() fails the handle will have been disconnected from the current transaction. In this situation, the handle must not be used for for any jbd2 function other than jbd2_journal_stop(). Enforce this with by treating a handle which has a NULL transaction pointer as an aborted handle, and issue a kernel warning if jbd2_journal_extent(), jbd2_journal_get_write_access(), jbd2_journal_dirty_metadata(), etc. is called with an invalid handle. This commit also fixes a bug where jbd2_journal_stop() would trip over a kernel jbd2 assertion check when trying to free an invalid handle. Also move the responsibility of setting current->journal_info to start_this_handle(), simplifying the three users of this function. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Younger Liu <younger.liu@huawei.com> Cc: Jan Kara <jack@suse.cz>
* jbd2: fix theoretical race in jbd2__journal_restartTheodore Ts'o2013-07-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | Once we decrement transaction->t_updates, if this is the last handle holding the transaction from closing, and once we release the t_handle_lock spinlock, it's possible for the transaction to commit and be released. In practice with normal kernels, this probably won't happen, since the commit happens in a separate kernel thread and it's unlikely this could all happen within the space of a few CPU cycles. On the other hand, with a real-time kernel, this could potentially happen, so save the tid found in transaction->t_tid before we release t_handle_lock. It would require an insane configuration, such as one where the jbd2 thread was set to a very high real-time priority, perhaps because a high priority real-time thread is trying to read or write to a file system. But some people who use real-time kernels have been known to do insane things, including controlling laser-wielding industrial robots. :-) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
* jbd2: optimize jbd2_journal_force_commitDmitry Monakhov2013-06-121-23/+0
| | | | | | | | | | | Current implementation of jbd2_journal_force_commit() is suboptimal because result in empty and useless commits. But callers just want to force and wait any unfinished commits. We already have jbd2_journal_force_commit_nested() which does exactly what we want, except we are guaranteed that we do not hold journal transaction open. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: transaction reservation supportJan Kara2013-06-041-88/+240
| | | | | | | | | | | | | | | | In some cases we cannot start a transaction because of locking constraints and passing started transaction into those places is not handy either because we could block transaction commit for too long. Transaction reservation is designed to solve these issues. It reserves a handle with given number of credits in the journal and the handle can be later attached to the running transaction without blocking on commit or checkpointing. Reserved handles do not block transaction commit in any way, they only reduce maximum size of the running transaction (because we have to always be prepared to accomodate request for attaching reserved handle). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: fix race in t_outstanding_credits update in jbd2_journal_extend()Jan Kara2013-06-041-2/+4
| | | | | | | | | | | | jbd2_journal_extend() first checked whether transaction can accept extending handle with more credits and then added credits to t_outstanding_credits. This can race with start_this_handle() adding another handle to a transaction and thus overbooking a transaction. Make jbd2_journal_extend() use atomic_add_return() to close the race. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: cleanup needed free block estimates when starting a transactionJan Kara2013-06-041-4/+5
| | | | | | | | | | | | | | | | | | | __jbd2_log_space_left() and jbd_space_needed() were kind of odd. jbd_space_needed() accounted also credits needed for currently committing transaction while it didn't account for credits needed for control blocks. __jbd2_log_space_left() then accounted for control blocks as a fraction of free space. Since results of these two functions are always only compared against each other, this works correct but is somewhat strange. Move the estimates so that jbd_space_needed() returns number of blocks needed for a transaction including control blocks and __jbd2_log_space_left() returns free space in the journal (with the committing transaction already subtracted). Rename functions to jbd2_log_space_left() and jbd2_space_needed() while we are changing them. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: remove outdated commentJan Kara2013-06-041-10/+0
| | | | | | | | | The comment about credit estimates isn't true anymore. We do what the comment describes now. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: refine waiting for shadow buffersJan Kara2013-06-041-25/+19
| | | | | | | | | | | | | | | | | | | | | | | Currently when we add a buffer to a transaction, we wait until the buffer is removed from BJ_Shadow list (so that we prevent any changes to the buffer that is just written to the journal). This can take unnecessarily long as a lot happens between the time the buffer is submitted to the journal and the time when we remove the buffer from BJ_Shadow list. (e.g. We wait for all data buffers in the transaction, we issue a cache flush, etc.) Also this creates a dependency of do_get_write_access() on transaction commit (namely waiting for data IO to complete) which we want to avoid when implementing transaction reservation. So we modify commit code to set new BH_Shadow flag when temporary shadowing buffer is created and we clear that flag once IO on that buffer is complete. This allows do_get_write_access() to wait only for BH_Shadow bit and thus removes the dependency on data IO completion. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: remove journal_head from descriptor buffersJan Kara2013-06-041-6/+0
| | | | | | | | | Similarly as for metadata buffers, also log descriptor buffers don't really need the journal head. So strip it and remove BJ_LogCtl list. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: don't create journal_head for temporary journal buffersJan Kara2013-06-041-10/+4
| | | | | | | | | | | | | | | When writing metadata to the journal, we create temporary buffer heads for that task. We also attach journal heads to these buffer heads but the only purpose of the journal heads is to keep buffers linked in transaction's BJ_IO list. We remove the need for journal heads by reusing buffer_head's b_assoc_buffers list for that purpose. Also since BJ_IO list is just a temporary list for transaction commit, we use a private list in jbd2_journal_commit_transaction() for that thus removing BJ_IO list from transaction completely. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: change jbd2_journal_invalidatepage to accept lengthLukas Czerner2013-05-211-7/+17
| | | | | | | | | | | | | invalidatepage now accepts range to invalidate and there are two file system using jbd2 also implementing punch hole feature which can benefit from this. We need to implement the same thing for jbd2 layer in order to allow those file system take benefit of this functionality. This commit adds length argument to the jbd2_journal_invalidatepage() and updates all instances in ext4 and ocfs2. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>
* jbd2: trace when lock_buffer in do_get_write_access takes a long timeTheodore Ts'o2013-04-211-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | While investigating interactivity problems it was clear that processes sometimes stall for long periods of times if an attempt is made to lock a buffer which is undergoing writeback. It would stall in a trace looking something like [<ffffffff811a39de>] __lock_buffer+0x2e/0x30 [<ffffffff8123a60f>] do_get_write_access+0x43f/0x4b0 [<ffffffff8123a7cb>] jbd2_journal_get_write_access+0x2b/0x50 [<ffffffff81220f79>] __ext4_journal_get_write_access+0x39/0x80 [<ffffffff811f3198>] ext4_reserve_inode_write+0x78/0xa0 [<ffffffff811f3209>] ext4_mark_inode_dirty+0x49/0x220 [<ffffffff811f57d1>] ext4_dirty_inode+0x41/0x60 [<ffffffff8119ac3e>] __mark_inode_dirty+0x4e/0x2d0 [<ffffffff8118b9b9>] update_time+0x79/0xc0 [<ffffffff8118ba98>] file_update_time+0x98/0x100 [<ffffffff81110ffc>] __generic_file_aio_write+0x17c/0x3b0 [<ffffffff811112aa>] generic_file_aio_write+0x7a/0xf0 [<ffffffff811ea853>] ext4_file_write+0x83/0xd0 [<ffffffff81172b23>] do_sync_write+0xa3/0xe0 [<ffffffff811731ae>] vfs_write+0xae/0x180 [<ffffffff8117361d>] sys_write+0x4d/0x90 [<ffffffff8159d62d>] system_call_fastpath+0x1a/0x1f [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: use kmem_cache_zalloc instead of kmem_cache_alloc/memsetZheng Liu2013-04-191-1/+0
| | | | | | | | | The jbd2_alloc_handle() function is only called by new_handle(). So this commit uses kmem_cache_zalloc() instead of kmem_cache_alloc()/memset(). Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: fix use after free in jbd2_journal_dirty_metadata()Jan Kara2013-03-111-5/+10
| | | | | | | | | | | | | | | | | | jbd2_journal_dirty_metadata() didn't get a reference to journal_head it was working with. This is OK in most of the cases since the journal head should be attached to a transaction but in rare occasions when we are journalling data, __ext4_journalled_writepage() can race with jbd2_journal_invalidatepage() stripping buffers from a page and thus journal head can be freed under hands of jbd2_journal_dirty_metadata(). Fix the problem by getting own journal head reference in jbd2_journal_dirty_metadata() (and also in jbd2_journal_set_triggers() which can possibly have the same issue). Reported-by: Zheng Liu <gnehzuil.liu@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
* jbd2: fix ERR_PTR dereference in jbd2__journal_startDmitry Monakhov2013-03-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | If start_this_handle() failed handle will be initialized to ERR_PTR() and can not be dereferenced. paging request at fffffffffffffff6 IP: [<ffffffff813c073f>] jbd2__journal_start+0x18f/0x290 PGD 200e067 PUD 200f067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod CPU 0 journal commit I/O error Pid: 2694, comm: fio Not tainted 3.8.0-rc3+ #79 /DQ67SW RIP: 0010:[<ffffffff813c073f>] [<ffffffff813c073f>] jbd2__journal_start+0x18f/0x290 RSP: 0018:ffff880233b8ba58 EFLAGS: 00010292 RAX: 00000000ffffffe2 RBX: ffffffffffffffe2 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff82128f48 RBP: ffff880233b8ba98 R08: 0000000000000000 R09: ffff88021440a6e0 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: add tracepoints which provide per-handle statistics Theodore Ts'o2013-02-081-2/+26
| | | | | | | | | | Handles which stay open a long time are problematic when it comes time to close down a transaction so it can be committed. These tracepoints will help us determine which ones are the problematic ones, and to validate whether changes makes things better or worse. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: track request delay statisticsTheodore Ts'o2013-02-061-0/+1
| | | | | | | | | | | | Track the delay between when we first request that the commit begin and when it actually begins, so we can see how much of a gap exists. In theory, this should just be the remaining scheduling quantuum of the thread which requested the commit (assuming it was not a synchronous operation which triggered the commit request) plus scheduling overhead; however, it's possible that real time processes might get in the way of letting the kjournald thread from executing. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* Merge tag 'ext4_for_linus' of ↵Linus Torvalds2013-01-021-14/+16
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bug fixes from Ted Ts'o: "Various bug fixes for ext4. Perhaps the most serious bug fixed is one which could cause file system corruptions when performing file punch operations." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: avoid hang when mounting non-journal filesystems with orphan list ext4: lock i_mutex when truncating orphan inodes ext4: do not try to write superblock on ro remount w/o journal ext4: include journal blocks in df overhead calcs ext4: remove unaligned AIO warning printk ext4: fix an incorrect comment about i_mutex ext4: fix deadlock in journal_unmap_buffer() ext4: split off ext4_journalled_invalidatepage() jbd2: fix assertion failure in jbd2_journal_flush() ext4: check dioread_nolock on remount ext4: fix extent tree corruption caused by hole punch
| * ext4: fix deadlock in journal_unmap_buffer()Jan Kara2012-12-251-13/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We cannot wait for transaction commit in journal_unmap_buffer() because we hold page lock which ranks below transaction start. We solve the issue by bailing out of journal_unmap_buffer() and jbd2_journal_invalidatepage() with -EBUSY. Caller is then responsible for waiting for transaction commit to finish and try invalidation again. Since the issue can happen only for page stradding i_size, it is simple enough to manually call jbd2_journal_invalidatepage() for such page from ext4_setattr(), check the return value and wait if necessary. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * jbd2: fix assertion failure in jbd2_journal_flush()Jan Kara2012-12-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following race is possible between start_this_handle() and someone calling jbd2_journal_flush(). Process A Process B start_this_handle(). if (journal->j_barrier_count) # false if (!journal->j_running_transaction) { #true read_unlock(&journal->j_state_lock); jbd2_journal_lock_updates() jbd2_journal_flush() write_lock(&journal->j_state_lock); if (journal->j_running_transaction) { # false ... wait for committing trans ... write_unlock(&journal->j_state_lock); ... write_lock(&journal->j_state_lock); if (!journal->j_running_transaction) { # true jbd2_get_transaction(journal, new_transaction); write_unlock(&journal->j_state_lock); goto repeat; # eventually blocks on j_barrier_count > 0 ... J_ASSERT(!journal->j_running_transaction); # fails We fix the race by rechecking j_barrier_count after reacquiring j_state_lock in exclusive mode. Reported-by: yjwsignal@empal.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
* | Merge tag 'ext4_for_linus' of ↵Linus Torvalds2012-12-161-11/+0
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 update from Ted Ts'o: "There are two major features for this merge window. The first is inline data, which allows small files or directories to be stored in the in-inode extended attribute area. (This requires that the file system use inodes which are at least 256 bytes or larger; 128 byte inodes do not have any room for in-inode xattrs.) The second new feature is SEEK_HOLE/SEEK_DATA support. This is enabled by the extent status tree patches, and this infrastructure will be used to further optimize ext4 in the future. Beyond that, we have the usual collection of code cleanups and bug fixes." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (63 commits) ext4: zero out inline data using memset() instead of empty_zero_page ext4: ensure Inode flags consistency are checked at build time ext4: Remove CONFIG_EXT4_FS_XATTR ext4: remove unused variable from ext4_ext_in_cache() ext4: remove redundant initialization in ext4_fill_super() ext4: remove redundant code in ext4_alloc_inode() ext4: use sync_inode_metadata() when syncing inode metadata ext4: enable ext4 inline support ext4: let fallocate handle inline data correctly ext4: let ext4_truncate handle inline data correctly ext4: evict inline data out if we need to strore xattr in inode ext4: let fiemap work with inline data ext4: let ext4_rename handle inline dir ext4: let empty_dir handle inline dir ext4: let ext4_delete_entry() handle inline data ext4: make ext4_delete_entry generic ext4: let ext4_find_entry handle inline data ext4: create a new function search_dir ext4: let ext4_readdir handle inline data ext4: let add_dir_entry handle inline data properly ...
| * ext4: remove ext4_handle_release_buffer()Eric Sandeen2012-11-081-11/+0
| | | | | | | | | | | | | | | | | | | | | | ext4_handle_release_buffer() was intended to remove journal write access from a buffer, but it doesn't actually do anything at all other than add a BUFFER_TRACE point, but it's not reliably used for that either. Remove all the associated dead code. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
* | Fix misspellings of "whether" in comments.Adam Buchbinder2012-11-191-1/+1
|/ | | | | | | | "Whether" is misspelled in various comments across the tree; this fixes them. No code changes. Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* jbd2: fix assertion failure in commit code due to lacking transaction creditsJan Kara2012-09-261-20/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | ext4 users of data=journal mode with blocksize < pagesize were occasionally hitting assertion failure in jbd2_journal_commit_transaction() checking whether the transaction has at least as many credits reserved as buffers attached. The core of the problem is that when a file gets truncated, buffers that still need checkpointing or that are attached to the committing transaction are left with buffer_mapped set. When this happens to buffers beyond i_size attached to a page stradding i_size, subsequent write extending the file will see these buffers and as they are mapped (but underlying blocks were freed) things go awry from here. The assertion failure just coincidentally (and in this case luckily as we would start corrupting filesystem) triggers due to journal_head not being properly cleaned up as well. We fix the problem by unmapping buffers if possible (in lots of cases we just need a buffer attached to a transaction as a place holder but it must not be written out anyway). And in one case, we just have to bite the bullet and wait for transaction commit to finish. CC: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Jan Kara <jack@suse.cz>
* jbd2: use kmem_cache_zalloc wrapper instead of flagWanlong Gao2012-06-011-2/+2
| | | | | | | Use kmem_cache_zalloc wrapper instead of flag __GFP_ZERO. Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* Merge tag 'ext4_for_linus' of ↵Linus Torvalds2012-03-281-9/+39
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates for 3.4 from Ted Ts'o: "Ext4 commits for 3.3 merge window; mostly cleanups and bug fixes The changes to export dirty_writeback_interval are from Artem's s_dirt cleanup patch series. The same is true of the change to remove the s_dirt helper functions which never got used by anyone in-tree. I've run these changes by Al Viro, and am carrying them so that Artem can more easily fix up the rest of the file systems during the next merge window. (Originally we had hopped to remove the use of s_dirt from ext4 during this merge window, but his patches had some bugs, so I ultimately ended dropping them from the ext4 tree.)" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (66 commits) vfs: remove unused superblock helpers mm: export dirty_writeback_interval ext4: remove useless s_dirt assignment ext4: write superblock only once on unmount ext4: do not mark superblock as dirty unnecessarily ext4: correct ext4_punch_hole return codes ext4: remove restrictive checks for EOFBLOCKS_FL ext4: always set then trimmed blocks count into len ext4: fix trimmed block count accunting ext4: fix start and len arguments handling in ext4_trim_fs() ext4: update s_free_{inodes,blocks}_count during online resize ext4: change some printk() calls to use ext4_msg() instead ext4: avoid output message interleaving in ext4_error_<foo>() ext4: remove trailing newlines from ext4_msg() and ext4_error() messages ext4: add no_printk argument validation, fix fallout ext4: remove redundant "EXT4-fs: " from uses of ext4_msg ext4: give more helpful error message in ext4_ext_rm_leaf() ext4: remove unused code from ext4_ext_map_blocks() ext4: rewrite punch hole to use ext4_ext_remove_space() jbd2: cleanup journal tail after transaction commit ...
| * jbd2: remove always true condition in __journal_try_to_free_buffer()Jan Kara2012-03-131-4/+2
| | | | | | | | | | | | | | | | | | The check b_jlist == BJ_None in __journal_try_to_free_buffer() is always true (__jbd2_journal_temp_unlink_buffer() also checks this in an assertion) so just remove it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * jbd2: declare __jbd2_journal_temp_unlink_buffer() staticJan Kara2012-03-131-2/+2
| | | | | | | | | | Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * jbd2: allocate transaction from separate slab cacheYongqiang Yang2012-02-201-3/+33
| | | | | | | | | | | | | | | | | | | | | | There is normally only a handful of these active at any one time, but putting them in a separate slab cache makes debugging memory corruption problems easier. Manish Katiyar also wanted this make it easier to test memory failure scenarios in the jbd2 layer. Cc: Manish Katiyar <mkatiyar@gmail.com> Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_bufferEric Sandeen2012-02-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | journal_unmap_buffer()'s zap_buffer: code clears a lot of buffer head state ala discard_buffer(), but does not touch _Delay or _Unwritten as discard_buffer() does. This can be problematic in some areas of the ext4 code which assume that if they have found a buffer marked unwritten or delay, then it's a live one. Perhaps those spots should check whether it is mapped as well, but if jbd2 is going to tear down a buffer, let's really tear it down completely. Without this I get some fsx failures on sub-page-block filesystems up until v3.2, at which point 4e96b2dbbf1d7e81f22047a50f862555a6cb87cb and 189e868fa8fdca702eb9db9d8afc46b5cb9144c9 make the failures go away, because buried within that large change is some more flag clearing. I still think it's worth doing in jbd2, since ->invalidatepage leads here directly, and it's the right place to clear away these flags. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
* | jbd2: remove the second argument of k[un]map_atomic()Cong Wang2012-03-201-2/+2
|/ | | | Signed-off-by: Cong Wang <amwang@redhat.com>
* jbd2: fix hung processes in jbd2_journal_lock_updates()Jan Kara2012-01-041-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Toshiyuki Okajima found out that when running for ((i=0; i < 100000; i++)); do if ((i%2 == 0)); then chattr +j /mnt/file else chattr -j /mnt/file fi echo "0" >> /mnt/file done process sometimes hangs indefinitely in jbd2_journal_lock_updates(). Toshiyuki identified that the following race happens: jbd2_journal_lock_updates() |jbd2_journal_stop() ---------------------------------------+--------------------------------------- write_lock(&journal->j_state_lock) | . ++journal->j_barrier_count | . spin_lock(&tran->t_handle_lock) | . atomic_read(&tran->t_updates) //not 0 | | atomic_dec_and_test(&tran->t_updates) | // t_updates = 0 | wake_up(&journal->j_wait_updates) prepare_to_wait() | // no process is woken up. spin_unlock(&tran->t_handle_lock) | write_unlock(&journal->j_state_lock) | schedule() // never return | We fix the problem by first calling prepare_to_wait() and only after that checking t_updates in jbd2_journal_lock_updates(). Reported-and-analyzed-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: Unify log messages in jbd2 codeEryu Guan2011-11-011-2/+2
| | | | | | | | | Some jbd2 code prints out kernel messages with "JBD2: " prefix, at the same time other jbd2 code prints with "JBD: " prefix. Unify the prefix to "JBD2: ". Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: fix build when CONFIG_BUG is not enabledRandy Dunlap2011-10-271-2/+2
| | | | | | | | | | | | | | Fix build error when CONFIG_BUG is not enabled: fs/jbd2/transaction.c:1175:3: error: implicit declaration of function '__WARN' by changing __WARN() to WARN_ON(), as suggested by Arnaud Lacombe <lacombar@gmail.com>. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arnaud Lacombe <lacombar@gmail.com>
* jbd2: use gfp_t instead of intDan Carpenter2011-09-041-3/+3
| | | | | | | | | | This silences some Sparse warnings: fs/jbd2/transaction.c:135:69: warning: incorrect type in argument 2 (different base types) fs/jbd2/transaction.c:135:69: expected restricted gfp_t [usertype] flags fs/jbd2/transaction.c:135:69: got int [signed] gfp_mask Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: add debugging information to jbd2_journal_dirty_metadata()Theodore Ts'o2011-09-041-6/+52
| | | | | | | | | | | | | | | | | | Add debugging information in case jbd2_journal_dirty_metadata() is called with a buffer_head which didn't have jbd2_journal_get_write_access() called on it, or if the journal_head has the wrong transaction in it. In addition, return an error code. This won't change anything for ocfs2, which will BUG_ON() the non-zero exit code. For ext4, the caller of this function is ext4_handle_dirty_metadata(), and on seeing a non-zero return code, will call __ext4_journal_stop(), which will print the function and line number of the (buggy) calling function and abort the journal. This will allow us to recover instead of bug halting, which is better from a robustness and reliability point of view. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: Fix oops in jbd2_journal_remove_journal_head()Jan Kara2011-06-131-32/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | jbd2_journal_remove_journal_head() can oops when trying to access journal_head returned by bh2jh(). This is caused for example by the following race: TASK1 TASK2 jbd2_journal_commit_transaction() ... processing t_forget list __jbd2_journal_refile_buffer(jh); if (!jh->b_transaction) { jbd_unlock_bh_state(bh); jbd2_journal_try_to_free_buffers() jbd2_journal_grab_journal_head(bh) jbd_lock_bh_state(bh) __journal_try_to_free_buffer() jbd2_journal_put_journal_head(jh) jbd2_journal_remove_journal_head(bh); jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and buffer is not part of any transaction and thus frees journal_head before TASK1 gets to doing so. Note that even buffer_head can be released by try_to_free_buffers() after jbd2_journal_put_journal_head() which adds even larger opportunity for oops (but I didn't see this happen in reality). Fix the problem by making transactions hold their own journal_head reference (in b_jcount). That way we don't have to remove journal_head explicitely via jbd2_journal_remove_journal_head() and instead just remove journal_head when b_jcount drops to zero. The result of this is that [__]jbd2_journal_refile_buffer(), [__]jbd2_journal_unfile_buffer(), and __jdb2_journal_remove_checkpoint() can free journal_head which needs modification of a few callers. Also we have to be careful because once journal_head is removed, buffer_head might be freed as well. So we have to get our own buffer_head reference where it matters. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: Remove obsolete parameters in the comments for some jbd2 functionsTao Ma2011-06-121-2/+0
| | | | | | | | | | credits isn't a parameter for jbd2_journal_get_write_access and jbd2_journal_get_undo_access. So remove the corresponding comments. Acked-by: Jan Kara <jack@suse.cz> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: fix a potential leak of a journal_head on an error pathDing Dinghua2011-05-251-1/+1
| | | | | | | drop jh->b_jcount in error path Signed-off-by: Ding Dinghua <dingdinghua@nrchpc.ac.cn> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: Fix comment to match the code in jbd2__journal_start()Eryu Guan2011-05-241-1/+2
| | | | | | | | jbd2__journal_start() returns an ERR_PTR() value rather than NULL on failure. Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: fix sending of data flush on journal commitJan Kara2011-05-241-0/+7
| | | | | | | | | | | | | | | In data=ordered mode, it's theoretically possible (however rare) that an inode is filed to transaction's t_inode_list and a flusher thread writes all the data and inode is reclaimed before the transaction starts to commit. In such a case, we could erroneously omit sending a flush to file system device when it is different from the journal device (because data can still be in disk cache only). Fix the problem by setting a flag in a transaction when some inode is added to it and then send disk flush in the commit code when the flag is set. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: Fix the wrong calculation of t_max_wait in update_t_max_waitTao Ma2011-05-221-5/+5
| | | | | | | | | | | | | | | | | | | | t_max_wait is added in commit 8e85fb3f to indicate how long we were waiting for new transaction to start. In commit 6d0bf005, it is moved to another function named update_t_max_wait to avoid a build warning. But the wrong thing is that the original 'ts' is initialized in the start of function start_this_handle and we can calculate t_max_wait in the right way. while with this change, ts is initialized within the function and t_max_wait can never be calculated right. This patch moves the initialization of ts to the original beginning of start_this_handle and pass it to function update_t_max_wait so that it can be calculated right and the build warning is avoided also. Cc: Jan Kara <jack@suse.cz> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
* Fix common misspellingsLucas De Marchi2011-03-311-1/+1
| | | | | | Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
* jbd2: call __jbd2_log_start_commit with j_state_lock write lockedTheodore Ts'o2011-02-121-7/+14
| | | | | | | | | | | | | | | | | On an SMP ARM system running ext4, I've received a report that the first J_ASSERT in jbd2_journal_commit_transaction has been triggering: J_ASSERT(journal->j_running_transaction != NULL); While investigating possible causes for this problem, I noticed that __jbd2_log_start_commit() is getting called with j_state_lock only read-locked, in spite of the fact that it's possible for it might j_commit_request. Fix this by grabbing the necessary information so we can test to see if we need to start a new transaction before dropping the read lock, and then calling jbd2_log_start_commit() which will grab the write lock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* Merge branch 'for-next' of ↵Linus Torvalds2011-01-131-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits) Documentation/trace/events.txt: Remove obsolete sched_signal_send. writeback: fix global_dirty_limits comment runtime -> real-time ppc: fix comment typo singal -> signal drivers: fix comment typo diable -> disable. m68k: fix comment typo diable -> disable. wireless: comment typo fix diable -> disable. media: comment typo fix diable -> disable. remove doc for obsolete dynamic-printk kernel-parameter remove extraneous 'is' from Documentation/iostats.txt Fix spelling milisec -> ms in snd_ps3 module parameter description Fix spelling mistakes in comments Revert conflicting V4L changes i7core_edac: fix typos in comments mm/rmap.c: fix comment sound, ca0106: Fix assignment to 'channel'. hrtimer: fix a typo in comment init/Kconfig: fix typo anon_inodes: fix wrong function name in comment fix comment typos concerning "consistent" poll: fix a typo in comment ... Fix up trivial conflicts in: - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c) - fs/ext4/ext4.h Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
| * fix comment typos concerning "consistent"Uwe Kleine-König2010-12-101-1/+1
| | | | | | | | | | Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* | jbd2: remove unnecessary goto statementTheodore Ts'o2010-12-181-2/+0
| | | | | | | | | | | | | | This is a port to jbd2 of a patch which Namhyung Kim <namhyung@gmail.com> originally made to fs/jbd. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* | jbd2: use offset_in_page() instead of manual calculationTheodore Ts'o2010-12-181-1/+1
| | | | | | | | | | | | | | This is a port to jbd2 of a patch which Namhyung Kim <namhyung@gmail.com> originally made to fs/jbd. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* | jbd2: Fix a debug message in do_get_write_access()Theodore Ts'o2010-12-181-1/+1
|/ | | | | | | | | 'buffer_head' should be 'journal_head' This is a port of a patch which Namhyung Kim <namhyung@gmail.com> made to fs/jbd to jbd2. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: Add sanity check for attempts to start handle during umountTheodore Ts'o2010-10-271-0/+1
| | | | | | | | | An attempt to modify the file system during the call to jbd2_destroy_journal() can lead to a system lockup. So add some checking to make it much more obvious when this happens to and to determine where the offending code is located. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>