summaryrefslogtreecommitdiffstats
path: root/fs/btrfs/transaction.c
Commit message (Collapse)AuthorAgeFilesLines
* Btrfs: Use btrfs_update_inode_fallback when creating a snapshotJosef Bacik2012-10-251-1/+1
| | | | | | | | | On a really full file system I was getting ENOSPC back from btrfs_update_inode when trying to update the parent inode when creating a snapshot. Just use the fallback method so we can update the inode and not have to worry about having a delayed ref. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* Btrfs: cache extent state when writing out dirty metadata pagesJosef Bacik2012-10-091-5/+9
| | | | | | | | | | Everytime we write out dirty pages we search for an offset in the tree, convert the bits in the state, and then when we wait we search for the offset again and clear the bits. So for every dirty range in the io tree we are doing 4 rb searches, which is suboptimal. With this patch we are only doing 2 searches for every cycle (modulo weird things happening). Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* Btrfs: fix orphan transaction on the freezed filesystemMiao Xie2012-10-091-15/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the following debug patch: static int btrfs_freeze(struct super_block *sb) { + struct btrfs_fs_info *fs_info = btrfs_sb(sb); + struct btrfs_transaction *trans; + + spin_lock(&fs_info->trans_lock); + trans = fs_info->running_transaction; + if (trans) { + printk("Transid %llu, use_count %d, num_writer %d\n", + trans->transid, atomic_read(&trans->use_count), + atomic_read(&trans->num_writers)); + } + spin_unlock(&fs_info->trans_lock); return 0; } I found there was a orphan transaction after the freeze operation was done. It is because the transaction may not be committed when the transaction handle end even though it is the last handle of the current transaction. This design avoid committing the transaction frequently, but also introduce the above problem. So I add btrfs_attach_transaction() which can catch the current transaction and commit it. If there is no transaction, it will return ENOENT, and do not anything. This function also can be used to instead of btrfs_join_transaction_freeze() because it don't increase the writer counter and don't start a new transaction, so it also can fix the deadlock between sync and freeze. Besides that, it is used to instead of btrfs_join_transaction() in transaction_kthread(), because if there is no transaction, the transaction kthread needn't anything. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
* Btrfs: add a type field for the transaction handleMiao Xie2012-10-091-23/+6
| | | | | | | | This patch add a type field into the transaction handle structure, in this way, we needn't implement various end-transaction functions and can make the code more simple and readable. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
* Btrfs: fix memory leak in start_transaction()Miao Xie2012-10-091-1/+3
| | | | | | | This patch fixes memory leak of the transaction handle which happened when starting transaction failed on a freezed fs. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
* Btrfs: fix the missing error information in create_pending_snapshot()Miao Xie2012-10-081-22/+35
| | | | | | | | | The macro btrfs_abort_transaction() can get the line number of the code where the problem happens, so we should invoke it in the place that the error occurs, or we will lose the line number. Reported-by: David Sterba <dave@jikos.cz> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
* Btrfs: fix race with freeze and free space inodesJosef Bacik2012-10-041-2/+11
| | | | | | | | | | | | | | So we start our freeze, somebody comes in and does an fsync() on a file where we have to commit a transaction for whatever reason, and we will deadlock because the freeze is waiting on FS_FREEZE people to stop writing to the file system, but the transaction is waiting for its free space inodes to be written out, which are in turn waiting on sb_start_intwrite while trying to write the file extents. To fix this we'll just skip the sb_start_intwrite() if we TRANS_JOIN_NOLOCK since we're being waited on by a transaction commit so we're safe wrt to freeze and this will keep us from deadlocking. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* Btrfs: kill obsolete arguments in btrfs_wait_ordered_extentsLiu Bo2012-10-041-1/+1
| | | | | | nocow_only is now an obsolete argument. Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
* Btrfs: fix race in sync and freeze againJosef Bacik2012-10-041-1/+11
| | | | | | | | | | | | I screwed this up, there is a race between checking if there is a running transaction and actually starting a transaction in sync where we could race with a freezer and get ourselves into trouble. To fix this we need to make a new join type to only do the try lock on the freeze stuff. If it fails we'll return EPERM and just return from sync. This fixes a hang Liu Bo reported when running xfstest 68 in a loop. Thanks, Reported-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* Btrfs: delay block group item insertionJosef Bacik2012-10-011-0/+10
| | | | | | | | | | | | | | | So we have lots of places where we try to preallocate chunks in order to make sure we have enough space as we make our allocations. This has historically meant that we're constantly tweaking when we should allocate a new chunk, and historically we have gotten this horribly wrong so we way over allocate either metadata or data. To try and keep this from happening we are going to make it so that the block group item insertion is done out of band at the end of a transaction. This will allow us to create chunks even if we are trying to make an allocation for the extent tree. With this patch my enospc tests run faster (didn't expect this) and more efficiently use the disk space (this is what I wanted). Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* Btrfs: move the sb_end_intwrite until after the throttle logicJosef Bacik2012-10-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sage reported the following lockdep backtrace ===================================== [ BUG: bad unlock balance detected! ] 3.6.0-rc2-ceph-00171-gc7ed62d #1 Not tainted ------------------------------------- btrfs-cleaner/7607 is trying to release lock (sb_internal) at: [<ffffffffa00422ae>] btrfs_commit_transaction+0xa6e/0xb20 [btrfs] but there are no more locks to release! other info that might help us debug this: 1 lock held by btrfs-cleaner/7607: #0: (&fs_info->cleaner_mutex){+.+...}, at: [<ffffffffa003b405>] cleaner_kthread+0x95/0x120 [btrfs] stack backtrace: Pid: 7607, comm: btrfs-cleaner Not tainted 3.6.0-rc2-ceph-00171-gc7ed62d #1 Call Trace: [<ffffffffa00422ae>] ? btrfs_commit_transaction+0xa6e/0xb20 [btrfs] [<ffffffff810afa9e>] print_unlock_inbalance_bug+0xfe/0x110 [<ffffffff810b289e>] lock_release_non_nested+0x1ee/0x310 [<ffffffff81172f9b>] ? kmem_cache_free+0x7b/0x160 [<ffffffffa004106c>] ? put_transaction+0x8c/0x130 [btrfs] [<ffffffffa00422ae>] ? btrfs_commit_transaction+0xa6e/0xb20 [btrfs] [<ffffffff810b2a95>] lock_release+0xd5/0x220 [<ffffffff81173071>] ? kmem_cache_free+0x151/0x160 [<ffffffff8117d9ed>] __sb_end_write+0x7d/0x90 [<ffffffffa00422ae>] btrfs_commit_transaction+0xa6e/0xb20 [btrfs] [<ffffffff81079850>] ? __init_waitqueue_head+0x60/0x60 [<ffffffff81634c6b>] ? _raw_spin_unlock+0x2b/0x40 [<ffffffffa0042758>] __btrfs_end_transaction+0x368/0x3c0 [btrfs] [<ffffffffa0042808>] btrfs_end_transaction_throttle+0x18/0x20 [btrfs] [<ffffffffa00318f0>] btrfs_drop_snapshot+0x410/0x600 [btrfs] [<ffffffff8132babd>] ? do_raw_spin_unlock+0x5d/0xb0 [<ffffffffa00430ef>] btrfs_clean_old_snapshots+0xaf/0x150 [btrfs] [<ffffffffa003b405>] ? cleaner_kthread+0x95/0x120 [btrfs] [<ffffffffa003b419>] cleaner_kthread+0xa9/0x120 [btrfs] [<ffffffffa003b370>] ? btrfs_destroy_delayed_refs.isra.102+0x220/0x220 [btrfs] [<ffffffff810791ee>] kthread+0xae/0xc0 [<ffffffff810b379d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8163e744>] kernel_thread_helper+0x4/0x10 [<ffffffff81635430>] ? retint_restore_args+0x13/0x13 [<ffffffff81079140>] ? flush_kthread_work+0x1a0/0x1a0 [<ffffffff8163e740>] ? gs_change+0x13/0x13 This is because the throttle stuff can commit the transaction, which expects to be the one stopping the intwrite stuff, but we've already done it in the __btrfs_end_transaction. Moving the sb_end_intewrite after this logic makes the lockdep go away. Thanks, Tested-by: Sage Weil <sage@inktank.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* Btrfs: fix corrupted metadata in the snapshotMiao Xie2012-10-011-8/+21
| | | | | | | | | | | | | | | | When we delete a inode, we will remove all the delayed items including delayed inode update, and then truncate all the relative metadata. If there is lots of metadata, we will end the current transaction, and start a new transaction to truncate the left metadata. In this way, we will leave a inode item that its link counter is > 0, and also may leave some directory index items in fs/file tree after the current transaction ends. In other words, the metadata in this fs/file tree is inconsistent. If we create a snapshot for this tree now, we will find a inode with corrupted metadata in the new snapshot, and we won't continue to drop the left metadata, because its link counter is not 0. We fix this problem by updating the inode item before the current transaction ends. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
* Btrfs: fix the snapshot that should not existMiao Xie2012-10-011-15/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The snapshot should be the image of the fs tree before it was created, so the metadata of the snapshot should not exist in the its tree. But now, we found the directory item and directory name index is in both the snapshot tree and the fs tree. It introduces some problems and makes the users feel strange: # mkfs.btrfs /dev/sda1 # mount /dev/sda1 /mnt # mkdir /mnt/1 # cd /mnt/1 # btrfs subvolume snapshot /mnt snap0 # ls -a /mnt/1/snap0/1 . .. [no other file/dir] # ll /mnt/1/snap0/ total 0 drwxr-xr-x 1 root root 10 Ju1 24 12:11 1 ^^^ There is no file/dir in it, but it's size is 10 # cd /mnt/1/snap0/1/snap0 [Enter a unexisted directory successfully...] There is nothing in the directory 1 in snap0, but btrfs told the length of this directory is 10. Beside that, we can enter an unexisted directory, it is very strange to the users. # btrfs subvolume snapshot /mnt/1/snap0 /mnt/snap1 # ll /mnt/1/snap0/1/ total 0 [None] # ll /mnt/snap1/1/ total 0 drwxr-xr-x 1 root root 0 Ju1 24 12:14 snap0 And the source of snap1 did have any directory in Directory 1, but snap1 have a snap0, it is different between the source and the snapshot. So I think we should insert directory item and directory name index and update the parent inode as the last step of snapshot creation, and do not leave the useless metadata in the file tree. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
* Btrfs: fix full backref problem when inserting shared block referenceMiao Xie2012-10-011-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we create several snapshots at the same time, the following BUG_ON() will be triggered. kernel BUG at fs/btrfs/extent-tree.c:6047! Steps to reproduce: # mkfs.btrfs <partition> # mount <partition> <mnt> # cd <mnt> # for ((i=0;i<2400;i++)); do touch long_name_to_make_tree_more_deep$i; done # for ((i=0; i<4; i++)) > do > mkdir $i > for ((j=0; j<200; j++)) > do > btrfs sub snap . $i/$j > done & > done The reason is: Before transaction commit, some operations changed the fs tree and new tree blocks were allocated because of COW. We used the implicit non-shared back reference for those newly allocated tree blocks because they were not shared by two or more trees. And then we created the first snapshot for the fs tree, according to the back reference rules, we also used implicit back refs for the child tree blocks of the root node of the fs tree, now those child nodes/leaves were shared by two trees. Then We didn't deal with the delayed references, and continued to change the fs tree(created the second snapshot and inserted the dir item of the new snapshot into the fs tree). According to the rules of the back reference, we added full back refs for those tree blocks whose parents have be shared by two trees. Now some newly allocated tree blocks had two types of the references. As we know, the delayed reference system handles these delayed references from back to front, and the full delayed reference is inserted after the implicit ones. So when we dealt with the back references of those newly allocated tree blocks, the full references was dealt with at first. And if the first reference is a shared back reference and the tree block that the reference points to is newly allocated, It would be considered as a tree block which is shared by two or more trees when it is allocated and should be a full back reference not a implicit one, the flag of its reference also should be set to FULL_BACKREF. But in fact, it was a non-shared tree block with a implicit reference at beginning, so it was not compulsory to set the flags to FULL_BACKREF. So BUG_ON was triggered. We have several methods to fix this bug: 1. deal with delayed references after the snapshot is created and before we change the source tree of the snapshot. This is the easiest and safest way. 2. modify the sort method of the delayed reference tree, make the full delayed references be inserted before the implicit ones. It is also very easy, but I don't know if it will introduce some problems or not. 3. modify select_delayed_ref() and make it select the implicit delayed reference at first. This way is not so good because it may wastes CPU time if we have lots of delayed references. 4. set the flags to FULL_BACKREF, this method is a little complex comparing with the 1st way. I chose the 1st way to fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
* Btrfs: fix error path in create_pending_snapshot()Miao Xie2012-10-011-23/+17
| | | | | | | | | | | | | This patch fixes the following problem: - If we failed to deal with the delayed dir items, we should abort transaction, just as its comment said. Fix it. - If root reference or root back reference insertion failed, we should abort transaction. Fix it. - Fix the double free problem of pending->inherit. - Do not restore the trans->rsv if we doesn't change it. - make the error path more clearly. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
* Btrfs: set journal_info in async trans commit workerSage Weil2012-10-011-0/+2
| | | | | | | We expect current->journal_info to point to the trans handle we are committing. Signed-off-by: Sage Weil <sage@inktank.com>
* Btrfs: pass lockdep rwsem metadata to async commit transactionSage Weil2012-10-011-0/+16
| | | | | | | | | | | The freeze rwsem is taken by sb_start_intwrite() and dropped during the commit_ or end_transaction(). In the async case, that happens in a worker thread. Tell lockdep the calling thread is releasing ownership of the rwsem and the async thread is picking it up. XFS plays the same trick in fs/xfs/xfs_aops.c. Signed-off-by: Sage Weil <sage@inktank.com>
* Merge branch 'for-linus' of ↵Linus Torvalds2012-08-291-1/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "I've split out the big send/receive update from my last pull request and now have just the fixes in my for-linus branch. The send/recv branch will wander over to linux-next shortly though. The largest patches in this pull are Josef's patches to fix DIO locking problems and his patch to fix a crash during balance. They are both well tested. The rest are smaller fixes that we've had queued. The last rc came out while I was hacking new and exciting ways to recover from a misplaced rm -rf on my dev box, so these missed rc3." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits) Btrfs: fix that repair code is spuriously executed for transid failures Btrfs: fix ordered extent leak when failing to start a transaction Btrfs: fix a dio write regression Btrfs: fix deadlock with freeze and sync V2 Btrfs: revert checksum error statistic which can cause a BUG() Btrfs: remove superblock writing after fatal error Btrfs: allow delayed refs to be merged Btrfs: fix enospc problems when deleting a subvol Btrfs: fix wrong mtime and ctime when creating snapshots Btrfs: fix race in run_clustered_refs Btrfs: don't run __tree_mod_log_free_eb on leaves Btrfs: increase the size of the free space cache Btrfs: barrier before waitqueue_active Btrfs: fix deadlock in wait_for_more_refs btrfs: fix second lock in btrfs_delete_delayed_items() Btrfs: don't allocate a seperate csums array for direct reads Btrfs: do not strdup non existent strings Btrfs: do not use missing devices when showing devname Btrfs: fix that error value is changed by mistake Btrfs: lock extents as we map them in DIO ...
| * Btrfs: fix wrong mtime and ctime when creating snapshotsMiao Xie2012-08-281-0/+1
| | | | | | | | | | | | | | | | When we created a new snapshot, the mtime and ctime of its parent directory were not updated. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * Btrfs: fix some endian bugs handling the root timesDan Carpenter2012-08-281-1/+1
| | | | | | | | | | | | | | "trans->transid" is cpu endian but we want to store the data as little endian. "item->ctime.nsec" is only 32 bits, not 64. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2012-08-011-0/+7
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull second vfs pile from Al Viro: "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the deadlock reproduced by xfstests 068), symlink and hardlink restriction patches, plus assorted cleanups and fixes. Note that another fsfreeze deadlock (emergency thaw one) is *not* dealt with - the series by Fernando conflicts a lot with Jan's, breaks userland ABI (FIFREEZE semantics gets changed) and trades the deadlock for massive vfsmount leak; this is going to be handled next cycle. There probably will be another pull request, but that stuff won't be in it." Fix up trivial conflicts due to unrelated changes next to each other in drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c} * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits) delousing target_core_file a bit Documentation: Correct s_umount state for freeze_fs/unfreeze_fs fs: Remove old freezing mechanism ext2: Implement freezing btrfs: Convert to new freezing mechanism nilfs2: Convert to new freezing mechanism ntfs: Convert to new freezing mechanism fuse: Convert to new freezing mechanism gfs2: Convert to new freezing mechanism ocfs2: Convert to new freezing mechanism xfs: Convert to new freezing code ext4: Convert to new freezing mechanism fs: Protect write paths by sb_start_write - sb_end_write fs: Skip atime update on frozen filesystem fs: Add freezing handling to mnt_want_write() / mnt_drop_write() fs: Improve filesystem freezing handling switch the protection of percpu_counter list to spinlock nfsd: Push mnt_want_write() outside of i_mutex btrfs: Push mnt_want_write() outside of i_mutex fat: Push mnt_want_write() outside of i_mutex ...
| * btrfs: Convert to new freezing mechanismJan Kara2012-07-311-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We convert btrfs_file_aio_write() to use new freeze check. We also add proper freeze protection to btrfs_page_mkwrite(). We also add freeze protection to the transaction mechanism to avoid starting transactions on frozen filesystem. At minimum this is necessary to stop iput() of unlinked file to change frozen filesystem during truncation. Checks in cleaner_kthread() and transaction_kthread() can be safely removed since btrfs_freeze() will lock the mutexes and thus block the threads (and they shouldn't have anything to do anyway). CC: linux-btrfs@vger.kernel.org CC: Chris Mason <chris.mason@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'send-v2' of git://github.com/ablock84/linux-btrfs into for-linusChris Mason2012-07-251-0/+17
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the kernel portion of btrfs send/receive Conflicts: fs/btrfs/Makefile fs/btrfs/backref.h fs/btrfs/ctree.c fs/btrfs/ioctl.c fs/btrfs/ioctl.h Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * | Btrfs: introduce subvol uuids and timesAlexander Block2012-07-251-0/+17
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces uuids for subvolumes. Each subvolume has it's own uuid. In case it was snapshotted, it also contains parent_uuid. In case it was received, it also contains received_uuid. It also introduces subvolume ctime/otime/stime/rtime. The first two are comparable to the times found in inodes. otime is the origin/creation time and ctime is the change time. stime/rtime are only valid on received subvolumes. stime is the time of the subvolume when it was sent. rtime is the time of the subvolume when it was received. Additionally to the times, we have a transid for each time. They are updated at the same place as the times. btrfs receive uses stransid and rtransid to find out if a received subvolume changed in the meantime. If an older kernel mounts a filesystem with the extented fields, all fields become invalid. The next mount with a new kernel will detect this and reset the fields. Signed-off-by: Alexander Block <ablock84@googlemail.com> Reviewed-by: David Sterba <dave@jikos.cz> Reviewed-by: Arne Jansen <sensille@gmx.net> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
* | Merge branch 'qgroup' of git://git.jan-o-sch.net/btrfs-unstable into for-linusChris Mason2012-07-251-4/+55
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: fs/btrfs/ioctl.c fs/btrfs/ioctl.h fs/btrfs/transaction.c fs/btrfs/transaction.h Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * | Btrfs: add qgroup inheritanceArne Jansen2012-07-121-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When creating a subvolume or snapshot, it is necessary to initialize the qgroup account with a copy of some other (tracking) qgroup. This patch adds parameters to the ioctls to pass the information from which qgroup to inherit. Signed-off-by: Arne Jansen <sensille@gmx.net>
| * | Btrfs: hooks to reserve qgroup spaceArne Jansen2012-07-121-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Like block reserves, reserve a small piece of space on each transaction start and for delalloc. These are the hooks that can actually return EDQUOT to the user. The amount of space reserved is tracked in the transaction handle. Signed-off-by: Arne Jansen <sensille@gmx.net>
| * | Btrfs: hooks for qgroup to record delayed refsJan Schmidt2012-07-121-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | Hooks into qgroup code to record refs and into transaction commit. This is the main entry point for qgroup. Basically every change in extent backrefs got accounted to the appropriate qgroups. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * | Btrfs: call the qgroup accounting functionsJan Schmidt2012-07-121-0/+14
| | | | | | | | | | | | Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * | Btrfs: qgroup implementation and prototypesArne Jansen2012-07-121-0/+2
| | | | | | | | | | | | | | | Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * | Btrfs: check the root passed to btrfs_end_transactionArne Jansen2012-07-101-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | This patch only add a consistancy check to validate that the same root is passed to start_transaction and end_transaction. Subvolume quota depends on this. Signed-off-by: Arne Jansen <sensille@gmx.net>
| * | Btrfs: join tree mod log code with the code holding back delayed refsJan Schmidt2012-07-101-4/+0
| |/ | | | | | | | | | | | | | | | | | | | | | | We've got two mechanisms both required for reliable backref resolving (tree mod log and holding back delayed refs). You cannot make use of one without the other. So instead of requiring the user of this mechanism to setup both correctly, we join them into a single interface. Additionally, we stop inserting non-blockers into fs_info->tree_mod_seq_list as we did before, which was of no value. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
* | Btrfs: change how we indicate we're adding csumsJosef Bacik2012-07-231-14/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is weird logic I had to put in place to make sure that when we were adding csums that we'd used the delalloc block rsv instead of the global block rsv. Part of this meant that we had to free up our transaction reservation before we ran the delayed refs since csum deletion happens during the delayed ref work. The problem with this is that when we release a reservation we will add it to the global reserve if it is not full in order to keep us going along longer before we have to force a transaction commit. By releasing our reservation before we run delayed refs we don't get the opportunity to drain down the global reserve for the work we did, so we won't refill it as often. This isn't a problem per-se, it just results in us possibly committing transactions more and more often, and in rare cases could cause those WARN_ON()'s to pop in use_block_rsv because we ran out of space in our block rsv. This also helps us by holding onto space while the delayed refs run so we don't end up with as many people trying to do things at the same time, which again will help us not force commits or hit the use_block_rsv warnings. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
* | Btrfs: small naming cleanup in join_transaction()Dan Carpenter2012-07-231-2/+2
| | | | | | | | | | | | | | | | "root->fs_info" and "fs_info" are the same, but "fs_info" is prefered because it is shorter and that's what is used in the rest of the function. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
* | Btrfs: don't wait around for new log writers on an SSDChris Mason2012-07-231-1/+2
|/ | | | | | | Waiting on spindles improves performance, but ssds want all the IO as quickly as we can push it down. Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* Btrfs: abort the transaction if the commit failsJosef Bacik2012-06-141-2/+8
| | | | | | | | | If a transaction commit fails we don't abort it so we don't set an error on the file system. This patch fixes that by actually calling the abort stuff and then adding a check for a fs error in the transaction start stuff to make sure it is caught properly. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: wake up transaction waiters when aborting a transactionJosef Bacik2012-06-141-0/+4
| | | | | | | | | | | I was getting lots of hung tasks and a NULL pointer dereference because we are not cleaning up the transaction properly when it aborts. First we need to reset the running_transaction to NULL so we don't get a bad dereference for any start_transaction callers after this. Also we cannot rely on waitqueue_active() since it's just a list_empty(), so just call wake_up() directly since that will do the barrier for us and such. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into ↵Chris Mason2012-05-311-18/+37
|\ | | | | | | | | | | | | | | | | for-linus Conflicts: fs/btrfs/ulist.h Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: tree mod log sanity checks in join_transactionJan Schmidt2012-05-301-0/+18
| | | | | | | | | | | | | | | | | | | | | | When a fresh transaction begins, the tree mod log must be clean. Users of the tree modification log must ensure they never span across transaction boundaries. We reset the sequence to 0 in this safe situation to make absolutely sure overflow can't happen. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * Btrfs: fs_info variable for join_transactionJan Schmidt2012-05-301-18/+19
| | | | | | | | Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
* | Btrfs: read device stats on mount, write modified ones during commitStefan Behrens2012-05-301-0/+4
|/ | | | | | | | | | The device statistics are written into the device tree with each transaction commit. Only modified statistics are written. When a filesystem is mounted, the device statistics for each involved device are read from the device tree and used to initialize the counters. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
* btrfs: add missing unlocks to transaction abort pathsDavid Sterba2012-04-181-1/+5
| | | | | | | | Added in commit 49b25e0540904be0bf558b84475c69d72e4de66e ("btrfs: enhance transaction abort infrastructure") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
* Btrfs: fix use-after-free in __btrfs_end_transactionDave Jones2012-04-121-4/+5
| | | | | | | | | | | | 49b25e0540904be0bf558b84475c69d72e4de66e introduced a use-after-free bug that caused spurious -EIO's to be returned. Do the check before we free the transaction. Cc: David Sterba <dsterba@suse.cz> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: show useful info in space reservation tracepointLiu Bo2012-03-291-2/+1
| | | | | | | | o For space info, the type of space info is useful for debug. o For transaction handle, its transid is useful. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs: replace many BUG_ONs with proper error handlingJeff Mahoney2012-03-221-13/+23
| | | | | | | | | | | btrfs currently handles most errors with BUG_ON. This patch is a work-in- progress but aims to handle most errors other than internal logic errors and ENOMEM more gracefully. This iteration prevents most crashes but can run into lockups with the page lock on occasion when the timing "works out." Signed-off-by: Jeff Mahoney <jeffm@suse.com>
* btrfs: enhance transaction abort infrastructureJeff Mahoney2012-03-221-35/+155
| | | | Signed-off-by: Jeff Mahoney <jeffm@suse.com>
* btrfs: btrfs_drop_snapshot should return intJeff Mahoney2012-03-221-2/+5
| | | | | | | | | | Commit cb1b69f4 (Btrfs: forced readonly when btrfs_drop_snapshot() fails) made btrfs_drop_snapshot return void because there were no callers checking the return value. That is the wrong order to handle error propogation since the caller will have no idea that an error has occured and continue on as if nothing went wrong. Signed-off-by: Jeff Mahoney <jeffm@suse.com>
* btrfs: return void in functions without error conditionsJeff Mahoney2012-03-221-2/+1
| | | | Signed-off-by: Jeff Mahoney <jeffm@suse.com>
* Btrfs: fix compiler warnings on 32 bit systemsChris Mason2012-02-241-1/+2
| | | | | | | The enospc tracing code added some interesting uses of u64 pointer casts. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: improve error handling for btrfs_insert_dir_item callersChris Mason2012-02-231-6/+7
| | | | | | | This allows us to gracefully continue if we aren't able to insert directory items, both for normal files/dirs and snapshots. Signed-off-by: Chris Mason <chris.mason@oracle.com>