summaryrefslogtreecommitdiffstats
path: root/fs/btrfs/tree-defrag.c
Commit message (Collapse)AuthorAgeFilesLines
* Btrfs: nuke fs wide allocation mutex V2Josef Bacik2008-10-291-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the giant fs_info->alloc_mutex and replaces it with a bunch of little locks. There is now a pinned_mutex, which is used when messing with the pinned_extents extent io tree, and the extent_ins_mutex which is used with the pending_del and extent_ins extent io trees. The locking for the extent tree stuff was inspired by a patch that Yan Zheng wrote to fix a race condition, I cleaned it up some and changed the locking around a little bit, but the idea remains the same. Basically instead of holding the extent_ins_mutex throughout the processing of an extent on the extent_ins or pending_del trees, we just hold it while we're searching and when we clear the bits on those trees, and lock the extent for the duration of the operations on the extent. Also to keep from getting hung up waiting to lock an extent, I've added a try_lock_extent so if we cannot lock the extent, move on to the next one in the tree and we'll come back to that one. I have tested this heavily and it does not appear to break anything. This has to be applied on top of my find_free_extent redo patch. I tested this patch on top of Yan's space reblancing code and it worked fine. The only thing that has changed since the last version is I pulled out all my debugging stuff, apparently I forgot to run guilt refresh before I sent the last patch out. Thank you, Signed-off-by: Josef Bacik <jbacik@redhat.com>
* Btrfs: add and improve commentsChris Mason2008-09-291-0/+4
| | | | | | | | | | | This improves the comments at the top of many functions. It didn't dive into the guts of functions because I was trying to avoid merging problems with the new allocator and back reference work. extent-tree.c and volumes.c were both skipped, and there is definitely more work todo in cleaning and commenting the code. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add a write ahead tree log to optimize synchronous operationsChris Mason2008-09-251-2/+2
| | | | | | | | | | | File syncs and directory syncs are optimized by copying their items into a special (copy-on-write) log tree. There is one log tree per subvolume and the btrfs super block points to a tree of log tree roots. After a crash, items are copied out of the log tree and back into the subvolume. See tree-log.c for all the details. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Online btree defragmentation fixesChris Mason2008-09-251-17/+19
| | | | | | | | | | The btree defragger wasn't making forward progress because the new key wasn't being saved by the btrfs_search_forward function. This also disables the automatic btree defrag, it wasn't scaling well to huge filesystems. The auto-defrag needs to be done differently. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add a per-inode csum mutex to avoid races creating csum itemsChris Mason2008-09-251-0/+7
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add btree locking to the tree defragmentation codeChris Mason2008-09-251-192/+47
| | | | | | | The online btree defragger is simplified and rewritten to use standard btree searches instead of a walk up / down mechanism. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Start btree concurrency work.Chris Mason2008-09-251-0/+2
| | | | | | | | | | | | | | | The allocation trees and the chunk trees are serialized via their own dedicated mutexes. This means allocation location is still not very fine grained. The main FS btree is protected by locks on each block in the btree. Locks are taken top / down, and as processing finishes on a given level of the tree, the lock is released after locking the lower level. The end result of a search is now a path where only the lowest level is locked. Releasing or freeing the path drops any locks held. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Allocator fix variety packChris Mason2008-09-251-0/+17
| | | | | | | | | | | | | | * Force chunk allocation when find_free_extent has to do a full scan * Record the max key at the start of defrag so it doesn't run forever * Block groups might not be contiguous, make a forward search for the next block group in extent-tree.c * Get rid of extra checks for total fs size * Fix relocate_one_reference to avoid relocating the same file data block twice when referenced by an older transaction * Use the open device count when allocating chunks so that we don't try to allocate from devices that don't exist Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Handle write errors on raid1 and raid10Chris Mason2008-09-251-3/+4
| | | | | | | | | | | | When duplicate copies exist, writes are allowed to fail to one of those copies. This changeset includes a few changes that allow the FS to continue even when some IOs fail. It also adds verification of the parent generation number for btree blocks. This generation is stored in the pointer to a block, and it ensures that missed writes to are detected. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Pass down the expected generation number when reading tree blocksChris Mason2008-09-251-3/+6
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Verify checksums on tree blocks found without read_tree_blockChris Mason2008-09-251-0/+1
| | | | | | | | | | | | | | | | | Checksums were only verified by btrfs_read_tree_block, which meant the functions to probe the page cache for blocks were not validating checksums. Normally this is fine because the buffers will only be in cache if they have already been validated. But, there is a window while the buffer is being read from disk where it could be up to date in the cache but not yet verified. This patch makes sure all buffers go through checksum verification before they are used. This is safer, and it prevents modification of buffers before they go through the csum code. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Disable tree defrag in SSD modeChris Mason2008-09-251-0/+3
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Leave on the tree defragger in mount -o ssd, it still helps thereChris Mason2008-09-251-3/+0
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add mount -o ssd, which includes optimizations for seek free storageChris Mason2008-09-251-0/+3
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add back pointers from extents to the btree or file referencing themChris Mason2008-09-251-0/+2
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Optimize allocations as we need to mix data and metadata into one groupChris Mason2008-09-251-1/+5
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Make defrag check nodes against the progress key to prevent repeating ↵Chris Mason2008-09-251-1/+2
| | | | | | work Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Tune the automatic defrag codeChris Mason2008-09-251-1/+0
| | | | | | | | | | | 1) Forced defrag wasn't working properly (btrfsctl -d) because some cache only checks were incorrect. 2) Defrag only the leaves unless in forced defrag mode. 3) Don't use complex logic to figure out if a leaf is needs defrag Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Defrag only leaves, and only when the parent node has a single objectidChris Mason2008-09-251-31/+5
| | | | | | | This allows us to defrag huge directories, but skip the expensive defrag case in more common usage, where it does not help as much. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Defrag: only walk into nodes with the defrag bit setChris Mason2008-09-251-4/+4
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Large block related defrag optimizationsChris Mason2008-09-251-1/+22
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Breakout BTRFS_SETGET_FUNCS into a separate C file, the inlines were too big.Chris Mason2008-09-251-1/+1
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add back the online defragging codeChris Mason2008-09-251-8/+6
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Allow tree blocks larger than the page sizeChris Mason2008-09-251-7/+11
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Create extent_buffer interface for large blocksizesChris Mason2008-09-251-26/+32
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Add support for defragging files via btrfsctl -d. Avoid OOM on extent treeChris Mason2007-09-101-0/+3
| | | | | | defrag. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add BH_Defrag to mark buffers that are in need of defraggingChris Mason2007-08-101-1/+2
| | | | | | | | This allows the tree walking code to defrag only the newly allocated buffers, it seems to be a good balance between perfect defragging and the performance hit of repeatedly reallocating blocks. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Btree defrag on the extent-mapping tree as wellChris Mason2007-08-101-7/+29
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Further reduce the concurrency penalty of defrag and drop_snapshotChris Mason2007-08-081-5/+2
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add run time btree defrag, and an ioctl to force btree defragChris Mason2007-08-071-0/+222
This adds two types of btree defrag, a run time form that tries to defrag recently allocated blocks in the btree when they are still in ram, and an ioctl that forces defrag of all btree blocks. File data blocks are not defragged yet, but this can make a huge difference in sequential btree reads. Signed-off-by: Chris Mason <chris.mason@oracle.com>