summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Btrfs: stop providing a bmap operation to avoid swapfile corruptionsChris Mason2009-01-211-6/+12
| | | | | | | | | | | | | | | | | | | | | Swapfiles use bmap to build a list of extents belonging to the file, and they assume these extents won't change over the life of the file. They also use resulting list to do IO directly to the block device. This causes problems for btrfs in a few ways: btrfs returns logical block numbers through bmap, and these are not suitable for IO. They might translate to different devices, raid etc. COW means that file block mappings are going to change frequently. Using swapfiles on btrfs will lead to corruption, so we're avoiding the problem for now by dropping bmap support entirely. A later commit will add fiemap support for people that really want to know how a file is laid out. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix tree logs parallel syncYan Zheng2009-01-216-210/+248
| | | | | | | | | | | | | | | | | | | | | | | | | | To improve performance, btrfs_sync_log merges tree log sync requests. But it wrongly merges sync requests for different tree logs. If multiple tree logs are synced at the same time, only one of them actually gets synced. This patch has following changes to fix the bug: Move most tree log related fields in btrfs_fs_info to btrfs_root. This allows merging sync requests separately for each tree log. Don't insert root item into the log root tree immediately after log tree is allocated. Root item for log tree is inserted when log tree get synced for the first time. This allows syncing the log root tree without first syncing all log trees. At tree-log sync, btrfs_sync_log first sync the log tree; then updates corresponding root item in the log root tree; sync the log root tree; then update the super block. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
* Btrfs: open_ctree() error handling can oops on fs_infoQinghuang Feng2009-01-211-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | a bug in open_ctree: struct btrfs_root *open_ctree(..) { .... if (!extent_root || !tree_root || !fs_info || !chunk_root || !dev_root || !csum_root) { err = -ENOMEM; goto fail; //When code flow goes to "fail", fs_info may be NULL or uninitialized. } .... fail: btrfs_close_devices(fs_info->fs_devices);// ! btrfs_mapping_tree_free(&fs_info->mapping_tree);// ! kfree(extent_root); kfree(tree_root); bdi_destroy(&fs_info->bdi);// ! ... ) Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix stop searching test in replace_one_extentYan Zheng2009-01-211-6/+7
| | | | | | | | | | | | | | replace_one_extent searches tree leaves for references to a given extent. It stops searching if it goes beyond the last possible position. The last possible position is computed by adding the starting offset of a found file extent to the full size of the extent. The code uses physical size of the extent as the full size. This is incorrect when compression is used. The fix is get the full size from ram_bytes field of file extent item. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
* Btrfs: change/remove typedefJan Engelhardt2009-01-211-8/+2
| | | | | | | | Change one typedef to a regular enum, and remove an unused one. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: remove duplicated #includeHuang Weiyi2009-01-211-1/+0
| | | | | | | | | Removed duplicated #include "compat.h"in fs/btrfs/extent-tree.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix infinite loop in btrfs_extent_post_opYan Zheng2009-01-211-1/+3
| | | | | | | | | | | | | | | btrfs_extent_post_op calls finish_current_insert and del_pending_extents. They both may enter infinite loops. finish_current_insert enters infinite loop if it only finds some backrefs to update. The fix is to check for pending backref updates before restarting the loop. The infinite loop in del_pending_extents is due to a the skipped variable not being properly reset before looping around. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
* Btrfs: fix locking issue in btrfs_remove_block_groupYan Zheng2009-01-211-1/+3
| | | | | | | | We should hold the block_group_cache_lock while modifying the block groups red-black tree. Thank you, Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
* Btrfs: simplify iteration codesQinghuang Feng2009-01-216-52/+19
| | | | | | | | Merge list_for_each* and list_entry to list_for_each_entry* Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: check return value for kthread_run() correctlyQinghuang Feng2009-01-211-2/+2
| | | | | | | | kthread_run() returns the kthread or ERR_PTR(-ENOMEM), not NULL. Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Remove extra KERN_INFO in the middle of a lineRoland Dreier2009-01-211-1/+1
| | | | | | | | | | | | | | | The "devid <xxx> transid <xxx>" printk in btrfs_scan_one_device() actually follows another printk that doesn't end in a newline (since the intention is for the two printks to make one line of output), so the KERN_INFO just ends up messing up the output: device label exp <6>devid 1 transid 9 /dev/sda5 Fix this by changing the extra KERN_INFO to KERN_CONT. Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: removed unused #include <version.h>'sHuang Weiyi2009-01-2111-11/+0
| | | | | | | | Removed unused #include <version.h>'s in btrfs Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: cleanup xattr codeJosef Bacik2009-01-211-2/+12
| | | | | | | | | | Andrew's review of the xattr code revealed some minor issues that this patch addresses. Just an error return fix, got rid of a useless statement and commented one of the trickier parts of __btrfs_getxattr. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: cleanup fs/btrfs/super.c::btrfs_control_ioctl()Wang Cong2009-01-211-2/+3
| | | | | | | | | - Remove the unused local variable 'len'; - Check return value of kmalloc(). Signed-off-by: Wang Cong <wangcong@zeuux.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix ioctl arg size (userland incompatible change!)Chris Mason2009-01-162-7/+10
| | | | | | | | | | | | | | The structure used to send device in btrfs ioctl calls was not properly aligned, and so 32 bit ioctls would not work properly on 64 bit kernels. We could fix this with compat ioctls, but we're just one byte away and it doesn't make sense at this stage to carry about the compat ioctls forever at this stage in the project. This patch brings the ioctl arg up to an evenly aligned 4k. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Clear the device->running_pending flag before bailing on congestionChris Mason2009-01-161-0/+1
| | | | | | | | | | | | | | | | | | Btrfs maintains a queue of async bio submissions so the checksumming threads don't have to wait on get_request_wait. In order to avoid extra wakeups, this code has a running_pending flag that is used to tell new submissions they don't need to wake the thread. When the threads notice congestion on a single device, they may decide to requeue the job and move on to other devices. This makes sure the running_pending flag is cleared before the job is requeued. It should help avoid IO stalls by making sure the task is woken up when new submissions come in. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: explicitly mark the tree log root for writebackChris Mason2009-01-091-0/+13
| | | | | | | | | | | | | | | | | Each subvolume has an extent_state_tree used to mark metadata that needs to be sent to disk while syncing the tree. This is used in addition to the dirty bits on the pages themselves so that a single subvolume can be sent to disk efficiently in disk order. Normally this marking happens in btrfs_alloc_free_block, which also does special recording of dirty tree blocks for the tree log roots. Yan Zheng noticed that when the root of the log tree is allocated, it is added to the wrong writeback list. The fix used here is to explicitly set it dirty as part of tree log creation. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Drop the hardware crc32c asm codeChris Mason2009-01-071-94/+3
| | | | | | | | This is already in the arch specific directories in mainline and shouldn't be copied into btrfs. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add Documentation/filesystem/btrfs.txt, remove old COPYINGDavid Woodhouse2009-01-072-404/+0
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: kmap_atomic(KM_USER0) is safe for btrfs_readpage_end_io_hookChris Mason2009-01-071-3/+3
| | | | | | | None of the checksum verification code schedules, so we can use the faster kmap_atomic Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Don't use kmap_atomic(..., KM_IRQ0) during checksum verifiesChris Mason2009-01-061-7/+3
| | | | | | | Checksum verification happens in a helper thread, and there is no need to mess with interrupts. This switches to kmap() instead. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: tree logging checksum fixesYan Zheng2009-01-064-232/+130
| | | | | | | | | | | | | | | | | | This patch contains following things. 1) Limit the max size of btrfs_ordered_sum structure to PAGE_SIZE. This struct is kmalloced so we want to keep it reasonable. 2) Replace copy_extent_csums by btrfs_lookup_csums_range. This was duplicated code in tree-log.c 3) Remove replay_one_csum. csum items are replayed at the same time as replaying file extents. This guarantees we only replay useful csums. 4) nbytes accounting fix. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
* Btrfs: don't change file extent's ram_bytes in btrfs_drop_extentsYan Zheng2009-01-061-4/+0
| | | | | | | | | btrfs_drop_extents doesn't change file extent's ram_bytes in the case of booked extent. To be consistent, we should also not change ram_bytes when truncating existing extent. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
* Btrfs: Use btrfs_join_transaction to avoid deadlocks during snapshot creationYan Zheng2009-01-062-2/+2
| | | | | | | | | | | | | | | | | | | | | | | Snapshot creation happens at a specific time during transaction commit. We need to make sure the code called by snapshot creation doesn't wait for the running transaction to commit. This changes btrfs_delete_inode and finish_pending_snaps to use btrfs_join_transaction instead of btrfs_start_transaction to avoid deadlocks. It would be better if btrfs_delete_inode didn't use the join, but the call path that triggers it is: btrfs_commit_transaction->create_pending_snapshots-> create_pending_snapshot->btrfs_lookup_dentry-> fixup_tree_root_location->btrfs_read_fs_root-> btrfs_read_fs_root_no_name->btrfs_orphan_cleanup->iput This will be fixed in a later patch by moving the orphan cleanup to the cleaner thread. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: drop remaining LINUX_KERNEL_VERSION checks and compat codeChris Mason2009-01-062-29/+0
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Merge branch 'master' of ↵Chris Mason2009-01-0659-0/+43024
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
| * Btrfs: drop EXPORT symbols from extent_io.cChris Mason2009-01-051-56/+0
| | | | | | | | | | | | They should stay out until this is turned into generic code. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: Fix checkpatch.pl warningsChris Mason2009-01-0533-898/+770
| | | | | | | | | | | | | | There were many, most are fixed now. struct-funcs.c generates some warnings but these are bogus. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: Fix free block discard calls down to the block layerLiu Hui2009-01-051-51/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a patch to fix discard semantic to make Btrfs work with FTL and SSD. We can improve FTL's performance by telling it which sectors are freed by file system. But if we don't tell FTL the information of free sectors in proper time, the transaction mechanism of Btrfs will be destroyed and Btrfs could not roll back the previous transaction under the power loss condition. There are some problems in the old implementation: 1, In __free_extent(), the pinned down extents should not be discarded. 2, In free_extents(), the free extents are all pinned, so they need to be discarded in transaction committing time instead of free_extents(). 3, The reserved extent used by log tree should be discard too. This patch change discard behavior as follows: 1, For the extents which need to be free at once, we discard them in update_block_group(). 2, Delay discarding the pinned extent in btrfs_finish_extent_commit() when committing transaction. 3, Remove discarding from free_extents() and __free_extent() 4, Add discard interface into btrfs_free_reserved_extent() 5, Discard sectors before updating the free space cache, otherwise, FTL will destroy file system data.
| * Btrfs: avoid orphan inode caused by log replayYan Zheng2009-01-051-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | drop_one_dir_item does not properly update inode's link count. It can be reproduced by executing following commands: #touch test #sync #rm -f test #dd if=/dev/zero bs=4k count=1 of=test conv=fsync #echo b > /proc/sysrq-trigger This fixes it by adding an BTRFS_ORPHAN_ITEM_KEY for the inode Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
| * Btrfs: avoid potential super block corruptionYan Zheng2009-01-051-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | The data in fs_info->super_for_commit are zeros before the first transaction commit. If tree log sync and system crash both occur before the first transaction commit, super block will get corrupted. This fixes it by properly filling in the super_for_commit field at open time. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
| * Btrfs: do not call kfree if kmalloc failed in btrfs_sysfs_add_superShen Feng2009-01-051-2/+1
| | | | | | | | | | Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
| * Btrfs: fix a memory leak in btrfs_get_sbShen Feng2009-01-051-5/+4
| | | | | | | | | | | | | | subvol_name should be freed if error occurs. Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
| * Btrfs: Fix typo in clear_state_cbLiu Hui2009-01-051-1/+1
| | | | | | | | | | | | | | | | In clear_state_cb, we should check 'tree->ops->clear_bit_hook' instead of 'tree->ops->set_bit_hook'. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: Fix memset length in btrfs_file_writeyanhai zhu2009-01-051-1/+1
| | | | | | | | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: update directory's size when creating subvol/snapshotYan Zheng2009-01-052-0/+8
| | | | | | | | | | | | | | | | Make sure directory's size properly updated when creating subvol/snapshot. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
| * Btrfs: add permission checks to the ioctlsChris Mason2009-01-052-2/+26
| | | | | | | | | | | | | | | | | | Only root can add/remove devices Only root can defrag subtrees Only files open for writing can be defragged Only files open for writing can be the destination for a clone Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: Fix compile warning around num_online_cpus() in a min statementChris Mason2008-12-191-1/+2
| | | | | | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: set EXTENT_BOUNDARY bit before marking extent delalloc.Yan Zheng2008-12-191-3/+2
| | | | | | | | | | | | | | | | | | There is a race in relocate_inode_pages, it happens when find_delalloc_range finds the delalloc extent before the boundary bit is set. Thank you, Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
| * Btrfs: properly update block accounting for metadataYan Zheng2008-12-191-3/+13
| | | | | | | | | | | | | | | | This adds the missing block accounting code to finish_current_insert and makes block accounting for root item properly protected by the delalloc spin lock. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
| * Btrfs: Add missing mnt_drop_write in ioctl.cYan Zheng2008-12-191-2/+7
| | | | | | | | | | | | | | | | This patch adds the missing mnt_drop_write to match mnt_want_write in btrfs_ioctl_defrag and btrfs_ioctl_clone Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
| * Btrfs: fix return value from btrfs_listxattr when buffer size is too smallYehuda Sadeh Weinraub2008-12-171-1/+1
| | | | | | | | | | | | | | The return value was being overwritten. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
| * Btrfs: shift all end_io work to thread poolsChris Mason2008-12-174-46/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bio_end_io for reads without checksumming on and btree writes were happening without using async thread pools. This means the extent_io.c code had to use spin_lock_irq and friends on the rb tree locks for extent state. There were some irq safe vs unsafe lock inversions between the delallock lock and the extent state locks. This patch gets rid of them by moving all end_io code into the thread pools. To avoid contention and deadlocks between the data end_io processing and the metadata end_io processing yet another thread pool is added to finish off metadata writes. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: properly check free space for tree balancingYan Zheng2008-12-173-33/+32
| | | | | | | | | | | | | | | | | | | | | | | | btrfs_insert_empty_items takes the space needed by the btrfs_item structure into account when calculating the required free space. So the tree balancing code shouldn't add sizeof(struct btrfs_item) to the size when checking the free space. This patch removes these superfluous additions. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
| * Btrfs: delete checksum items before marking blocks freeChris Mason2008-12-162-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Btrfs maintains a cache of blocks available for allocation in ram. The code that frees extents was marking the extents free and then deleting the checksum items. This meant it was possible the extent would be reallocated before the checksum item was actually deleted, leading to races and other problems as the checksums were updated for the newly allocated extent. The fix is to delete the checksum before marking the extent free. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: Don't use spin*lock_irq for the delalloc lockChris Mason2008-12-152-20/+26
| | | | | | | | | | | | | | The delalloc lock doesn't need to have irqs disabled, nobody that changes the number of delalloc bytes in the FS is running with irqs off. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: Fix compressed writes on truncated pagesChris Mason2008-12-152-4/+6
| | | | | | | | | | | | | | | | | | | | | | The compression code was using isize to limit the amount of data it sent through zlib. But, it wasn't properly limiting the looping to just the pages inside i_size. The end result was trying to compress too many pages, including those that had not been setup and properly locked down. This made the compression code oops while trying find_get_page on a page that didn't exist. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fix nodatasum handling in balancing codeYan Zheng2008-12-127-35/+226
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Checksums on data can be disabled by mount option, so it's possible some data extents don't have checksums or have invalid checksums. This causes trouble for data relocation. This patch contains following things to make data relocation work. 1) make nodatasum/nodatacow mount option only affects new files. Checksums and COW on data are only controlled by the inode flags. 2) check the existence of checksum in the nodatacow checker. If checksums exist, force COW the data extent. This ensure that checksum for a given block is either valid or does not exist. 3) update data relocation code to properly handle the case of checksum missing. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
| * Btrfs: shared seed deviceYan Zheng2008-12-125-134/+156
| | | | | | | | | | | | | | | | | | | | This patch makes seed device possible to be shared by multiple mounted file systems. The sharing is achieved by cloning seed device's btrfs_fs_devices structure. Thanks you, Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
| * Btrfs: fix leaking block group on balanceYan Zheng2008-12-117-118/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | The block group structs are referenced in many different places, and it's not safe to free while balancing. So, those block group structs were simply leaked instead. This patch replaces the block group pointer in the inode with the starting byte offset of the block group and adds reference counting to the block group struct. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>