summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | btrfs: export global block reserve size as space_infoDavid Sterba2014-04-072-1/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a block group type bit for a global reserve and fill the space info for SPACE_INFO ioctl. This should replace the newly added ioctl (01e219e8069516cdb98594d417b8bb8d906ed30d) to get just the 'size' part of the global reserve, while the actual usage can be now visible in the 'btrfs fi df' output during ENOSPC stress. The unpatched userspace tools will show the blockgroup as 'unknown'. CC: Jeff Mahoney <jeffm@suse.com> CC: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | | btrfs: fix crash in remount(thread_pool=) caseSergei Trofimovich2014-04-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reproducer: mount /dev/ubda /mnt mount -oremount,thread_pool=42 /mnt Gives a crash: ? btrfs_workqueue_set_max+0x0/0x70 btrfs_resize_thread_pool+0xe3/0xf0 ? sync_filesystem+0x0/0xc0 ? btrfs_resize_thread_pool+0x0/0xf0 btrfs_remount+0x1d2/0x570 ? kern_path+0x0/0x80 do_remount_sb+0xd9/0x1c0 do_mount+0x26a/0xbf0 ? kfree+0x0/0x1b0 SyS_mount+0xc4/0x110 It's a call btrfs_workqueue_set_max(fs_info->scrub_wr_completion_workers, new_pool_size); with fs_info->scrub_wr_completion_workers = NULL; as scrub wqs get created only on user's demand. Patch skips not-created-yet workqueues. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> CC: Qu Wenruo <quwenruo@cn.fujitsu.com> CC: Chris Mason <clm@fb.com> CC: Josef Bacik <jbacik@fb.com> CC: linux-btrfs@vger.kernel.org Signed-off-by: Chris Mason <clm@fb.com>
| * | | | | Btrfs: abort the transaction when we don't find our extent refJosef Bacik2014-04-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I'm not sure why we weren't aborting here in the first place, it is obviously a bad time from the fact that we print the leaf and yell loudly about it. Fix this up, otherwise we panic because our path could be pointing into oblivion. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | | Btrfs: fix EINVAL checks in btrfs_cloneChris Mason2014-04-071-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | btrfs_drop_extents can now return -EINVAL, but only one caller in btrfs_clone was checking for it. This adds it to the caller for inline extents, which is where we really need it. Signed-off-by: Chris Mason <clm@fb.com>
| * | | | | Btrfs: fix unlock in __start_delalloc_inodes()Wang Shilong2014-04-071-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fix a regression caused by the following patch: Btrfs: don't flush all delalloc inodes when we doesn't get s_umount lock break while loop will make us call @spin_unlock() without calling @spin_lock() before, fix it. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | | Btrfs: scrub raid56 stripes in the right wayWang Shilong2014-04-071-19/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Steps to reproduce: # mkfs.btrfs -f /dev/sda[8-11] -m raid5 -d raid5 # mount /dev/sda8 /mnt # btrfs scrub start -BR /mnt # echo $? <--unverified errors make return value be 3 This is because we don't setup right mapping between physical and logical address for raid56, which makes checksum mismatch. But we will find everthing is fine later when rechecking using btrfs_map_block(). This patch fixed the problem by settuping right mappings and we only verify data stripes' checksums. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | | Btrfs: don't compress for a small writeWang Shilong2014-04-071-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To compress a small file range(<=blocksize) that is not an inline extent can not save disk space at all. skip it can save us some cpu time. This patch can also fix wrong setting nocompression flag for inode, say a case when @total_in is 4096, and then we get @total_compressed 52,because we do aligment to page cache size firstly, and then we get into conclusion @total_in=@total_compressed thus we will clear this inode's compression flag. An exception comes from inserting inline extent failure but we still have @total_compressed < @total_in,so we will still reset inode's flag, this is ok, because we don't have good compression effect. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | | Btrfs: more efficient io tree navigation on wait_extent_bitFilipe Manana2014-04-071-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we don't reschedule use rb_next to find the next extent state instead of a full tree search, which is more efficient and safe since we didn't release the io tree's lock. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | | Btrfs: send, build path string only once in send_holeFilipe Manana2014-04-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no point building the path string in each iteration of the send_hole loop, as it produces always the same string. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | | btrfs: filter invalid arg for btrfs resizeGui Hecheng2014-04-071-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Originally following cmds will work: # btrfs fi resize -10A <mnt> # btrfs fi resize -10Gaha <mnt> Filter the arg by checking the return pointer of memparse. Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | | Btrfs: send, fix data corruption due to incorrect hole detectionFilipe Manana2014-04-071-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During an incremental send, when we finish processing an inode (corresponding to a regular file) we would assume the gap between the end of the last processed file extent and the file's size corresponded to a file hole, and therefore incorrectly send a bunch of zero bytes to overwrite that region in the file. This affects only kernel 3.14. Reproducer: mkfs.btrfs -f /dev/sdc mount /dev/sdc /mnt xfs_io -f -c "falloc -k 0 268435456" /mnt/foo btrfs subvolume snapshot -r /mnt /mnt/mysnap0 xfs_io -c "pwrite -S 0x01 -b 9216 16190218 9216" /mnt/foo xfs_io -c "pwrite -S 0x02 -b 1121 198720104 1121" /mnt/foo xfs_io -c "pwrite -S 0x05 -b 9216 107887439 9216" /mnt/foo xfs_io -c "pwrite -S 0x06 -b 9216 225520207 9216" /mnt/foo xfs_io -c "pwrite -S 0x07 -b 67584 102138300 67584" /mnt/foo xfs_io -c "pwrite -S 0x08 -b 7000 94897484 7000" /mnt/foo xfs_io -c "pwrite -S 0x09 -b 113664 245083212 113664" /mnt/foo xfs_io -c "pwrite -S 0x10 -b 123 17937788 123" /mnt/foo xfs_io -c "pwrite -S 0x11 -b 39936 229573311 39936" /mnt/foo xfs_io -c "pwrite -S 0x12 -b 67584 174792222 67584" /mnt/foo xfs_io -c "pwrite -S 0x13 -b 9216 249253213 9216" /mnt/foo xfs_io -c "pwrite -S 0x16 -b 67584 150046083 67584" /mnt/foo xfs_io -c "pwrite -S 0x17 -b 39936 118246040 39936" /mnt/foo xfs_io -c "pwrite -S 0x18 -b 67584 215965442 67584" /mnt/foo xfs_io -c "pwrite -S 0x19 -b 33792 97096725 33792" /mnt/foo xfs_io -c "pwrite -S 0x20 -b 125952 166300596 125952" /mnt/foo xfs_io -c "pwrite -S 0x21 -b 123 1078957 123" /mnt/foo xfs_io -c "pwrite -S 0x25 -b 9216 212044492 9216" /mnt/foo xfs_io -c "pwrite -S 0x26 -b 7000 265037146 7000" /mnt/foo xfs_io -c "pwrite -S 0x27 -b 42757 215922685 42757" /mnt/foo xfs_io -c "pwrite -S 0x28 -b 7000 69865411 7000" /mnt/foo xfs_io -c "pwrite -S 0x29 -b 67584 67948958 67584" /mnt/foo xfs_io -c "pwrite -S 0x30 -b 39936 266967019 39936" /mnt/foo xfs_io -c "pwrite -S 0x31 -b 1121 19582453 1121" /mnt/foo xfs_io -c "pwrite -S 0x32 -b 17408 257710255 17408" /mnt/foo xfs_io -c "pwrite -S 0x33 -b 39936 3895518 39936" /mnt/foo xfs_io -c "pwrite -S 0x34 -b 125952 12045847 125952" /mnt/foo xfs_io -c "pwrite -S 0x35 -b 17408 19156379 17408" /mnt/foo xfs_io -c "pwrite -S 0x36 -b 39936 50160066 39936" /mnt/foo xfs_io -c "pwrite -S 0x37 -b 113664 9549793 113664" /mnt/foo xfs_io -c "pwrite -S 0x38 -b 105472 94391506 105472" /mnt/foo xfs_io -c "pwrite -S 0x39 -b 23552 143632863 23552" /mnt/foo xfs_io -c "pwrite -S 0x40 -b 39936 241283845 39936" /mnt/foo xfs_io -c "pwrite -S 0x41 -b 113664 199937606 113664" /mnt/foo xfs_io -c "pwrite -S 0x42 -b 67584 67380093 67584" /mnt/foo xfs_io -c "pwrite -S 0x43 -b 67584 26793129 67584" /mnt/foo xfs_io -c "pwrite -S 0x44 -b 39936 14421913 39936" /mnt/foo xfs_io -c "pwrite -S 0x45 -b 123 253097405 123" /mnt/foo xfs_io -c "pwrite -S 0x46 -b 1121 128233424 1121" /mnt/foo xfs_io -c "pwrite -S 0x47 -b 105472 91577959 105472" /mnt/foo xfs_io -c "pwrite -S 0x48 -b 1121 7245381 1121" /mnt/foo xfs_io -c "pwrite -S 0x49 -b 113664 182414694 113664" /mnt/foo xfs_io -c "pwrite -S 0x50 -b 9216 32750608 9216" /mnt/foo xfs_io -c "pwrite -S 0x51 -b 67584 266546049 67584" /mnt/foo xfs_io -c "pwrite -S 0x52 -b 67584 87969398 67584" /mnt/foo xfs_io -c "pwrite -S 0x53 -b 9216 260848797 9216" /mnt/foo xfs_io -c "pwrite -S 0x54 -b 39936 119461243 39936" /mnt/foo xfs_io -c "pwrite -S 0x55 -b 7000 200178693 7000" /mnt/foo xfs_io -c "pwrite -S 0x56 -b 9216 243316029 9216" /mnt/foo xfs_io -c "pwrite -S 0x57 -b 7000 209658229 7000" /mnt/foo xfs_io -c "pwrite -S 0x58 -b 101376 179745192 101376" /mnt/foo xfs_io -c "pwrite -S 0x59 -b 9216 64012300 9216" /mnt/foo xfs_io -c "pwrite -S 0x60 -b 125952 181705139 125952" /mnt/foo xfs_io -c "pwrite -S 0x61 -b 23552 235737348 23552" /mnt/foo xfs_io -c "pwrite -S 0x62 -b 113664 106021355 113664" /mnt/foo xfs_io -c "pwrite -S 0x63 -b 67584 135753552 67584" /mnt/foo xfs_io -c "pwrite -S 0x64 -b 23552 95730888 23552" /mnt/foo xfs_io -c "pwrite -S 0x65 -b 11 17311415 11" /mnt/foo xfs_io -c "pwrite -S 0x66 -b 33792 120695553 33792" /mnt/foo xfs_io -c "pwrite -S 0x67 -b 9216 17164631 9216" /mnt/foo xfs_io -c "pwrite -S 0x68 -b 9216 136065853 9216" /mnt/foo xfs_io -c "pwrite -S 0x69 -b 67584 37752198 67584" /mnt/foo xfs_io -c "pwrite -S 0x70 -b 101376 189717473 101376" /mnt/foo xfs_io -c "pwrite -S 0x71 -b 7000 227463698 7000" /mnt/foo xfs_io -c "pwrite -S 0x72 -b 9216 12655137 9216" /mnt/foo xfs_io -c "pwrite -S 0x73 -b 7000 7488866 7000" /mnt/foo xfs_io -c "pwrite -S 0x74 -b 113664 87813649 113664" /mnt/foo xfs_io -c "pwrite -S 0x75 -b 33792 25802183 33792" /mnt/foo xfs_io -c "pwrite -S 0x76 -b 39936 93524024 39936" /mnt/foo xfs_io -c "pwrite -S 0x77 -b 33792 113336388 33792" /mnt/foo xfs_io -c "pwrite -S 0x78 -b 105472 184955320 105472" /mnt/foo xfs_io -c "pwrite -S 0x79 -b 101376 225691598 101376" /mnt/foo xfs_io -c "pwrite -S 0x80 -b 23552 77023155 23552" /mnt/foo xfs_io -c "pwrite -S 0x81 -b 11 201888192 11" /mnt/foo xfs_io -c "pwrite -S 0x82 -b 11 115332492 11" /mnt/foo xfs_io -c "pwrite -S 0x83 -b 67584 230278015 67584" /mnt/foo xfs_io -c "pwrite -S 0x84 -b 11 120589073 11" /mnt/foo xfs_io -c "pwrite -S 0x85 -b 125952 202207819 125952" /mnt/foo xfs_io -c "pwrite -S 0x86 -b 113664 86672080 113664" /mnt/foo xfs_io -c "pwrite -S 0x87 -b 17408 208459603 17408" /mnt/foo xfs_io -c "pwrite -S 0x88 -b 7000 73372211 7000" /mnt/foo xfs_io -c "pwrite -S 0x89 -b 7000 42252122 7000" /mnt/foo xfs_io -c "pwrite -S 0x90 -b 23552 46784881 23552" /mnt/foo xfs_io -c "pwrite -S 0x91 -b 101376 63172351 101376" /mnt/foo xfs_io -c "pwrite -S 0x92 -b 23552 59341931 23552" /mnt/foo xfs_io -c "pwrite -S 0x93 -b 39936 239599283 39936" /mnt/foo xfs_io -c "pwrite -S 0x94 -b 67584 175643105 67584" /mnt/foo xfs_io -c "pwrite -S 0x97 -b 23552 105534880 23552" /mnt/foo xfs_io -c "pwrite -S 0x98 -b 113664 8236844 113664" /mnt/foo xfs_io -c "pwrite -S 0x99 -b 125952 144489686 125952" /mnt/foo xfs_io -c "pwrite -S 0xa0 -b 7000 73273112 7000" /mnt/foo xfs_io -c "pwrite -S 0xa1 -b 125952 194580243 125952" /mnt/foo xfs_io -c "pwrite -S 0xa2 -b 123 56296779 123" /mnt/foo xfs_io -c "pwrite -S 0xa3 -b 11 233066845 11" /mnt/foo xfs_io -c "pwrite -S 0xa4 -b 39936 197727090 39936" /mnt/foo xfs_io -c "pwrite -S 0xa5 -b 101376 53579812 101376" /mnt/foo xfs_io -c "pwrite -S 0xa6 -b 9216 85669738 9216" /mnt/foo xfs_io -c "pwrite -S 0xa7 -b 125952 21266322 125952" /mnt/foo xfs_io -c "pwrite -S 0xa8 -b 23552 125726568 23552" /mnt/foo xfs_io -c "pwrite -S 0xa9 -b 9216 18423680 9216" /mnt/foo xfs_io -c "pwrite -S 0xb0 -b 1121 165901483 1121" /mnt/foo btrfs subvolume snapshot -r /mnt /mnt/mysnap1 xfs_io -c "pwrite -S 0xff -b 10 16190218 10" /mnt/foo btrfs subvolume snapshot -r /mnt /mnt/mysnap2 md5sum /mnt/foo # returns 79e53f1466bfc09fd82b450689e6119e md5sum /mnt/mysnap2/foo # returns 79e53f1466bfc09fd82b450689e6119e too btrfs send /mnt/mysnap1 -f /tmp/1.snap btrfs send -p /mnt/mysnap1 /mnt/mysnap2 -f /tmp/2.snap mkfs.btrfs -f /dev/sdc mount /dev/sdc /mnt btrfs receive /mnt -f /tmp/1.snap btrfs receive /mnt -f /tmp/2.snap md5sum /mnt/mysnap2/foo # returns 2bb414c5155767cedccd7063e51beabd !! A testcase for xfstests follows soon too. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | | Btrfs: kmalloc() doesn't return an ERR_PTRDan Carpenter2014-04-071-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The error handling was copy and pasted from memdup_user(). It should be checking for NULL obviously. Fixes: abccd00f8af2 ('btrfs: Fix 32/64-bit problem with BTRFS_SET_RECEIVED_SUBVOL ioctl') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | | Btrfs: fix snapshot vs nocow writtingWang Shilong2014-04-071-2/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While running fsstress and snapshots concurrently, we will hit something like followings: Thread 1 Thread 2 |->fallocate |->write pages |->join transaction |->add ordered extent |->end transaction |->flushing data |->creating pending snapshots |->write data into src root's fallocated space After above work flows finished, we will get a state that source and snapshot root share same space, but source root have written data into fallocated space, this will make fsck fail to verify checksums for snapshot root's preallocating file extent data.Nocow writting also has this same problem. Fix this problem by syncing snapshots with nocow writting: 1.for nocow writting,if there are pending snapshots, we will fall into COW way. 2.if there are pending nocow writes, snapshots for this root will be blocked until nocow writting finish. Reported-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | | btrfs: Change the expanding write sequence to fix snapshot related bug.Qu Wenruo2014-04-071-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When testing fsstress with snapshot making background, some snapshot following problem. Snapshot 270: inode 323: size 0 Snapshot 271: inode 323: size 349145 |-------Hole---|---------Empty gap-------|-------Hole-----| 0 122880 172032 349145 Snapshot 272: inode 323: size 349145 |-------Hole---|------------Data---------|-------Hole-----| 0 122880 172032 349145 The fsstress operation on inode 323 is the following: write: offset 126832 len 43124 truncate: size 349145 Since the write with offset is consist of 2 operations: 1. punch hole 2. write data Hole punching is faster than data write, so hole punching in write and truncate is done first and then buffered write, so the snapshot 271 got empty gap, which will not pass btrfsck. To fix the bug, this patch will change the write sequence which will first punch a hole covering the write end if a hole is needed. Reported-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | | btrfs: make device scan less noisyDavid Sterba2014-04-071-11/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Print the message only when the device is seen for the first time. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | | btrfs: fix lockdep warning with reclaim lock inversionJeff Mahoney2014-04-071-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When encountering memory pressure, testers have run into the following lockdep warning. It was caused by __link_block_group calling kobject_add with the groups_sem held. kobject_add calls kvasprintf with GFP_KERNEL, which gets us into reclaim context. The kobject doesn't actually need to be added under the lock -- it just needs to ensure that it's only added for the first block group to be linked. ========================================================= [ INFO: possible irq lock inversion dependency detected ] 3.14.0-rc8-default #1 Not tainted --------------------------------------------------------- kswapd0/169 just changed the state of lock: (&delayed_node->mutex){+.+.-.}, at: [<ffffffffa018baea>] __btrfs_release_delayed_node+0x3a/0x200 [btrfs] but this lock took another, RECLAIM_FS-unsafe lock in the past: (&found->groups_sem){+++++.} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&found->groups_sem); local_irq_disable(); lock(&delayed_node->mutex); lock(&found->groups_sem); <Interrupt> lock(&delayed_node->mutex); *** DEADLOCK *** 2 locks held by kswapd0/169: #0: (shrinker_rwsem){++++..}, at: [<ffffffff81159e8a>] shrink_slab+0x3a/0x160 #1: (&type->s_umount_key#27){++++..}, at: [<ffffffff811bac6f>] grab_super_passive+0x3f/0x90 Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | | Btrfs: hold the commit_root_sem when getting the commit root during sendJosef Bacik2014-04-073-16/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently rely too heavily on roots being read-only to save us from just accessing root->commit_root. We can easily balance blocks out from underneath a read only root, so to save us from getting screwed make sure we only access root->commit_root under the commit root sem. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | | Btrfs: remove transaction from sendJosef Bacik2014-04-069-187/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lets try this again. We can deadlock the box if we send on a box and try to write onto the same fs with the app that is trying to listen to the send pipe. This is because the writer could get stuck waiting for a transaction commit which is being blocked by the send. So fix this by making sure looking at the commit roots is always going to be consistent. We do this by keeping track of which roots need to have their commit roots swapped during commit, and then taking the commit_root_sem and swapping them all at once. Then make sure we take a read lock on the commit_root_sem in cases where we search the commit root to make sure we're always looking at a consistent view of the commit roots. Previously we had problems with this because we would swap a fs tree commit root and then swap the extent tree commit root independently which would cause the backref walking code to screw up sometimes. With this patch we no longer deadlock and pass all the weird send/receive corner cases. Thanks, Reportedy-by: Hugo Mills <hugo@carfax.org.uk> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | | Btrfs: don't clear uptodate if the eb is under IOJosef Bacik2014-04-066-3/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So I have an awful exercise script that will run snapshot, balance and send/receive in parallel. This sometimes would crash spectacularly and when it came back up the fs would be completely hosed. Turns out this is because of a bad interaction of balance and send/receive. Send will hold onto its entire path for the whole send, but its blocks could get relocated out from underneath it, and because it doesn't old tree locks theres nothing to keep this from happening. So it will go to read in a slot with an old transid, and we could have re-allocated this block for something else and it could have a completely different transid. But because we think it is invalid we clear uptodate and re-read in the block. If we do this before we actually write out the new block we could write back stale data to the fs, and boom we're screwed. Now we definitely need to fix this disconnect between send and balance, but we really really need to not allow ourselves to accidently read in stale data over new data. So make sure we check if the extent buffer is not under io before clearing uptodate, this will kick back EIO to the caller instead of reading in stale data and keep us from corrupting the fs. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | | Btrfs: check for an extent_op on the locked refJosef Bacik2014-04-061-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We could have possibly added an extent_op to the locked_ref while we dropped locked_ref->lock, so check for this case as well and loop around. Otherwise we could lose flag updates which would lead to extent tree corruption. Thanks, cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | | Btrfs: do not reset last_snapshot after relocationJosef Bacik2014-04-061-21/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was done to allow NO_COW to continue to be NO_COW after relocation but it is not right. When relocating we will convert blocks to FULL_BACKREF that we relocate. We can leave some of these full backref blocks behind if they are not cow'ed out during the relocation, like if we fail the relocation with ENOSPC and then just drop the reloc tree. Then when we go to cow the block again we won't lookup the extent flags because we won't think there has been a snapshot recently which means we will do our normal ref drop thing instead of adding back a tree ref and dropping the shared ref. This will cause btrfs_free_extent to blow up because it can't find the ref we are trying to free. This was found with my ref verifying tool. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
* | | | | | Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osdLinus Torvalds2014-04-102-3/+3
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull exofs updates from Boaz Harrosh: "Trivial updates to exofs for 3.15-rc1 Just a few fixes sent by people" * 'for-linus' of git://git.open-osd.org/linux-open-osd: MAINTAINERS: Update email address for bhalevy fs: Mark functions as static in exofs/ore_raid.c fs: Mark function as static in exofs/super.c
| * | | | | | fs: Mark functions as static in exofs/ore_raid.cRashika Kheria2014-04-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mark functions as static in exofs/ore_raid.c because they are not used outside this file. This also eliminates the following warning in exofs/ore_raid.c: fs/exofs/ore_raid.c:24:14: warning: no previous prototype for _raid_page_alloc [-Wmissing-prototypes] fs/exofs/ore_raid.c:29:6: warning: no previous prototype for _raid_page_free [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | | | | | fs: Mark function as static in exofs/super.cRashika Kheria2014-04-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mark function as static in exofs/super.c because it is not used outside this file. This also eliminates the following warning in exofs/super.c: fs/exofs/super.c:546:5: warning: no previous prototype \ for __alloc_dev_table[-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
* | | | | | | Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2014-04-101-10/+12
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block layer fixes from Jens Axboe: "A small collection of fixes that should go in before -rc1. The pull request contains: - A two patch fix for a regression with block enabled tagging caused by a commit in the initial pull request. One patch is from Martin and ensures that SCSI doesn't truncate 64-bit block flags, the other one is from me and prevents us from double using struct request queuelist for both completion and busy tags. This caused anything from a boot crash for some, to crashes under load. - A blk-mq fix for a potential soft stall when hot unplugging CPUs with busy IO. - percpu_counter fix is listed in here, that caused a suspend issue with virtio-blk due to percpu counters having an inconsistent state during CPU removal. Andrew sent this in separately a few days ago, but it's here. JFYI. - A few fixes for block integrity from Martin. - A ratelimit fix for loop from Mike Galbraith, to avoid spewing too much in error cases" * 'for-linus' of git://git.kernel.dk/linux-block: block: fix regression with block enabled tagging scsi: Make sure cmd_flags are 64-bit block: Ensure we only enable integrity metadata for reads and writes block: Fix integrity verification block: Fix for_each_bvec() drivers/block/loop.c: ratelimit error messages blk-mq: fix potential stall during CPU unplug with IO pending percpu_counter: fix bad counter state during suspend
| * | | | | | | block: Ensure we only enable integrity metadata for reads and writesMartin K. Petersen2014-04-091-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We'd occasionally attempt to generate protection information for flushes and other requests with a zero payload. Make sure we only attempt to enable integrity for reads and writes. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | | | | block: Fix integrity verificationMartin K. Petersen2014-04-091-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit bf36f9cfa6d3d caused a regression by effectively reverting Nic's fix from 5837c80e870b that ensures we traverse the full bio_vec list upon completion. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | | | | | | | Merge branch 'for-3.15' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2014-04-0813-50/+130
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: "Highlights: - server-side nfs/rdma fixes from Jeff Layton and Tom Tucker - xdr fixes (a larger xdr rewrite has been posted but I decided it would be better to queue it up for 3.16). - miscellaneous fixes and cleanup from all over (thanks especially to Kinglong Mee)" * 'for-3.15' of git://linux-nfs.org/~bfields/linux: (36 commits) nfsd4: don't create unnecessary mask acl nfsd: revert v2 half of "nfsd: don't return high mode bits" nfsd4: fix memory leak in nfsd4_encode_fattr() nfsd: check passed socket's net matches NFSd superblock's one SUNRPC: Clear xpt_bc_xprt if xs_setup_bc_tcp failed NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcp SUNRPC: New helper for creating client with rpc_xprt NFSD: Free backchannel xprt in bc_destroy NFSD: Clear wcc data between compound ops nfsd: Don't return NFS4ERR_STALE_STATEID for NFSv4.1+ nfsd4: fix nfs4err_resource in 4.1 case nfsd4: fix setclientid encode size nfsd4: remove redundant check from nfsd4_check_resp_size nfsd4: use more generous NFS4_ACL_MAX nfsd4: minor nfsd4_replay_cache_entry cleanup nfsd4: nfsd4_replay_cache_entry should be static nfsd4: update comments with obsolete function name rpc: Allow xdr_buf_subsegment to operate in-place NFSD: Using free_conn free connection SUNRPC: fix memory leak of peer addresses in XPRT ...
| * | | | | | | | nfsd4: don't create unnecessary mask aclJ. Bruce Fields2014-04-041-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Any setattr of the ACL attribute, even if it sets just the basic 3-ACE ACL exactly as it was returned from a file with only mode bits, creates a mask entry, and it is only the mask, not group, entry that is changed by subsequent modifications of the mode bits. So, for example, it's surprising that GROUP@ is left without read or write permissions after a chmod 0666: touch test chmod 0600 test nfs4_getfacl test A::OWNER@:rwatTcCy A::GROUP@:tcy A::EVERYONE@:tcy nfs4_getfacl test | nfs4_setfacl -S - test # chmod 0666 test nfs4_getfacl test A::OWNER@:rwatTcCy A::GROUP@:tcy D::GROUP@:rwa A::EVERYONE@:rwatcy So, let's stop creating the unnecessary mask ACL. A mask will still be created on non-trivial ACLs (ACLs with actual named user and group ACEs), so the odd posix-acl behavior of chmod modifying only the mask will still be left in that case; but that's consistent with local behavior. Reported-by: Soumya Koduri <skoduri@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | nfsd: revert v2 half of "nfsd: don't return high mode bits"J. Bruce Fields2014-04-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts the part of commit 6e14b46b91fee8a049b0940333ce13a820beaaa5 that changes NFSv2 behavior. Mark Lord found that it broke nfs-root for Linux clients, because it broke NFSv2. In fact, from RFC 1094: "Notice that the file type is specified both in the mode bits and in the file type. This is really a bug in the protocol and will be fixed in future versions." So NFSv2 clients really are expected to depend on the high bits of the mode. Cc: stable@kernel.org Reported-by: Mark Lord <mlord@pobox.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | nfsd4: fix memory leak in nfsd4_encode_fattr()Yan, Zheng2014-03-311-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fh_put() does not free the temporary file handle. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | nfsd: check passed socket's net matches NFSd superblock's oneStanislav Kinsbursky2014-03-311-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There could be a case, when NFSd file system is mounted in network, different to socket's one, like below: "ip netns exec" creates new network and mount namespace, which duplicates NFSd mount point, created in init_net context. And thus NFS server stop in nested network context leads to RPCBIND client destruction in init_net. Then, on NFSd start in nested network context, rpc.nfsd process creates socket in nested net and passes it into "write_ports", which leads to RPCBIND sockets creation in init_net context because of the same reason (NFSd monut point was created in init_net context). An attempt to register passed socket in nested net leads to panic, because no RPCBIND client present in nexted network namespace. This patch add check that passed socket's net matches NFSd superblock's one. And returns -EINVAL error to user psace otherwise. v2: Put socket on exit. Reported-by: Weng Meiling <wengmeiling.weng@huawei.com> Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcpKinglong Mee2014-03-301-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Besides checking rpc_xprt out of xs_setup_bc_tcp, increase it's reference (it's important). Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | NFSD: Clear wcc data between compound opsKinglong Mee2014-03-302-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Testing NFS4.0 by pynfs, I got some messeages as, "nfsd: inode locked twice during operation." When one compound RPC contains two or more ops that locks the filehandle,the second op will cause the message. As two SETATTR ops, after the first SETATTR, nfsd will not call fh_put() to release current filehandle, it means filehandle have unlocked with fh_post_saved = 1. The second SETATTR find fh_post_saved = 1, and printk the message. v2: introduce helper fh_clear_wcc(). Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | nfsd: Don't return NFS4ERR_STALE_STATEID for NFSv4.1+Trond Myklebust2014-03-301-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RFC5661 obsoletes NFS4ERR_STALE_STATEID in favour of NFS4ERR_BAD_STATEID. Note that because nfsd encodes the clientid boot time in the stateid, we can hit this error case in certain scenarios where the Linux client state management thread exits early, before it has finished recovering all state. Reported-by: Idan Kedar <idank@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | nfsd4: fix nfs4err_resource in 4.1 caseJ. Bruce Fields2014-03-281-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | encode_getattr, for example, can return nfserr_resource to indicate it ran out of buffer space. That's not a legal error in the 4.1 case. And in the 4.1 case, if we ran out of buffer space, we should have exceeded a session limit too. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | nfsd4: fix setclientid encode sizeJ. Bruce Fields2014-03-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | nfsd4: remove redundant check from nfsd4_check_resp_sizeJ. Bruce Fields2014-03-281-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cstate->slot and ->session are each set together in nfsd4_sequence. If one is non-NULL, so is the other. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | nfsd4: use more generous NFS4_ACL_MAXJ. Bruce Fields2014-03-281-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | nfsd4: minor nfsd4_replay_cache_entry cleanupJ. Bruce Fields2014-03-281-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Maybe this is comment true, who cares? Handle this like any other error. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | nfsd4: nfsd4_replay_cache_entry should be staticJ. Bruce Fields2014-03-282-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This isn't actually used anywhere else. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | nfsd4: update comments with obsolete function nameJ. Bruce Fields2014-03-282-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | NFSD: Using free_conn free connectionKinglong Mee2014-03-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Connection from alloc_conn must be freed through free_conn, otherwise, the reference of svc_xprt will never be put. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | nfsd: typo in nfsd_rename commentJ. Bruce Fields2014-03-281-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | NFSD: simplify saved/current fh uses in nfsd4_proc_compoundKinglong Mee2014-03-281-13/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | lockd: ensure we tear down any live sockets when socket creation fails ↵Jeff Layton2014-03-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | during lockd_up We had a Fedora ABRT report with a stack trace like this: kernel BUG at net/sunrpc/svc.c:550! invalid opcode: 0000 [#1] SMP [...] CPU: 2 PID: 913 Comm: rpc.nfsd Not tainted 3.13.6-200.fc20.x86_64 #1 Hardware name: Hewlett-Packard HP ProBook 4740s/1846, BIOS 68IRR Ver. F.40 01/29/2013 task: ffff880146b00000 ti: ffff88003f9b8000 task.ti: ffff88003f9b8000 RIP: 0010:[<ffffffffa0305fa8>] [<ffffffffa0305fa8>] svc_destroy+0x128/0x130 [sunrpc] RSP: 0018:ffff88003f9b9de0 EFLAGS: 00010206 RAX: ffff88003f829628 RBX: ffff88003f829600 RCX: 00000000000041ee RDX: 0000000000000000 RSI: 0000000000000286 RDI: 0000000000000286 RBP: ffff88003f9b9de8 R08: 0000000000017360 R09: ffff88014fa97360 R10: ffffffff8114ce57 R11: ffffea00051c9c00 R12: ffff88003f829600 R13: 00000000ffffff9e R14: ffffffff81cc7cc0 R15: 0000000000000000 FS: 00007f4fde284840(0000) GS:ffff88014fa80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4fdf5192f8 CR3: 00000000a569a000 CR4: 00000000001407e0 Stack: ffff88003f792300 ffff88003f9b9e18 ffffffffa02de02a 0000000000000000 ffffffff81cc7cc0 ffff88003f9cb000 0000000000000008 ffff88003f9b9e60 ffffffffa033bb35 ffffffff8131c86c ffff88003f9cb000 ffff8800a5715008 Call Trace: [<ffffffffa02de02a>] lockd_up+0xaa/0x330 [lockd] [<ffffffffa033bb35>] nfsd_svc+0x1b5/0x2f0 [nfsd] [<ffffffff8131c86c>] ? simple_strtoull+0x2c/0x50 [<ffffffffa033c630>] ? write_pool_threads+0x280/0x280 [nfsd] [<ffffffffa033c6bb>] write_threads+0x8b/0xf0 [nfsd] [<ffffffff8114efa4>] ? __get_free_pages+0x14/0x50 [<ffffffff8114eff6>] ? get_zeroed_page+0x16/0x20 [<ffffffff811dec51>] ? simple_transaction_get+0xb1/0xd0 [<ffffffffa033c098>] nfsctl_transaction_write+0x48/0x80 [nfsd] [<ffffffff811b8b34>] vfs_write+0xb4/0x1f0 [<ffffffff811c3f99>] ? putname+0x29/0x40 [<ffffffff811b9569>] SyS_write+0x49/0xa0 [<ffffffff810fc2a6>] ? __audit_syscall_exit+0x1f6/0x2a0 [<ffffffff816962e9>] system_call_fastpath+0x16/0x1b Code: 31 c0 e8 82 db 37 e1 e9 2a ff ff ff 48 8b 07 8b 57 14 48 c7 c7 d5 c6 31 a0 48 8b 70 20 31 c0 e8 65 db 37 e1 e9 f4 fe ff ff 0f 0b <0f> 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 RIP [<ffffffffa0305fa8>] svc_destroy+0x128/0x130 [sunrpc] RSP <ffff88003f9b9de0> Evidently, we created some lockd sockets and then failed to create others. make_socks then returned an error and we tried to tear down the svc, but svc->sv_permsocks was not empty so we ended up tripping over the BUG() in svc_destroy(). Fix this by ensuring that we tear down any live sockets we created when socket creation is going to return an error. Fixes: 786185b5f8abefa (SUNRPC: move per-net operations from...) Reported-by: Raphos <raphoszap@laposte.net> Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | NFSD: Traverse unconfirmed client through hash-tableKinglong Mee2014-03-281-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When stopping nfsd, I got BUG messages, and soft lockup messages, The problem is cuased by double rb_erase() in nfs4_state_destroy_net() and destroy_client(). This patch just let nfsd traversing unconfirmed client through hash-table instead of rbtree. [ 2325.021995] BUG: unable to handle kernel NULL pointer dereference at (null) [ 2325.022809] IP: [<ffffffff8133c18c>] rb_erase+0x14c/0x390 [ 2325.022982] PGD 7a91b067 PUD 7a33d067 PMD 0 [ 2325.022982] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [ 2325.022982] Modules linked in: nfsd(OF) cfg80211 rfkill bridge stp llc snd_intel8x0 snd_ac97_codec ac97_bus auth_rpcgss nfs_acl serio_raw e1000 i2c_piix4 ppdev snd_pcm snd_timer lockd pcspkr joydev parport_pc snd parport i2c_core soundcore microcode sunrpc ata_generic pata_acpi [last unloaded: nfsd] [ 2325.022982] CPU: 1 PID: 2123 Comm: nfsd Tainted: GF O 3.14.0-rc8+ #2 [ 2325.022982] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 2325.022982] task: ffff88007b384800 ti: ffff8800797f6000 task.ti: ffff8800797f6000 [ 2325.022982] RIP: 0010:[<ffffffff8133c18c>] [<ffffffff8133c18c>] rb_erase+0x14c/0x390 [ 2325.022982] RSP: 0018:ffff8800797f7d98 EFLAGS: 00010246 [ 2325.022982] RAX: ffff880079c1f010 RBX: ffff880079f4c828 RCX: 0000000000000000 [ 2325.022982] RDX: 0000000000000000 RSI: ffff880079bcb070 RDI: ffff880079f4c810 [ 2325.022982] RBP: ffff8800797f7d98 R08: 0000000000000000 R09: ffff88007964fc70 [ 2325.022982] R10: 0000000000000000 R11: 0000000000000400 R12: ffff880079f4c800 [ 2325.022982] R13: ffff880079bcb000 R14: ffff8800797f7da8 R15: ffff880079f4c860 [ 2325.022982] FS: 0000000000000000(0000) GS:ffff88007f900000(0000) knlGS:0000000000000000 [ 2325.022982] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 2325.022982] CR2: 0000000000000000 CR3: 000000007a3ef000 CR4: 00000000000006e0 [ 2325.022982] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2325.022982] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2325.022982] Stack: [ 2325.022982] ffff8800797f7de0 ffffffffa0191c6e ffff8800797f7da8 ffff8800797f7da8 [ 2325.022982] ffff880079f4c810 ffff880079bcb000 ffffffff81cc26c0 ffff880079c1f010 [ 2325.022982] ffff880079bcb070 ffff8800797f7e28 ffffffffa01977f2 ffff8800797f7df0 [ 2325.022982] Call Trace: [ 2325.022982] [<ffffffffa0191c6e>] destroy_client+0x32e/0x3b0 [nfsd] [ 2325.022982] [<ffffffffa01977f2>] nfs4_state_shutdown_net+0x1a2/0x220 [nfsd] [ 2325.022982] [<ffffffffa01700b8>] nfsd_shutdown_net+0x38/0x70 [nfsd] [ 2325.022982] [<ffffffffa017013e>] nfsd_last_thread+0x4e/0x80 [nfsd] [ 2325.022982] [<ffffffffa001f1eb>] svc_shutdown_net+0x2b/0x30 [sunrpc] [ 2325.022982] [<ffffffffa017064b>] nfsd_destroy+0x5b/0x80 [nfsd] [ 2325.022982] [<ffffffffa0170773>] nfsd+0x103/0x130 [nfsd] [ 2325.022982] [<ffffffffa0170670>] ? nfsd_destroy+0x80/0x80 [nfsd] [ 2325.022982] [<ffffffff810a8232>] kthread+0xd2/0xf0 [ 2325.022982] [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40 [ 2325.022982] [<ffffffff816c493c>] ret_from_fork+0x7c/0xb0 [ 2325.022982] [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40 [ 2325.022982] Code: 48 83 e1 fc 48 89 10 0f 84 02 01 00 00 48 3b 41 10 0f 84 08 01 00 00 48 89 51 08 48 89 fa e9 74 ff ff ff 0f 1f 40 00 48 8b 50 10 <f6> 02 01 0f 84 93 00 00 00 48 8b 7a 10 48 85 ff 74 05 f6 07 01 [ 2325.022982] RIP [<ffffffff8133c18c>] rb_erase+0x14c/0x390 [ 2325.022982] RSP <ffff8800797f7d98> [ 2325.022982] CR2: 0000000000000000 [ 2325.022982] ---[ end trace 28c27ed011655e57 ]--- [ 228.064071] BUG: soft lockup - CPU#0 stuck for 22s! [nfsd:558] [ 228.064428] Modules linked in: ip6t_rpfilter ip6t_REJECT cfg80211 xt_conntrack rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw nfsd(OF) auth_rpcgss nfs_acl lockd snd_intel8x0 snd_ac97_codec ac97_bus joydev snd_pcm snd_timer e1000 sunrpc snd ppdev parport_pc serio_raw pcspkr i2c_piix4 microcode parport soundcore i2c_core ata_generic pata_acpi [ 228.064539] CPU: 0 PID: 558 Comm: nfsd Tainted: GF O 3.14.0-rc8+ #2 [ 228.064539] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 228.064539] task: ffff880076adec00 ti: ffff880074616000 task.ti: ffff880074616000 [ 228.064539] RIP: 0010:[<ffffffff8133ba17>] [<ffffffff8133ba17>] rb_next+0x27/0x50 [ 228.064539] RSP: 0018:ffff880074617de0 EFLAGS: 00000282 [ 228.064539] RAX: ffff880074478010 RBX: ffff88007446f860 RCX: 0000000000000014 [ 228.064539] RDX: ffff880074478010 RSI: 0000000000000000 RDI: ffff880074478010 [ 228.064539] RBP: ffff880074617de0 R08: 0000000000000000 R09: 0000000000000012 [ 228.064539] R10: 0000000000000001 R11: ffffffffffffffec R12: ffffea0001d11a00 [ 228.064539] R13: ffff88007f401400 R14: ffff88007446f800 R15: ffff880074617d50 [ 228.064539] FS: 0000000000000000(0000) GS:ffff88007f800000(0000) knlGS:0000000000000000 [ 228.064539] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 228.064539] CR2: 00007fe9ac6ec000 CR3: 000000007a5d6000 CR4: 00000000000006f0 [ 228.064539] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 228.064539] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 228.064539] Stack: [ 228.064539] ffff880074617e28 ffffffffa01ab7db ffff880074617df0 ffff880074617df0 [ 228.064539] ffff880079273000 ffffffff81cc26c0 ffffffff81cc26c0 0000000000000000 [ 228.064539] 0000000000000000 ffff880074617e48 ffffffffa01840b8 ffffffff81cc26c0 [ 228.064539] Call Trace: [ 228.064539] [<ffffffffa01ab7db>] nfs4_state_shutdown_net+0x18b/0x220 [nfsd] [ 228.064539] [<ffffffffa01840b8>] nfsd_shutdown_net+0x38/0x70 [nfsd] [ 228.064539] [<ffffffffa018413e>] nfsd_last_thread+0x4e/0x80 [nfsd] [ 228.064539] [<ffffffffa00aa1eb>] svc_shutdown_net+0x2b/0x30 [sunrpc] [ 228.064539] [<ffffffffa018464b>] nfsd_destroy+0x5b/0x80 [nfsd] [ 228.064539] [<ffffffffa0184773>] nfsd+0x103/0x130 [nfsd] [ 228.064539] [<ffffffffa0184670>] ? nfsd_destroy+0x80/0x80 [nfsd] [ 228.064539] [<ffffffff810a8232>] kthread+0xd2/0xf0 [ 228.064539] [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40 [ 228.064539] [<ffffffff816c493c>] ret_from_fork+0x7c/0xb0 [ 228.064539] [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40 [ 228.064539] Code: 1f 44 00 00 55 48 8b 17 48 89 e5 48 39 d7 74 3b 48 8b 47 08 48 85 c0 75 0e eb 25 66 0f 1f 84 00 00 00 00 00 48 89 d0 48 8b 50 10 <48> 85 d2 75 f4 5d c3 66 90 48 3b 78 08 75 f6 48 8b 10 48 89 c7 Fixes: ac55fdc408039 (nfsd: move the confirmed and unconfirmed hlists...) Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | svcrpc: explicitly reject compounds that are not padded out to 4-byte multipleJeff Layton2014-03-271-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a WARN_ON in the nfsd4_decode_write() that tells us when the client has sent a request that is not padded out properly according to RFC4506. A WARN_ON really isn't appropriate in this case though since this indicates a client bug, not a server one. Move this check out to the top-level compound decoder and have it just explicitly return an error. Also add a dprintk() that shows the client address and xid to help track down clients and frames that trigger it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | nfsd: notify_change needs elevated write countJ. Bruce Fields2014-03-271-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Looks like this bug has been here since these write counts were introduced, not sure why it was just noticed now. Thanks also to Jan Kara for pointing out the problem. Cc: stable@vger.kernel.org Reported-by: Matthew Rahtz <mrahtz@rapitasystems.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | | | nfsd4: fix test_stateid error reply encodingJ. Bruce Fields2014-03-271-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the entire operation fails then there's nothing to encode. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>