summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixesLinus Torvalds2012-11-076-38/+43
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull gfs2 fixes from Steven Whitehouse: "Here are a number of GFS2 bug fixes. There are three from Andy Price which fix various issues spotted by automated code analysis. There are two from Lukas Czerner fixing my mistaken assumptions as to how FITRIM should work. Finally Ben Marzinski has fixed a bug relating to mmap and atime and also a bug relating to a locking issue in the transaction code." * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes: GFS2: Test bufdata with buffer locked and gfs2_log_lock held GFS2: Don't call file_accessed() with a shared glock GFS2: Fix FITRIM argument handling GFS2: Require user to provide argument for FITRIM GFS2: Clean up some unused assignments GFS2: Fix possible null pointer deref in gfs2_rs_alloc GFS2: Fix an unchecked error from gfs2_rs_alloc
| * GFS2: Test bufdata with buffer locked and gfs2_log_lock heldBenjamin Marzinski2012-11-072-12/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In gfs2_trans_add_bh(), gfs2 was testing if a there was a bd attached to the buffer without having the gfs2_log_lock held. It was then assuming it would stay attached for the rest of the function. However, without either the log lock being held of the buffer locked, __gfs2_ail_flush() could detach bd at any time. This patch moves the locking before the test. If there isn't a bd already attached, gfs2 can safely allocate one and attach it before locking. There is no way that the newly allocated bd could be on the ail list, and thus no way for __gfs2_ail_flush() to detach it. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * GFS2: Don't call file_accessed() with a shared glockBenjamin Marzinski2012-11-072-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | file_accessed() was being called by gfs2_mmap() with a shared glock. If it needed to update the atime, it was crashing because it dirtied the inode in gfs2_dirty_inode() without holding an exclusive lock. gfs2_dirty_inode() checked if the caller was already holding a glock, but it didn't make sure that the glock was in the exclusive state. Now, instead of calling file_accessed() while holding the shared lock in gfs2_mmap(), file_accessed() is called after grabbing and releasing the glock to update the inode. If file_accessed() needs to update the atime, it will grab an exclusive lock in gfs2_dirty_inode(). gfs2_dirty_inode() now also checks to make sure that if the calling process has already locked the glock, it has an exclusive lock. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * GFS2: Fix FITRIM argument handlingLukas Czerner2012-11-071-3/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently implementation in gfs2 uses FITRIM arguments as it were in file system blocks units which is wrong. The FITRIM arguments (fstrim_range.start, fstrim_range.len and fstrim_range.minlen) are actually in bytes. Moreover, check for start argument beyond the end of file system, len argument being smaller than file system block and minlen argument being bigger than biggest resource group were missing. This commit converts the code to convert FITRIM argument to file system blocks and also adds appropriate checks mentioned above. All the problems were recognised by xfstests 251 and 260. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * GFS2: Require user to provide argument for FITRIMLukas Czerner2012-11-071-6/+2
| | | | | | | | | | | | | | | | | | When the fstrim_range argument is not provided by user in FITRIM ioctl we should just return EFAULT and not promoting bad behaviour by filling the structure in kernel. Let the user deal with it. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * GFS2: Clean up some unused assignmentsAndrew Price2012-11-072-4/+0
| | | | | | | | | | | | | | | | Cleans up two cases where variables were assigned values but then never used again. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * GFS2: Fix possible null pointer deref in gfs2_rs_allocAndrew Price2012-11-071-3/+2
| | | | | | | | | | | | | | | | | | | | Despite the return value from kmem_cache_zalloc() being checked, the error wasn't being returned until after a possible null pointer dereference. This patch returns the error immediately, allowing the removal of the error variable. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * GFS2: Fix an unchecked error from gfs2_rs_allocAndrew Price2012-11-071-2/+5
| | | | | | | | | | | | | | | | Check the return value of gfs2_rs_alloc(ip) and avoid a possible null pointer dereference. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | NFS4: nfs4_opendata_access should return errnoWeston Andros Adamson2012-11-021-1/+1
| | | | | | | | | | | | | | Return errno - not an NFS4ERR_. This worked because NFS4ERR_ACCESS == EACCES. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | NFSv4: Initialise the NFSv4.1 slot table highest_used_slotid correctlyTrond Myklebust2012-11-011-1/+1
| | | | | | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | NFS: add nfs_sb_deactive_async to avoid deadlockWeston Andros Adamson2012-10-315-3/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use nfs_sb_deactive_async instead of nfs_sb_deactive when in a workqueue context. This avoids a deadlock where rpc_shutdown_client loops forever in a workqueue kworker context, trying to kill all RPC tasks associated with the client, while one or more of these tasks have already been assigned to the same kworker (and will never run rpc_exit_task). This approach is needed because RPC tasks that have already been assigned to a kworker by queue_work cannot be canceled, as explained in the comment for workqueue.c:insert_wq_barrier. Signed-off-by: Weston Andros Adamson <dros@netapp.com> [Trond: add module_get/put.] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | nfs: Show original device name verbatim in /proc/*/mount{s,info}Ben Hutchings2012-10-314-9/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit c7f404b ('vfs: new superblock methods to override /proc/*/mount{s,info}'), nfs_path() is used to generate the mounted device name reported back to userland. nfs_path() always generates a trailing slash when the given dentry is the root of an NFS mount, but userland may expect the original device name to be returned verbatim (as it used to be). Make this canonicalisation optional and change the callers accordingly. [jrnieder@gmail.com: use flag instead of bool argument] Reported-and-tested-by: Chris Hiestand <chiestand@salk.edu> Reference: http://bugs.debian.org/669314 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: <stable@vger.kernel.org> # v2.6.39+ Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | nfsv3: Make v3 mounts fail with ETIMEDOUTs instead EIO on mountd timeoutsScott Mayhew2012-10-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In very busy v3 environment, rpc.mountd can respond to the NULL procedure but not the MNT procedure in a timely manner causing the MNT procedure to time out. The problem is the mount system call returns EIO which causes the mount to fail, instead of ETIMEDOUT, which would cause the mount to be retried. This patch sets the RPC_TASK_SOFT|RPC_TASK_TIMEOUT flags to the rpc_call_sync() call in nfs_mount() which causes ETIMEDOUT to be returned on timed out connections. Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
* | nfs: Check whether a layout pointer is NULL before free itYanchuan Nian2012-10-311-2/+2
| | | | | | | | | | | | | | | | | | The new layout pointer in pnfs_find_alloc_layout() may be NULL because of out of memory. we must do some check work, otherwise pnfs_free_layout_hdr() will go wrong because it can not deal with a NULL pointer. Signed-off-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | NFS: fix bug in legacy DNS resolver.NeilBrown2012-10-311-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The DNS resolver's use of the sunrpc cache involves a 'ttl' number (relative) rather that a timeout (absolute). This confused me when I wrote commit c5b29f885afe890f953f7f23424045cdad31d3e4 "sunrpc: use seconds since boot in expiry cache" and I managed to break it. The effect is that any TTL is interpreted as 0, and nothing useful gets into the cache. This patch removes the use of get_expiry() - which really expects an expiry time - and uses get_uint() instead, treating the int correctly as a ttl. This fixes a regression that has been present since 2.6.37, causing certain NFS accesses in certain environments to incorrectly fail. Reported-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | NFSv4: nfs4_locku_done must release the sequence idTrond Myklebust2012-10-311-0/+1
| | | | | | | | | | | | | | | | If the state recovery machinery is triggered by the call to nfs4_async_handle_error() then we can deadlock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
* | NFSv4.1: We must release the sequence id when we fail to get a session slotTrond Myklebust2012-10-311-13/+23
| | | | | | | | | | | | | | | | If we do not release the sequence id in cases where we fail to get a session slot, then we can deadlock if we hit a recovery scenario. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
* | NFS: Wait for session recovery to finish before returningBryan Schumaker2012-10-311-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we will schedule session recovery and then return to the caller of nfs4_handle_exception. This works for most cases, but causes a hang on the following test case: Client Server ------ ------ Open file over NFS v4.1 Write to file Expire client Try to lock file The server will return NFS4ERR_BADSESSION, prompting the client to schedule recovery. However, the client will continue placing lock attempts and the open recovery never seems to be scheduled. The simplest solution is to wait for session recovery to run before retrying the lock. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
* | Return the right error value when dup[23]() newfd argument is too largeAl Viro2012-10-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jack Lin reports that the error return from dup3() for the RLIMIT_NOFILE case changed incorrectly after 3.6. The culprit is commit f33ff9927f42 ("take rlimit check to callers of expand_files()") which when it moved the "return -EMFILE" out to the caller, didn't notice that the dup3() had special code to turn the EMFILE return into EBADF. The replace_fd() helper that got added later then inherited the bug too. Reported-by: Jack Lin <linliangjie@huawei.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [ Noted more bugs, wrote proper changelog, fixed up typos - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds2012-10-301-10/+9
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bugfix from Ted Ts'o: "This fixes the root cause of the ext4 data corruption bug which raised a ruckus on LWN, Phoronix, and Slashdot. This bug only showed up when non-standard mount options (journal_async_commit and/or journal_checksum) were enabled, and when the file system was not cleanly unmounted, but the root cause was the inode bitmap modifications was not being properly journaled. This could potentially lead to minor file system corruptions (pass 5 complaints with the inode allocation bitmap) after an unclean shutdown under the wrong/unlucky workloads, but it turned into major failure if the journal_checksum and/or jouaral_async_commit was enabled." * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix unjournaled inode bitmap modification
| * | ext4: fix unjournaled inode bitmap modificationEric Sandeen2012-10-281-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 119c0d4460b001e44b41dcf73dc6ee794b98bd31 changed ext4_new_inode() such that the inode bitmap was being modified outside a transaction, which could lead to corruption, and was discovered when journal_checksum found a bad checksum in the journal during log replay. Nix ran into this when using the journal_async_commit mount option, which enables journal checksumming. The ensuing journal replay failures due to the bad checksums led to filesystem corruption reported as the now infamous "Apparent serious progressive ext4 data corruption bug" [ Changed by tytso to only call ext4_journal_get_write_access() only when we're fairly certain that we're going to allocate the inode. ] I've tested this by mounting with journal_checksum and running fsstress then dropping power; I've also tested by hacking DM to create snapshots w/o first quiescing, which allows me to test journal replay repeatedly w/o actually power-cycling the box. Without the patch I hit a journal checksum error every time. With this fix it survives many iterations. Reported-by: Nix <nix@esperi.org.uk> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
* | | Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2012-10-301-2/+4
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block driver update from Jens Axboe: "Distilled down variant, the rest will pass over to 3.8. I pulled it into the for-linus branch I had waiting for a pull request as well, in case you are wondering why there are new entries in here too. This also got rid of two reverts and the ones of the mtip32xx patches that went in later in the 3.6 cycle, so the series looks a bit cleaner." * 'for-linus' of git://git.kernel.dk/linux-block: loop: Make explicit loop device destruction lazy mtip32xx:Added appropriate timeout value for secure erase xen/blkback: Change xen_vbd's flush_support and discard_secure to have type unsigned int, rather than bool cciss: select CONFIG_CHECK_SIGNATURE cciss: remove unneeded memset() xen/blkback: use kmem_cache_zalloc instead of kmem_cache_alloc/memset pktcdvd: update MAINTAINERS floppy: remove dr, reuse drive on do_floppy_init floppy: use common function to check if floppies can be registered floppy: properly handle failure on add_disk loop floppy: do put_disk on current dr if blk_init_queue fails floppy: don't call alloc_ordered_workqueue inside the alloc_disk loop xen/blkback: Fix compile warning block: Add blk_rq_pos(rq) to sort rq when plushing drivers/block: remove CONFIG_EXPERIMENTAL block: remove CONFIG_EXPERIMENTAL vfs: fix: don't increase bio_slab_max if krealloc() fails blkcg: stop iteration early if root_rl is the only request list blkcg: Fix use-after-free of q->root_blkg and q->root_rl.blkg
| * | | vfs: fix: don't increase bio_slab_max if krealloc() failsAnna Leuschner2012-10-221-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Without the patch, bio_slab_max, representing bio_slabs capacity, is increased before krealloc() of bio_slabs. If krealloc() fails, bio_slab_max is too high. Fix that by only updating bio_slab_max if krealloc() is successful. Signed-off-by: Anna Leuschner <anna.m.leuschner@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2012-10-291-0/+2
|\ \ \ \ | |_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes form Sage Weil: "There are two fixes in the messenger code, one that can trigger a NULL dereference, and one that error in refcounting (extra put). There is also a trivial fix that in the fs client code that is triggered by NFS reexport." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: fix dentry reference leak in encode_fh() libceph: avoid NULL kref_put when osd reset races with alloc_msg rbd: reset BACKOFF if unable to re-queue
| * | | ceph: fix dentry reference leak in encode_fh()David Zafman2012-10-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Call to d_find_alias() needs a corresponding dput() This fixes http://tracker.newdream.net/issues/3271 Signed-off-by: David Zafman <david.zafman@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* | | | Lock splice_read and splice_write functionsMikulas Patocka2012-10-281-2/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Functions generic_file_splice_read and generic_file_splice_write access the pagecache directly. For block devices these functions must be locked so that block size is not changed while they are in progress. This patch is an additional fix for commit b87570f5d349 ("Fix a crash when block device is read and block size is changed at the same time") that locked aio_read, aio_write and mmap against block size change. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge tag 'pm+acpi-for-3.7-rc3' of ↵Linus Torvalds2012-10-261-1/+2
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI fixes from Rafael J Wysocki: - Fix for a memory leak in acpi_bind_one() from Jesper Juhl. - Fix for an error code path memory leak in pm_genpd_attach_cpuidle() from Jonghwan Choi. - Fix for smp_processor_id() usage in preemptible code in powernow-k8 from Andreas Herrmann. - Fix for a suspend-related memory leak in cpufreq stats from Xiaobing Tu. - Freezer fix for failure to clear PF_NOFREEZE along with PF_KTHREAD in flush_old_exec() from Oleg Nesterov. - acpi_processor_notify() fix from Alan Cox. * tag 'pm+acpi-for-3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: missing break freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD Fix memory leak in cpufreq stats. cpufreq / powernow-k8: Remove usage of smp_processor_id() in preemptible code PM / Domains: Fix memory leak on error path in pm_genpd_attach_cpuidle ACPI: Fix memory leak in acpi_bind_one()
| * | | | freezer: exec should clear PF_NOFREEZE along with PF_KTHREADOleg Nesterov2012-10-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | flush_old_exec() clears PF_KTHREAD but forgets about PF_NOFREEZE. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | | | Merge tag 'driver-core-3.7-rc3' of ↵Linus Torvalds2012-10-261-8/+8
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg Kroah-Hartman: "Here are a number of firmware core fixes for 3.7, and some other minor fixes. And some documentation updates thrown in for good measure. All have been in the linux-next tree for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" * tag 'driver-core-3.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: Documentation:Chinese translation of Documentation/arm64/memory.txt Documentation:Chinese translation of Documentation/arm64/booting.txt Documentation:Chinese translation of Documentation/IRQ.txt firmware loader: document kernel direct loading sysfs: sysfs_pathname/sysfs_add_one: Use strlcat() instead of strcat() dynamic_debug: Remove unnecessary __used firmware loader: sync firmware cache by async_synchronize_full_domain firmware loader: let direct loading back on 'firmware_buf' firmware loader: fix one reqeust_firmware race firmware loader: cancel uncache work before caching firmware
| * | | | | sysfs: sysfs_pathname/sysfs_add_one: Use strlcat() instead of strcat()Geert Uytterhoeven2012-10-241-8/+8
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The warning check for duplicate sysfs entries can cause a buffer overflow when printing the warning, as strcat() doesn't check buffer sizes. Use strlcat() instead. Since strlcat() doesn't return a pointer to the passed buffer, unlike strcat(), I had to convert the nested concatenation in sysfs_add_one() to an admittedly more obscure comma operator construct, to avoid emitting code for the concatenation if CONFIG_BUG is disabled. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | | | VFS: don't do protected {sym,hard}links by defaultLinus Torvalds2012-10-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 800179c9b8a1 ("This adds symlink and hardlink restrictions to the Linux VFS"), the new link protections were enabled by default, in the hope that no actual application would care, despite it being technically against legacy UNIX (and documented POSIX) behavior. However, it does turn out to break some applications. It's rare, and it's unfortunate, but it's unacceptable to break existing systems, so we'll have to default to legacy behavior. In particular, it has broken the way AFD distributes files, see http://www.dwd.de/AFD/ along with some legacy scripts. Distributions can end up setting this at initrd time or in system scripts: if you have security problems due to link attacks during your early boot sequence, you have bigger problems than some kernel sysctl setting. Do: echo 1 > /proc/sys/fs/protected_symlinks echo 1 > /proc/sys/fs/protected_hardlinks to re-enable the link protections. Alternatively, we may at some point introduce a kernel config option that sets these kinds of "more secure but not traditional" behavioural options automatically. Reported-by: Nick Bowler <nbowler@elliptictech.com> Reported-by: Holger Kiehl <Holger.Kiehl@dwd.de> Cc: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # v3.6 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge branch 'for-linus' of ↵Linus Torvalds2012-10-2611-105/+199
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This has our series of fixes for the next rc. The biggest batch is from Jan Schmidt, fixing up some problems in our subvolume quota code and fixing btrfs send/receive to work with the new extended inode refs." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: do not bug when we fail to commit the transaction Btrfs: fix memory leak when cloning root's node Btrfs: Use btrfs_update_inode_fallback when creating a snapshot Btrfs: Send: preserve ownership (uid and gid) also for symlinks. Btrfs: fix deadlock caused by the nested chunk allocation btrfs: Return EINVAL when length to trim is less than FSB Btrfs: fix memory leak in btrfs_quota_enable() Btrfs: send correct rdev and mode in btrfs-send Btrfs: extended inode refs support for send mechanism Btrfs: Fix wrong error handling code Fix a sign bug causing invalid memory access in the ino_paths ioctl. Btrfs: comment for loop in tree_mod_log_insert_move Btrfs: fix extent buffer reference for tree mod log roots Btrfs: determine level of old roots Btrfs: tree mod log's old roots could still be part of the tree Btrfs: fix a tree mod logging issue for root replacement operations Btrfs: don't put removals from push_node_left into tree mod log twice
| * | | | | Btrfs: do not bug when we fail to commit the transactionJosef Bacik2012-10-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We BUG if we fail to commit the transaction when creating a snapshot, which is just obnoxious. Remove the BUG_ON(). Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | Btrfs: fix memory leak when cloning root's nodeLiu Bo2012-10-251-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After cloning root's node, we forgot to dec the src's ref which can lead to a memory leak. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * | | | | Merge branch 'for-chris-fixed' of git://git.jan-o-sch.net/btrfs-unstableChris Mason2012-10-253-18/+55
| |\ \ \ \ \
| | * | | | | Btrfs: comment for loop in tree_mod_log_insert_moveJan Schmidt2012-10-241-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Emphasis the way tree_mod_log_insert_move avoids adding MOD_LOG_KEY_REMOVE_WHILE_MOVING operations, depending on the direction of the move operation. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| | * | | | | Btrfs: fix extent buffer reference for tree mod log rootsJan Schmidt2012-10-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In get_old_root we grab a lock on the extent buffer before we obtain a reference on that buffer. That order is changed now. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| | * | | | | Btrfs: determine level of old rootsJan Schmidt2012-10-243-3/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In btrfs_find_all_roots' termination condition, we compare the level of the old buffer we got from btrfs_search_old_slot to the level of the current root node. We'd better compare it to the level of the rewinded root node. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| | * | | | | Btrfs: tree mod log's old roots could still be part of the treeJan Schmidt2012-10-241-4/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tree mod log treated old root buffers as always empty buffers when starting the rewind operations. However, the old root may still be part of the current tree at a lower level, with still some valid entries. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| | * | | | | Btrfs: fix a tree mod logging issue for root replacement operationsJan Schmidt2012-10-231-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid the implicit free by tree_mod_log_set_root_pointer, which is wrong in two places. Where needed, we call tree_mod_log_free_eb explicitly now. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| | * | | | | Btrfs: don't put removals from push_node_left into tree mod log twiceJan Schmidt2012-10-231-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Independant of the check (push_items < src_items) tree_mod_log_eb_copy did log the removal of the old data entries from the source buffer. Therefore, we must not call tree_mod_log_eb_move if the check evaluates to true, as that would log the removal twice, finally resulting in (rewinded) buffers with wrong values for header_nritems. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * | | | | | Btrfs: Use btrfs_update_inode_fallback when creating a snapshotJosef Bacik2012-10-253-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a really full file system I was getting ENOSPC back from btrfs_update_inode when trying to update the parent inode when creating a snapshot. Just use the fallback method so we can update the inode and not have to worry about having a delayed ref. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
| * | | | | | Btrfs: Send: preserve ownership (uid and gid) also for symlinks.Alex Lyakas2012-10-251-14/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch also requires a change in the user-space part of "receive". We need to use "lchown" instead of "chown". We will do this in the following patch. Signed-off-by: Alex Lyakas <alex.btrfs@zadarastorage.com> if (S_ISREG(sctx->cur_inode_mode)) {
| * | | | | | Btrfs: fix deadlock caused by the nested chunk allocationMiao Xie2012-10-251-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Steps to reproduce: # mkfs.btrfs -m raid1 <disk1> <disk2> # btrfstune -S 1 <disk1> # mount <disk1> <mnt> # btrfs device add <disk3> <disk4> <mnt> # mount -o remount,rw <mnt> # dd if=/dev/zero of=<mnt>/tmpfile bs=1M count=1 Deadlock happened. It is because of the nested chunk allocation. When we wrote the data into the filesystem, we would allocate the data chunk because there was no data chunk in the filesystem. At the end of the data chunk allocation, we should insert the metadata of the data chunk into the extent tree, but there was no raid1 chunk, so we tried to lock the chunk allocation mutex to allocate the new chunk, but we had held the mutex, the deadlock happened. By rights, we would allocate the raid1 chunk when we added the second device because the profile of the seed filesystem is raid1 and we had two devices. But we didn't do that in fact. It is because the last step of the first device insertion didn't commit the transaction. So when we added the second device, we didn't cow the tree, and just inserted the relative metadata into the leaves which were generated by the first device insertion, and its profile was dup. So, I fix this problem by commiting the transaction at the end of the first device insertion. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
| * | | | | | btrfs: Return EINVAL when length to trim is less than FSBLukas Czerner2012-10-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently if len argument in btrfs_ioctl_fitrim() is smaller than one FSB we will continue and finally return 0 bytes discarded. However if the length to discard is smaller then file system block we should really return EINVAL. Signed-off-by: Lukas Czerner <lczerner@redhat.com>
| * | | | | | Btrfs: fix memory leak in btrfs_quota_enable()Tsutomu Itoh2012-10-251-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should free quota_root before returning from the error handling code. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
| * | | | | | Btrfs: send correct rdev and mode in btrfs-sendArne Jansen2012-10-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When sending a device file, the stream was missing the mode. Also the rdev was encoded wrongly. Signed-off-by: Arne Jansen <sensille@gmx.net>
| * | | | | | Btrfs: extended inode refs support for send mechanismJan Schmidt2012-10-253-58/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for the new extended inode refs to btrfs send. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * | | | | | Btrfs: Fix wrong error handling codeStefan Behrens2012-10-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc says "warning: comparison of unsigned expression >= 0 is always true" because i is an unsigned long. And gcc is right this time. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
| * | | | | | Fix a sign bug causing invalid memory access in the ino_paths ioctl.Gabriel de Perthuis2012-10-251-1/+1
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To see the problem, create many hardlinks to the same file (120 should do it), then look up paths by inode with: ls -i btrfs inspect inode-resolve -v $ino /mnt/btrfs I noticed the memory layout of the fspath->val data had some irregularities (some unnecessary gaps that stop appearing about halfway), so I'm not sure there aren't any bugs left in it.