summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* cifs: always revalidate hardlinked inodes when using noserverinoJeff Layton2010-05-171-0/+5
| | | | | | | | | The old cifs_revalidate logic always revalidated hardlinked inodes. This hack allowed CIFS to pass some connectathon tests when server inode numbers aren't used (basic test7, in particular). Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
* Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6Steve French2010-05-13101-468/+975
|\ | | | | | | | | Conflicts: fs/cifs/inode.c
| * Merge branch 'for-linus' of ↵Linus Torvalds2010-05-132-2/+20
| |\ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: guard against hardlinking directories
| | * cifs: guard against hardlinking directoriesJeff Layton2010-05-112-2/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we made serverino the default, we trusted that the field sent by the server in the "uniqueid" field was actually unique. It turns out that it isn't reliably so. Samba, in particular, will just put the st_ino in the uniqueid field when unix extensions are enabled. When a share spans multiple filesystems, it's quite possible that there will be collisions. This is a server bug, but when the inodes in question are a directory (as is often the case) and there is a collision with the root inode of the mount, the result is a kernel panic on umount. Fix this by checking explicitly for directory inodes with the same uniqueid. If that is the case, then we can assume that using server inode numbers will be a problem and that they should be disabled. Fixes Samba bugzilla 7407 Signed-off-by: Jeff Layton <jlayton@redhat.com> CC: Stable <stable@kernel.org> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | vfs: Fix O_NOFOLLOW behavior for paths with trailing slashesJan Kara2010-05-131-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to specification mkdir d; ln -s d a; open("a/", O_NOFOLLOW | O_RDONLY) should return success but currently it returns ELOOP. This is a regression caused by path lookup cleanup patch series. Fix the code to ignore O_NOFOLLOW in case the provided path has trailing slashes. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de> Acked-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | Merge branch 'for-linus' of ↵Linus Torvalds2010-05-1212-49/+116
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: preserve seq # on requeued messages after transient transport errors ceph: fix cap removal races ceph: zero unused message header, footer fields ceph: fix locking for waking session requests after reconnect ceph: resubmit requests on pg mapping change (not just primary change) ceph: fix open file counting on snapped inodes when mds returns no caps ceph: unregister osd request on failure ceph: don't use writeback_control in writepages completion ceph: unregister bdi before kill_anon_super releases device name
| | * | ceph: preserve seq # on requeued messages after transient transport errorsSage Weil2010-05-112-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the tcp connection drops and we reconnect to reestablish a stateful session (with the mds), we need to resend previously sent (and possibly received) messages with the _same_ seq # so that they can be dropped on the other end if needed. Only assign a new seq once after the message is queued. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | ceph: fix cap removal racesSage Weil2010-05-112-9/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The iterate_session_caps helper traverses the session caps list and tries to grab an inode reference. However, the __ceph_remove_cap was clearing the inode backpointer _before_ removing itself from the session list, causing a null pointer dereference. Clear cap->ci under protection of s_cap_lock to avoid the race, and to tightly couple the list and backpointer state. Use a local flag to indicate whether we are releasing the cap, as cap->session may be modified by a racing thread in iterate_session_caps. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | ceph: zero unused message header, footer fieldsSage Weil2010-05-111-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We shouldn't leak any prior memory contents to other parties. And random data, particularly in the 'version' field, can cause problems down the line. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | ceph: fix locking for waking session requests after reconnectSage Weil2010-05-111-13/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The session->s_waiting list is protected by mdsc->mutex, not s_mutex. This was causing (rare) s_waiting list corruption. Fix errors paths too, while we're here. A more thorough cleanup of this function is coming soon. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | ceph: resubmit requests on pg mapping change (not just primary change)Sage Weil2010-05-115-9/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OSD requests need to be resubmitted on any pg mapping change, not just when the pg primary changes. Resending only when the primary changes results in occasional 'hung' requests during osd cluster recovery or rebalancing. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | ceph: fix open file counting on snapped inodes when mds returns no capsSage Weil2010-05-111-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's possible the MDS will not issue caps on a snapped inode, in which case an open request may not __ceph_get_fmode(), botching the open file counting. (This is actually a server bug, but the client shouldn't BUG out in this case.) Signed-off-by: Sage Weil <sage@newdream.net>
| | * | ceph: unregister osd request on failureSage Weil2010-05-111-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The osd request wasn't being unregistered when the osd returned a failure code, even though the result was returned to the caller. This would cause it to eventually time out, and then crash the kernel when it tried to resend the request using a stale page vector. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | ceph: don't use writeback_control in writepages completionSage Weil2010-05-052-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ->writepages writeback_control is not still valid in the writepages completion. We were touching it solely to adjust pages_skipped when there was a writeback error (EIO, ENOSPC, EPERM due to bad osd credentials), causing an oops in the writeback code shortly thereafter. Updating pages_skipped on error isn't correct anyway, so let's just rip out this (clearly broken) code to pass the wbc to the completion. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | ceph: unregister bdi before kill_anon_super releases device nameSage Weil2010-05-041-7/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unregister and destroy the bdi in put_super, after mount is r/o, but before put_anon_super releases the device name. For symmetry, bdi_destroy in destroy_client (we bdi_init in create_client). Only set s_bdi if bdi_register succeeds, since we use it to decide whether to bdi_unregister. Signed-off-by: Sage Weil <sage@newdream.net>
| * | | CacheFiles: Fix error handling in cachefiles_determine_cache_security()David Howells2010-05-121-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cachefiles_determine_cache_security() is expected to return with a security override in place. However, if set_create_files_as() fails, we fail to do this. In this case, we should just reinstate the security override that was set by the caller. Furthermore, if set_create_files_as() fails, we should dispose of the new credentials we were in the process of creating. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | revert "procfs: provide stack information for threads" and its fixup commitsRobin Holt2010-05-114-25/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Originally, commit d899bf7b ("procfs: provide stack information for threads") attempted to introduce a new feature for showing where the threadstack was located and how many pages are being utilized by the stack. Commit c44972f1 ("procfs: disable per-task stack usage on NOMMU") was applied to fix the NO_MMU case. Commit 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on 64-bit") was applied to fix a bug in ia32 executables being loaded. Commit 9ebd4eba7 ("procfs: fix /proc/<pid>/stat stack pointer for kernel threads") was applied to fix a bug which had kernel threads printing a userland stack address. Commit 1306d603f ('proc: partially revert "procfs: provide stack information for threads"') was then applied to revert the stack pages being used to solve a significant performance regression. This patch nearly undoes the effect of all these patches. The reason for reverting these is it provides an unusable value in field 28. For x86_64, a fork will result in the task->stack_start value being updated to the current user top of stack and not the stack start address. This unpredictability of the stack_start value makes it worthless. That includes the intended use of showing how much stack space a thread has. Other architectures will get different values. As an example, ia64 gets 0. The do_fork() and copy_process() functions appear to treat the stack_start and stack_size parameters as architecture specific. I only partially reverted c44972f1 ("procfs: disable per-task stack usage on NOMMU") . If I had completely reverted it, I would have had to change mm/Makefile only build pagewalk.o when CONFIG_PROC_PAGE_MONITOR is configured. Since I could not test the builds without significant effort, I decided to not change mm/Makefile. I only partially reverted 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on 64-bit") . I left the KSTK_ESP() change in place as that seemed worthwhile. Signed-off-by: Robin Holt <holt@sgi.com> Cc: Stefani Seibold <stefani@seibold.net> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | CacheFiles: Fix occasional EIO on call to vfs_unlink()David Howells2010-05-112-12/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix an occasional EIO returned by a call to vfs_unlink(): [ 4868.465413] CacheFiles: I/O Error: Unlink failed [ 4868.465444] FS-Cache: Cache cachefiles stopped due to I/O error [ 4947.320011] CacheFiles: File cache on md3 unregistering [ 4947.320041] FS-Cache: Withdrawing cache "mycache" [ 5127.348683] FS-Cache: Cache "mycache" added (type cachefiles) [ 5127.348716] CacheFiles: File cache on md3 registered [ 7076.871081] CacheFiles: I/O Error: Unlink failed [ 7076.871130] FS-Cache: Cache cachefiles stopped due to I/O error [ 7116.780891] CacheFiles: File cache on md3 unregistering [ 7116.780937] FS-Cache: Withdrawing cache "mycache" [ 7296.813394] FS-Cache: Cache "mycache" added (type cachefiles) [ 7296.813432] CacheFiles: File cache on md3 registered What happens is this: (1) A cached NFS file is seen to have become out of date, so NFS retires the object and immediately acquires a new object with the same key. (2) Retirement of the old object is done asynchronously - so the lookup/create to generate the new object may be done first. This can be a problem as the old object and the new object must exist at the same point in the backing filesystem (i.e. they must have the same pathname). (3) The lookup for the new object sees that a backing file already exists, checks to see whether it is valid and sees that it isn't. It then deletes that file and creates a new one on disk. (4) The retirement phase for the old file is then performed. It tries to delete the dentry it has, but ext4_unlink() returns -EIO because the inode attached to that dentry no longer matches the inode number associated with the filename in the parent directory. The trace below shows this quite well. [md5sum] ==> __fscache_relinquish_cookie(ffff88002d12fb58{NFS.fh,ffff88002ce62100},1) [md5sum] ==> __fscache_acquire_cookie({NFS.server},{NFS.fh},ffff88002ce62100) NFS has retired the old cookie and asked for a new one. [kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_ACTIVE,24}) [kslowd] <== fscache_object_state_machine() [->OBJECT_DYING] [kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_INIT,0}) [kslowd] <== fscache_object_state_machine() [->OBJECT_LOOKING_UP] [kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_DYING,24}) [kslowd] <== fscache_object_state_machine() [->OBJECT_RECYCLING] The old object (OBJ52) is going through the terminal states to get rid of it, whilst the new object - (OBJ53) - is coming into being. [kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_LOOKING_UP,0}) [kslowd] ==> cachefiles_walk_to_object({ffff88003029d8b8},OBJ53,@68,) [kslowd] lookup '@68' [kslowd] next -> ffff88002ce41bd0 positive [kslowd] advance [kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA' [kslowd] next -> ffff8800369faac8 positive The new object has looked up the subdir in which the file would be in (getting dentry ffff88002ce41bd0) and then looked up the file itself (getting dentry ffff8800369faac8). [kslowd] validate 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA' [kslowd] ==> cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA') [kslowd] remove ffff8800369faac8 from ffff88002ce41bd0 [kslowd] unlink stale object [kslowd] <== cachefiles_bury_object() = 0 It then checks the file's xattrs to see if it's valid. NFS says that the auxiliary data indicate the file is out of date (obvious to us - that's why NFS ditched the old version and got a new one). CacheFiles then deletes the old file (dentry ffff8800369faac8). [kslowd] redo lookup [kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA' [kslowd] next -> ffff88002cd94288 negative [kslowd] create -> ffff88002cd94288{ffff88002cdaf238{ino=148247}} CacheFiles then redoes the lookup and gets a negative result in a new dentry (ffff88002cd94288) which it then creates a file for. [kslowd] ==> cachefiles_mark_object_active(,OBJ53) [kslowd] <== cachefiles_mark_object_active() = 0 [kslowd] === OBTAINED_OBJECT === [kslowd] <== cachefiles_walk_to_object() = 0 [148247] [kslowd] <== fscache_object_state_machine() [->OBJECT_AVAILABLE] The new object is then marked active and the state machine moves to the available state - at which point NFS can start filling the object. [kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_RECYCLING,20}) [kslowd] ==> fscache_release_object() [kslowd] ==> cachefiles_drop_object({OBJ52,2}) [kslowd] ==> cachefiles_delete_object(,OBJ52{ffff8800369faac8}) The old object, meanwhile, goes on with being retired. If allocation occurs first, cachefiles_delete_object() has to wait for dir->d_inode->i_mutex to become available before it can continue. [kslowd] ==> cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA') [kslowd] remove ffff8800369faac8 from ffff88002ce41bd0 [kslowd] unlink stale object EXT4-fs warning (device sda6): ext4_unlink: Inode number mismatch in unlink (148247!=148193) CacheFiles: I/O Error: Unlink failed FS-Cache: Cache cachefiles stopped due to I/O error CacheFiles then tries to delete the file for the old object, but the dentry it has (ffff8800369faac8) no longer points to a valid inode for that directory entry, and so ext4_unlink() returns -EIO when de->inode does not match i_ino. [kslowd] <== cachefiles_bury_object() = -5 [kslowd] <== cachefiles_delete_object() = -5 [kslowd] <== fscache_object_state_machine() [->OBJECT_DEAD] [kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_AVAILABLE,0}) [kslowd] <== fscache_object_state_machine() [->OBJECT_ACTIVE] (Note that the above trace includes extra information beyond that produced by the upstream code). The fix is to note when an object that is being retired has had its object deleted preemptively by a replacement object that is being created, and to skip the second removal attempt in such a case. Reported-by: Greg M <gregm@servu.net.au> Reported-by: Mark Moseley <moseleymark@gmail.com> Reported-by: Romain DEGEZ <romain.degez@smartjog.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | autofs4-2.6.34-rc1 - fix link_count usageIan Kent2010-05-101-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit 1f36f774b2 ("Switch !O_CREAT case to use of do_last()") in 2.6.34-rc1 autofs direct mounts stopped working. This is caused by current->link_count being 0 when ->follow_link() is called from do_filp_open(). I can't work out why this hasn't been seen before Als patch series. This patch removes the autofs dependence on current->link_count. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds2010-05-071-35/+51
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Fix RCU issues in the NFSv4 delegation code NFSv4: Fix the locking in nfs_inode_reclaim_delegation()
| | * | | NFS: Fix RCU issues in the NFSv4 delegation codeDavid Howells2010-05-011-21/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a number of RCU issues in the NFSv4 delegation code. (1) delegation->cred doesn't need to be RCU protected as it's essentially an invariant refcounted structure. By the time we get to nfs_free_delegation(), the delegation is being released, so no one else should be attempting to use the saved credentials, and they can be cleared. However, since the list of delegations could still be under traversal at this point by such as nfs_client_return_marked_delegations(), the cred should be released in nfs_do_free_delegation() rather than in nfs_free_delegation(). Simply using rcu_assign_pointer() to clear it is insufficient as that doesn't stop the cred from being destroyed, and nor does calling put_rpccred() after call_rcu(), given that the latter is asynchronous. (2) nfs_detach_delegation_locked() and nfs_inode_set_delegation() should use rcu_derefence_protected() because they can only be called if nfs_client::cl_lock is held, and that guards against anyone changing nfsi->delegation under it. Furthermore, the barrier imposed by rcu_dereference() is superfluous, given that the spin_lock() is also a barrier. (3) nfs_detach_delegation_locked() is now passed a pointer to the nfs_client struct so that it can issue lockdep advice based on clp->cl_lock for (2). (4) nfs_inode_return_delegation_noreclaim() and nfs_inode_return_delegation() should use rcu_access_pointer() outside the spinlocked region as they merely examine the pointer and don't follow it, thus rendering unnecessary the need to impose a partial ordering over the one item of interest. These result in an RCU warning like the following: [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- fs/nfs/delegation.c:332 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by mount.nfs4/2281: #0: (&type->s_umount_key#34){+.+...}, at: [<ffffffff810b25b4>] deactivate_super+0x60/0x80 #1: (iprune_sem){+.+...}, at: [<ffffffff810c332a>] invalidate_inodes+0x39/0x13a stack backtrace: Pid: 2281, comm: mount.nfs4 Not tainted 2.6.34-rc1-cachefs #110 Call Trace: [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2 [<ffffffffa00b4591>] nfs_inode_return_delegation_noreclaim+0x5b/0xa0 [nfs] [<ffffffffa0095d63>] nfs4_clear_inode+0x11/0x1e [nfs] [<ffffffff810c2d92>] clear_inode+0x9e/0xf8 [<ffffffff810c3028>] dispose_list+0x67/0x10e [<ffffffff810c340d>] invalidate_inodes+0x11c/0x13a [<ffffffff810b1dc1>] generic_shutdown_super+0x42/0xf4 [<ffffffff810b1ebe>] kill_anon_super+0x11/0x4f [<ffffffffa009893c>] nfs4_kill_super+0x3f/0x72 [nfs] [<ffffffff810b25bc>] deactivate_super+0x68/0x80 [<ffffffff810c6744>] mntput_no_expire+0xbb/0xf8 [<ffffffff810c681b>] release_mounts+0x9a/0xb0 [<ffffffff810c689b>] put_mnt_ns+0x6a/0x79 [<ffffffffa00983a1>] nfs_follow_remote_path+0x5a/0x146 [nfs] [<ffffffffa0098334>] ? nfs_do_root_mount+0x82/0x95 [nfs] [<ffffffffa00985a9>] nfs4_try_mount+0x75/0xaf [nfs] [<ffffffffa0098874>] nfs4_get_sb+0x291/0x31a [nfs] [<ffffffff810b2059>] vfs_kern_mount+0xb8/0x177 [<ffffffff810b2176>] do_kern_mount+0x48/0xe8 [<ffffffff810c810b>] do_mount+0x782/0x7f9 [<ffffffff810c8205>] sys_mount+0x83/0xbe [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b Also on: fs/nfs/delegation.c:215 invoked rcu_dereference_check() without protection! [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2 [<ffffffffa00b4223>] nfs_inode_set_delegation+0xfe/0x219 [nfs] [<ffffffffa00a9c6f>] nfs4_opendata_to_nfs4_state+0x2c2/0x30d [nfs] [<ffffffffa00aa15d>] nfs4_do_open+0x2a6/0x3a6 [nfs] ... And: fs/nfs/delegation.c:40 invoked rcu_dereference_check() without protection! [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2 [<ffffffffa00b3bef>] nfs_free_delegation+0x3d/0x6e [nfs] [<ffffffffa00b3e71>] nfs_do_return_delegation+0x26/0x30 [nfs] [<ffffffffa00b406a>] __nfs_inode_return_delegation+0x1ef/0x1fe [nfs] [<ffffffffa00b448a>] nfs_client_return_marked_delegations+0xc9/0x124 [nfs] ... Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * | | NFSv4: Fix the locking in nfs_inode_reclaim_delegation()Trond Myklebust2010-05-011-14/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure that we correctly rcu-dereference the delegation itself, and that we protect against removal while we're changing the contents. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| * | | | Merge branch 'fixes' of ↵Linus Torvalds2010-05-048-74/+110
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Avoid a gcc warning in ocfs2_wipe_inode(). ocfs2: Avoid direct write if we fall back to buffered I/O ocfs2_dlmfs: Fix math error when reading LVB. ocfs2: Update VFS inode's id info after reflink. ocfs2: potential ERR_PTR dereference on error paths ocfs2: Add directory entry later in ocfs2_symlink() and ocfs2_mknod() ocfs2: use OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_mknod error path ocfs2: use OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_symlink error path ocfs2: add OCFS2_INODE_SKIP_ORPHAN_DIR flag and honor it in the inode wipe code ocfs2: Reset status if we want to restart file extension. ocfs2: Compute metaecc for superblocks during online resize. ocfs2: Check the owner of a lockres inside the spinlock ocfs2: one more warning fix in ocfs2_file_aio_write(), v2 ocfs2_dlmfs: User DLM_* when decoding file open flags.
| | * | | | ocfs2: Avoid a gcc warning in ocfs2_wipe_inode().Joel Becker2010-05-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc warns that a variable is uninitialized. It's actually handled, but an early return fools gcc. Let's just initialize the variable to a garbage value that will crash if the usage is ever broken. Signed-off-by: Joel Becker <joel.becker@oracle.com>
| | * | | | ocfs2: Avoid direct write if we fall back to buffered I/OLi Dongyang2010-04-301-11/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when we fall back to buffered write from direct write, we call __generic_file_aio_write() but that will end up doing direct write even we are only prepared to do buffered write because the file has the O_DIRECT flag set. This is a fix for https://bugzilla.novell.com/show_bug.cgi?id=591039 revised with Joel's comments. Signed-off-by: Li Dongyang <lidongyang@novell.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| | * | | | Merge branch 'skip_delete_inode' of ↵Joel Becker2010-04-30387-905/+1370
| | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2-mark into ocfs2-fixes
| | | * | | | ocfs2: Add directory entry later in ocfs2_symlink() and ocfs2_mknod()Mark Fasheh2010-04-231-15/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we get a failure during creation of an inode we'll allow the orphan code to remove the inode, which is correct. However, we need to ensure that we don't get any errors after the call to ocfs2_add_entry(), otherwise we could leave a dangling directory reference. The solution is simple - in both cases, all I had to do was move ocfs2_dentry_attach_lock() above the ocfs2_add_entry() call. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
| | | * | | | ocfs2: use OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_mknod error pathLi Dongyang2010-04-231-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mark the inode with flag OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_mknod, so we can kill the inode in case of error. [ Fixed up comment style -Mark ] Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
| | | * | | | ocfs2: use OCFS2_INODE_SKIP_ORPHAN_DIR in ocfs2_symlink error pathLi Dongyang2010-04-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mark the inode with flag OCFS2_INODE_SKIP_ORPHAN_DIR when we get an error after allocating one, so that we can kill the inode. Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
| | | * | | | ocfs2: add OCFS2_INODE_SKIP_ORPHAN_DIR flag and honor it in the inode wipe codeLi Dongyang2010-04-233-29/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently in the error path of ocfs2_symlink and ocfs2_mknod, we just call iput with the inode we failed with, but the inode wipe code will complain because we don't add the inode to orphan dir. One solution would be to lock the orphan dir during the entire transaction, but that's too heavy for a rare error path. Instead, we add a flag, OCFS2_INODE_SKIP_ORPHAN_DIR which tells the inode wipe code that it won't find this inode in the orphan dir. [ Merge fixes and comment style cleanups -Mark ] Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
| | * | | | | ocfs2_dlmfs: Fix math error when reading LVB.Joel Becker2010-04-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When asked for a partial read of the LVB in a dlmfs file, we can accidentally calculate a negative count. Reported-by: Dan Carpenter <error27@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| | * | | | | ocfs2: Update VFS inode's id info after reflink.Tao Ma2010-04-231-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In reflink we update the id info on the disk but forgot to update the corresponding information in the VFS inode. Update them accordingly when we want to preserve the attributes. Reported-by: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Cc: <stable@kernel.org> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| | * | | | | ocfs2: potential ERR_PTR dereference on error pathsDan Carpenter2010-04-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If "handle" is non null at the end of the function then we assume it's a valid pointer and pass it to ocfs2_commit_trans(); Signed-off-by: Dan Carpenter <error27@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| | * | | | | ocfs2: Reset status if we want to restart file extension.Tao Ma2010-04-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In __ocfs2_extend_allocation, we will restart our file extension if ((!status) && restart_func). But there is a bug that the status is still left as -EGAIN. This is really an old bug, but it is masked by the return value of ocfs2_journal_dirty. So it show up when we make ocfs2_journal_dirty void. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| | * | | | | ocfs2: Compute metaecc for superblocks during online resize.Joel Becker2010-03-311-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Online resize writes out the new superblock and its backups directly. The metaecc data wasn't being recomputed. Let's do that directly. Signed-off-by: Joel Becker <joel.becker@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com>[ Cc: stable@kernel.org
| | * | | | | ocfs2: Check the owner of a lockres inside the spinlockWengang Wang2010-03-301-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The checking of lockres owner in dlm_update_lvb() is not inside spinlock protection. I don't see problem in current call path of dlm_update_lvb(). But just for code robustness. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| | * | | | | ocfs2: one more warning fix in ocfs2_file_aio_write(), v2Coly Li2010-03-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes another compiling warning in ocfs2_file_aio_write() like this, fs/ocfs2/file.c: In function ‘ocfs2_file_aio_write’: fs/ocfs2/file.c:2026: warning: suggest parentheses around ‘&&’ within ‘||’ As Joel suggested, '!ret' is unary, this version removes the wrap from '!ret'. Signed-off-by: Coly Li <coly.li@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| | * | | | | ocfs2_dlmfs: User DLM_* when decoding file open flags.Tao Ma2010-03-301-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 0016eedc4185a3cd7e578b027a6e69001b85d6c4, we have changed dlmfs to use stackglue. So when use DLM* when we decode dlm flags from open level. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2010-05-0312-38/+71
| |\ \ \ \ \ \ | | | |_|_|/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: remove bad auth_x kmem_cache ceph: fix lockless caps check ceph: clear dir complete, invalidate dentry on replayed rename ceph: fix direct io truncate offset ceph: discard incoming messages with bad seq # ceph: fix seq counting for skipped messages ceph: add missing #includes ceph: fix leaked spinlock during mds reconnect ceph: print more useful version info on module load ceph: fix snap realm splits ceph: clear dir complete on d_move
| | * | | | | ceph: remove bad auth_x kmem_cacheSage Weil2010-05-031-22/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's useless, since our allocations are already a power of 2. And it was allocated per-instance (not globally), which caused a name collision when we tried to mount a second file system with auth_x enabled. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | ceph: fix lockless caps checkSage Weil2010-05-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The __ variant requires caller to hold i_lock. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | ceph: clear dir complete, invalidate dentry on replayed renameSage Weil2010-05-031-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a rename operation is resent to the MDS following an MDS restart, the client does not get a full reply (containing the resulting metadata) back. In that case, a ceph_rename() needs to compensate by doing anything useful that fill_inode() would have, like d_move(). It also needs to invalidate the dentry (to workaround the vfs_rename_dir() bug) and clear the dir complete flag, just like fill_trace(). Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | ceph: fix direct io truncate offsetSage Weil2010-05-031-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | truncate_inode_pages_range wants the end offset to align with the last byte in a page. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | ceph: discard incoming messages with bad seq #Sage Weil2010-05-031-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can get old message seq #'s after a tcp reconnect for stateful sessions (i.e., the MDS). If we get a higher seq #, that is an error, and we shouldn't see any bad seq #'s for stateless (mon, osd) connections. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | ceph: fix seq counting for skipped messagesSage Weil2010-05-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Increment in_seq even when the message is skipped for some reason. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | ceph: add missing #includesSage Weil2010-05-033-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | ceph: fix leaked spinlock during mds reconnectSage Weil2010-05-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | ceph: print more useful version info on module loadSage Weil2010-05-031-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Decouple the client version from the server side. Print relevant protocol and map version info instead. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | ceph: fix snap realm splitsSage Weil2010-05-031-10/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The snap realm split was checking i_snap_realm, not the list_head, to determine if an inode belonged in the new realm. The check always failed, which meant we always moved the inode, corrupting the old realm's list and causing various crashes. Also wait to release old realm reference to avoid possibility of use after free. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | ceph: clear dir complete on d_moveSage Weil2010-05-031-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | d_move() reorders the d_subdirs list, breaking the readdir result caching. Unless/until d_move preserves that ordering, clear CEPH_I_COMPLETE on rename. Signed-off-by: Sage Weil <sage@newdream.net>