summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'driver-core-next' into cgroup/for-3.15Tejun Heo2014-02-0812-364/+675
|\ | | | | | | | | | | | | Pending kernfs conversion depends on kernfs improvements in driver-core-next. Pull it into for-3.15. Signed-off-by: Tejun Heo <tj@kernel.org>
| * kernfs: add CONFIG_KERNFSTejun Heo2014-02-074-1/+11
| | | | | | | | | | | | | | | | | | | | | | As sysfs was kernfs's only user, kernfs has been piggybacking on CONFIG_SYSFS; however, kernfs is scheduled to grow a new user very soon. Introduce a separate config option CONFIG_KERNFS which is to be selected by kernfs users. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernfs: implement kernfs_get_parent(), kernfs_name/path() and friendsTejun Heo2014-02-072-41/+178
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernfs_node->parent and ->name are currently marked as "published" indicating that kernfs users may access them directly; however, those fields may get updated by kernfs_rename[_ns]() and unrestricted access may lead to erroneous values or oops. Protect ->parent and ->name updates with a irq-safe spinlock kernfs_rename_lock and implement the following accessors for these fields. * kernfs_name() - format the node's name into the specified buffer * kernfs_path() - format the node's path into the specified buffer * pr_cont_kernfs_name() - pr_cont a node's name (doesn't need buffer) * pr_cont_kernfs_path() - pr_cont a node's path (doesn't need buffer) * kernfs_get_parent() - pin and return a node's parent All can be called under any context. The recursive sysfs_pathname() in fs/sysfs/dir.c is replaced with kernfs_path() and sysfs_rename_dir_ns() is updated to use kernfs_get_parent() instead of dereferencing parent directly. v2: Dummy definition of kernfs_path() for !CONFIG_KERNFS was missing static inline making it cause a lot of build warnings. Add it. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernfs: implement kernfs_node_from_dentry(), kernfs_root_from_sb() and ↵Tejun Heo2014-02-072-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernfs_rename() Implement helpers to determine node from dentry and root from super_block. Also add a kernfs_rename_ns() wrapper which assumes NULL namespace. These generally make sense and will be used by cgroup. v2: Some dummy implementations for !CONFIG_SYSFS was missing. Fixed. Reported by kbuild test robot. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernfs: implement kernfs_ops->atomic_write_lenTejun Heo2014-02-071-18/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A write to a kernfs_node is buffered through a kernel buffer. Writes <= PAGE_SIZE are performed atomically, while larger ones are executed in PAGE_SIZE chunks. While this is enough for sysfs, cgroup which is scheduled to be converted to use kernfs needs a bit more control over it. This patch adds kernfs_ops->atomic_write_len. If not set (zero), the behavior stays the same. If set, writes upto the size are executed atomically and larger writes are rejected with -E2BIG. A different implementation strategy would be allowing configuring chunking size while making the original write size available to the write method; however, such strategy, while being more complicated, doesn't really buy anything. If the write implementation has to handle chunking, the specific chunk size shouldn't matter all that much. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernfs: allow nodes to be created in the deactivated stateTejun Heo2014-02-072-8/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, kernfs_nodes are made visible to userland on creation, which makes it difficult for kernfs users to atomically succeed or fail creation of multiple nodes. In addition, if something fails after creating some nodes, the created nodes might already be in use and their active refs need to be drained for removal, which has the potential to introduce tricky reverse locking dependency on active_ref depending on how the error path is synchronized. This patch introduces per-root flag KERNFS_ROOT_CREATE_DEACTIVATED. If set, all nodes under the root are created in the deactivated state and stay invisible to userland until explicitly enabled by the new kernfs_activate() API. Also, nodes which have never been activated are guaranteed to bypass draining on removal thus allowing error paths to not worry about lockding dependency on active_ref draining. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernfs: add missing kernfs_active() checks in directory operationsTejun Heo2014-02-071-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernfs_iop_lookup(), kernfs_dir_pos() and kernfs_dir_next_pos() were missing kernfs_active() tests before using the found kernfs_node. As deactivated state is currently visible only while a node is being removed, this doesn't pose an actual problem. e.g. lookup succeeding on a deactivated node doesn't harm anything as the eventual file operations are gonna fail and those failures are indistinguishible from the cases in which the lookups had happened before the node was deactivated. However, we're gonna allow new nodes to be created deactivated and then activated explicitly by the kernfs user when it sees fit. This is to support atomically making multiple nodes visible to userland and thus those nodes must not be visible to userland before activated. Let's plug the lookup and readdir holes so that deactivated nodes are invisible to userland. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernfs: implement kernfs_syscall_ops->remount_fs() and ->show_options()Tejun Heo2014-02-071-0/+23
| | | | | | | | | | | | | | | | | | Add two super_block related syscall callbacks ->remount_fs() and ->show_options() to kernfs_syscall_ops. These simply forward the matching super_operations. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernfs: rename kernfs_dir_ops to kernfs_syscall_opsTejun Heo2014-02-071-12/+13
| | | | | | | | | | | | | | | | | | | | We're gonna need non-dir syscall callbacks, which will make dir_ops a misnomer. Let's rename kernfs_dir_ops to kernfs_syscall_ops. This is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernfs: invoke dir_ops while holding active ref of the target nodeTejun Heo2014-02-071-3/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernfs_dir_ops are currently being invoked without any active reference, which makes it tricky for the invoked operations to determine whether the objects associated those nodes are safe to access and will remain that way for the duration of such operations. kernfs already has active_ref mechanism to deal with this which makes the removal of a given node the synchronization point for gating the file operations. There's no reason for dir_ops to be any different. Update the dir_ops handling so that active_ref is held while the dir_ops are executing. This guarantees that while a dir_ops is executing the target nodes stay alive. As kernfs_dir_ops doesn't have any in-kernel user at this point, this doesn't affect anybody. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()Tejun Heo2014-02-071-92/+0
| | | | | | | | | | | | | | | | | | All device_schedule_callback_owner() users are converted to use device_remove_file_self(). Remove now unused {sysfs|device}_schedule_callback_owner(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappersTejun Heo2014-02-072-1/+160
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sometimes it's necessary to implement a node which wants to delete nodes including itself. This isn't straightforward because of kernfs active reference. While a file operation is in progress, an active reference is held and kernfs_remove() waits for all such references to drain before completing. For a self-deleting node, this is a deadlock as kernfs_remove() ends up waiting for an active reference that itself is sitting on top of. This currently is worked around in the sysfs layer using sysfs_schedule_callback() which makes such removals asynchronous. While it works, it's rather cumbersome and inherently breaks synchronicity of the operation - the file operation which triggered the operation may complete before the removal is finished (or even started) and the removal may fail asynchronously. If a removal operation is immmediately followed by another operation which expects the specific name to be available (e.g. removal followed by rename onto the same name), there's no way to make the latter operation reliable. The thing is there's no inherent reason for this to be asynchrnous. All that's necessary to do this synchronous is a dedicated operation which drops its own active ref and deactivates self. This patch implements kernfs_remove_self() and its wrappers in sysfs and driver core. kernfs_remove_self() is to be called from one of the file operations, drops the active ref the task is holding, removes the self node, and restores active ref to the dead node so that the ref is balanced afterwards. __kernfs_remove() is updated so that it takes an early exit if the target node is already fully removed so that the active ref restored by kernfs_remove_self() after removal doesn't confuse the deactivation path. This makes implementing self-deleting nodes very easy. The normal removal path doesn't even need to be changed to use kernfs_remove_self() for the self-deleting node. The method can invoke kernfs_remove_self() on itself before proceeding the normal removal path. kernfs_remove() invoked on the node by the normal deletion path will simply be ignored. This will replace sysfs_schedule_callback(). A subtle feature of sysfs_schedule_callback() is that it collapses multiple invocations - even if multiple removals are triggered, the removal callback is run only once. An equivalent effect can be achieved by testing the return value of kernfs_remove_self() - only the one which gets %true return value should proceed with actual deletion. All other instances of kernfs_remove_self() will wait till the enclosing kernfs operation which invoked the winning instance of kernfs_remove_self() finishes and then return %false. This trivially makes all users of kernfs_remove_self() automatically show correct synchronous behavior even when there are multiple concurrent operations - all "echo 1 > delete" instances will finish only after the whole operation is completed by one of the instances. Note that manipulation of active ref is implemented in separate public functions - kernfs_[un]break_active_protection(). kernfs_remove_self() is the only user at the moment but this will be used to cater to more complex cases. v2: For !CONFIG_SYSFS, dummy version kernfs_remove_self() was missing and sysfs_remove_file_self() had incorrect return type. Fix it. Reported by kbuild test bot. v3: kernfs_[un]break_active_protection() separated out from kernfs_remove_self() and exposed as public API. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernfs: remove KERNFS_REMOVEDTejun Heo2014-02-072-32/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KERNFS_REMOVED is used to mark half-initialized and dying nodes so that they don't show up in lookups and deny adding new nodes under or renaming it; however, its role overlaps that of deactivation. It's necessary to deny addition of new children while removal is in progress; however, this role considerably intersects with deactivation - KERNFS_REMOVED prevents new children while deactivation prevents new file operations. There's no reason to have them separate making things more complex than necessary. This patch removes KERNFS_REMOVED. * Instead of KERNFS_REMOVED, each node now starts its life deactivated. This means that we now use both atomic_add() and atomic_sub() on KN_DEACTIVATED_BIAS, which is INT_MIN. The compiler generates an overflow warnings when negating INT_MIN as the negation can't be represented as a positive number. Nothing is actually broken but let's bump BIAS by one to avoid the warnings for archs which negates the subtrahend.. * A new helper kernfs_active() which tests whether kn->active >= 0 is added for convenience and lockdep annotation. All KERNFS_REMOVED tests are replaced with negated kernfs_active() tests. * __kernfs_remove() is updated to deactivate, but not drain, all nodes in the subtree instead of setting KERNFS_REMOVED. This removes deactivation from kernfs_deactivate(), which is now renamed to kernfs_drain(). * Sanity check on KERNFS_REMOVED in kernfs_put() is replaced with checks on the active ref. * Some comment style updates in the affected area. v2: Reordered before removal path restructuring. kernfs_active() dropped and kernfs_get/put_active() used instead. RB_EMPTY_NODE() used in the lookup paths. v3: Reverted most of v2 except for creating a new node with KN_DEACTIVATED_BIAS. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep()Tejun Heo2014-02-071-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There currently are two mechanisms gating active ref lockdep annotations - KERNFS_LOCKDEP flag and KERNFS_ACTIVE_REF type mask. The former disables lockdep annotations in kernfs_get/put_active() while the latter disables all of kernfs_deactivate(). While KERNFS_ACTIVE_REF also behaves as an optimization to skip the deactivation step for non-file nodes, the benefit is marginal and it needlessly diverges code paths. Let's drop KERNFS_ACTIVE_REF. While at it, add a test helper kernfs_lockdep() to test KERNFS_LOCKDEP flag so that it's more convenient and the related code can be compiled out when not enabled. v2: Refreshed on top of ("kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag"). As the earlier patch already added KERNFS_LOCKDEP tests to kernfs_deactivate(), those additions are dropped from this patch and the existing ones are simply converted to kernfs_lockdep(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernfs: remove kernfs_addrm_cxtTejun Heo2014-02-074-104/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernfs_addrm_cxt and the accompanying kernfs_addrm_start/finish() were added because there were operations which should be performed outside kernfs_mutex after adding and removing kernfs_nodes. The necessary operations were recorded in kernfs_addrm_cxt and performed by kernfs_addrm_finish(); however, after the recent changes which relocated deactivation and unmapping so that they're performed directly during removal, the only operation kernfs_addrm_finish() performs is kernfs_put(), which can be moved inside the removal path too. This patch moves the kernfs_put() of the base ref to __kernfs_remove() and remove kernfs_addrm_cxt and kernfs_addrm_start/finish(). * kernfs_add_one() is updated to grab and release kernfs_mutex itself. sysfs_addrm_start/finish() invocations around it are removed from all users. * __kernfs_remove() puts an unlinked node directly instead of chaining it to kernfs_addrm_cxt. Its callers are updated to grab and release kernfs_mutex instead of calling kernfs_addrm_start/finish() around it. v2: Rebased on top of "kernfs: associate a new kernfs_node with its parent on creation" which dropped @parent from kernfs_add_one(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernfs: invoke kernfs_unmap_bin_file() directly from kernfs_deactivate()Tejun Heo2014-02-071-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernfs_unmap_bin_file() is supposed to unmap all memory mappings of the target file before kernfs_remove() finishes; however, it currently is being called from kernfs_addrm_finish() and has the same race problem as the original implementation of deactivation when there are multiple removers - only the remover which snatches the node to its addrm_cxt->removed list is guaranteed to wait for its completion before returning. It can be easily fixed by moving kernfs_unmap_bin_file() invocation from kernfs_addrm_finish() to kernfs_deactivated(). The function may be called multiple times but that shouldn't do any harm. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernfs: restructure removal path to fix possible premature returnTejun Heo2014-02-071-61/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recursive nature of kernfs_remove() means that, even if kernfs_remove() is not allowed to be called multiple times on the same node, there may be race conditions between removal of parent and its descendants. While we can claim that kernfs_remove() shouldn't be called on one of the descendants while the removal of an ancestor is in progress, such rule is unnecessarily restrictive and very difficult to enforce. It's better to simply allow invoking kernfs_remove() as the caller sees fit as long as the caller ensures that the node is accessible. The current behavior in such situations is broken. Whoever enters removal path first takes the node off the hierarchy and then deactivates. Following removers either return as soon as it notices that it's not the first one or can't even find the target node as it has already been removed from the hierarchy. In both cases, the following removers may finish prematurely while the nodes which should be removed and drained are still being processed by the first one. This patch restructures so that multiple removers, whether through recursion or direction invocation, always follow the following rules. * When there are multiple concurrent removers, only one puts the base ref. * Regardless of which one puts the base ref, all removers are blocked until the target node is fully deactivated and removed. To achieve the above, removal path now first marks all descendants including self REMOVED and then deactivates and unlinks leftmost descendant one-by-one. kernfs_deactivate() is called directly from __kernfs_removal() and drops and regrabs kernfs_mutex for each descendant to drain active refs. As this means that multiple removers can enter kernfs_deactivate() for the same node, the function is updated so that it can handle multiple deactivators of the same node - only one actually deactivates but all wait till drain completion. The restructured removal path guarantees that a removed node gets unlinked only after the node is deactivated and drained. Combined with proper multiple deactivator handling, this guarantees that any invocation of kernfs_remove() returns only after the node itself and all its descendants are deactivated, drained and removed. v2: Draining separated into a separate loop (used to be in the same loop as unlink) and done from __kernfs_deactivate(). This is to allow exposing deactivation as a separate interface later. Root node removal was broken in v1 patch. Fixed. v3: Revert most of v2 except for root node removal fix and simplification of KERNFS_REMOVED setting loop. v4: Refreshed on top of ("kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag"). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernfs: replace kernfs_node->u.completion with kernfs_root->deactivate_waitqTejun Heo2014-02-071-18/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernfs_node->u.completion is used to notify deactivation completion from kernfs_put_active() to kernfs_deactivate(). We now allow multiple racing removals of the same node and the current removal scheme is no longer correct - kernfs_remove() invocation may return before the node is properly deactivated if it races against another removal. The removal path will be restructured to address the issue. To help such restructure which requires supporting multiple waiters, this patch replaces kernfs_node->u.completion with kernfs_root->deactivate_waitq. This makes deactivation event notifications share a per-root waitqueue_head; however, the wait path is quite cold and this will also allow shaving one pointer off kernfs_node. v2: Refreshed on top of ("kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag"). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flagTejun Heo2014-02-071-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernfs_deactivate() forgot to check whether KERNFS_LOCKDEP is set before performing lockdep annotations and ends up feeding uninitialized lockdep_map to lockdep triggering warning like the following on USB stick hotunplug. usb 1-2: USB disconnect, device number 2 INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 1 PID: 62 Comm: khubd Not tainted 3.13.0-work+ #82 Hardware name: empty empty/S3992, BIOS 080011 10/26/2007 ffff880065ca7f60 ffff88013a4ffa08 ffffffff81cfb6bd 0000000000000002 ffff88013a4ffac8 ffffffff810f8530 ffff88013a4fc710 0000000000000002 ffff880100000000 ffffffff82a3db50 0000000000000001 ffff88013a4fc710 Call Trace: [<ffffffff81cfb6bd>] dump_stack+0x4e/0x7a [<ffffffff810f8530>] __lock_acquire+0x1910/0x1e70 [<ffffffff810f931a>] lock_acquire+0x9a/0x1d0 [<ffffffff8127c75e>] kernfs_deactivate+0xee/0x130 [<ffffffff8127d4c8>] kernfs_addrm_finish+0x38/0x60 [<ffffffff8127d701>] kernfs_remove_by_name_ns+0x51/0xa0 [<ffffffff8127b4f1>] remove_files.isra.1+0x41/0x80 [<ffffffff8127b7e7>] sysfs_remove_group+0x47/0xa0 [<ffffffff8127b873>] sysfs_remove_groups+0x33/0x50 [<ffffffff8177d66d>] device_remove_attrs+0x4d/0x80 [<ffffffff8177e25e>] device_del+0x12e/0x1d0 [<ffffffff819722c2>] usb_disconnect+0x122/0x1a0 [<ffffffff819749b5>] hub_thread+0x3c5/0x1290 [<ffffffff810c6a6d>] kthread+0xed/0x110 [<ffffffff81d0a56c>] ret_from_fork+0x7c/0xb0 Fix it by making kernfs_deactivate() perform lockdep annotations only if KERNFS_LOCKDEP is set. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Fabio Estevam <festevam@gmail.com> Reported-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge branch 'for-3.14-fixes' into for-3.15Tejun Heo2014-02-081-0/+1
|\ \ | | | | | | | | | | | | | | | | | | Pending kernfs conversion depends on fixes in for-3.14-fixes. Pull it into for-3.15. Signed-off-by: Tejun Heo <tj@kernel.org>
| * | nfs: include xattr.h from fs/nfs/nfs3proc.cTejun Heo2014-02-031-0/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | fs/nfs/nfs3proc.c is making use of xattr but was getting linux/xattr.h indirectly through linux/cgroup.h, which will soon drop the inclusion of xattr.h. Explicitly include linux/xattr.h from nfs3proc.c so that compilation doesn't fail when linux/cgroup.h drops linux/xattr.h. As the following cgroup changes will depend on these changes, it probably would be easier to route this through cgroup branch. Would that be okay? Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: linux-nfs@vger.kernel.org
* / cgroup: clean up cgroup_subsys names and initializationTejun Heo2014-02-081-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cgroup_subsys is a bit messier than it needs to be. * The name of a subsys can be different from its internal identifier defined in cgroup_subsys.h. Most subsystems use the matching name but three - cpu, memory and perf_event - use different ones. * cgroup_subsys_id enums are postfixed with _subsys_id and each cgroup_subsys is postfixed with _subsys. cgroup.h is widely included throughout various subsystems, it doesn't and shouldn't have claim on such generic names which don't have any qualifier indicating that they belong to cgroup. * cgroup_subsys->subsys_id should always equal the matching cgroup_subsys_id enum; however, we require each controller to initialize it and then BUG if they don't match, which is a bit silly. This patch cleans up cgroup_subsys names and initialization by doing the followings. * cgroup_subsys_id enums are now postfixed with _cgrp_id, and each cgroup_subsys with _cgrp_subsys. * With the above, renaming subsys identifiers to match the userland visible names doesn't cause any naming conflicts. All non-matching identifiers are renamed to match the official names. cpu_cgroup -> cpu mem_cgroup -> memory perf -> perf_event * controllers no longer need to initialize ->subsys_id and ->name. They're generated in cgroup core and set automatically during boot. * Redundant cgroup_subsys declarations removed. * While updating BUG_ON()s in cgroup_init_early(), convert them to WARN()s. BUGging that early during boot is stupid - the kernel can't print anything, even through serial console and the trap handler doesn't even link stack frame properly for back-tracing. This patch doesn't introduce any behavior changes. v2: Rebased on top of fe1217c4f3f7 ("net: net_cls: move cgroupfs classid handling into core"). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: "David S. Miller" <davem@davemloft.net> Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Ingo Molnar <mingo@redhat.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Thomas Graf <tgraf@suug.ch>
* hpfs: optimize quad buffer loadingMikulas Patocka2014-02-021-46/+50
| | | | | | | | | | | | | | | | | | | | | | HPFS needs to load 4 consecutive 512-byte sectors when accessing the directory nodes or bitmaps. We can't switch to 2048-byte block size because files are allocated in the units of 512-byte sectors. Previously, the driver would allocate a 2048-byte area using kmalloc, copy the data from four buffers to this area and eventually copy them back if they were modified. In the current implementation of the buffer cache, buffers are allocated in the pagecache. That means that 4 consecutive 512-byte buffers are stored in consecutive areas in the kernel address space. So, we don't need to allocate extra memory and copy the content of the buffers there. This patch optimizes the code to avoid copying the buffers. It checks if the four buffers are stored in contiguous memory - if they are not, it falls back to allocating a 2048-byte area and copying data there. Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* hpfs: remember free spaceMikulas Patocka2014-02-023-10/+87
| | | | | | | | | | | | | | | | | | Previously, hpfs scanned all bitmaps each time the user asked for free space using statfs. This patch changes it so that hpfs scans the bitmaps only once, remembes the free space and on next invocation of statfs it returns the value instantly. New versions of wine are hammering on the statfs syscall very heavily, making some games unplayable when they're stored on hpfs, with load times in minutes. This should be backported to the stable kernels because it fixes user-visible problem (excessive level load times in wine). Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* afs: proc cells and rootcell are writeablePali Rohár2014-02-011-2/+2
| | | | | | | | | | | | | Both proc files are writeable and used for configuring cells. But there is missing correct mode flag for writeable files. Without this patch both proc files are read only. [ It turns out they aren't really read-only, since root can write to them even if the write bit isn't set due to CAP_DAC_OVERRIDE ] Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2014-02-0111-420/+553
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull cifs fixes from Steve French: "A set of cifs fixes (mostly for symlinks, and SMB2 xattrs) and cleanups" * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix check for regular file in couldbe_mf_symlink() [CIFS] Fix SMB2 mounts so they don't try to set or get xattrs via cifs CIFS: Cleanup cifs open codepath CIFS: Remove extra indentation in cifs_sfu_type CIFS: Cleanup cifs_mknod CIFS: Cleanup CIFSSMBOpen cifs: Add support for follow_link on dfs shares under posix extensions cifs: move unix extension call to cifs_query_symlink() cifs: Re-order M-F Symlink code cifs: Add create MFSymlinks to protocol ops struct cifs: use protocol specific call for query_mf_symlink() cifs: Rename MF symlink function names cifs: Rename and cleanup open_query_close_cifs_symlink() cifs: Fix memory leak in cifs_hardlink()
| * cifs: Fix check for regular file in couldbe_mf_symlink()Sachin Prabhu2014-01-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MF Symlinks are regular files containing content in a specified format. The function couldbe_mf_symlink() checks the mode for a set S_IFREG bit as a test to confirm that it is a regular file. This bit is also set for other filetypes and simply checking for this bit being set may return false positives. We ensure that we are actually checking for a regular file by using the S_ISREG macro to test instead. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reported-by: Neil Brown <neilb@suse.de> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <smfrench@gmail.com>
| * [CIFS] Fix SMB2 mounts so they don't try to set or get xattrs via cifsSteve French2014-01-262-19/+36
| | | | | | | | | | | | | | | | | | | | | | When mounting with smb2 (or smb2.1 or smb3) we need to check to make sure that attempts to query or set extended attributes do not attempt to send the request with the older cifs protocol instead (eventually we also need to add the support in SMB2 to query/set extended attributes but this patch prevents us from using the wrong protocol for extended attribute operations). Signed-off-by: Steve French <smfrench@gmail.com>
| * CIFS: Cleanup cifs open codepathPavel Shilovsky2014-01-208-100/+174
| | | | | | | | | | | | | | | | Rename CIFSSMBOpen to CIFS_open and make it take cifs_open_parms structure as a parm. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>
| * CIFS: Remove extra indentation in cifs_sfu_typePavel Shilovsky2014-01-201-47/+50
| | | | | | | | | | Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>
| * CIFS: Cleanup cifs_mknodPavel Shilovsky2014-01-201-26/+22
| | | | | | | | | | | | | | Rename camel case variable and fix comment style. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>
| * CIFS: Cleanup CIFSSMBOpenPavel Shilovsky2014-01-202-72/+86
| | | | | | | | | | | | | | | | | | Remove indentation, fix comment style, rename camel case variables in preparation to make it work with cifs_open_parms structure as a parm. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>
| * cifs: Add support for follow_link on dfs shares under posix extensionsSachin Prabhu2014-01-201-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using posix extensions, dfs shares in the dfs root show up as symlinks resulting in userland tools such as 'ls' calling readlink() on these shares. Since these are dfs shares, we end up returning -EREMOTE. $ ls -l /mnt ls: cannot read symbolic link /mnt/test: Object is remote total 0 lrwxrwxrwx. 1 root root 19 Nov 6 09:47 test With added follow_link() support for dfs shares, when using unix extensions, we call GET_DFS_REFERRAL to obtain the DFS referral and return the first node returned. The dfs share in the dfs root is now displayed in the following manner. $ ls -l /mnt total 0 lrwxrwxrwx. 1 root root 19 Nov 6 09:47 test -> \vm140-31\test Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
| * cifs: move unix extension call to cifs_query_symlink()Sachin Prabhu2014-01-202-10/+15
| | | | | | | | | | | | | | | | | | | | Unix extensions rigth now are only applicable to smb1 operations. Move the check and subsequent unix extension call to the smb1 specific call to query_symlink() ie. cifs_query_symlink(). Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
| * cifs: Re-order M-F Symlink codeSachin Prabhu2014-01-201-56/+68
| | | | | | | | | | | | | | | | | | This patch makes cosmetic changes. We group similar functions together and separate out the protocol specific functions. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
| * cifs: Add create MFSymlinks to protocol ops structSachin Prabhu2014-01-204-42/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | Add a new protocol ops function create_mf_symlink and have create_mf_symlink() use it. This patchset moves the MFSymlink operations completely to the ops structure so that we only use the right protocol versions when querying or creating MFSymlinks. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
| * cifs: use protocol specific call for query_mf_symlink()Sachin Prabhu2014-01-201-41/+20
| | | | | | | | | | | | | | | | | | We have an existing protocol specific call query_mf_symlink() created for check_mf_symlink which can also be used for query_mf_symlink(). Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
| * cifs: Rename MF symlink function namesSachin Prabhu2014-01-204-26/+24
| | | | | | | | | | | | | | | | Clean up camel case in functionnames. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
| * cifs: Rename and cleanup open_query_close_cifs_symlink()Sachin Prabhu2014-01-204-31/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | Rename open_query_close_cifs_symlink to cifs_query_mf_symlink() to make the name more consistent with other protocol version specific functions. We also pass tcon as an argument to the function. This is already available in the calling functions and we can avoid having to make an unnecessary lookup. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
| * cifs: Fix memory leak in cifs_hardlink()Christian Engelmayer2014-01-191-2/+4
| | | | | | | | | | | | | | | | Fix a potential memory leak in the cifs_hardlink() error handling path. Detected by Coverity: CID 728510, CID 728511. Signed-off-by: Christian Engelmayer <cengelma@gmx.at> Signed-off-by: Steve French <smfrench@gmail.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2014-02-016-50/+25
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Several obvious fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Fix mountpoint reference leakage in linkat hfsplus: use xattr handlers for removexattr Typo in compat_sys_lseek() declaration fs/super.c: sync ro remount after blocking writers vfs: unexport the getname() symbol
| * | Fix mountpoint reference leakage in linkatOleg Drokin2014-01-311-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent changes to retry on ESTALE in linkat (commit 442e31ca5a49e398351b2954b51f578353fdf210) introduced a mountpoint reference leak and a small memory leak in case a filesystem link operation returns ESTALE which is pretty normal for distributed filesystems like lustre, nfs and so on. Free old_path in such a case. [AV: there was another missing path_put() nearby - on the previous goto retry] Signed-off-by: Oleg Drokin: <green@linuxhacker.ru> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | hfsplus: use xattr handlers for removexattrChristoph Hellwig2014-01-314-47/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hfsplus was already using the handlers for get and set operations, and with the removal of can_set_xattr we've now allow operations that wouldn't otherwise be allowed. With this we can also centralize the special-casing of the osx. attrs that don't have prefixes on disk in the osx xattr handlers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | fs/super.c: sync ro remount after blocking writersAndrew Ruder2014-01-311-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move sync_filesystem() after sb_prepare_remount_readonly(). If writers sneak in anywhere from sync_filesystem() to sb_prepare_remount_readonly() it can cause inodes to be dirtied and writeback to occur well after sys_mount() has completely successfully. This was spotted by corrupted ubifs filesystems on reboot, but appears that it can cause issues with any filesystem using writeback. Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> CC: Richard Weinberger <richard@nod.at> Co-authored-by: Richard Weinberger <richard@nod.at> Signed-off-by: Andrew Ruder <andrew.ruder@elecsyscorp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | vfs: unexport the getname() symbolJeff Layton2014-01-311-1/+0
| | | | | | | | | | | | | | | | | | | | | Leaving getname() exported when putname() isn't is a bad idea. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Merge tag 'nfs-for-3.14-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2014-01-319-35/+84
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes from Trond Myklebust: "Highlights: - Fix several races in nfs_revalidate_mapping - NFSv4.1 slot leakage in the pNFS files driver - Stable fix for a slot leak in nfs40_sequence_done - Don't reject NFSv4 servers that support ACLs with only ALLOW aces" * tag 'nfs-for-3.14-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: initialize the ACL support bits to zero. NFSv4.1: Cleanup NFSv4.1: Clean up nfs41_sequence_done NFSv4: Fix a slot leak in nfs40_sequence_done NFSv4.1 free slot before resending I/O to MDS nfs: add memory barriers around NFS_INO_INVALID_DATA and NFS_INO_INVALIDATING NFS: Fix races in nfs_revalidate_mapping sunrpc: turn warn_gssd() log message into a dprintk() NFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping nfs: handle servers that support only ALLOW ACE type.
| * | | nfs: initialize the ACL support bits to zero.Malahal Naineni2014-01-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid returning incorrect acl mask attributes when the server doesn't support ACLs. Signed-off-by: Malahal Naineni <malahal@us.ibm.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | NFSv4.1: CleanupTrond Myklebust2014-01-291-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It is now completely safe to call nfs41_sequence_free_slot with a NULL slot. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | NFSv4.1: Clean up nfs41_sequence_doneTrond Myklebust2014-01-291-11/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the test for res->sr_slot == NULL out of the nfs41_sequence_free_slot helper and into the main function for efficiency. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | NFSv4: Fix a slot leak in nfs40_sequence_doneTrond Myklebust2014-01-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The check for whether or not we sent an RPC call in nfs40_sequence_done is insufficient to decide whether or not we are holding a session slot, and thus should not be used to decide when to free that slot. This patch replaces the RPC_WAS_SENT() test with the correct test for whether or not slot == NULL. Cc: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org # 3.12+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>