summaryrefslogtreecommitdiffstats
path: root/fs/notify/dnotify
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'selinux-pr-20190917' of ↵Linus Torvalds2019-09-231-3/+12
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: - Add LSM hooks, and SELinux access control hooks, for dnotify, fanotify, and inotify watches. This has been discussed with both the LSM and fs/notify folks and everybody is good with these new hooks. - The LSM stacking changes missed a few calls to current_security() in the SELinux code; we fix those and remove current_security() for good. - Improve our network object labeling cache so that we always return the object's label, even when under memory pressure. Previously we would return an error if we couldn't allocate a new cache entry, now we always return the label even if we can't create a new cache entry for it. - Convert the sidtab atomic_t counter to a normal u32 with READ/WRITE_ONCE() and memory barrier protection. - A few patches to policydb.c to clean things up (remove forward declarations, long lines, bad variable names, etc) * tag 'selinux-pr-20190917' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: lsm: remove current_security() selinux: fix residual uses of current_security() for the SELinux blob selinux: avoid atomic_t usage in sidtab fanotify, inotify, dnotify, security: add security hook for fs notifications selinux: always return a secid from the network caches if we find one selinux: policydb - rename type_val_to_struct_array selinux: policydb - fix some checkpatch.pl warnings selinux: shuffle around policydb.c to get rid of forward declarations
| * fanotify, inotify, dnotify, security: add security hook for fs notificationsAaron Goidel2019-08-121-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of now, setting watches on filesystem objects has, at most, applied a check for read access to the inode, and in the case of fanotify, requires CAP_SYS_ADMIN. No specific security hook or permission check has been provided to control the setting of watches. Using any of inotify, dnotify, or fanotify, it is possible to observe, not only write-like operations, but even read access to a file. Modeling the watch as being merely a read from the file is insufficient for the needs of SELinux. This is due to the fact that read access should not necessarily imply access to information about when another process reads from a file. Furthermore, fanotify watches grant more power to an application in the form of permission events. While notification events are solely, unidirectional (i.e. they only pass information to the receiving application), permission events are blocking. Permission events make a request to the receiving application which will then reply with a decision as to whether or not that action may be completed. This causes the issue of the watching application having the ability to exercise control over the triggering process. Without drawing a distinction within the permission check, the ability to read would imply the greater ability to control an application. Additionally, mount and superblock watches apply to all files within the same mount or superblock. Read access to one file should not necessarily imply the ability to watch all files accessed within a given mount or superblock. In order to solve these issues, a new LSM hook is implemented and has been placed within the system calls for marking filesystem objects with inotify, fanotify, and dnotify watches. These calls to the hook are placed at the point at which the target path has been resolved and are provided with the path struct, the mask of requested notification events, and the type of object on which the mark is being set (inode, superblock, or mount). The mask and obj_type have already been translated into common FS_* values shared by the entirety of the fs notification infrastructure. The path struct is passed rather than just the inode so that the mount is available, particularly for mount watches. This also allows for use of the hook by pathname-based security modules. However, since the hook is intended for use even by inode based security modules, it is not placed under the CONFIG_SECURITY_PATH conditional. Otherwise, the inode-based security modules would need to enable all of the path hooks, even though they do not use any of them. This only provides a hook at the point of setting a watch, and presumes that permission to set a particular watch implies the ability to receive all notification about that object which match the mask. This is all that is required for SELinux. If other security modules require additional hooks or infrastructure to control delivery of notification, these can be added by them. It does not make sense for us to propose hooks for which we have no implementation. The understanding that all notifications received by the requesting application are all strictly of a type for which the application has been granted permission shows that this implementation is sufficient in its coverage. Security modules wishing to provide complete control over fanotify must also implement a security_file_open hook that validates that the access requested by the watching application is authorized. Fanotify has the issue that it returns a file descriptor with the file mode specified during fanotify_init() to the watching process on event. This is already covered by the LSM security_file_open hook if the security module implements checking of the requested file mode there. Otherwise, a watching process can obtain escalated access to a file for which it has not been authorized. The selinux_path_notify hook implementation works by adding five new file permissions: watch, watch_mount, watch_sb, watch_reads, and watch_with_perm (descriptions about which will follow), and one new filesystem permission: watch (which is applied to superblock checks). The hook then decides which subset of these permissions must be held by the requesting application based on the contents of the provided mask and the obj_type. The selinux_file_open hook already checks the requested file mode and therefore ensures that a watching process cannot escalate its access through fanotify. The watch, watch_mount, and watch_sb permissions are the baseline permissions for setting a watch on an object and each are a requirement for any watch to be set on a file, mount, or superblock respectively. It should be noted that having either of the other two permissions (watch_reads and watch_with_perm) does not imply the watch, watch_mount, or watch_sb permission. Superblock watches further require the filesystem watch permission to the superblock. As there is no labeled object in view for mounts, there is no specific check for mount watches beyond watch_mount to the inode. Such a check could be added in the future, if a suitable labeled object existed representing the mount. The watch_reads permission is required to receive notifications from read-exclusive events on filesystem objects. These events include accessing a file for the purpose of reading and closing a file which has been opened read-only. This distinction has been drawn in order to provide a direct indication in the policy for this otherwise not obvious capability. Read access to a file should not necessarily imply the ability to observe read events on a file. Finally, watch_with_perm only applies to fanotify masks since it is the only way to set a mask which allows for the blocking, permission event. This permission is needed for any watch which is of this type. Though fanotify requires CAP_SYS_ADMIN, this is insufficient as it gives implicit trust to root, which we do not do, and does not support least privilege. Signed-off-by: Aaron Goidel <acgoide@tycho.nsa.gov> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Moore <paul@paul-moore.com>
* | treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 118Thomas Gleixner2019-05-241-10/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 44 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190523091651.032047323@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | treewide: Add SPDX license identifier - Makefile/KconfigThomas Gleixner2019-05-212-0/+2
|/ | | | | | | | | | | | | | Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* fsnotify: switch send_to_group() and ->handle_event to const struct qstr *Al Viro2019-04-261-1/+1
| | | | | | | note that conditions surrounding accesses to dname in audit_watch_handle_event() and audit_mark_handle_event() guarantee that dname won't have been NULL. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'siginfo-linus' of ↵Linus Torvalds2018-08-211-1/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull core signal handling updates from Eric Biederman: "It was observed that a periodic timer in combination with a sufficiently expensive fork could prevent fork from every completing. This contains the changes to remove the need for that restart. This set of changes is split into several parts: - The first part makes PIDTYPE_TGID a proper pid type instead something only for very special cases. The part starts using PIDTYPE_TGID enough so that in __send_signal where signals are actually delivered we know if the signal is being sent to a a group of processes or just a single process. - With that prep work out of the way the logic in fork is modified so that fork logically makes signals received while it is running appear to be received after the fork completes" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (22 commits) signal: Don't send signals to tasks that don't exist signal: Don't restart fork when signals come in. fork: Have new threads join on-going signal group stops fork: Skip setting TIF_SIGPENDING in ptrace_init_task signal: Add calculate_sigpending() fork: Unconditionally exit if a fatal signal is pending fork: Move and describe why the code examines PIDNS_ADDING signal: Push pid type down into complete_signal. signal: Push pid type down into __send_signal signal: Push pid type down into send_signal signal: Pass pid type into do_send_sig_info signal: Pass pid type into send_sigio_to_task & send_sigurg_to_task signal: Pass pid type into group_send_sig_info signal: Pass pid and pid type into send_sigqueue posix-timers: Noralize good_sigevent signal: Use PIDTYPE_TGID to clearly store where file signals will be sent pid: Implement PIDTYPE_TGID pids: Move the pgrp and session pid pointers from task_struct to signal_struct kvm: Don't open code task_pid in kvm_vcpu_ioctl pids: Compute task_tgid using signal->leader_pid ...
| * signal: Use PIDTYPE_TGID to clearly store where file signals will be sentEric W. Biederman2018-07-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When f_setown is called a pid and a pid type are stored. Replace the use of PIDTYPE_PID with PIDTYPE_TGID as PIDTYPE_TGID goes to the entire thread group. Replace the use of PIDTYPE_MAX with PIDTYPE_PID as PIDTYPE_PID now is only for a thread. Update the users of __f_setown to use PIDTYPE_TGID instead of PIDTYPE_PID. For now the code continues to capture task_pid (when task_tgid would really be appropriate), and iterate on PIDTYPE_PID (even when type == PIDTYPE_TGID) out of an abundance of caution to preserve existing behavior. Oleg Nesterov suggested using the test to ensure we use PIDTYPE_PID for tgid lookup also be used to avoid taking the tasklist lock. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* | fs: fsnotify: account fsnotify metadata to kmemcgShakeel Butt2018-08-171-2/+3
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "Directed kmem charging", v8. The Linux kernel's memory cgroup allows limiting the memory usage of the jobs running on the system to provide isolation between the jobs. All the kernel memory allocated in the context of the job and marked with __GFP_ACCOUNT will also be included in the memory usage and be limited by the job's limit. The kernel memory can only be charged to the memcg of the process in whose context kernel memory was allocated. However there are cases where the allocated kernel memory should be charged to the memcg different from the current processes's memcg. This patch series contains two such concrete use-cases i.e. fsnotify and buffer_head. The fsnotify event objects can consume a lot of system memory for large or unlimited queues if there is either no or slow listener. The events are allocated in the context of the event producer. However they should be charged to the event consumer. Similarly the buffer_head objects can be allocated in a memcg different from the memcg of the page for which buffer_head objects are being allocated. To solve this issue, this patch series introduces mechanism to charge kernel memory to a given memcg. In case of fsnotify events, the memcg of the consumer can be used for charging and for buffer_head, the memcg of the page can be charged. For directed charging, the caller can use the scope API memalloc_[un]use_memcg() to specify the memcg to charge for all the __GFP_ACCOUNT allocations within the scope. This patch (of 2): A lot of memory can be consumed by the events generated for the huge or unlimited queues if there is either no or slow listener. This can cause system level memory pressure or OOMs. So, it's better to account the fsnotify kmem caches to the memcg of the listener. However the listener can be in a different memcg than the memcg of the producer and these allocations happen in the context of the event producer. This patch introduces remote memcg charging API which the producer can use to charge the allocations to the memcg of the listener. There are seven fsnotify kmem caches and among them allocations from dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and inotify_inode_mark_cachep happens in the context of syscall from the listener. So, SLAB_ACCOUNT is enough for these caches. The objects from fsnotify_mark_connector_cachep are not accounted as they are small compared to the notification mark or events and it is unclear whom to account connector to since it is shared by all events attached to the inode. The allocations from the event caches happen in the context of the event producer. For such caches we will need to remote charge the allocations to the listener's memcg. Thus we save the memcg reference in the fsnotify_group structure of the listener. This patch has also moved the members of fsnotify_group to keep the size same, at least for 64 bit build, even with additional member by filling the holes. [shakeelb@google.com: use GFP_KERNEL_ACCOUNT rather than open-coding it] Link: http://lkml.kernel.org/r/20180702215439.211597-1-shakeelb@google.com Link: http://lkml.kernel.org/r/20180627191250.209150-2-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Greg Thelen <gthelen@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fsnotify: add fsnotify_add_inode_mark() wrappersAmir Goldstein2018-05-181-1/+1
| | | | | | | | Before changing the arguments of the functions fsnotify_add_mark() and fsnotify_add_mark_locked(), convert most callers to use a wrapper. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* fsnotify: remove redundant arguments to handle_event()Amir Goldstein2018-05-181-3/+3
| | | | | | | | | | | | | | | | | | | inode_mark and vfsmount_mark arguments are passed to handle_event() operation as function arguments as well as on iter_info struct. The difference is that iter_info struct may contain marks that should not be handled and are represented as NULL arguments to inode_mark or vfsmount_mark. Instead of passing the inode_mark and vfsmount_mark arguments, add a report_mask member to iter_info struct to indicate which marks should be handled, versus marks that should only be kept alive during user wait. This change is going to be used for passing more mark types with handle_event() (i.e. super block marks). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* dnotify: Handle errors from fsnotify_add_mark_locked() in fcntl_dirnotify()Jan Kara2017-10-311-1/+6
| | | | | | | | | | | | | | fsnotify_add_mark_locked() can fail but we do not check its return value. This didn't matter before commit 9dd813c15b2c "fsnotify: Move mark list head from object into dedicated structure" as none of possible failures could happen for dnotify but after that commit -ENOMEM can be returned. Handle this error properly in fcntl_dirnotify() as otherwise we just hit BUG_ON(dn_mark->dn) in dnotify_free_mark(). Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reported-by: syzkaller Fixes: 9dd813c15b2c101168808d4f5941a29985758973 Signed-off-by: Jan Kara <jack@suse.cz>
* fsnotify: make dnotify_fsnotify_ops constBhumika Goyal2017-08-301-1/+1
| | | | | | | Make this const as it is never modified. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* fsnotify: Move ->free_mark callback to fsnotify_opsJan Kara2017-04-101-1/+2
| | | | | | | | | | Pointer to ->free_mark callback unnecessarily occupies one long in each fsnotify_mark although they are the same for all marks from one notification group. Move the callback pointer to fsnotify_ops. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* fsnotify: Add group pointer in fsnotify_init_mark()Jan Kara2017-04-101-3/+2
| | | | | | | | | | | | | Currently we initialize mark->group only in fsnotify_add_mark_lock(). However we will need to access fsnotify_ops of corresponding group from fsnotify_put_mark() so we need mark->group initialized earlier. Do that in fsnotify_init_mark() which has a consequence that once fsnotify_init_mark() is called on a mark, the mark has to be destroyed by fsnotify_put_mark(). Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* fsnotify: Remove fsnotify_find_{inode|vfsmount}_mark()Jan Kara2017-04-101-2/+2
| | | | | | | | | These are very thin wrappers, just remove them. Drop fs/notify/vfsmount_mark.c as it is empty now. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* fsnotify: Remove fsnotify_set_mark_{,ignored_}mask_locked()Jan Kara2017-04-101-6/+3
| | | | | | | | | These helpers are now only a simple assignment and just obfuscate what is going on. Remove them. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* fsnotify: Pass fsnotify_iter_info into handle_event handlerJan Kara2017-04-101-1/+2
| | | | | | | | | | | | Pass fsnotify_iter_info into ->handle_event() handler so that it can release and reacquire SRCU lock via fsnotify_prepare_user_wait() and fsnotify_finish_user_wait() functions. These functions also make sure current marks are appropriately pinned so that iteration protected by srcu in fsnotify() stays safe. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* fsnotify: Move locking into fsnotify_recalc_mask()Jan Kara2017-04-101-2/+1
| | | | | | | | | | Move locking of locks protecting a list of marks into fsnotify_recalc_mask(). This reduces code churn in the following patch which changes the lock protecting the list of marks. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* fsnotify: Move object pointer to fsnotify_mark_connectorJan Kara2017-04-101-2/+2
| | | | | | | | | | | Move pointer to inode / vfsmount from mark itself to the fsnotify_mark_connector structure. This is another step on the path towards decoupling inode / vfsmount lifetime from notification mark lifetime. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* fsnotify: constify 'data' passed to ->handle_event()Al Viro2016-12-051-1/+1
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fsnotify: get rid of fsnotify_destroy_mark_locked()Jan Kara2015-09-041-4/+10
| | | | | | | | | | | | | | | fsnotify_destroy_mark_locked() is subtle to use because it temporarily releases group->mark_mutex. To avoid future problems with this function, split it into two. fsnotify_detach_mark() is the part that needs group->mark_mutex and fsnotify_free_mark() is the part that must be called outside of group->mark_mutex. This way it's much clearer what's going on and we also avoid some pointless acquisitions of group->mark_mutex. Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fsnotify: unify inode and mount marks handlingJan Kara2014-12-131-2/+2
| | | | | | | | | | | There's a lot of common code in inode and mount marks handling. Factor it out to a common helper function. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* security: make security_file_set_fowner, f_setown and __f_setown void returnJeff Layton2014-09-091-7/+1
| | | | | | | | | | security_file_set_fowner always returns 0, so make it f_setown and __f_setown void return functions and fix up the error handling in the callers. Cc: linux-security-module@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* inotify: Fix reporting of cookies for inotify eventsJan Kara2014-02-181-1/+1
| | | | | | | | | | | | | | | | My rework of handling of notification events (namely commit 7053aee26a35 "fsnotify: do not share events between notification groups") broke sending of cookies with inotify events. We didn't propagate the value passed to fsnotify() properly and passed 4 uninitialized bytes to userspace instead (so it is also an information leak). Sadly I didn't notice this during my testing because inotify cookies aren't used very much and LTP inotify tests ignore them. Fix the problem by passing the cookie value properly. Fixes: 7053aee26a3548ebaba046ae2e52396ccf56ac6c Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
* fsnotify: remove pointless NULL initializersJan Kara2014-01-211-3/+0
| | | | | | | | | | | | | We usually rely on the fact that struct members not specified in the initializer are set to NULL. So do that with fsnotify function pointers as well. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fsnotify: remove .should_send_event callbackJan Kara2014-01-211-18/+4
| | | | | | | | | | | | | After removing event structure creation from the generic layer there is no reason for separate .should_send_event and .handle_event callbacks. So just remove the first one. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fsnotify: do not share events between notification groupsJan Kara2014-01-211-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently fsnotify framework creates one event structure for each notification event and links this event into all interested notification groups. This is done so that we save memory when several notification groups are interested in the event. However the need for event structure shared between inotify & fanotify bloats the event structure so the result is often higher memory consumption. Another problem is that fsnotify framework keeps path references with outstanding events so that fanotify can return open file descriptors with its events. This has the undesirable effect that filesystem cannot be unmounted while there are outstanding events - a regression for inotify compared to a situation before it was converted to fsnotify framework. For fanotify this problem is hard to avoid and users of fanotify should kind of expect this behavior when they ask for file descriptors from notified files. This patch changes fsnotify and its users to create separate event structure for each group. This allows for much simpler code (~400 lines removed by this patch) and also smaller event structures. For example on 64-bit system original struct fsnotify_event consumes 120 bytes, plus additional space for file name, additional 24 bytes for second and each subsequent group linking the event, and additional 32 bytes for each inotify group for private data. After the conversion inotify event consumes 48 bytes plus space for file name which is considerably less memory unless file names are long and there are several groups interested in the events (both of which are uncommon). Fanotify event fits in 56 bytes after the conversion (fanotify doesn't care about file names so its events don't have to have it allocated). A win unless there are four or more fanotify groups interested in the event. The conversion also solves the problem with unmount when only inotify is used as we don't have to grab path references for inotify events. [hughd@google.com: fanotify: fix corruption preventing startup] Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* dnotify: replace dnotify_mark_mutex with mark mutex of dnotify_groupLino Sanfilippo2013-07-091-12/+13
| | | | | | | | | | | | There is no need to use a special mutex to protect against the fcntl/close race (see dnotify.c for a description of this race). Instead the dnotify_groups mark mutex can be used. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* new helper: file_inode(file)Al Viro2013-02-221-2/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fsnotify: pass group to fsnotify_destroy_mark()Lino Sanfilippo2012-12-111-2/+2
| | | | | | | | In fsnotify_destroy_mark() dont get the group from the passed mark anymore, but pass the group itself as an additional parameter to the function. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
* fanotify: use both marks when possibleEric Paris2010-07-281-1/+1
| | | | | | | | fanotify currently, when given a vfsmount_mark will look up (if it exists) the corresponding inode mark. This patch drops that lookup and uses the mark provided. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: pass both the vfsmount mark and inode markEric Paris2010-07-281-7/+11
| | | | | | | | should_send_event() and handle_event() will both need to look up the inode event if they get a vfsmount event. Lets just pass both at the same time since we have them both after walking the lists in lockstep. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: remove group->maskEric Paris2010-07-281-4/+0
| | | | | | | group->mask is now useless. It was originally a shortcut for fsnotify to save on performance. These checks are now redundant, so we remove them. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: cleanup should_send_eventEric Paris2010-07-281-10/+1
| | | | | | | | The change to use srcu and walk the object list rather than the global fsnotify_group list means that should_send_event is no longer needed for a number of groups and can be simplified for others. Do that. Signed-off-by: Eric Paris <eparis@redhat.com>
* dnotify: use the mark in handler functionsEric Paris2010-07-281-17/+5
| | | | | | | | dnotify now gets a mark in the should_send_event and handle_event functions. Rather than look up the mark themselves dnotify should just use the mark it was handed. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: send fsnotify_mark to groups in event handling functionsEric Paris2010-07-281-1/+3
| | | | | | | | | With the change of fsnotify to use srcu walking the marks list instead of walking the global groups list we now know the mark in question. The code can send the mark to the group's handling functions and the groups won't have to find those marks themselves. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: allow marks to not pin inodes in coreEric Paris2010-07-281-1/+1
| | | | | | | | | | | | | | | | | inotify marks must pin inodes in core. dnotify doesn't technically need to since they are closed when the directory is closed. fanotify also need to pin inodes in core as it works today. But the next step is to introduce the concept of 'ignored masks' which is actually a mask of events for an inode of no interest. I claim that these should be liberally sent to the kernel and should not pin the inode in core. If the inode is brought back in the listener will get an event it may have thought excluded, but this is not a serious situation and one any listener should deal with. This patch lays the ground work for non-pinning inode marks by using lazy inode pinning. We do not pin a mark until it has a non-zero mask entry. If a listener new sets a mask we never pin the inode. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: split generic and inode specific mark codeEric Paris2010-07-281-6/+6
| | | | | | | | currently all marking is done by functions in inode-mark.c. Some of this is pretty generic and should be instead done in a generic function and we should only put the inode specific code in inode-mark.c Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: take inode->i_lock inside fsnotify_find_mark_entry()Andreas Gruenbacher2010-07-281-12/+0
| | | | | | | | | All callers to fsnotify_find_mark_entry() except one take and release inode->i_lock around the call. Take the lock inside fsnotify_find_mark_entry() instead. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Eric Paris <eparis@redhat.com>
* dnotify: rename mark_entry to markEric Paris2010-07-281-85/+85
| | | | | | | nomenclature change. Used to call things 'entries' but now we just call them 'marks.' Do those changes for dnotify. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: rename fsnotify_find_mark_entry to fsnotify_find_markEric Paris2010-07-281-7/+7
| | | | | | the _entry portion of fsnotify functions is useless. Drop it. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: rename fsnotify_mark_entry to just fsnotify_markEric Paris2010-07-281-12/+12
| | | | | | | The name is long and it serves no real purpose. So rename fsnotify_mark_entry to just fsnotify_mark. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: put inode specific fields in an fsnotify_mark in a unionEric Paris2010-07-281-2/+2
| | | | | | | | | The addition of marks on vfs mounts will be simplified if the inode specific parts of a mark and the vfsmnt specific parts of a mark are actually in a union so naming can be easy. This patch just implements the inode struct and the union. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: include vfsmount in should_send_event when appropriateEric Paris2010-07-281-2/+2
| | | | | | | | | To ensure that a group will not duplicate events when it receives it based on the vfsmount and the inode should_send_event test we should distinguish those two cases. We pass a vfsmount to this function so groups can make their own determinations. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: drop mask argument from fsnotify_alloc_groupEric Paris2010-07-281-1/+1
| | | | | | | Nothing uses the mask argument to fsnotify_alloc_group. This patch drops that argument. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: fsnotify_obtain_group should be fsnotify_alloc_groupEric Paris2010-07-281-1/+1
| | | | | | | | fsnotify_obtain_group was intended to be able to find an already existing group. Nothing uses that functionality. This just renames it to fsnotify_alloc_group so it is clear what it is doing. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: remove group_num altogetherEric Paris2010-07-281-2/+1
| | | | | | | | The original fsnotify interface has a group-num which was intended to be able to find a group after it was added. I no longer think this is a necessary thing to do and so we remove the group_num. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: include data in should_send callsEric Paris2010-07-281-1/+1
| | | | | | | | | fanotify is going to need to look at file->private_data to know if an event should be sent or not. This passes the data (which might be a file, dentry, inode, or none) to the should_send function calls so fanotify can get that information when available Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: provide the data type to should_send_eventEric Paris2010-07-281-1/+2
| | | | | | | | | | fanotify is only interested in event types which contain enough information to open the original file in the context of the fanotify listener. Since fanotify may not want to send events if that data isn't present we pass the data type to the should_send_event function call so fanotify can express its lack of interest. Signed-off-by: Eric Paris <eparis@redhat.com>
* fsnotify: allow addition of duplicate fsnotify marksEric Paris2010-07-281-1/+1
| | | | | | | | | This patch allows a task to add a second fsnotify mark to an inode for the same group. This mark will be added to the end of the inode's list and this will never be found by the stand fsnotify_find_mark() function. This is useful if a user wants to add a new mark before removing the old one. Signed-off-by: Eric Paris <eparis@redhat.com>