summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* slab: convert slab_is_available() to booleanDenis Kirjanov2015-11-052-2/+2
| | | | | | | | | | | | A good candidate to return a boolean result. Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Cc: Christoph Lameter <cl@linux.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kernel/watchdog.c: fix race between proc_watchdog_thresh() and ↵Ulrich Obergfell2015-11-051-1/+1
| | | | | | | | | | | | | | | | | | watchdog_timer_fn() Theoretically it is possible that the watchdog timer expires right at the time when a user sets 'watchdog_thresh' to zero (note: this disables the lockup detectors). In this scenario, the is_softlockup() function - which is called by the timer - could produce a false positive. Fix this by checking the current value of 'watchdog_thresh'. Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com> Acked-by: Don Zickus <dzickus@redhat.com> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kernel/watchdog.c: remove {get|put}_online_cpus() from ↵Ulrich Obergfell2015-11-051-4/+6
| | | | | | | | | | | | | | | watchdog_{park|unpark}_threads() watchdog_{park|unpark}_threads() are now called in code paths that protect themselves against CPU hotplug, so {get|put}_online_cpus() calls are redundant and can be removed. Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com> Acked-by: Don Zickus <dzickus@redhat.com> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kernel/watchdog.c: avoid races between /proc handlers and CPU hotplugUlrich Obergfell2015-11-051-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | The handler functions for watchdog parameters in /proc/sys/kernel do not protect themselves against races with CPU hotplug. Hence, theoretically it is possible that a new watchdog thread is started on a hotplugged CPU while a parameter is being modified, and the thread could thus use a parameter value that is 'in transition'. For example, if 'watchdog_thresh' is being set to zero (note: this disables the lockup detectors) the thread would erroneously use the value zero as the sample period. To avoid such races and to keep the /proc handler code consistent, call {get|put}_online_cpus() in proc_watchdog_common() {get|put}_online_cpus() in proc_watchdog_thresh() {get|put}_online_cpus() in proc_watchdog_cpumask() Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com> Acked-by: Don Zickus <dzickus@redhat.com> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kernel/watchdog.c: avoid race between lockup detector suspend/resume and CPU ↵Ulrich Obergfell2015-11-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | hotplug The lockup detector suspend/resume interface that was introduced by commit 8c073d27d7ad ("watchdog: introduce watchdog_suspend() and watchdog_resume()") does not protect itself against races with CPU hotplug. Hence, theoretically it is possible that a new watchdog thread is started on a hotplugged CPU while the lockup detector is suspended, and the thread could thus interfere unexpectedly with the code that requested to suspend the lockup detector. Avoid the race by calling get_online_cpus() in lockup_detector_suspend() put_online_cpus() in lockup_detector_resume() Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com> Acked-by: Don Zickus <dzickus@redhat.com> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kernel/watchdog.c: add sysctl knob hardlockup_panicDon Zickus2015-11-054-3/+16
| | | | | | | | | | | | | | | | | | The only way to enable a hardlockup to panic the machine is to set 'nmi_watchdog=panic' on the kernel command line. This makes it awkward for end users and folks who want to run automate tests (like myself). Mimic the softlockup_panic knob and create a /proc/sys/kernel/hardlockup_panic knob. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Acked-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kernel/watchdog.c: perform all-CPU backtrace in case of hard lockupJiri Kosina2015-11-055-5/+55
| | | | | | | | | | | | | | | | | | | In many cases of hardlockup reports, it's actually not possible to know why it triggered, because the CPU that got stuck is usually waiting on a resource (with IRQs disabled) in posession of some other CPU is holding. IOW, we are often looking at the stacktrace of the victim and not the actual offender. Introduce sysctl / cmdline parameter that makes it possible to have hardlockup detector perform all-CPU backtrace. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Acked-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* watchdog: do not unpark threads in watchdog_park_threads() on errorUlrich Obergfell2015-11-051-4/+6
| | | | | | | | | | | | | | | If kthread_park() returns an error, watchdog_park_threads() should not blindly 'roll back' the already parked threads to the unparked state. Instead leave it up to the callers to handle such errors appropriately in their context. For example, it is redundant to unpark the threads if the lockup detectors will soon be disabled by the callers anyway. Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* watchdog: implement error handling in lockup_detector_suspend()Ulrich Obergfell2015-11-051-0/+5
| | | | | | | | | | | lockup_detector_suspend() now handles errors from watchdog_park_threads(). Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* watchdog: implement error handling in update_watchdog_all_cpus() and callersUlrich Obergfell2015-11-051-7/+23
| | | | | | | | | | | | | update_watchdog_all_cpus() now passes errors from watchdog_park_threads() up to functions in the call chain. This allows watchdog_enable_all_cpus() and proc_watchdog_update() to handle such errors too. Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* watchdog: move watchdog_disable_all_cpus() outside of ifdefUlrich Obergfell2015-11-051-3/+5
| | | | | | | | | | | | | | Move watchdog_disable_all_cpus() outside of the ifdef so that it is available if CONFIG_SYSCTL is not defined. This is preparation for "watchdog: implement error handling in update_watchdog_all_cpus() and callers". Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* watchdog: fix error handling in proc_watchdog_thresh()Ulrich Obergfell2015-11-051-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original watchdog_park_threads() function that was introduced by commit 81a4beef91ba ("watchdog: introduce watchdog_park_threads() and watchdog_unpark_threads()") takes a very simple approach to handle errors returned by kthread_park(): It attempts to roll back all watchdog threads to the unparked state. However, this may be undesired behaviour from the perspective of the caller which may want to handle errors as appropriate in its specific context. Currently, there are two possible call chains: - watchdog suspend/resume interface lockup_detector_suspend watchdog_park_threads - write to parameters in /proc/sys/kernel proc_watchdog_update watchdog_enable_all_cpus update_watchdog_all_cpus watchdog_park_threads Instead of 'blindly' attempting to unpark the watchdog threads if a kthread_park() call fails, the new approach is to disable the lockup detectors in the above call chains. Failure becomes visible to the user as follows: - error messages from lockup_detector_suspend() or watchdog_enable_all_cpus() - the state that can be read from /proc/sys/kernel/watchdog_enabled - the 'write' system call in the latter call chain returns an error I did not experience kthread_park() failures in practice, I used some instrumentation to fake error returns from kthread_park() in order to test the patches. This patch (of 5): Restore the previous value of watchdog_thresh _and_ sample_period if proc_watchdog_update() returns an error. The variables must be consistent to avoid false positives of the lockup detectors. Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kernel/watchdog.c: is_hardlockup can be booleanYaowei Bai2015-11-051-3/+3
| | | | | | | | | | | | | Make is_hardlockup return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Acked-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 9p: do not overwrite return code when locking failsDominique Martinet2015-11-051-1/+2
| | | | | | | | | | | | | | If the remote locking fail, we run a local vfs unlock that should work and return success to userland when we didn't actually lock at all. We need to tell the application that tried to lock that it didn't get it, not that all went well. Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* rcu: force alignment on struct callback_head/rcu_headKirill A. Shutemov2015-11-051-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make struct callback_head aligned to size of pointer. On most architectures it happens naturally due ABI requirements, but some architectures (like CRIS) have weird ABI and we need to ask it explicitly. The alignment is required to guarantee that bits 0 and 1 of @next will be clear under normal conditions -- as long as we use call_rcu(), call_rcu_bh(), call_rcu_sched(), or call_srcu() to queue callback. This guarantee is important for few reasons: - future call_rcu_lazy() will make use of lower bits in the pointer; - the structure shares storage spacer in struct page with @compound_head, which encode PageTail() in bit 0. The guarantee is needed to avoid false-positive PageTail(). False postive PageTail() caused crash on crisv32[1]. It happend due misaligned task_struct->rcu, which was byte-aligned. [1] http://lkml.kernel.org/r/55FAEA67.9000102@roeck-us.net Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: clean up unused variable in ocfs2_duplicate_clusters_by_page()Joseph Qi2015-11-051-4/+1
| | | | | | | | | | | readahead_pages in ocfs2_duplicate_clusters_by_page is defined but not used, so clean it up. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: add uuid to ocfs2 thread name for problem analysisJoseph Qi2015-11-055-6/+10
| | | | | | | | | | | | | | | | A node can mount multiple ocfs2 volumes. And if thread names are same for each volume/domain, it will bring inconvenience when analyzing problems because we have to identify which volume/domain the messages belong to. Since thread name will be printed to messages, so add volume uuid or dlm name to thread name can benefit problem analysis. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Gang He <ghe@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an erroralex chen2015-11-051-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ocfs2_mknod_locked if '__ocfs2_mknod_locke d' returns an error, we should reclaim the inode successfully claimed above, otherwise, the inode never be reused. The case is described below: ocfs2_mknod ocfs2_mknod_locked ocfs2_claim_new_inode Successfully claim the inode __ocfs2_mknod_locked ocfs2_journal_access_di Failed because of -ENOMEM or other reasons, the inode lockres has not been initialized yet. iput(inode) ocfs2_evict_inode ocfs2_delete_inode ocfs2_inode_lock ocfs2_inode_lock_full_nested __ocfs2_cluster_lock Return -EINVAL because of the inode lockres has not been initialized. So the following operations are not performed ocfs2_wipe_inode ocfs2_remove_inode ocfs2_free_dinode ocfs2_free_suballoc_bits Signed-off-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: fix race between mount and delete node/clusterJoseph Qi2015-11-051-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | There is a race case between mount and delete node/cluster, which will lead o2hb_thread to malfunctioning dead loop. o2hb_thread { o2nm_depend_this_node(); <<<<<< race window, node may have already been deleted, and then enter the loop, o2hb thread will be malfunctioning because of no configured nodes found. while (!kthread_should_stop() && !reg->hr_unclean_stop && !reg->hr_aborted_start) { } So check the return value of o2nm_depend_this_node() is needed. If node has been deleted, do not enter the loop and let mount fail. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: only take lock if dio entry when recover orphansJoseph Qi2015-11-052-39/+49
| | | | | | | | | | | | We have no need to take inode mutex, rw and inode lock if it is not dio entry when recover orphans. Optimize it by adding a flag OCFS2_INODE_DIO_ORPHAN_ENTRY to ocfs2_inode_info to reduce contention. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: do not include dio entry in case of orphan scanJoseph Qi2015-11-053-5/+15
| | | | | | | | | | | dio entry will only do truncate in case of ORPHAN_NEED_TRUNCATE. So do not include it when doing normal orphan scan to reduce contention. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: improve performance for localallocJoseph Qi2015-11-051-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently cluster allocation is always trying to find a victim chain (a chian has most space), and this may lead to poor performance because of discontiguous allocation in some scenarios. Our test case is block size 4k, cluster size 1M and mount option with localalloc=2048 (2G), since a gd is 32256M (about 31.5G) and a localalloc window is only 2G, creating 50G file will result in 2G from gd0, 2G from gd1, ... One way to improve performance is enlarge localalloc window size (max 31104M), but this will make end user feel that about 30G is suddenly "missing", and localalloc currently do not support steal, which means one node cannot use another node's localalloc even it is not used in fact. So using the last gd to record the allocation and continues with the gd if it has enough space for a localalloc window can make the allocation as more contiguous as possible. Our test result is below (evaluated in IOPS), which is using iometer running in VM, dynamic vhd virtual disk stored in ocfs2. IO model Original After Improved(%) 16K60%Write100%Random 703 876 24.59% 8K90%Write100%Random 735 827 12.59% 4K100%Write100%Random 859 915 6.52% 4K100%Read100%Random 2092 2600 24.30% Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Tested-by: Norton Zhu <norton.zhu@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: fill in the unused portion of the block with zeros by dio_zero_block()jiangyiwen2015-11-051-0/+1
| | | | | | | | | | | | | | | | | | | A simplified test case is (this case from Ryan): 1) dd if=/dev/zero of=/mnt/hello bs=512 count=1 oflag=direct; 2) truncate /mnt/hello -s 2097152 file 'hello' is not exist before test. After this command, file 'hello' should be all zero. But 512~4096 is some random data. Setting bh state to new when get a new block, if so, direct_io_worker()->dio_zero_block() will fill-in the unused portion of the block with zero. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2_direct_IO_write() misses ocfs2_is_overwrite() error codeNorton.Zhu2015-11-051-0/+1
| | | | | | | | | | | If ocfs2_is_overwrite failed, ocfs2_direct_IO_write mays till return success to the caller. Signed-off-by: Norton.Zhu <norton.zhu@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* logfs: fix build warningSudip Mukherjee2015-11-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | fs/logfs/dev_bdev.c: In function '__bdev_writeseg': include/linux/kernel.h:601:17: warning: comparison of distinct pointer types lacks a cast [enabled by default] (void) (&_min1 == &_min2); \ fs/logfs/dev_bdev.c:84:14: note: in expansion of macro 'min' max_pages = min(nr_pages, BIO_MAX_PAGES); fs/logfs/dev_bdev.c: In function 'do_erase': include/linux/kernel.h:601:17: warning: comparison of distinct pointer types lacks a cast [enabled by default] (void) (&_min1 == &_min2); \ fs/logfs/dev_bdev.c:174:14: note: in expansion of macro 'min' max_pages = min(nr_pages, BIO_MAX_PAGES); Lets use min_t and mention the type. Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* inotify: actually check for invalid bits in sys_inotify_add_watch()Dave Hansen2015-11-051-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | The comment here says that it is checking for invalid bits. But, the mask is *actually* checking to ensure that _any_ valid bit is set, which is quite different. Without this check, an unexpected bit could get set on an inotify object. Since these bits are also interpreted by the fsnotify/dnotify code, there is the potential for an object to be mishandled inside the kernel. For instance, can we be sure that setting the dnotify flag FS_DN_RENAME on an inotify watch is harmless? Add the actual check which was intended. Retain the existing inotify bits are being added to the watch. Plus, this is existing behavior which would be nice to preserve. I did a quick sniff test that inotify functions and that my 'inotify-tools' package passes 'make check'. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Cc: Eric Paris <eparis@parisplace.org> Cc: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* inotify: hide internal kernel bits from fdinfoDave Hansen2015-11-051-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a report that my patch: inotify: actually check for invalid bits in sys_inotify_add_watch() broke CRIU. The reason is that CRIU looks up raw flags in /proc/$pid/fdinfo/* to figure out how to rebuild inotify watches and then passes those flags directly back in to the inotify API. One of those flags (FS_EVENT_ON_CHILD) is set in mark->mask, but is not part of the inotify API. It is used inside the kernel to _implement_ inotify but it is not and has never been part of the API. My patch above ensured that we only allow bits which are part of the API (IN_ALL_EVENTS). This broke CRIU. FS_EVENT_ON_CHILD is really internal to the kernel. It is set _anyway_ on all inotify marks. So, CRIU was really just trying to set a bit that was already set. This patch hides that bit from fdinfo. CRIU will not see the bit, not try to set it, and should work as before. We should not have been exposing this bit in the first place, so this is a good patch independent of the CRIU problem. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reported-by: Andrey Wagin <avagin@gmail.com> Acked-by: Andrey Vagin <avagin@openvz.org> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Eric Paris <eparis@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'char-misc-4.4-rc1' of ↵Linus Torvalds2015-11-04204-2645/+21697
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big char/misc driver update for 4.4-rc1. Lots of different driver and subsystem updates, hwtracing being the largest with the addition of some new platforms that are now supported. Full details in the shortlog. All of these have been in linux-next for a long time with no reported issues" * tag 'char-misc-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (181 commits) fpga: socfpga: Fix check of return value of devm_request_irq lkdtm: fix ACCESS_USERSPACE test mcb: Destroy IDA on module unload mcb: Do not return zero on error path in mcb_pci_probe() mei: bus: set the device name before running fixup mei: bus: use correct lock ordering mei: Fix debugfs filename in error output char: ipmi: ipmi_ssif: Replace timeval with timespec64 fpga: zynq-fpga: Fix issue with drvdata being overwritten. fpga manager: remove unnecessary null pointer checks fpga manager: ensure lifetime with of_fpga_mgr_get fpga: zynq-fpga: Change fw format to handle bin instead of bit. fpga: zynq-fpga: Fix unbalanced clock handling misc: sram: partition base address belongs to __iomem space coresight: etm3x: adding documentation for sysFS's cpu interface vme: 8-bit status/id takes 256 values, not 255 fpga manager: Adding FPGA Manager support for Xilinx Zynq 7000 ARM: zynq: dt: Updated devicetree for Zynq 7000 platform. ARM: dt: fpga: Added binding docs for Xilinx Zynq FPGA manager. ver_linux: proc/modules, limit text processing to 'sed' ...
| * fpga: socfpga: Fix check of return value of devm_request_irqMoritz Fischer2015-10-291-1/+1
| | | | | | | | | | | | | | | | | | | | The return value should be checked for non-zero, instead of checking it being IS_ERR_VALUE(). Acked-by: Alan Tull <atull@opensource.altera.com> Reviewed-by: Josh Cartwright <joshc@eso.teric.us> Signed-off-by: Moritz Fischer <moritz.fischer@ettus.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * lkdtm: fix ACCESS_USERSPACE testStephen Smalley2015-10-291-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a copy_to_user() call to the ACCESS_USERSPACE test prior to attempting direct dereferencing of the user address to ensure the page is present. Otherwise, a fault occurs on arm kernels even prior to the introduction of CONFIG_CPU_SW_DOMAIN_PAN, and there is no difference in behavior for CONFIG_CPU_SW_DOMAIN_PAN=n vs CONFIG_CPU_SW_DOMAIN_PAN=y. Before this change, for any value of CONFIG_CPU_SW_DOMAIN_PAN: lkdtm: Performing direct entry ACCESS_USERSPACE lkdtm: attempting bad read at b6fe8000 Unable to handle kernel paging request at virtual address b6fe8000 After this change, for CONFIG_CPU_SW_DOMAIN_PAN=n: lkdtm: Performing direct entry ACCESS_USERSPACE lkdtm: attempting bad read at b6efc000 lkdtm: attempting bad write at b6efc000 After this change, for CONFIG_CPU_SW_DOMAIN_PAN=y: lkdtm: Performing direct entry ACCESS_USERSPACE lkdtm: attempting bad read at b6f7d000 Unhandled fault: page domain fault (0x01b) at 0xb6f7d000 ... Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * mcb: Destroy IDA on module unloadJohannes Thumshirn2015-10-291-0/+1
| | | | | | | | | | | | | | Destroy mcb_ida on module_unload Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * mcb: Do not return zero on error path in mcb_pci_probe()Alexey Khoroshilov2015-10-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | There is an error path in mcb_pci_probe() where it returns zero instead of error code. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * mei: bus: set the device name before running fixupTomas Winkler2015-10-291-4/+16
| | | | | | | | | | | | | | | | | | The mei bus fixup use dev_xxx services for printing to kernel log so we need to setup the device name prior to running fixup hooks. Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * mei: bus: use correct lock orderingTomas Winkler2015-10-291-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The correct lock order is cl_bus_lock device_lock me_clients_rwsem This order was violated in bus rescan and remove routines when me_client_rwsem was locked before cl_bus_lock. Chain exists of: [ 4.321653] &dev->device_lock --> &dev->me_clients_rwsem --> &dev->cl_bus_lock [ 4.321653] [ 4.321679] Possible unsafe locking scenario: [ 4.321679] [ 4.321693] CPU0 CPU1 [ 4.321701] ---- ---- [ 4.321709] lock(&dev->cl_bus_lock); [ 4.321720] lock(&dev->me_clients_rwsem); [ 4.321733] lock(&dev->cl_bus_lock); [ 4.321745] lock(&dev->device_lock); [ 4.321755] [ 4.321755] *** DEADLOCK *** [ 4.321755] Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * mei: Fix debugfs filename in error outputAlexander Kuleshov2015-10-271-1/+1
| | | | | | | | | | | | Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * char: ipmi: ipmi_ssif: Replace timeval with timespec64Amitoj Kaur Chawla2015-10-241-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch replaces timeval with timespec64 as 32 bit 'struct timeval' will not give current time beyond 2038. The patch changes the code to use ktime_get_real_ts64() which returns a 'struct timespec64' instead of do_gettimeofday() which returns a 'struct timeval' This patch also alters the format string in pr_info() for now.tv_sec to incorporate 'long long' on 32 bit architectures. Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * fpga: zynq-fpga: Fix issue with drvdata being overwritten.Moritz Fischer2015-10-231-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | Upon registering a FPGA Manager low level driver, FPGA Manager core overwrites the platform drvdata pointer. Prior to this commit zynq-fpga falsely relied on this pointer to still be valid at remove() time. Reported-by: Alan Tull <atull@opensource.altera.com> Signed-off-by: Moritz Fischer <moritz.fischer@ettus.com> Acked-by: Alan Tull <atull@opensource.altera.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * fpga manager: remove unnecessary null pointer checksAlan Tull2015-10-231-8/+4
| | | | | | | | | | | | | | | | | | | | Remove unnecessary null pointer checks. We want the caller of these functions to do their own pointer checks. Add some comments to document this. Signed-off-by: Alan Tull <atull@opensource.altera.com> Reviewed-by: Moritz Fischer <moritz.fischer@ettus.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * fpga manager: ensure lifetime with of_fpga_mgr_getAlan Tull2015-10-231-15/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure device and driver lifetime from of_fpga_mgr_get() to fpga_mgr_put(). * Don't put_device() in of_fpga_mgr_get, do it in fpga_mgr_put(). (still do put_device if there is an error). * Do module_get on the low level driver. * Don't need to module_get(THIS_MODULE) since we won't be allowed to unload the fpga manager core without unloading low level driver first. * Remove unnedessary null check for node pointer. Signed-off-by: Alan Tull <atull@opensource.altera.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * fpga: zynq-fpga: Change fw format to handle bin instead of bit.Moritz Fischer2015-10-231-22/+2
| | | | | | | | | | | | | | | | | | | | This gets rid of the code to strip away the header and byteswap, as well as the check for the sync word. Signed-off-by: Moritz Fischer <moritz.fischer@ettus.com> Reviewed-by: Josh Cartwright <joshc@ni.com> Acked-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * fpga: zynq-fpga: Fix unbalanced clock handlingMoritz Fischer2015-10-231-2/+2
| | | | | | | | | | | | | | | | | | This commit fixes the unbalanced clock handling, where a failed probe would leave the clock with an enable count of -1. Reported-by: Josh Cartwright <joshc@ni.com> Signed-off-by: Moritz Fischer <moritz.fischer@ettus.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * misc: sram: partition base address belongs to __iomem spaceVladimir Zapolskiy2015-10-181-3/+3
| | | | | | | | | | | | | | | | | | | | | | The change fixes a warning found by sparse: drivers/misc/sram.c:134:20: warning: incorrect type in assignment (different address spaces) drivers/misc/sram.c:134:20: expected void *base drivers/misc/sram.c:134:20: got void [noderef] <asn:2>* Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * coresight: etm3x: adding documentation for sysFS's cpu interfaceMathieu Poirier2015-10-171-0/+6
| | | | | | | | | | | | | | | | Supplementing ABI documentation with a description of the newly added interface. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * vme: 8-bit status/id takes 256 values, not 255Dmitry Kalinkin2015-10-172-1/+6
| | | | | | | | | | | | | | Fixes an off by one array size. Signed-off-by: Dmitry Kalinkin <dmitry.kalinkin@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * fpga manager: Adding FPGA Manager support for Xilinx Zynq 7000Moritz Fischer2015-10-173-0/+539
| | | | | | | | | | | | | | | | | | This commit adds FPGA Manager support for the Xilinx Zynq chip. The code borrows some from the xdevcfg driver in Xilinx' vendor tree. Signed-off-by: Moritz Fischer <moritz.fischer@ettus.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ARM: zynq: dt: Updated devicetree for Zynq 7000 platform.Moritz Fischer2015-10-171-0/+5
| | | | | | | | | | | | | | | | | | Added addtional nodes required for FPGA Manager operation of the Xilinx Zynq Devc configuration interface. Reviewed-by: Sören Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: Moritz Fischer <moritz.fischer@ettus.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ARM: dt: fpga: Added binding docs for Xilinx Zynq FPGA manager.Moritz Fischer2015-10-171-0/+19
| | | | | | | | | | Signed-off-by: Moritz Fischer <moritz.fischer@ettus.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ver_linux: proc/modules, limit text processing to 'sed'Alexander Kapshuk2015-10-171-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is more of a personal preference, rather than a fix for a problem. The current implementation used a combination of both 'cat' and 'sed' to generate an unsorted list of kernel modules separated by while space. The proposed implementation uses 'sort' and 'sed' to generate a sort list of kernel modules separated by while space. Tested on: Gentoo Linux Debian 6.0.10 Oracle Linux Server release 7.1 Arch Linux openSuSE 13.2 Signed-off-by: Alexander Kapshuk <alexander.kapshuk@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ver_linux: wireless-tools, look for numerical input, not field numberAlexander Kapshuk2015-10-171-2/+6
| | | | | | | | | | | | | | | | | | | | Rely on regex to find the version number, rather than rely on numerical input to be found in a particular input field. Tested on: Gentoo Linux Signed-off-by: Alexander Kapshuk <alexander.kapshuk@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * ver_linux: use 'udevadm', instead of 'udevinfo'Alexander Kapshuk2015-10-171-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'udevinfo' no longer seems to be available across various distros. 'udevadm' seems to be the currently valid way to look up the 'udev' version. Tested on: Gentoo Linux Debian 6.0.10 Oracle Linux Server release 7.1 Rely on regex to find the version number, rather than rely on numerical input to be found in a particular input field. Proposed implementation also eliminates the necessity to invoke 'grep' + 'awk'. Signed-off-by: Alexander Kapshuk <alexander.kapshuk@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>