From 01f8fa4f01d8362358eb90e412bd7ae18a3ec1ad Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx@linutronix.de>
Date: Wed, 16 Apr 2014 14:36:44 +0000
Subject: genirq: Allow forcing cpu affinity of interrupts

The current implementation of irq_set_affinity() refuses rightfully to
route an interrupt to an offline cpu.

But there is a special case, where this is actually desired. Some of
the ARM SoCs have per cpu timers which require setting the affinity
during cpu startup where the cpu is not yet in the online mask.

If we can't do that, then the local timer interrupt for the about to
become online cpu is routed to some random online cpu.

The developers of the affected machines tried to work around that
issue, but that results in a massive mess in that timer code.

We have a yet unused argument in the set_affinity callbacks of the irq
chips, which I added back then for a similar reason. It was never
required so it got not used. But I'm happy that I never removed it.

That allows us to implement a sane handling of the above scenario. So
the affected SoC drivers can add the required force handling to their
interrupt chip, switch the timer code to irq_force_affinity() and
things just work.

This does not affect any existing user of irq_set_affinity().

Tagged for stable to allow a simple fix of the affected SoC clock
event drivers.

Reported-and-tested-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Tomasz Figa <t.figa@samsung.com>,
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>,
Cc: Kukjin Kim <kgene.kim@samsung.com>
Cc: linux-arm-kernel@lists.infradead.org,
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20140416143315.717251504@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/irq/manage.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

(limited to 'kernel')

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 2486a4c1a710..d34131ca372b 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -180,7 +180,7 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
 	struct irq_chip *chip = irq_data_get_irq_chip(data);
 	int ret;
 
-	ret = chip->irq_set_affinity(data, mask, false);
+	ret = chip->irq_set_affinity(data, mask, force);
 	switch (ret) {
 	case IRQ_SET_MASK_OK:
 		cpumask_copy(data->affinity, mask);
@@ -192,7 +192,8 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
 	return ret;
 }
 
-int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask)
+int irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask,
+			    bool force)
 {
 	struct irq_chip *chip = irq_data_get_irq_chip(data);
 	struct irq_desc *desc = irq_data_to_desc(data);
@@ -202,7 +203,7 @@ int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask)
 		return -EINVAL;
 
 	if (irq_can_move_pcntxt(data)) {
-		ret = irq_do_set_affinity(data, mask, false);
+		ret = irq_do_set_affinity(data, mask, force);
 	} else {
 		irqd_set_move_pending(data);
 		irq_copy_pending(desc, mask);
@@ -217,13 +218,7 @@ int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask)
 	return ret;
 }
 
-/**
- *	irq_set_affinity - Set the irq affinity of a given irq
- *	@irq:		Interrupt to set affinity
- *	@mask:		cpumask
- *
- */
-int irq_set_affinity(unsigned int irq, const struct cpumask *mask)
+int __irq_set_affinity(unsigned int irq, const struct cpumask *mask, bool force)
 {
 	struct irq_desc *desc = irq_to_desc(irq);
 	unsigned long flags;
@@ -233,7 +228,7 @@ int irq_set_affinity(unsigned int irq, const struct cpumask *mask)
 		return -EINVAL;
 
 	raw_spin_lock_irqsave(&desc->lock, flags);
-	ret =  __irq_set_affinity_locked(irq_desc_get_irq_data(desc), mask);
+	ret = irq_set_affinity_locked(irq_desc_get_irq_data(desc), mask, force);
 	raw_spin_unlock_irqrestore(&desc->lock, flags);
 	return ret;
 }
-- 
cgit v1.2.3


From f3f125324fc1b8500cd20a2907628f7e5d88a708 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Date: Sun, 20 Apr 2014 23:43:01 +0200
Subject: PM / suspend: Make cpuidle work in the "freeze" state

The "freeze" system sleep state introduced by commit 7e73c5ae6e79
(PM: Introduce suspend state PM_SUSPEND_FREEZE) requires cpuidle
to be functional when freeze_enter() is executed to work correctly
(that is, to be able to save any more energy than runtime idle),
but that is impossible after commit 8651f97bd951d (PM / cpuidle:
System resume hang fix with cpuidle) which caused cpuidle to be
paused in dpm_suspend_noirq() and resumed in dpm_resume_noirq().

To avoid that problem, add cpuidle_resume() and cpuidle_pause()
to the beginning and the end of freeze_enter(), respectively.

Reported-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---
 kernel/power/suspend.c | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'kernel')

diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index c3ad9cafe930..8233cd4047d7 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -14,6 +14,7 @@
 #include <linux/init.h>
 #include <linux/console.h>
 #include <linux/cpu.h>
+#include <linux/cpuidle.h>
 #include <linux/syscalls.h>
 #include <linux/gfp.h>
 #include <linux/io.h>
@@ -53,7 +54,9 @@ static void freeze_begin(void)
 
 static void freeze_enter(void)
 {
+	cpuidle_resume();
 	wait_event(suspend_freeze_wait_head, suspend_freeze_wake);
+	cpuidle_pause();
 }
 
 void freeze_wake(void)
-- 
cgit v1.2.3


From 90f62cf30a78721641e08737bda787552428061e Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Wed, 23 Apr 2014 14:29:27 -0700
Subject: net: Use netlink_ns_capable to verify the permisions of netlink
 messages

It is possible by passing a netlink socket to a more privileged
executable and then to fool that executable into writing to the socket
data that happens to be valid netlink message to do something that
privileged executable did not intend to do.

To keep this from happening replace bare capable and ns_capable calls
with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
Which act the same as the previous calls except they verify that the
opener of the socket had the desired permissions as well.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 kernel/audit.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'kernel')

diff --git a/kernel/audit.c b/kernel/audit.c
index 7c2893602d06..47845c57eb19 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -643,13 +643,13 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
 		if ((task_active_pid_ns(current) != &init_pid_ns))
 			return -EPERM;
 
-		if (!capable(CAP_AUDIT_CONTROL))
+		if (!netlink_capable(skb, CAP_AUDIT_CONTROL))
 			err = -EPERM;
 		break;
 	case AUDIT_USER:
 	case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
 	case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
-		if (!capable(CAP_AUDIT_WRITE))
+		if (!netlink_capable(skb, CAP_AUDIT_WRITE))
 			err = -EPERM;
 		break;
 	default:  /* bad msg */
-- 
cgit v1.2.3


From 79465d2fd48e68940c2bdecddbdecd45bbba06fe Mon Sep 17 00:00:00 2001
From: Rusty Russell <rusty@rustcorp.com.au>
Date: Mon, 28 Apr 2014 11:05:43 +0930
Subject: module: remove warning about waiting module removal.

We remove the waiting module removal in commit 3f2b9c9cdf38 (September
2013), but it turns out that modprobe in kmod (< version 16) was
asking for waiting module removal.  No one noticed since modprobe would
check for 0 usage immediately before trying to remove the module, and
the race is unlikely.

However, it means that anyone running old (but not ancient) kmod
versions is hitting the printk designed to see if anyone was running
"rmmod -w".  All reports so far have been false positives, so remove
the warning.

Fixes: 3f2b9c9cdf389e303b2273679af08aab5f153517
Reported-by: Valerio Vanni <valerio.vanni@inwind.it>
Cc: Elliott, Robert (Server Storage) <Elliott@hp.com>
Cc: stable@kernel.org
Acked-by: Lucas De Marchi <lucas.de.marchi@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 kernel/module.c | 3 ---
 1 file changed, 3 deletions(-)

(limited to 'kernel')

diff --git a/kernel/module.c b/kernel/module.c
index 11869408f79b..ae7821898bf2 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -815,9 +815,6 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
 		return -EFAULT;
 	name[MODULE_NAME_LEN-1] = '\0';
 
-	if (!(flags & O_NONBLOCK))
-		pr_warn("waiting module removal not supported: please upgrade\n");
-
 	if (mutex_lock_interruptible(&module_mutex) != 0)
 		return -EINTR;
 
-- 
cgit v1.2.3


From 62a08ae2a5763aabeee98264605236b001503e0c Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx@linutronix.de>
Date: Thu, 24 Apr 2014 09:50:53 +0200
Subject: genirq: x86: Ensure that dynamic irq allocation does not conflict

On x86 the allocation of irq descriptors may allocate interrupts which
are in the range of the GSI interrupts. That's wrong as those
interrupts are hardwired and we don't have the irq domain translation
like PPC. So one of these interrupts can be hooked up later to one of
the devices which are hard wired to it and the io_apic init code for
that particular interrupt line happily reuses that descriptor with a
completely different configuration so hell breaks lose.

Inside x86 we allocate dynamic interrupts from above nr_gsi_irqs,
except for a few usage sites which have not yet blown up in our face
for whatever reason. But for drivers which need an irq range, like the
GPIO drivers, we have no limit in place and we don't want to expose
such a detail to a driver.

To cure this introduce a function which an architecture can implement
to impose a lower bound on the dynamic interrupt allocations.

Implement it for x86 and set the lower bound to nr_gsi_irqs, which is
the end of the hardwired interrupt space, so all dynamic allocations
happen above.

That not only allows the GPIO driver to work sanely, it also protects
the bogus callsites of create_irq_nr() in hpet, uv, irq_remapping and
htirq code. They need to be cleaned up as well, but that's a separate
issue.

Reported-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Krogerus Heikki <heikki.krogerus@intel.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1404241617360.28206@ionos.tec.linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/irq/irqdesc.c | 7 +++++++
 kernel/softirq.c     | 5 +++++
 2 files changed, 12 insertions(+)

(limited to 'kernel')

diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index a7174617616b..bb07f2928f4b 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -363,6 +363,13 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
 		if (from > irq)
 			return -EINVAL;
 		from = irq;
+	} else {
+		/*
+		 * For interrupts which are freely allocated the
+		 * architecture can force a lower bound to the @from
+		 * argument. x86 uses this to exclude the GSI space.
+		 */
+		from = arch_dynirq_lower_bound(from);
 	}
 
 	mutex_lock(&sparse_irq_lock);
diff --git a/kernel/softirq.c b/kernel/softirq.c
index b50990a5bea0..33e4648ae0e7 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -779,3 +779,8 @@ int __init __weak arch_early_irq_init(void)
 {
 	return 0;
 }
+
+unsigned int __weak arch_dynirq_lower_bound(unsigned int from)
+{
+	return from;
+}
-- 
cgit v1.2.3


From a949ae560a511fe4e3adf48fa44fefded93e5c2b Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
Date: Thu, 24 Apr 2014 10:40:12 -0400
Subject: ftrace/module: Hardcode ftrace_module_init() call into load_module()

A race exists between module loading and enabling of function tracer.

	CPU 1				CPU 2
	-----				-----
  load_module()
   module->state = MODULE_STATE_COMING

				register_ftrace_function()
				 mutex_lock(&ftrace_lock);
				 ftrace_startup()
				  update_ftrace_function();
				   ftrace_arch_code_modify_prepare()
				    set_all_module_text_rw();
				   <enables-ftrace>
				    ftrace_arch_code_modify_post_process()
				     set_all_module_text_ro();

				[ here all module text is set to RO,
				  including the module that is
				  loading!! ]

   blocking_notifier_call_chain(MODULE_STATE_COMING);
    ftrace_init_module()

     [ tries to modify code, but it's RO, and fails!
       ftrace_bug() is called]

When this race happens, ftrace_bug() will produces a nasty warning and
all of the function tracing features will be disabled until reboot.

The simple solution is to treate module load the same way the core
kernel is treated at boot. To hardcode the ftrace function modification
of converting calls to mcount into nops. This is done in init/main.c
there's no reason it could not be done in load_module(). This gives
a better control of the changes and doesn't tie the state of the
module to its notifiers as much. Ftrace is special, it needs to be
treated as such.

The reason this would work, is that the ftrace_module_init() would be
called while the module is in MODULE_STATE_UNFORMED, which is ignored
by the set_all_module_text_ro() call.

Link: http://lkml.kernel.org/r/1395637826-3312-1-git-send-email-indou.takao@jp.fujitsu.com

Reported-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@vger.kernel.org # 2.6.38+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/module.c       |  3 +++
 kernel/trace/ftrace.c | 27 ++++-----------------------
 2 files changed, 7 insertions(+), 23 deletions(-)

(limited to 'kernel')

diff --git a/kernel/module.c b/kernel/module.c
index 11869408f79b..5f14fec9f825 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3271,6 +3271,9 @@ static int load_module(struct load_info *info, const char __user *uargs,
 
 	dynamic_debug_setup(info->debug, info->num_debug);
 
+	/* Ftrace init must be called in the MODULE_STATE_UNFORMED state */
+	ftrace_module_init(mod);
+
 	/* Finally it's fully formed, ready to start executing. */
 	err = complete_formation(mod, info);
 	if (err)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 1fd4b9479210..4a54a25afa2f 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -4330,16 +4330,11 @@ static void ftrace_init_module(struct module *mod,
 	ftrace_process_locs(mod, start, end);
 }
 
-static int ftrace_module_notify_enter(struct notifier_block *self,
-				      unsigned long val, void *data)
+void ftrace_module_init(struct module *mod)
 {
-	struct module *mod = data;
-
-	if (val == MODULE_STATE_COMING)
-		ftrace_init_module(mod, mod->ftrace_callsites,
-				   mod->ftrace_callsites +
-				   mod->num_ftrace_callsites);
-	return 0;
+	ftrace_init_module(mod, mod->ftrace_callsites,
+			   mod->ftrace_callsites +
+			   mod->num_ftrace_callsites);
 }
 
 static int ftrace_module_notify_exit(struct notifier_block *self,
@@ -4353,11 +4348,6 @@ static int ftrace_module_notify_exit(struct notifier_block *self,
 	return 0;
 }
 #else
-static int ftrace_module_notify_enter(struct notifier_block *self,
-				      unsigned long val, void *data)
-{
-	return 0;
-}
 static int ftrace_module_notify_exit(struct notifier_block *self,
 				     unsigned long val, void *data)
 {
@@ -4365,11 +4355,6 @@ static int ftrace_module_notify_exit(struct notifier_block *self,
 }
 #endif /* CONFIG_MODULES */
 
-struct notifier_block ftrace_module_enter_nb = {
-	.notifier_call = ftrace_module_notify_enter,
-	.priority = INT_MAX,	/* Run before anything that can use kprobes */
-};
-
 struct notifier_block ftrace_module_exit_nb = {
 	.notifier_call = ftrace_module_notify_exit,
 	.priority = INT_MIN,	/* Run after anything that can remove kprobes */
@@ -4403,10 +4388,6 @@ void __init ftrace_init(void)
 				  __start_mcount_loc,
 				  __stop_mcount_loc);
 
-	ret = register_module_notifier(&ftrace_module_enter_nb);
-	if (ret)
-		pr_warning("Failed to register trace ftrace module enter notifier\n");
-
 	ret = register_module_notifier(&ftrace_module_exit_nb);
 	if (ret)
 		pr_warning("Failed to register trace ftrace module exit notifier\n");
-- 
cgit v1.2.3


From 6c6c0d5a1c949d2e084706f9e5fb1fccc175b265 Mon Sep 17 00:00:00 2001
From: Stuart Hayes <stuart.w.hayes@gmail.com>
Date: Tue, 29 Apr 2014 17:55:02 -0500
Subject: hrtimer: Prevent all reprogramming if hang detected

If the last hrtimer interrupt detected a hang it sets hang_detected=1
and programs the clock event device with a delay to let the system
make progress.

If hang_detected == 1, we prevent reprogramming of the clock event
device in hrtimer_reprogram() but not in hrtimer_force_reprogram().

This can lead to the following situation:

hrtimer_interrupt()
   hang_detected = 1;
   program ce device to Xms from now (hang delay)

We have two timers pending:
   T1 expires 50ms from now
   T2 expires 5s from now

Now T1 gets canceled, which causes hrtimer_force_reprogram() to be
invoked, which in turn programs the clock event device to T2 (5
seconds from now).

Any hrtimer_start after that will not reprogram the hardware due to
hang_detected still being set. So we effectivly block all timers until
the T2 event fires and cleans up the hang situation.

Add a check for hang_detected to hrtimer_force_reprogram() which
prevents the reprogramming of the hang delay in the hardware
timer. The subsequent hrtimer_interrupt will resolve all outstanding
issues.

[ tglx: Rewrote subject and changelog and fixed up the comment in
  	hrtimer_force_reprogram() ]

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Link: http://lkml.kernel.org/r/53602DC6.2060101@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/hrtimer.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

(limited to 'kernel')

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index d55092ceee29..e3724fdac2da 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -569,6 +569,23 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal)
 
 	cpu_base->expires_next.tv64 = expires_next.tv64;
 
+	/*
+	 * If a hang was detected in the last timer interrupt then we
+	 * leave the hang delay active in the hardware. We want the
+	 * system to make progress. That also prevents the following
+	 * scenario:
+	 * T1 expires 50ms from now
+	 * T2 expires 5s from now
+	 *
+	 * T1 is removed, so this code is called and would reprogram
+	 * the hardware to 5s from now. Any hrtimer_start after that
+	 * will not reprogram the hardware due to hang_detected being
+	 * set. So we'd effectivly block all timers until the T2 event
+	 * fires.
+	 */
+	if (cpu_base->hang_detected)
+		return;
+
 	if (cpu_base->expires_next.tv64 != KTIME_MAX)
 		tick_program_event(cpu_base->expires_next, 1);
 }
-- 
cgit v1.2.3


From 012a45e3f4af68e86d85cce060c6c2fed56498b2 Mon Sep 17 00:00:00 2001
From: Leon Ma <xindong.ma@intel.com>
Date: Wed, 30 Apr 2014 16:43:10 +0800
Subject: hrtimer: Prevent remote enqueue of leftmost timers

If a cpu is idle and starts an hrtimer which is not pinned on that
same cpu, the nohz code might target the timer to a different cpu.

In the case that we switch the cpu base of the timer we already have a
sanity check in place, which determines whether the timer is earlier
than the current leftmost timer on the target cpu. In that case we
enqueue the timer on the current cpu because we cannot reprogram the
clock event device on the target.

If the timers base is already the target CPU we do not have this
sanity check in place so we enqueue the timer as the leftmost timer in
the target cpus rb tree, but we cannot reprogram the clock event
device on the target cpu. So the timer expires late and subsequently
prevents the reprogramming of the target cpu clock event device until
the previously programmed event fires or a timer with an earlier
expiry time gets enqueued on the target cpu itself.

Add the same target check as we have for the switch base case and
start the timer on the current cpu if it would become the leftmost
timer on the target.

[ tglx: Rewrote subject and changelog ]

Signed-off-by: Leon Ma <xindong.ma@intel.com>
Link: http://lkml.kernel.org/r/1398847391-5994-1-git-send-email-xindong.ma@intel.com
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/hrtimer.c | 5 +++++
 1 file changed, 5 insertions(+)

(limited to 'kernel')

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index e3724fdac2da..6b715c0af1b1 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -234,6 +234,11 @@ again:
 			goto again;
 		}
 		timer->base = new_base;
+	} else {
+		if (cpu != this_cpu && hrtimer_check_target(timer, new_base)) {
+			cpu = this_cpu;
+			goto again;
+		}
 	}
 	return new_base;
 }
-- 
cgit v1.2.3


From 98a01e779f3c66b0b11cd7e64d531c0e41c95762 Mon Sep 17 00:00:00 2001
From: Jiri Bohac <jbohac@suse.cz>
Date: Fri, 18 Apr 2014 17:23:11 +0200
Subject: timer: Prevent overflow in apply_slack

On architectures with sizeof(int) < sizeof (long), the
computation of mask inside apply_slack() can be undefined if the
computed bit is > 32.

E.g. with: expires = 0xffffe6f5 and slack = 25, we get:

expires_limit = 0x20000000e
bit = 33
mask = (1 << 33) - 1  /* undefined */

On x86, mask becomes 1 and and the slack is not applied properly.
On s390, mask is -1, expires is set to 0 and the timer fires immediately.

Use 1UL << bit to solve that issue.

Suggested-by: Deborah Townsend <dstownse@us.ibm.com>
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20140418152310.GA13654@midget.suse.cz
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'kernel')

diff --git a/kernel/timer.c b/kernel/timer.c
index 87bd529879c2..3bb01a323b2a 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -838,7 +838,7 @@ unsigned long apply_slack(struct timer_list *timer, unsigned long expires)
 
 	bit = find_last_bit(&mask, BITS_PER_LONG);
 
-	mask = (1 << bit) - 1;
+	mask = (1UL << bit) - 1;
 
 	expires_limit = expires_limit & ~(mask);
 
-- 
cgit v1.2.3


From 561a4fe851ccab9dd0d14989ab566f9392d9f8b5 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
Date: Fri, 2 May 2014 13:30:04 -0400
Subject: tracing: Use rcu_dereference_sched() for trace event triggers

As trace event triggers are now part of the mainline kernel, I added
my trace event trigger tests to my test suite I run on all my kernels.
Now these tests get run under different config options, and one of
those options is CONFIG_PROVE_RCU, which checks under lockdep that
the rcu locking primitives are being used correctly. This triggered
the following splat:

===============================
[ INFO: suspicious RCU usage. ]
3.15.0-rc2-test+ #11 Not tainted
-------------------------------
kernel/trace/trace_events_trigger.c:80 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
4 locks held by swapper/1/0:
 #0:  ((&(&j_cdbs->work)->timer)){..-...}, at: [<ffffffff8104d2cc>] call_timer_fn+0x5/0x1be
 #1:  (&(&pool->lock)->rlock){-.-...}, at: [<ffffffff81059856>] __queue_work+0x140/0x283
 #2:  (&p->pi_lock){-.-.-.}, at: [<ffffffff8106e961>] try_to_wake_up+0x2e/0x1e8
 #3:  (&rq->lock){-.-.-.}, at: [<ffffffff8106ead3>] try_to_wake_up+0x1a0/0x1e8

stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15.0-rc2-test+ #11
Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
 0000000000000001 ffff88007e083b98 ffffffff819f53a5 0000000000000006
 ffff88007b0942c0 ffff88007e083bc8 ffffffff81081307 ffff88007ad96d20
 0000000000000000 ffff88007af2d840 ffff88007b2e701c ffff88007e083c18
Call Trace:
 <IRQ>  [<ffffffff819f53a5>] dump_stack+0x4f/0x7c
 [<ffffffff81081307>] lockdep_rcu_suspicious+0x107/0x110
 [<ffffffff810ee51c>] event_triggers_call+0x99/0x108
 [<ffffffff810e8174>] ftrace_event_buffer_commit+0x42/0xa4
 [<ffffffff8106aadc>] ftrace_raw_event_sched_wakeup_template+0x71/0x7c
 [<ffffffff8106bcbf>] ttwu_do_wakeup+0x7f/0xff
 [<ffffffff8106bd9b>] ttwu_do_activate.constprop.126+0x5c/0x61
 [<ffffffff8106eadf>] try_to_wake_up+0x1ac/0x1e8
 [<ffffffff8106eb77>] wake_up_process+0x36/0x3b
 [<ffffffff810575cc>] wake_up_worker+0x24/0x26
 [<ffffffff810578bc>] insert_work+0x5c/0x65
 [<ffffffff81059982>] __queue_work+0x26c/0x283
 [<ffffffff81059999>] ? __queue_work+0x283/0x283
 [<ffffffff810599b7>] delayed_work_timer_fn+0x1e/0x20
 [<ffffffff8104d3a6>] call_timer_fn+0xdf/0x1be^M
 [<ffffffff8104d2cc>] ? call_timer_fn+0x5/0x1be
 [<ffffffff81059999>] ? __queue_work+0x283/0x283
 [<ffffffff8104d823>] run_timer_softirq+0x1a4/0x22f^M
 [<ffffffff8104696d>] __do_softirq+0x17b/0x31b^M
 [<ffffffff81046d03>] irq_exit+0x42/0x97
 [<ffffffff81a08db6>] smp_apic_timer_interrupt+0x37/0x44
 [<ffffffff81a07a2f>] apic_timer_interrupt+0x6f/0x80
 <EOI>  [<ffffffff8100a5d8>] ? default_idle+0x21/0x32
 [<ffffffff8100a5d6>] ? default_idle+0x1f/0x32
 [<ffffffff8100ac10>] arch_cpu_idle+0xf/0x11
 [<ffffffff8107b3a4>] cpu_startup_entry+0x1a3/0x213
 [<ffffffff8102a23c>] start_secondary+0x212/0x219

The cause is that the triggers are protected by rcu_read_lock_sched() but
the data is dereferenced with rcu_dereference() which expects it to
be protected with rcu_read_lock(). The proper reference should be
rcu_dereference_sched().

Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace_events_trigger.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'kernel')

diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 925f537f07d1..4747b476a030 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -77,7 +77,7 @@ event_triggers_call(struct ftrace_event_file *file, void *rec)
 			data->ops->func(data);
 			continue;
 		}
-		filter = rcu_dereference(data->filter);
+		filter = rcu_dereference_sched(data->filter);
 		if (filter && !filter_match_preds(filter, rec))
 			continue;
 		if (data->cmd_ops->post_trigger) {
-- 
cgit v1.2.3