summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2013-05-15Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: - Cure for not using zalloc in the first place, which leads to random crashes with CPUMASK_OFF_STACK. - Revert a user space visible change which broke udev - Add a missing cpu_online early return introduced by the new full dyntick conversions - Plug a long standing race in the timer wheel cpu hotplug code. Sigh... - Cleanup NOHZ per cpu data on cpu down to prevent stale data on cpu up. * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARE tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline tick: Cleanup NOHZ per cpu data on cpu down tick: Use zalloc_cpumask_var for allocating offstack cpumasks
2013-05-15Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Thomas Gleixner: - Two fixlets for the fallout of the generic idle task conversion - Documentation update * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu/idle: Wrap cpu-idle poll mode within rcu_idle_enter/exit idle: Fix hlt/nohlt command-line handling in new generic idle kthread: Document ways of reducing OS jitter due to per-CPU kthreads
2013-05-15Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-armLinus Torvalds
Pull ARM fixes from Russell King: "A small number of fixes for stuff from the last merge window, and in one case (IRQ time accounting) the previous merge window." * 'fixes' of git://git.linaro.org/people/rmk/linux-arm: ARM: 7720/1: ARM v6/v7 cmpxchg64 shouldn't clear upper 32 bits of the old/new value ARM: 7715/1: MCPM: adapt to GIC changes after upstream merge ARM: 7714/1: mmc: mmci: Ensure return value of regulator_enable() is checked ARM: 7712/1: Remove trailing whitespace in arch/arm/Makefile ARM: 7711/1: dove: fix Dove cpu type from V7 to PJ4 ARM: finally enable IRQ time accounting config
2013-05-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "Yes, this is a much larger pull than I would like after -rc1. There are a few things included: - a few fixes for leaks and incorrect assertions - a few patches fixing behavior when mapped images are resized - handling for cloned/layered images that are flattened out from underneath the client The last bit was non-trivial, and there is some code movement and associated cleanup mixed in. This was ready and was meant to go in last week but I missed the boat on Friday. My only excuse is that I was waiting for an all clear from the testing and there were many other shiny things to distract me. Strictly speaking, handling the flatten case isn't a regression and could wait, so if you like we can try to pull the series apart, but Alex and I would much prefer to have it all in as it is a case real users will hit with 3.10." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (33 commits) rbd: re-submit flattened write request (part 2) rbd: re-submit write request for flattened clone rbd: re-submit read request for flattened clone rbd: detect when clone image is flattened rbd: reference count parent requests rbd: define parent image request routines rbd: define rbd_dev_unparent() rbd: don't release write request until necessary rbd: get parent info on refresh rbd: ignore zero-overlap parent rbd: support reading parent page data for writes rbd: fix parent request size assumption libceph: init sent and completed when starting rbd: kill rbd_img_request_get() rbd: only set up watch for mapped images rbd: set mapping read-only flag in rbd_add() rbd: support reading parent page data rbd: fix an incorrect assertion condition rbd: define rbd_dev_v2_header_info() rbd: get rid of trivial v1 header wrappers ...
2013-05-14time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitonsJohn Stultz
Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config, which enables some minor compile time optimization to avoid uncessary code in mostly the suspend/resume path could cause problems for userland. In particular, the dependency for RTC_HCTOSYS on !ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time twice and simplifies suspend/resume, has the side effect of causing the /sys/class/rtc/rtcN/hctosys flag to always be zero, and this flag is commonly used by udev to setup the /dev/rtc symlink to /dev/rtcN, which can cause pain for older applications. While the udev rules could use some work to be less fragile, breaking userland should strongly be avoided. Additionally the compile time optimizations are fairly minor, and the code being optimized is likely to be reworked in the future, so lets revert this change. Reported-by: Kay Sievers <kay@vrfy.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: stable <stable@vger.kernel.org> #3.9 Cc: Feng Tang <feng.tang@intel.com> Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Link: http://lkml.kernel.org/r/1366828376-18124-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-14Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 update from Ted Ts'o: "Fixed regressions (two stability regressions and a performance regression) introduced during the 3.10-rc1 merge window. Also included is a bug fix relating to allocating blocks after resizing an ext3 file system when using the ext4 file system driver" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd,jbd2: fix oops in jbd2_journal_put_journal_head() ext4: revert "ext4: use io_end for multiple bios" ext4: limit group search loop for non-extent files ext4: fix fio regression
2013-05-14Merge branch 'for-3.10-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "A fix for a workqueue_congested() regression that broke fscache" * 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: workqueue_congested() shouldn't translate WORK_CPU_UNBOUND into node number
2013-05-14timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARETirupathi Reddy
An inactive timer's base can refer to a offline cpu's base. In the current code, cpu_base's lock is blindly reinitialized each time a CPU is brought up. If a CPU is brought online during the period that another thread is trying to modify an inactive timer on that CPU with holding its timer base lock, then the lock will be reinitialized under its feet. This leads to following SPIN_BUG(). <0> BUG: spinlock already unlocked on CPU#3, kworker/u:3/1466 <0> lock: 0xe3ebe000, .magic: dead4ead, .owner: kworker/u:3/1466, .owner_cpu: 1 <4> [<c0013dc4>] (unwind_backtrace+0x0/0x11c) from [<c026e794>] (do_raw_spin_unlock+0x40/0xcc) <4> [<c026e794>] (do_raw_spin_unlock+0x40/0xcc) from [<c076c160>] (_raw_spin_unlock+0x8/0x30) <4> [<c076c160>] (_raw_spin_unlock+0x8/0x30) from [<c009b858>] (mod_timer+0x294/0x310) <4> [<c009b858>] (mod_timer+0x294/0x310) from [<c00a5e04>] (queue_delayed_work_on+0x104/0x120) <4> [<c00a5e04>] (queue_delayed_work_on+0x104/0x120) from [<c04eae00>] (sdhci_msm_bus_voting+0x88/0x9c) <4> [<c04eae00>] (sdhci_msm_bus_voting+0x88/0x9c) from [<c04d8780>] (sdhci_disable+0x40/0x48) <4> [<c04d8780>] (sdhci_disable+0x40/0x48) from [<c04bf300>] (mmc_release_host+0x4c/0xb0) <4> [<c04bf300>] (mmc_release_host+0x4c/0xb0) from [<c04c7aac>] (mmc_sd_detect+0x90/0xfc) <4> [<c04c7aac>] (mmc_sd_detect+0x90/0xfc) from [<c04c2504>] (mmc_rescan+0x7c/0x2c4) <4> [<c04c2504>] (mmc_rescan+0x7c/0x2c4) from [<c00a6a7c>] (process_one_work+0x27c/0x484) <4> [<c00a6a7c>] (process_one_work+0x27c/0x484) from [<c00a6e94>] (worker_thread+0x210/0x3b0) <4> [<c00a6e94>] (worker_thread+0x210/0x3b0) from [<c00aad9c>] (kthread+0x80/0x8c) <4> [<c00aad9c>] (kthread+0x80/0x8c) from [<c000ea80>] (kernel_thread_exit+0x0/0x8) As an example, this particular crash occurred when CPU #3 is executing mod_timer() on an inactive timer whose base is refered to offlined CPU #2. The code locked the timer_base corresponding to CPU #2. Before it could proceed, CPU #2 came online and reinitialized the spinlock corresponding to its base. Thus now CPU #3 held a lock which was reinitialized. When CPU #3 finally ended up unlocking the old cpu_base corresponding to CPU #2, we hit the above SPIN_BUG(). CPU #0 CPU #3 CPU #2 ------ ------- ------- ..... ...... <Offline> mod_timer() lock_timer_base spin_lock_irqsave(&base->lock) cpu_up(2) ..... ...... init_timers_cpu() .... ..... spin_lock_init(&base->lock) ..... spin_unlock_irqrestore(&base->lock) ...... <spin_bug> Allocation of per_cpu timer vector bases is done only once under "tvec_base_done[]" check. In the current code, spinlock_initialization of base->lock isn't under this check. When a CPU is up each time the base lock is reinitialized. Move base spinlock initialization under the check. Signed-off-by: Tirupathi Reddy <tirupath@codeaurora.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1368520142-4136-1-git-send-email-tirupath@codeaurora.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-14rcu/idle: Wrap cpu-idle poll mode within rcu_idle_enter/exitSrivatsa S. Bhat
Bjørn Mork reported the following warning when running powertop. [ 49.289034] ------------[ cut here ]------------ [ 49.289055] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common.isra.48+0x3d/0x125() [ 49.289244] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-bisect-rcu-warn+ #107 [ 49.289251] ffffffff8157d8c8 ffffffff81801e28 ffffffff8137e4e3 ffffffff81801e68 [ 49.289260] ffffffff8103094f ffffffff81801e68 0000000000000000 ffff88023afcd9b0 [ 49.289268] 0000000000000000 0140000000000000 ffff88023bee7700 ffffffff81801e78 [ 49.289276] Call Trace: [ 49.289285] [<ffffffff8137e4e3>] dump_stack+0x19/0x1b [ 49.289293] [<ffffffff8103094f>] warn_slowpath_common+0x62/0x7b [ 49.289300] [<ffffffff8103097d>] warn_slowpath_null+0x15/0x17 [ 49.289306] [<ffffffff810a9006>] rcu_eqs_exit_common.isra.48+0x3d/0x125 [ 49.289314] [<ffffffff81079b49>] ? trace_hardirqs_off_caller+0x37/0xa6 [ 49.289320] [<ffffffff810a9692>] rcu_idle_exit+0x85/0xa8 [ 49.289327] [<ffffffff8107076e>] trace_cpu_idle_rcuidle+0xae/0xff [ 49.289334] [<ffffffff810708b1>] cpu_startup_entry+0x72/0x115 [ 49.289341] [<ffffffff813689e5>] rest_init+0x149/0x150 [ 49.289347] [<ffffffff8136889c>] ? csum_partial_copy_generic+0x16c/0x16c [ 49.289355] [<ffffffff81a82d34>] start_kernel+0x3f0/0x3fd [ 49.289362] [<ffffffff81a8274c>] ? repair_env_string+0x5a/0x5a [ 49.289368] [<ffffffff81a82481>] x86_64_start_reservations+0x2a/0x2c [ 49.289375] [<ffffffff81a82550>] x86_64_start_kernel+0xcd/0xd1 [ 49.289379] ---[ end trace 07a1cc95e29e9036 ]--- The warning is that 'rdtp->dynticks' has an unexpected value, which roughly translates to - the calls to rcu_idle_enter() and rcu_idle_exit() were not made in the correct order, or otherwise messed up. And Bjørn's painstaking debugging indicated that this happens when the idle loop enters the poll mode. Looking at the poll function cpu_idle_poll(), and the implementation of trace_cpu_idle_rcuidle(), the problem becomes very clear: cpu_idle_poll() lacks calls to rcu_idle_enter/exit(), and trace_cpu_idle_rcuidle() calls them in the reverse order - first rcu_idle_exit(), and then rcu_idle_enter(). Hence the even/odd alternative sequencing of rdtp->dynticks goes for a toss. And powertop readily triggers this because powertop uses the idle-tracing infrastructure extensively. So, to fix this, wrap the code in cpu_idle_poll() within rcu_idle_enter/exit(), so that it blends properly with the calls inside trace_cpu_idle_rcuidle() and thus get the function ordering right. Reported-and-tested-by: Bjørn Mork <bjorn@mork.no> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/519169BF.4080208@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-14tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offlineThomas Gleixner
commit 5b39939a4 (nohz: Move ts->idle_calls incrementation into strict idle logic) moved code out of tick_nohz_stop_sched_tick() and missed to bail out when the cpu is offline. That's causing subsequent failures as an offline CPU is supposed to die and not to fiddle with nohz magic. Return false in can_stop_idle_tick() if the cpu is offline. Reported-and-tested-by: Jiri Kosina <jkosina@suse.cz> Reported-and-tested-by: Prarit Bhargava <prarit@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305132138160.2863@ionos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-05-14Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc fixes from Benjamin Herrenschmidt: "This is mostly bug fixes (some of them regressions, some of them I deemed worth merging now) along with some patches from Li Zhong hooking up the new context tracking stuff (for the new full NO_HZ)" * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (25 commits) powerpc: Set show_unhandled_signals to 1 by default powerpc/perf: Fix setting of "to" addresses for BHRB powerpc/pmu: Fix order of interpreting BHRB target entries powerpc/perf: Move BHRB code into CONFIG_PPC64 region powerpc: select HAVE_CONTEXT_TRACKING for pSeries powerpc: Use the new schedule_user API on userspace preemption powerpc: Exit user context on notify resume powerpc: Exception hooks for context tracking subsystem powerpc: Syscall hooks for context tracking subsystem powerpc/booke64: Fix kernel hangs at kernel_dbg_exc powerpc: Fix irq_set_affinity() return values powerpc: Provide __bswapdi2 powerpc/powernv: Fix starting of secondary CPUs on OPALv2 and v3 powerpc/powernv: Detect OPAL v3 API version powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning again powerpc: Make CONFIG_RTAS_PROC depend on CONFIG_PROC_FS powerpc: Bring all threads online prior to migration/hibernation powerpc/rtas_flash: Fix validate_flash buffer overflow issue powerpc/kexec: Fix kexec when using VMX optimised memcpy powerpc: Fix build errors STRICT_MM_TYPECHECKS ...
2013-05-14powerpc: Set show_unhandled_signals to 1 by defaultBenjamin Herrenschmidt
Just like other architectures Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc/perf: Fix setting of "to" addresses for BHRBMichael Neuling
Currently we only set the "to" address in the branch stack when the CPU explicitly gives us a value. Unfortunately it only does this for XL form branches (eg blr, bctr, bctar) and not I and B form branches (eg b, bc). Fortunately if we read the instruction from memory we can extract the offset of a branch and calculate the target address. This adds a function power_pmu_bhrb_to() to calculate the target/to address of the corresponding I and B form branches. It handles branches in both user and kernel spaces. It also plumbs this into the perf brhb reading code. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc/pmu: Fix order of interpreting BHRB target entriesMichael Neuling
The current Branch History Rolling Buffer (BHRB) code misinterprets the order of entries in the hardware buffer. It assumes that a branch target address will be read _after_ its corresponding branch. In reality the branch target comes before (lower mfbhrb entry) it's corresponding branch. This is a rewrite of the code to take this into account. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc/perf: Move BHRB code into CONFIG_PPC64 regionMichael Neuling
The new Branch History Rolling buffer (BHRB) code is only useful on 64bit processors, so move it into the #ifdef CONFIG_PPC64 region. This avoids code bloat on 32bit systems. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc: select HAVE_CONTEXT_TRACKING for pSeriesLi Zhong
Start context tracking support from pSeries. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc: Use the new schedule_user API on userspace preemptionLi Zhong
This patch corresponds to [PATCH] x86: Use the new schedule_user API on userspace preemption commit 0430499ce9d78691f3985962021b16bf8f8a8048 Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc: Exit user context on notify resumeLi Zhong
This patch allows RCU usage in do_notify_resume, e.g. signal handling. It corresponds to [PATCH] x86: Exit RCU extended QS on notify resume commit edf55fda35c7dc7f2d9241c3abaddaf759b457c6 Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc: Exception hooks for context tracking subsystemLi Zhong
This is the exception hooks for context tracking subsystem, including data access, program check, single step, instruction breakpoint, machine check, alignment, fp unavailable, altivec assist, unknown exception, whose handlers might use RCU. This patch corresponds to [PATCH] x86: Exception hooks for userspace RCU extended QS commit 6ba3c97a38803883c2eee489505796cb0a727122 But after the exception handling moved to generic code, and some changes in following two commits: 56dd9470d7c8734f055da2a6bac553caf4a468eb context_tracking: Move exception handling to generic code 6c1e0256fad84a843d915414e4b5973b7443d48d context_tracking: Restore correct previous context state on exception exit it is able for exception hooks to use the generic code above instead of a redundant arch implementation. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc: Syscall hooks for context tracking subsystemLi Zhong
This is the syscall slow path hooks for context tracking subsystem, corresponding to [PATCH] x86: Syscall hooks for userspace RCU extended QS commit bf5a3c13b939813d28ce26c01425054c740d6731 TIF_MEMDIE is moved to the second 16-bits (with value 17), as it seems there is no asm code using it. TIF_NOHZ is added to _TIF_SYCALL_T_OR_A, so it is better for it to be in the same 16 bits with others in the group, so in the asm code, andi. with this group could work. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc/booke64: Fix kernel hangs at kernel_dbg_excScott Wood
MSR_DE is not cleared on entry to the kernel, and we don't clear it explicitly outside of debug code. If we have MSR_DE set in prime_debug_regs(), and the new thread has events enabled in DBCR0 (e.g. ICMP is set in thread->dbsr0, even though it was cleared in the real DBCR0 when the thread got scheduled out), we'll end up taking a debug exception in the kernel when DBCR0 is loaded. DSRR0 will not point to an exception vector, and the kernel ends up hanging at kernel_dbg_exc. Fix this by always clearing MSR_DE when we load new debug state. Another observed source of kernel_dbg_exc hangs is with the branch taken event. If this event is active, but we take a non-debug trap (e.g. a TLB miss or an asynchronous interrupt) before the next branch. We end up taking a branch-taken debug exception on the initial branch instruction of the exception vector, but because the debug exception is DBSR_BT rather than DBSR_IC we branch to kernel_dbg_exc before even checking the DSRR0 address. Fix this by checking for DBSR_BT as well as DBSR_IC, which is what 32-bit does and what the comments suggest was intended in the 64-bit code as well. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc: Fix irq_set_affinity() return valuesAlexander Gordeev
Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc: Provide __bswapdi2David Woodhouse
Some versions of GCC apparently expect this to be provided by libgcc. Updates from Mikey to fix 32 bit version and adding "r" to registers. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc/powernv: Fix starting of secondary CPUs on OPALv2 and v3Benjamin Herrenschmidt
The current code fails to handle kexec on OPALv2. This fixes it and adds code to improve the situation on OPALv3 where we can query the CPU status from the firmware and decide what to do based on that. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc/powernv: Detect OPAL v3 API versionBenjamin Herrenschmidt
Future firmwares will support that new version Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning againLi Zhong
Saw this warning again, and this time from the ret_from_fork path. It seems we could clear the back chain earlier in copy_thread(), which could cover both path, and also fix potential lockdep usage in schedule_tail(), or exception occurred before we clear the back chain. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc: Make CONFIG_RTAS_PROC depend on CONFIG_PROC_FSMichael Ellerman
We are getting build errors with CONFIG_PROC_FS=n: arch/powerpc/kernel/rtas_flash.c In function 'rtas_flash_init': 745:33: error: unused variable 'f' [-Werror=unused-variable] But rtas_flash.c should not be built when CONFIG_PROC_FS=n, beacause all it does is provide a /proc interface to the RTAS flash routines. CONFIG_RTAS_FLASH already depends on CONFIG_RTAS_PROC, to indicate that it depends on the RTAS proc support, but CONFIG_RTAS_PROC does not depend on CONFIG_PROC_FS. So fix that. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc: Bring all threads online prior to migration/hibernationRobert Jennings
This patch brings online all threads which are present but not online prior to migration/hibernation. After migration/hibernation those threads are taken back offline. During migration/hibernation all online CPUs must call H_JOIN, this is required by the hypervisor. Without this patch, threads that are offline (H_CEDE'd) will not be woken to make the H_JOIN call and the OS will be deadlocked (all threads either JOIN'd or CEDE'd). Cc: <stable@kernel.org> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc/rtas_flash: Fix validate_flash buffer overflow issueVasant Hegde
ibm,validate-flash-image RTAS call output buffer contains 150 - 200 bytes of data on latest system. Presently we have output buffer size as 64 bytes and we use sprintf to copy data from RTAS buffer to local buffer. This causes kernel oops (see below call trace). This patch increases local buffer size to 256 and also uses snprintf instead of sprintf to copy data from RTAS buffer. Kernel call trace : ------------------- Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=1024 NUMA pSeries Modules linked in: nfs fscache lockd auth_rpcgss nfs_acl sunrpc fuse loop dm_mod ipv6 ipv6_lib usb_storage ehea(X) sr_mod qlge ses cdrom enclosure st be2net sg ext3 jbd mbcache usbhid hid ohci_hcd ehci_hcd usbcore qla2xxx usb_common sd_mod crc_t10dif scsi_dh_hp_sw scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh lpfc scsi_transport_fc scsi_tgt ipr(X) libata scsi_mod Supported: Yes NIP: 4520323031333130 LR: 4520323031333130 CTR: 0000000000000000 REGS: c0000001b91779b0 TRAP: 0400 Tainted: G X (3.0.13-0.27-ppc64) MSR: 8000000040009032 <EE,ME,IR,DR> CR: 44022488 XER: 20000018 TASK = c0000001bca1aba0[4736] 'cat' THREAD: c0000001b9174000 CPU: 36 GPR00: 4520323031333130 c0000001b9177c30 c000000000f87c98 000000000000009b GPR04: c0000001b9177c4a 000000000000000b 3520323031333130 2032303133313031 GPR08: 3133313031350a4d 000000000000009b 0000000000000000 c0000000003664a4 GPR12: 0000000022022448 c000000003ee6c00 0000000000000002 00000000100e8a90 GPR16: 00000000100cb9d8 0000000010093370 000000001001d310 0000000000000000 GPR20: 0000000000008000 00000000100fae60 000000000000005e 0000000000000000 GPR24: 0000000010129350 46573738302e3030 2046573738302e30 300a4d4720323031 GPR28: 333130313520554e 4b4e4f574e0a4d47 2032303133313031 3520323031333130 NIP [4520323031333130] 0x4520323031333130 LR [4520323031333130] 0x4520323031333130 Call Trace: [c0000001b9177c30] [4520323031333130] 0x4520323031333130 (unreliable) Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc/kexec: Fix kexec when using VMX optimised memcpyAnton Blanchard
commit b3f271e86e5a (powerpc: POWER7 optimised memcpy using VMX and enhanced prefetch) uses VMX when it is safe to do so (ie not in interrupt). It also looks at the task struct to decide if we have to save the current tasks' VMX state. kexec calls memcpy() at a point where the task struct may have been overwritten by the new kexec segments. If it has been overwritten then when memcpy -> enable_altivec looks up current->thread.regs->msr we get a cryptic oops or lockup. I also notice we aren't initialising thread_info->cpu, which means smp_processor_id is broken. Fix that too. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: <stable@vger.kernel.org> # 3.6+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc: Fix build errors STRICT_MM_TYPECHECKSAneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14powerpc/mm: Use the correct mask value when looking at pgtable addressAneesh Kumar K.V
Our pgtable are 2*sizeof(pte_t)*PTRS_PER_PTE which is PTE_FRAG_SIZE. Instead of depending on frag size, mask with PMD_MASKED_BITS. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-13Merge tag 'fixes-for-3.10-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen Pull Xen/arm fixes from Stefano Stabellini: "This contains a couple of Xen on ARM initialization fixes and a patch to improve error handling" * tag 'fixes-for-3.10-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen: xen/arm: rename xen_secondary_init and run it on every online cpu xen/arm: do not handle VCPUOP_register_vcpu_info failures xen/arm: initialize pm functions later
2013-05-13Merge branch 'parisc-for-3.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc update from Helge Deller: "The second round of parisc updates for 3.10 includes build fixes and enhancements to utilize irq stacks, fixes SMP races when updating PTE and TLB entries by proper locking and makes the search for the correct cross compiler more robust on Debian and Gentoo." * 'parisc-for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: make default cross compiler search more robust (v3) parisc: fix SMP races when updating PTE and TLB entries in entry.S parisc: implement irq stacks - part 2 (v2)
2013-05-13ARM: 7720/1: ARM v6/v7 cmpxchg64 shouldn't clear upper 32 bits of the ↵Jaccon Bastiaansen
old/new value The implementation of cmpxchg64() for the ARM v6 and v7 architecture casts parameter 2 and 3 (the old and new 64bit values) to an unsigned long before calling the atomic_cmpxchg64() function. This clears the top 32 bits of the old and new values, resulting in the wrong values being compare-exchanged. Luckily, this only appears to be used for 64-bit sched_clock, which we don't (yet) have on ARM. This bug was introduced by commit 3e0f5a15f500 ("ARM: 7404/1: cmpxchg64: use atomic64 and local64 routines for cmpxchg64"). Cc: <stable@vger.kernel.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Jaccon Bastiaansen <jaccon.bastiaansen@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-05-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Several small bug fixes all over: 1) be2net driver uses wrong payload length when submitting MAC list get requests to the chip. From Sathya Perla. 2) Fix mwifiex memory leak on driver unload, from Amitkumar Karwar. 3) Prevent random memory access in batman-adv, from Marek Lindner. 4) batman-adv doesn't check for pskb_trim_rcsum() errors, also from Marek Lindner. 5) Fix fec crashes on rapid link up/down, from Frank Li. 6) Fix inner protocol grovelling in GSO, from Pravin B Shelar. 7) Link event validation fix in qlcnic from Rajesh Borundia. 8) Not all FEC chips can support checksum offload, fix from Shawn Guo. 9) EXPORT_SYMBOL + inline doesn't make any sense, from Denis Efremov. 10) Fix race in passthru mode during device removal in macvlan, from Jiri Pirko. 11) Fix RCU hash table lookup socket state race in ipv6, leading to NULL pointer derefs, from Eric Dumazet. 12) Add several missing HAS_DMA kconfig dependencies, from Geert Uyttterhoeven. 13) Fix bogus PCI resource management in 3c59x driver, from Sergei Shtylyov. 14) Fix info leak in ipv6 GRE tunnel driver, from Amerigo Wang. 15) Fix device leak in ipv6 IPSEC policy layer, from Cong Wang. 16) DMA mapping leak fix in qlge from Thadeu Lima de Souza Cascardo. 17) Missing iounmap on probe failure in bna driver, from Wei Yongjun." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (40 commits) bna: add missing iounmap() on error in bnad_init() qlge: fix dma map leak when the last chunk is not allocated xfrm6: release dev before returning error ipv6,gre: do not leak info to user-space virtio_net: use default napi weight by default emac: Fix EMAC soft reset on 460EX/GT 3c59x: fix PCI resource management caif: CAIF_VIRTIO should depend on HAS_DMA net/ethernet: MACB should depend on HAS_DMA net/ethernet: ARM_AT91_ETHER should depend on HAS_DMA net/wireless: ATH9K should depend on HAS_DMA net/ethernet: STMMAC_ETH should depend on HAS_DMA net/ethernet: NET_CALXEDA_XGMAC should depend on HAS_DMA ipv6: do not clear pinet6 field macvlan: fix passthru mode race between dev removal and rx path ipv4: ip_output: remove inline marking of EXPORT_SYMBOL functions net/mlx4: Strengthen VLAN tags/priorities enforcement in VST mode net/mlx4_core: Add missing report on VST and spoof-checking dev caps net: fec: enable hardware checksum only on imx6q-fec qlcnic: Fix validation of link event command. ...
2013-05-13parisc: make default cross compiler search more robust (v3)Helge Deller
People/distros vary how they prefix the toolchain name for 64bit builds. Rather than enforce one convention over another, add a for loop which does a search for all the general prefixes. For 64bit builds, we now search for (in order): hppa64-unknown-linux-gnu hppa64-linux-gnu hppa64-linux For 32bit builds, we look for: hppa-unknown-linux-gnu hppa-linux-gnu hppa-linux hppa2.0-unknown-linux-gnu hppa2.0-linux-gnu hppa2.0-linux hppa1.1-unknown-linux-gnu hppa1.1-linux-gnu hppa1.1-linux This patch was initiated by Mike Frysinger, with feedback from Jeroen Roovers, John David Anglin and Helge Deller. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Jeroen Roovers <jer@gentoo.org> Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2013-05-13rbd: re-submit flattened write request (part 2)Alex Elder
Add code to rbd_img_obj_exists_callback() to detect when a clone's parent image has disappeared, and re-submit the original write request in that case. Kill off some redundant assertions. This completes the resolution for: http://tracker.ceph.com/issues/3763 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13rbd: re-submit write request for flattened cloneAlex Elder
Add code to rbd_img_parent_read_full_callback() to detect when a clone's parent image has disappeared, and re-submit the original write request in that case. (See the previous commit for more reasoning about why this is appropriate.) Rename some variables in rbd_img_obj_parent_read_full_callback() to match the convention used in the previous patch. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13rbd: re-submit read request for flattened cloneAlex Elder
If a clone image gets flattened while a parent read request is underway, the original rbd object request needs to be resubmitted. The reason is that by the time we get the response to the parent read request, the data read from the parent may be out of date. In other words, we could see this sequence of events: rbd client parent image/osd ---------- ---------------- original object ENOENT; issue parent read respond to parent read child image flattened original image header refresh <--- original object written independently here parent read response received Add code to rbd_img_parent_read_callback() to detect when a clone's parent image has disappeared (as evidenced by its parent overlap becoming 0), and re-submit the original read request in that case. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13rbd: detect when clone image is flattenedAlex Elder
A format 2 clone image can be the subject of a "flatten" operation, during which all of its data gets "copied up" from its parent image, leaving the image fully populated. Once this is complete, the clone's association with the parent is abolished. Since this can occur when a clone is mapped, we need to detect when it has occurred and handle it accordingly. We know an image has been flattened when we know it at one time had a parent, but we have learned (via a "get_parent" object class method call) it no longer has one. There might be in-flight requests at the point we learn an image has been flattened, so we can't simply clean up parent data structures right away. Instead, we'll drop the initial parent reference when the parent has disappeared (rather than when the image gets destroyed), which will allow the last in-flight reference to clean things up when it's complete. We leverage the fact that a zero parent overlap renders an image effectively unlayered. We set the overlap to 0 at the point we detect the clone image has flattened, which allows the unlayered behavior to take effect immediately, while keeping other parent structures in place until in-flight requests to complete. This and the next few patches resolve: http://tracker.ceph.com/issues/3763 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13rbd: reference count parent requestsAlex Elder
Keep a reference count for uses of the parent information for an rbd device. An initial reference is set in rbd_img_request_create() if the target image has a parent (with non-zero overlap). Each image request for an image with a non-zero parent overlap gets another reference when it's created, and that reference is dropped when the request is destroyed. The initial reference is dropped when the image gets torn down. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13rbd: define parent image request routinesAlex Elder
Define rbd_parent_request_create() and rbd_parent_request_destroy() to handle the creation of parent image requests submitted for layered image objects. For simplicity, let rbd_img_request_put() handle dropping the reference to any image request (parent or not), and call whichever destructor is appropriate on the last put. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13rbd: define rbd_dev_unparent()Alex Elder
Define rbd_dev_unparent() to encapsulate cleaning up parent data structures from a layered rbd image. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13rbd: don't release write request until necessaryAlex Elder
Previously when a layered write was going to involve a copyup request, the original osd request was released before submitting the parent full-object read. The osd request for the copyup would then be allocated in rbd_img_obj_parent_read_full_callback(). Shortly we will be handling the event of mapped layered images getting flattened, and when that occurs we need to resubmit the original request. We therefore don't want to release the osd request until we really konw we're going to replace it--in the callback function. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13rbd: get parent info on refreshAlex Elder
Get parent info for format 2 images on every refresh (rather than just during the initial probe). This will be needed to detect the disappearance of the parent image in the event a mapped image becomes unlayered (i.e., flattened). Avoid leaking the previous parent spec on the second and subsequent times this information is requested by dropping the previous one (if any) before updating it. (Also, extract the pool id into a local variable before assigning it into the parent spec.) Switch to using a non-zero parent overlap value rather than the existence of a parent (a non-null parent_spec pointer) to determine whether to mark a request layered. It will soon be possible for a layered image to become unlayered while a request is in flight. This means that the layered flag for an image request indicates that there was a non-zero parent overlap at the time the image request was created. The parent overlap can change thereafter, which may lead to special handling at request submission or completion time. This and the next several patches are related to: http://tracker.ceph.com/issues/3763 NOTE: If an error occurs while refreshing the parent info (i.e., requesting it after initial probe), the old parent info will persist. This is not really correct, and is a scenario that needs to be addressed. For now we'll assert that the failure mode is unlikely, but the issue has been documented in tracker issue 5040. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13bna: add missing iounmap() on error in bnad_init()Wei Yongjun
Add the missing iounmap() before return from bnad_init() in the error handling case. Introduced by commit 01b54b1451853593739816a392485c4e2bee7dda (bna: tx rx cleanup fix). Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-13qlge: fix dma map leak when the last chunk is not allocatedThadeu Lima de Souza Cascardo
qlge allocates chunks from a page that it maps and unmaps that page when the last chunk is released. When the driver is unloaded or the card is removed, all chunks are released and the page is unmapped for the last chunk. However, when the last chunk of a page is not allocated and the device is removed, that page is not unmapped. In fact, its last reference is not put and there's also a page leak. This bug prevents a device from being properly hotplugged. When the DMA API debug option is enabled, the following messages show the pending DMA allocation after we remove the driver. This patch fixes the bug by unmapping and putting the page from the ring if its last chunk has not been allocated. pci 0005:98:00.0: DMA-API: device driver has pending DMA allocations while released from device [count=1] One of leaked entries details: [device address=0x0000000060a80000] [size=65536 bytes] [mapped with DMA_FROM_DEVICE] [mapped as page] ------------[ cut here ]------------ WARNING: at lib/dma-debug.c:746 Modules linked in: qlge(-) rpadlpar_io rpaphp pci_hotplug fuse [last unloaded: qlge] NIP: c0000000003fc3ec LR: c0000000003fc3e8 CTR: c00000000054de60 REGS: c0000003ee9c74e0 TRAP: 0700 Tainted: G O (3.7.2) MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 28002424 XER: 00000001 SOFTE: 1 CFAR: c0000000007a39c8 TASK = c0000003ee8d5c90[8406] 'rmmod' THREAD: c0000003ee9c4000 CPU: 31 GPR00: c0000000003fc3e8 c0000003ee9c7760 c000000000c789f8 00000000000000ee GPR04: 0000000000000000 00000000000000ef 0000000000004000 0000000000010000 GPR08: 00000000000000be c000000000b22088 c000000000c4c218 00000000007c0000 GPR12: 0000000028002422 c00000000ff26c80 0000000000000000 000001001b0f1b40 GPR16: 00000000100cb9d8 0000000010093088 c000000000cdf910 0000000000000001 GPR20: 0000000000000000 c000000000dbfc00 0000000000000000 c000000000dbfb80 GPR24: c0000003fafc9d80 0000000000000001 000000000001ff80 c0000003f38f7888 GPR28: c000000000ddfc00 0000000000000400 c000000000bd7790 c000000000ddfb80 NIP [c0000000003fc3ec] .dma_debug_device_change+0x22c/0x2b0 LR [c0000000003fc3e8] .dma_debug_device_change+0x228/0x2b0 Call Trace: [c0000003ee9c7760] [c0000000003fc3e8] .dma_debug_device_change+0x228/0x2b0 (unreliable) [c0000003ee9c7840] [c00000000079a098] .notifier_call_chain+0x78/0xf0 [c0000003ee9c78e0] [c0000000000acc20] .__blocking_notifier_call_chain+0x70/0xb0 [c0000003ee9c7990] [c0000000004a9580] .__device_release_driver+0x100/0x140 [c0000003ee9c7a20] [c0000000004a9708] .driver_detach+0x148/0x150 [c0000003ee9c7ac0] [c0000000004a8144] .bus_remove_driver+0xc4/0x150 [c0000003ee9c7b60] [c0000000004aa58c] .driver_unregister+0x8c/0xe0 [c0000003ee9c7bf0] [c0000000004090b4] .pci_unregister_driver+0x34/0xf0 [c0000003ee9c7ca0] [d000000002231194] .qlge_exit+0x1c/0x34 [qlge] [c0000003ee9c7d20] [c0000000000e36d8] .SyS_delete_module+0x1e8/0x290 [c0000003ee9c7e30] [c0000000000098d4] syscall_exit+0x0/0x94 Instruction dump: 7f26cb78 e818003a e87e81a0 e8f80028 e9180030 796b1f24 78001f24 7d6a5a14 7d2a002a e94b0020 483a7595 60000000 <0fe00000> 2fb80000 40de0048 80120050 ---[ end trace 4294f9abdb01031d ]--- Mapped at: [<d000000002222f54>] .ql_update_lbq+0x384/0x580 [qlge] [<d000000002227bd0>] .ql_clean_inbound_rx_ring+0x300/0xc60 [qlge] [<d0000000022288cc>] .ql_napi_poll_msix+0x39c/0x5a0 [qlge] [<c0000000006b3c50>] .net_rx_action+0x170/0x300 [<c000000000081840>] .__do_softirq+0x170/0x300 Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Acked-by: Jitendra Kalsaria <Jitendra.kalsaria@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-13rbd: ignore zero-overlap parentAlex Elder
An rbd clone image that has an overlap with its parent of 0 is effectively not a layered image at all. Detect this case and treat such an image as non-layered. Issue a warning to be sure the user knows what's going on. This resolves: http://tracker.ceph.com/issues/5028 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13rbd: support reading parent page data for writesAlex Elder
Currently, rbd_img_obj_parent_read_full() assumes the incoming object request contains bio data. But if a layered image is part of a multi-layer stack of images it will result in read requests of page data to parent images. This is handling the same kind of issue as was resolved by this commit: 5b2ab72d rbd: support reading parent page data This resolves: http://tracker.ceph.com/issues/5027 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>