summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* perf: Add pmu callbacks to track event mapping and unmappingAndy Lutomirski2015-02-042-0/+16
| | | | | | | | | | | | | | | Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/266afcba1d1f91ea5501e4e16e94bbbc1a9339b6.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86: Add a comment clarifying LDT context switchingAndy Lutomirski2015-02-041-6/+8
| | | | | | | | | | | | | | | | | | | | | | | The code is correct, but only for a rather subtle reason. This confused me for quite a while when I read switch_mm, so clarify the code to avoid confusing other people, too. TBH, I wouldn't be surprised if this code was only correct by accident. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/0db86397f968996fb772c443c251415b0b430ddd.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86: Store a per-cpu shadow copy of CR4Andy Lutomirski2015-02-0420-46/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | Context switches and TLB flushes can change individual bits of CR4. CR4 reads take several cycles, so store a shadow copy of CR4 in a per-cpu variable. To avoid wasting a cache line, I added the CR4 shadow to cpu_tlbstate, which is already touched in switch_mm. The heaviest users of the cr4 shadow will be switch_mm and __switch_to_xtra, and __switch_to_xtra is called shortly after switch_mm during context switch, so the cacheline is likely to be hot. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/3a54dd3353fffbf84804398e00dfdc5b7c1afd7d.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86: Clean up cr4 manipulationAndy Lutomirski2015-02-0415-57/+70
| | | | | | | | | | | | | | | | | | | | | | | | | CR4 manipulation was split, seemingly at random, between direct (write_cr4) and using a helper (set/clear_in_cr4). Unfortunately, the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code, which only a small subset of users actually wanted. This patch replaces all cr4 access in functions that don't leave cr4 exactly the way they found it with new helpers cr4_set_bits, cr4_clear_bits, and cr4_set_bits_and_update_boot. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Merge branch 'x86/asm' into perf/x86, to avoid conflicts with upcoming patchesIngo Molnar2015-02-0416-327/+374
|\ | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * Merge tag 'pr-20150201-x86-entry' of ↵Ingo Molnar2015-02-032-54/+77
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/asm Pull "x86: Entry cleanups and a bugfix for 3.20" from Andy Lutomirski: " This fixes a bug in the RCU code I added in ist_enter. It also includes the sysret stuff discussed here: http://lkml.kernel.org/g/cover.1421453410.git.luto%40amacapital.net " Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * x86_64, entry: Remove the syscall exit audit and schedule optimizationsAndy Lutomirski2015-02-011-47/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We used to optimize rescheduling and audit on syscall exit. Now that the full slow path is reasonably fast, remove these optimizations. Syscall exit auditing is now handled exclusively by syscall_trace_leave. This adds something like 10ns to the previously optimized paths on my computer, presumably due mostly to SAVE_REST / RESTORE_REST. I think that we should eventually replace both the syscall and non-paranoid interrupt exit slow paths with a pair of C functions along the lines of the syscall entry hooks. Link: http://lkml.kernel.org/r/22f2aa4a0361707a5cfb1de9d45260b39965dead.1421453410.git.luto@amacapital.net Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
| | * x86_64, entry: Use sysret to return to userspace when possibleAndy Lutomirski2015-02-011-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The x86_64 entry code currently jumps through complex and inconsistent hoops to try to minimize the impact of syscall exit work. For a true fast-path syscall, almost nothing needs to be done, so returning is just a check for exit work and sysret. For a full slow-path return from a syscall, the C exit hook is invoked if needed and we join the iret path. Using iret to return to userspace is very slow, so the entry code has accumulated various special cases to try to do certain forms of exit work without invoking iret. This is error-prone, since it duplicates assembly code paths, and it's dangerous, since sysret can malfunction in interesting ways if used carelessly. It's also inefficient, since a lot of useful cases aren't optimized and therefore force an iret out of a combination of paranoia and the fact that no one has bothered to write even more asm code to avoid it. I would argue that this approach is backwards. Rather than trying to avoid the iret path, we should instead try to make the iret path fast. Under a specific set of conditions, iret is unnecessary. In particular, if RIP==RCX, RFLAGS==R11, RIP is canonical, RF is not set, and both SS and CS are as expected, then movq 32(%rsp),%rsp;sysret does the same thing as iret. This set of conditions is nearly always satisfied on return from syscalls, and it can even occasionally be satisfied on return from an irq. Even with the careful checks for sysret applicability, this cuts nearly 80ns off of the overhead from syscalls with unoptimized exit work. This includes tracing and context tracking, and any return that invokes KVM's user return notifier. For example, the cost of getpid with CONFIG_CONTEXT_TRACKING_FORCE=y drops from ~360ns to ~280ns on my computer. This may allow the removal and even eventual conversion to C of a respectable amount of exit asm. This may require further tweaking to give the full benefit on Xen. It may be worthwhile to adjust signal delivery and exec to try hit the sysret path. This does not optimize returns to 32-bit userspace. Making the same optimization for CS == __USER32_CS is conceptually straightforward, but it will require some tedious code to handle the differences between sysretl and sysexitl. Link: http://lkml.kernel.org/r/71428f63e681e1b4aa1a781e3ef7c27f027d1103.1421453410.git.luto@amacapital.net Signed-off-by: Andy Lutomirski <luto@amacapital.net>
| | * x86, traps: Fix ist_enter from userspaceAndy Lutomirski2015-02-011-7/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | context_tracking_user_exit() has no effect if in_interrupt() returns true, so ist_enter() didn't work. Fix it by calling exception_enter(), and thus context_tracking_user_exit(), before incrementing the preempt count. This also adds an assertion that will catch the problem reliably if CONFIG_PROVE_RCU=y to help prevent the bug from being reintroduced. Link: http://lkml.kernel.org/r/261ebee6aee55a4724746d0d7024697013c40a08.1422709102.git.luto@amacapital.net Fixes: 959274753857 x86, traps: Track entry into and exit from IST context Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
| * | Merge tag 'pr-20150201-x86-vdso' of ↵Ingo Molnar2015-02-031-1/+1
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/asm Pull VDSO fix fro Andy Lutomirski: "x86, vdso: One trivial last-minute VDSO build improvement Andrey noticed that the VDSO build wasn't cleaning itself up. This one-liner fixes it." Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | x86, vdso: teach 'make clean' remove vdso64 binariesAndrey Skvortsov2015-01-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After 'make clean' vdso64.so and vdso64.dbg.so were left in arch/x86/vdso/. Link: http://lkml.kernel.org/r/1422453867-17326-1-git-send-email-andrej.skvortzov@gmail.com Signed-off-by: Andrey Skvortsov <andrej.skvortzov@gmail.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
| * | | Merge tag 'v3.19-rc7' into x86/asm, to refresh the branch before pulling in ↵Ingo Molnar2015-02-03459-3532/+4747
| |\ \ \ | | |_|/ | |/| | | | | | | | | | | | | | new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | Merge tag 'pr-20150114-x86-entry' of ↵Ingo Molnar2015-01-2815-278/+301
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/asm Pull x86/entry enhancements from Andy Lutomirski: " This is my accumulated x86 entry work, part 1, for 3.20. The meat of this is an IST rework. When an IST exception interrupts user space, we will handle it on the per-thread kernel stack instead of on the IST stack. This sounds messy, but it actually simplifies the IST entry/exit code, because it eliminates some ugly games we used to play in order to handle rescheduling, signal delivery, etc on the way out of an IST exception. The IST rework introduces proper context tracking to IST exception handlers. I haven't seen any bug reports, but the old code could have incorrectly treated an IST exception handler as an RCU extended quiescent state. The memory failure change (included in this pull request with Borislav and Tony's permission) eliminates a bunch of code that is no longer needed now that user memory failure handlers are called in process context. Finally, this includes a few on Denys' uncontroversial and Obviously Correct (tm) cleanups. The IST and memory failure changes have been in -next for a while. LKML references: IST rework: http://lkml.kernel.org/r/cover.1416604491.git.luto@amacapital.net Memory failure change: http://lkml.kernel.org/r/54ab2ffa301102cd6e@agluck-desk.sc.intel.com Denys' cleanups: http://lkml.kernel.org/r/1420927210-19738-1-git-send-email-dvlasenk@redhat.com " This tree semantically depends on and is based on the following RCU commit: 734d16801349 ("rcu: Make rcu_nmi_enter() handle nesting") ... and for that reason won't be pushed upstream before the RCU bits hit Linus's tree. Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | x86: entry_64.S: fold SAVE_ARGS_IRQ macro into its sole userDenys Vlasenko2015-01-131-46/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No code changes. This is a preparatory patch for change in "struct pt_regs" handling. CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Oleg Nesterov <oleg@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Andy Lutomirski <luto@amacapital.net> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: X86 ML <x86@kernel.org> CC: Alexei Starovoitov <ast@plumgrid.com> CC: Will Drewry <wad@chromium.org> CC: Kees Cook <keescook@chromium.org> CC: linux-kernel@vger.kernel.org Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
| | * | | x86: ia32entry.S: fix wrong symbolic constant usage: R11->ARGOFFSETDenys Vlasenko2015-01-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The values of these two constants are the same, the meaning is different. Acked-by: Borislav Petkov <bp@suse.de> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Oleg Nesterov <oleg@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Borislav Petkov <bp@alien8.de> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: X86 ML <x86@kernel.org> CC: Alexei Starovoitov <ast@plumgrid.com> CC: Will Drewry <wad@chromium.org> CC: Kees Cook <keescook@chromium.org> CC: linux-kernel@vger.kernel.org Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
| | * | | x86: entry_64.S: delete unused codeDenys Vlasenko2015-01-132-35/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A define, two macros and an unreferenced bit of assembly are gone. Acked-by: Borislav Petkov <bp@suse.de> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Oleg Nesterov <oleg@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Andy Lutomirski <luto@amacapital.net> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: X86 ML <x86@kernel.org> CC: Alexei Starovoitov <ast@plumgrid.com> CC: Will Drewry <wad@chromium.org> CC: Kees Cook <keescook@chromium.org> CC: linux-kernel@vger.kernel.org Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
| | * | | x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricksLuck, Tony2015-01-074-94/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We now switch to the kernel stack when a machine check interrupts during user mode. This means that we can perform recovery actions in the tail of do_machine_check() Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
| | * | | x86, traps: Add ist_begin_non_atomic and ist_end_non_atomicAndy Lutomirski2015-01-022-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some IST handlers, if the interrupt came from user mode, we can safely enable preemption. Add helpers to do it safely. This is intended to be used my the memory failure code in do_machine_check. Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
| | * | | x86: Clean up current_stack_pointerAndy Lutomirski2015-01-022-10/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no good reason for it to be a macro, and x86_64 will want to use it, so it should be in a header. Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
| | * | | x86, traps: Track entry into and exit from IST contextAndy Lutomirski2015-01-025-6/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently pretend that IST context is like standard exception context, but this is incorrect. IST entries from userspace are like standard exceptions except that they use per-cpu stacks, so they are atomic. IST entries from kernel space are like NMIs from RCU's perspective -- they are not quiescent states even if they interrupted the kernel during a quiescent state. Add and use ist_enter and ist_exit to track IST context. Even though x86_32 has no IST stacks, we track these interrupts the same way. This fixes two issues: - Scheduling from an IST interrupt handler will now warn. It would previously appear to work as long as we got lucky and nothing overwrote the stack frame. (I don't know of any bugs in this that would trigger the warning, but it's good to be on the safe side.) - RCU handling in IST context was dangerous. As far as I know, only machine checks were likely to trigger this, but it's good to be on the safe side. Note that the machine check handlers appears to have been missing any context tracking at all before this patch. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
| | * | | x86, entry: Switch stacks on a paranoid entry from userspaceAndy Lutomirski2015-01-024-68/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This causes all non-NMI, non-double-fault kernel entries from userspace to run on the normal kernel stack. Double-fault is exempt to minimize confusion if we double-fault directly from userspace due to a bad kernel stack. This is, suprisingly, simpler and shorter than the current code. It removes the IMO rather frightening paranoid_userspace path, and it make sync_regs much simpler. There is no risk of stack overflow due to this change -- the kernel stack that we switch to is empty. This will also enable us to create non-atomic sections within machine checks from userspace, which will simplify memory failure handling. It will also allow the upcoming fsgsbase code to be simplified, because it doesn't need to worry about usergs when scheduling in paranoid_exit, as that code no longer exists. Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Tony Luck <tony.luck@intel.com> Acked-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
| | * | | rcu: Make rcu_nmi_enter() handle nestingPaul E. McKenney2014-12-301-17/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The x86 architecture has multiple types of NMI-like interrupts: real NMIs, machine checks, and, for some values of NMI-like, debugging and breakpoint interrupts. These interrupts can nest inside each other. Andy Lutomirski is adding RCU support to these interrupts, so rcu_nmi_enter() and rcu_nmi_exit() must now correctly handle nesting. This commit therefore introduces nesting, using a clever NMI-coordination algorithm suggested by Andy. The trick is to atomically increment ->dynticks (if needed) before manipulating ->dynticks_nmi_nesting on entry (and, accordingly, after on exit). In addition, ->dynticks_nmi_nesting is incremented by one if ->dynticks was incremented and by two otherwise. This means that when rcu_nmi_exit() sees ->dynticks_nmi_nesting equal to one, it knows that ->dynticks must be atomically incremented. This NMI-coordination algorithms has been validated by the following Promela model: ------------------------------------------------------------------------ /* * Promela model for Andy Lutomirski's suggested change to rcu_nmi_enter() * that allows nesting. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, you can access it online at * http://www.gnu.org/licenses/gpl-2.0.html. * * Copyright IBM Corporation, 2014 * * Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> */ byte dynticks_nmi_nesting = 0; byte dynticks = 0; /* * Promela verision of rcu_nmi_enter(). */ inline rcu_nmi_enter() { byte incby; byte tmp; incby = BUSY_INCBY; assert(dynticks_nmi_nesting >= 0); if :: (dynticks & 1) == 0 -> atomic { dynticks = dynticks + 1; } assert((dynticks & 1) == 1); incby = 1; :: else -> skip; fi; tmp = dynticks_nmi_nesting; tmp = tmp + incby; dynticks_nmi_nesting = tmp; assert(dynticks_nmi_nesting >= 1); } /* * Promela verision of rcu_nmi_exit(). */ inline rcu_nmi_exit() { byte tmp; assert(dynticks_nmi_nesting > 0); assert((dynticks & 1) != 0); if :: dynticks_nmi_nesting != 1 -> tmp = dynticks_nmi_nesting; tmp = tmp - BUSY_INCBY; dynticks_nmi_nesting = tmp; :: else -> dynticks_nmi_nesting = 0; atomic { dynticks = dynticks + 1; } assert((dynticks & 1) == 0); fi; } /* * Base-level NMI runs non-atomically. Crudely emulates process-level * dynticks-idle entry/exit. */ proctype base_NMI() { byte busy; busy = 0; do :: /* Emulate base-level dynticks and not. */ if :: 1 -> atomic { dynticks = dynticks + 1; } busy = 1; :: 1 -> skip; fi; /* Verify that we only sometimes have base-level dynticks. */ if :: busy == 0 -> skip; :: busy == 1 -> skip; fi; /* Model RCU's NMI entry and exit actions. */ rcu_nmi_enter(); assert((dynticks & 1) == 1); rcu_nmi_exit(); /* Emulated re-entering base-level dynticks and not. */ if :: !busy -> skip; :: busy -> atomic { dynticks = dynticks + 1; } busy = 0; fi; /* We had better now be in dyntick-idle mode. */ assert((dynticks & 1) == 0); od; } /* * Nested NMI runs atomically to emulate interrupting base_level(). */ proctype nested_NMI() { do :: /* * Use an atomic section to model a nested NMI. This is * guaranteed to interleave into base_NMI() between a pair * of base_NMI() statements, just as a nested NMI would. */ atomic { /* Verify that we only sometimes are in dynticks. */ if :: (dynticks & 1) == 0 -> skip; :: (dynticks & 1) == 1 -> skip; fi; /* Model RCU's NMI entry and exit actions. */ rcu_nmi_enter(); assert((dynticks & 1) == 1); rcu_nmi_exit(); } od; } init { run base_NMI(); run nested_NMI(); } ------------------------------------------------------------------------ The following script can be used to run this model if placed in rcu_nmi.spin: ------------------------------------------------------------------------ if ! spin -a rcu_nmi.spin then echo Spin errors!!! exit 1 fi if ! cc -DSAFETY -o pan pan.c then echo Compilation errors!!! exit 1 fi ./pan -m100000 Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
| * | | | Merge commit 3669ef9fa7d3 ("x86, tls: Interpret an all-zero struct user_desc ↵Ingo Molnar2015-01-28161-571/+1334
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | as 'no segment'") into x86/asm Pick up the latestest asm fixes before advancing it any further. Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | x86_64 entry: Fix RCX for ptraced syscallsAndy Lutomirski2015-01-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The int_ret_from_sys_call and syscall tracing code disagrees with the sysret path as to the value of RCX. The Intel SDM, the AMD APM, and my laptop all agree that sysret returns with RCX == RIP. The syscall tracing code does not respect this property. For example, this program: int main() { extern const char syscall_rip[]; unsigned long rcx = 1; unsigned long orig_rcx = rcx; asm ("mov $-1, %%eax\n\t" "syscall\n\t" "syscall_rip:" : "+c" (rcx) : : "r11"); printf("syscall: RCX = %lX RIP = %lX orig RCX = %lx\n", rcx, (unsigned long)syscall_rip, orig_rcx); return 0; } prints: syscall: RCX = 400556 RIP = 400556 orig RCX = 1 Running it under strace gives this instead: syscall: RCX = FFFFFFFFFFFFFFFF RIP = 400556 orig RCX = 1 This changes FIXUP_TOP_OF_STACK to match sysret, causing the test to show RCX == RIP even under strace. It looks like this is a partial revert of: 88e4bc32686e ("[PATCH] x86-64 architecture specific sync for 2.5.8") from the historic git tree. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/c9a418c3dc3993cb88bb7773800225fd318a4c67.1421453410.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | perf: Decouple unthrottling and rotatingMark Rutland2015-02-042-53/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the adjusments made as part of perf_event_task_tick() use the percpu rotation lists to iterate over any active PMU contexts, but these are not used by the context rotation code, having been replaced by separate (per-context) hrtimer callbacks. However, some manipulation of the rotation lists (i.e. removal of contexts) has remained in perf_rotate_context(). This leads to the following issues: * Contexts are not always removed from the rotation lists. Removal of PMUs which have been placed in rotation lists, but have not been removed by a hrtimer callback can result in corruption of the rotation lists (when memory backing the context is freed). This has been observed to result in hangs when PMU drivers built as modules are inserted and removed around the creation of events for said PMUs. * Contexts which do not require rotation may be removed from the rotation lists as a result of a hrtimer, and will not be considered by the unthrottling code in perf_event_task_tick. This patch fixes the issue by updating the rotation ist when events are scheduled in/out, ensuring that each rotation list stays in sync with the HW state. As each event holds a refcount on the module of its PMU, this ensures that when a PMU module is unloaded none of its CPU contexts can be in a rotation list. By maintaining a list of perf_event_contexts rather than perf_event_cpu_contexts, we don't need separate paths to handle the cpu and task contexts, which also makes the code a little simpler. As the rotation_list variables are not used for rotation, these are renamed to active_ctx_list, which better matches their current function. perf_pmu_rotate_{start,stop} are renamed to perf_pmu_ctx_{activate,deactivate}. Reported-by: Johannes Jensen <johannes.jensen@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Will Deacon <Will.Deacon@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150129134511.GR17721@leverpostej Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | perf: Drop module reference on event init failureMark Rutland2015-02-041-12/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When initialising an event, perf_init_event will call try_module_get() to ensure that the PMU's module cannot be removed for the lifetime of the event, with __free_event() dropping the reference when the event is finally destroyed. If something fails after the event has been initialised, but before the event is installed, perf_event_alloc will drop the reference on the module. However, if we fail to initialise an event for some reason (e.g. we ask an uncore PMU to perform sampling, and it refuses to initialise the event), we do not drop the refcount. If we try to open such a bogus event without a precise IDR type, we will loop over each PMU in the pmus list, incrementing each of their refcounts without decrementing them. This patch adds a module_put when pmu->event_init(event) fails, ensuring that the refcounts are balanced in failure cases. As the innards of the precise and search based initialisation look very similar, this logic is hoisted out into a new helper function. While the early return for the failed try_module_get is removed from the search case, this is handled by the remaining return when ret is not -ENOENT. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1420642611-22667-1-git-send-email-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | perf: Use POLLIN instead of POLL_IN for perf poll data in flagJiri Olsa2015-02-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we flag available data (via poll syscall) on perf fd with POLL_IN macro, which is normally used for SIGIO interface. We've been lucky, because POLLIN (0x1) is subset of POLL_IN (0x20001) and sys_poll (do_pollfd function) cut the extra bit out (0x20000). Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422467678-22341-1-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | perf: Fix put_event() ctx lockPeter Zijlstra2015-02-041-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So what I suspect; but I'm in zombie mode today it seems; is that while I initially thought that it was impossible for ctx to change when refcount dropped to 0, I now suspect its possible. Note that until perf_remove_from_context() the event is still active and visible on the lists. So a concurrent sys_perf_event_open() from another task into this task can race. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: mark.rutland@arm.com Cc: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150129134434.GB26304@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | perf: Fix move_group() orderPeter Zijlstra (Intel)2015-02-041-9/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jiri reported triggering the new WARN_ON_ONCE in event_sched_out over the weekend: event_sched_out.isra.79+0x2b9/0x2d0 group_sched_out+0x69/0xc0 ctx_sched_out+0x106/0x130 task_ctx_sched_out+0x37/0x70 __perf_install_in_context+0x70/0x1a0 remote_function+0x48/0x60 generic_exec_single+0x15b/0x1d0 smp_call_function_single+0x67/0xa0 task_function_call+0x53/0x80 perf_install_in_context+0x8b/0x110 I think the below should cure this; if we install a group leader it will iterate the (still intact) group list and find its siblings and try and install those too -- even though those still have the old event->ctx -- in the new ctx. Upon installing the first group sibling we'd try and schedule out the group and trigger the above warn. Fix this by installing the group leader last, installing siblings would have no effect, they're not reachable through the group lists and therefore we don't schedule them. Also delay resetting the state until we're absolutely sure the events are quiescent. Reported-by: Jiri Olsa <jolsa@redhat.com> Reported-by: vincent.weaver@maine.edu Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150126162639.GA21418@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | perf: Fix event->ctx lockingPeter Zijlstra2015-02-041-37/+207
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There have been a few reported issues wrt. the lack of locking around changing event->ctx. This patch tries to address those. It avoids the whole rwsem thing; and while it appears to work, please give it some thought in review. What I did fail at is sensible runtime checks on the use of event->ctx, the RCU use makes it very hard. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150123125834.209535886@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | perf: Add a bit of paranoiaPeter Zijlstra2015-02-041-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a few WARN()s to catch things that should never happen. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150123125834.150481799@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | Merge tag 'v3.19-rc7' into perf/core, to merge fixes before applying new changesIngo Molnar2015-02-04206-1190/+2194
|\ \ \ \ \ \ | | |_|_|/ / | |/| | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | Linux 3.19-rc7v3.19-rc7Linus Torvalds2015-02-011-1/+1
| | | | | |
| * | | | | Merge tag 'armsoc-for-linus' of ↵Linus Torvalds2015-02-0122-71/+150
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "One more week's worth of fixes. Worth pointing out here are: - A patch fixing detaching of iommu registrations when a device is removed -- earlier the ops pointer wasn't managed properly - Another set of Renesas boards get the same GIC setup fixup as others have in previous -rcs - Serial port aliases fixups for sunxi. We did the same to tegra but we caught that in time before the merge window due to more machines being affected. Here it took longer for anyone to notice. - A couple more DT tweaks on sunxi - A follow-up patch for the mvebu coherency disabling in last -rc batch" * tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: arm: dma-mapping: Set DMA IOMMU ops in arm_iommu_attach_device() ARM: shmobile: r8a7790: Instantiate GIC from C board code in legacy builds ARM: shmobile: r8a73a4: Instantiate GIC from C board code in legacy builds ARM: mvebu: don't set the PL310 in I/O coherency mode when I/O coherency is disabled ARM: sunxi: dt: Fix aliases ARM: dts: sun4i: Add simplefb node with de_fe0-de_be0-lcd0-hdmi pipeline ARM: dts: sun6i: ippo-q8h-v5: Fix serial0 alias ARM: dts: sunxi: Fix usb-phy support for sun4i/sun5i
| | * \ \ \ \ Merge tag 'renesas-soc-fixes3-for-v3.19' of ↵Olof Johansson2015-02-014-0/+47
| | |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into fixes Merge "Third Round of Renesas ARM Based SoC Fixes for v3.19" from Simon Horman: * Instantiate GIC from C board code in legacy builds on r8a7790 and r8a73a4 * tag 'renesas-soc-fixes3-for-v3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas: ARM: shmobile: r8a7790: Instantiate GIC from C board code in legacy builds ARM: shmobile: r8a73a4: Instantiate GIC from C board code in legacy builds Signed-off-by: Olof Johansson <olof@lixom.net>
| | | * | | | | ARM: shmobile: r8a7790: Instantiate GIC from C board code in legacy buildsMagnus Damm2015-01-293-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of commit 9a1091ef0017c40a ("irqchip: gic: Support hierarchy irq domain."), the Lager legacy board support is known to be broken. The IRQ numbers of the GIC are now virtual, and no longer match the hardcoded hardware IRQ numbers in the legacy platform board code. To fix this issue specific to non-multiplatform r8a7790 and Lager: 1) Instantiate the GIC from platform board code and also 2) Skip over the DT arch timer as well as 3) Force delay setup based on DT CPU frequency With these 3 fixes in place interrupts on Lager are now unbroken. Partially based on legacy GIC fix by Geert Uytterhoeven, thanks to him for the initial work. Signed-off-by: Magnus Damm <damm+renesas@opensource.se> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
| | | * | | | | ARM: shmobile: r8a73a4: Instantiate GIC from C board code in legacy buildsMagnus Damm2015-01-292-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of commit 9a1091ef0017c40a ("irqchip: gic: Support hierarchy irq domain."), the APE6EVM legacy board support is known to be broken. The IRQ numbers of the GIC are now virtual, and no longer match the hardcoded hardware IRQ numbers in the legacy platform board code. To fix this issue specific to non-muliplatform r8a73a4 and APE6EVM: 1) Instantiate the GIC from platform board code and also 2) Skip over the DT arch timer as well as 3) Force delay setup based on DT CPU frequency With these 3 fixes in place interrupts on APE6EVM are now unbroken. Partially based on legacy GIC fix by Geert Uytterhoeven, thanks to him for the initial work. Signed-off-by: Magnus Damm <damm+renesas@opensource.se> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
| | * | | | | | arm: dma-mapping: Set DMA IOMMU ops in arm_iommu_attach_device()Laurent Pinchart2015-01-291-15/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4bb25789ed28228a ("arm: dma-mapping: plumb our iommu mapping ops into arch_setup_dma_ops") moved the setting of the DMA operations from arm_iommu_attach_device() to arch_setup_dma_ops() where the DMA operations to be used are selected based on whether the device is connected to an IOMMU. However, the IOMMU detection scheme requires the IOMMU driver to be ported to the new IOMMU of_xlate API. As no driver has been ported yet, this effectively breaks all IOMMU ARM users that depend on the IOMMU being handled transparently by the DMA mapping API. Fix this by restoring the setting of DMA IOMMU ops in arm_iommu_attach_device() and splitting the rest of the function into a new internal __arm_iommu_attach_device() function, called by arch_setup_dma_ops(). Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Acked-by: Will Deacon <will.deacon@arm.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Olof Johansson <olof@lixom.net>
| | * | | | | | Merge tag 'mvebu-fixes-3.19-6' of git://git.infradead.org/linux-mvebu into fixesOlof Johansson2015-01-281-0/+7
| | |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge "mvebu-fixes-6" from Andrew Lunn: The previous fix for Armada XP, disabling I/O coherency, broke Armada 375/38x. Only switch the PL310 to I/O coherent mode if I/O coherency is enabled. * tag 'mvebu-fixes-3.19-6' of git://git.infradead.org/linux-mvebu: ARM: mvebu: don't set the PL310 in I/O coherency mode when I/O coherency is disabled Signed-off-by: Olof Johansson <olof@lixom.net>
| | | * | | | | | ARM: mvebu: don't set the PL310 in I/O coherency mode when I/O coherency is ↵Thomas Petazzoni2015-01-281-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | disabled Since commit f2c3c67f00 (merge commit that adds commit "ARM: mvebu: completely disable hardware I/O coherency"), we disable I/O coherency on Armada EBU platforms. However, we continue to initialize the coherency fabric, because this coherency fabric is needed on Armada XP for inter-CPU coherency. Unfortunately, due to this, we also continued to execute the coherency fabric initialization code for Armada 375/38x, which switched the PL310 into I/O coherent mode. This has the effect of disabling the outer cache sync operation: this is needed when I/O coherency is enabled to work around a PCIe/L2 deadlock. But obviously, when I/O coherency is disabled, having the outer cache sync operation is crucial. Therefore, this commit fixes the armada_375_380_coherency_init() so that the PL310 is switched to I/O coherent mode only if I/O coherency is enabled. Without this fix, all devices using DMA are broken on Armada 375/38x. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Cc: <stable@vger.kernel.org> # v3.8+
| | * | | | | | | Merge tag 'sunxi-fixes-for-3.19' of ↵Olof Johansson2015-01-2616-56/+58
| | |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into fixes Merge "Allwinner fixes for 3.19" from Maxime Ripard: Allwinner fixes for 3.19 A few minor fixes for the 3.19 kernel: - The 8250 uart driver now respects the aliases, which pointed out that we were using them wrong. Fixed them. - The simplefb pipeline that was used on the A10 caused flickering and tearing, and rendered it pretty much useless. Added a new simplefb node with another pipeline that removes this issue. Note that we need to keep the old node because u-boot 2015.01 uses it. - Added a fix for the USB phy node on sun4i/sun5i * tag 'sunxi-fixes-for-3.19' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux: ARM: sunxi: dt: Fix aliases ARM: dts: sun4i: Add simplefb node with de_fe0-de_be0-lcd0-hdmi pipeline ARM: dts: sun6i: ippo-q8h-v5: Fix serial0 alias ARM: dts: sunxi: Fix usb-phy support for sun4i/sun5i Signed-off-by: Olof Johansson <olof@lixom.net>
| | | * | | | | | | ARM: sunxi: dt: Fix aliasesMaxime Ripard2015-01-2515-50/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f77d55a3b56a ("serial: 8250_dw: get index of serial line from DT aliases") made the serial driver now use the serial aliases to get the tty number, pointing out that our aliases have been wrong all along. Remove them from the DTSI and add custom ones in the relevant boards. Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
| | | * | | | | | | ARM: dts: sun4i: Add simplefb node with de_fe0-de_be0-lcd0-hdmi pipelineHans de Goede2015-01-211-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Testing has shown that on sun4i the display backend engine does not have deep enough fifo-s causing flickering / tearing in full-hd mode due to fifo underruns. This can be avoided by letting the display frontend engine do the dma from memory, and then letting it feed the data directly into the backend unmodified, as the frontend does have deep enough fifo-s. Note since u-boot-v2015.01 has been released using the de_be0-lcd0-hdmi pipeline on sun4i, we need to keep that one around too (unfortunately). Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
| | | * | | | | | | ARM: dts: sun6i: ippo-q8h-v5: Fix serial0 aliasHans de Goede2015-01-061-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Ippo q8h has its serial console connected to the r-uart. Adjust the serial0 alias to match. This fixes the kernel serial console no longer working since 3.19-rc1, because 8250_dw.c now honors dt aliases, causing the serial console to be ttyS5 rather then being ttyS0, as it was in 3.18 and before. Note that adjusting bootargs instead is not an acceptable fix, because console=ttyS0,115200 is used by a lot of bootscripts, etc. and this should continue to work. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
| | | * | | | | | | ARM: dts: sunxi: Fix usb-phy support for sun4i/sun5iChen-Yu Tsai2014-12-213-6/+6
| | | | |_|_|_|/ / | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | usbphy0 support in the sunxi usb-phy driver has been merged, but the dtsi's for sun4i/sun5i haven't been updated. This results in the phy driver failing to load, breaking usb support. Fixes: 6827a46f5994 ('phy: sun4i: add support for USB phy0') Signed-off-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
| * | | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2015-02-013-3/+28
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input layer updates from Dmitry Torokhov: "Just a few quirks for PS/2 this time" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: elantech - add more Fujtisu notebooks to force crc_enabled Input: i8042 - add noloop quirk for Medion Akoya E7225 (MD98857) Input: synaptics - adjust min/max for Lenovo ThinkPad X1 Carbon 2nd
| | * | | | | | | | Input: elantech - add more Fujtisu notebooks to force crc_enabledRainer Koenig2015-02-011-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add two more Fujitsu LIFEBOOK models that also ship with the Elantech touchpad and don't work with crc_disabled to the quirk list. Signed-off-by: Rainer Koenig <Rainer.Koenig@ts.fujitsu.com> Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
| | * | | | | | | | Input: i8042 - add noloop quirk for Medion Akoya E7225 (MD98857)Jochen Hein2015-01-221-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without this the aux port does not get detected, and consequently the touchpad will not work. With this patch the touchpad is detected: $ dmesg | grep -E "(SYN|i8042|serio)" pnp 00:03: Plug and Play ACPI device, IDs SYN1d22 PNP0f13 (active) i8042: PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12 serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input4 psmouse serio1: synaptics: Touchpad model: 1, fw: 8.1, id: 0x1e2b1, caps: 0xd00123/0x840300/0x126800, board id: 2863, fw id: 1473085 input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio1/input/input6 dmidecode excerpt for this laptop is: Handle 0x0001, DMI type 1, 27 bytes System Information Manufacturer: Medion Product Name: Akoya E7225 Version: 1.0 Cc: stable@vger.kernel.org Signed-off-by: Jochen Hein <jochen@jochen.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
| | * | | | | | | | Input: synaptics - adjust min/max for Lenovo ThinkPad X1 Carbon 2ndPeter Hutterer2015-01-191-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LEN0037 found in the Lenovo ThinkPad X1 Carbon 2nd (2014 model) Cc: stable@vger.kernel.org Reported-and-tested-by: Bjoern Olausson <bjoern@olausson.de> Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
| * | | | | | | | | sched: don't cause task state changes in nested sleep debuggingLinus Torvalds2015-02-012-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8eb23b9f35aa ("sched: Debug nested sleeps") added code to report on nested sleep conditions, which we generally want to avoid because the inner sleeping operation can re-set the thread state to TASK_RUNNING, but that will then cause the outer sleep loop not actually sleep when it calls schedule. However, that's actually valid traditional behavior, with the inner sleep being some fairly rare case (like taking a sleeping lock that normally doesn't actually need to sleep). And the debug code would actually change the state of the task to TASK_RUNNING internally, which makes that kind of traditional and working code not work at all, because now the nested sleep doesn't just sometimes cause the outer one to not block, but will cause it to happen every time. In particular, it will cause the cardbus kernel daemon (pccardd) to basically busy-loop doing scheduling, converting a laptop into a heater, as reported by Bruno Prémont. But there may be other legacy uses of that nested sleep model in other drivers that are also likely to never get converted to the new model. This fixes both cases: - don't set TASK_RUNNING when the nested condition happens (note: even if WARN_ONCE() only _warns_ once, the return value isn't whether the warning happened, but whether the condition for the warning was true. So despite the warning only happening once, the "if (WARN_ON(..))" would trigger for every nested sleep. - in the cases where we knowingly disable the warning by using "sched_annotate_sleep()", don't change the task state (that is used for all core scheduling decisions), instead use '->task_state_change' that is used for the debugging decision itself. (Credit for the second part of the fix goes to Oleg Nesterov: "Can't we avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the suggested change to use 'task_state_change' as part of the test) Reported-and-bisected-by: Bruno Prémont <bonbons@linux-vserver.org> Tested-by: Rafael J Wysocki <rjw@rjwysocki.net> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>, Cc: Ilya Dryomov <ilya.dryomov@inktank.com>, Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com>, Cc: Davidlohr Bueso <dave@stgolabs.net>, Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>