summaryrefslogtreecommitdiffstats
path: root/arch
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2012-01-065-167/+133
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) sched/tracing: Add a new tracepoint for sleeptime sched: Disable scheduler warnings during oopses sched: Fix cgroup movement of waking process sched: Fix cgroup movement of newly created process sched: Fix cgroup movement of forking process sched: Remove cfs bandwidth period check in tg_set_cfs_period() sched: Fix load-balance lock-breaking sched: Replace all_pinned with a generic flags field sched: Only queue remote wakeups when crossing cache boundaries sched: Add missing rcu_dereference() around ->real_parent usage [S390] fix cputime overflow in uptime_proc_show [S390] cputime: add sparse checking and cleanup sched: Mark parent and real_parent as __rcu sched, nohz: Fix missing RCU read lock sched, nohz: Set the NOHZ_BALANCE_KICK flag for idle load balancer sched, nohz: Fix the idle cpu check in nohz_idle_balance sched: Use jump_labels for sched_feat sched/accounting: Fix parameter passing in task_group_account_field sched/accounting: Fix user/system tick double accounting sched/accounting: Re-use scheduler statistics for the root cgroup ... Fix up conflicts in - arch/ia64/include/asm/cputime.h, include/asm-generic/cputime.h usecs_to_cputime64() vs the sparse cleanups - kernel/sched/fair.c, kernel/time/tick-sched.c scheduler changes in multiple branches
| * Merge branch 'sched/core' of ↵Martin Schwidefsky2011-12-192-9/+9
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into cputime-tip Conflicts: drivers/cpufreq/cpufreq_conservative.c drivers/cpufreq/cpufreq_ondemand.c drivers/macintosh/rack-meter.c fs/proc/stat.c fs/proc/uptime.c kernel/sched/core.c
| | * Merge commit 'v3.2-rc5' into sched/coreIngo Molnar2011-12-15334-1594/+2377
| | |\ | | | | | | | | | | | | | | | | | | | | Merge reason: Pick up the latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | sched/accounting: Change cpustat fields to an arrayGlauber Costa2011-12-062-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes fields in cpustat from a structure, to an u64 array. Math gets easier, and the code is more flexible. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Tuner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1322498719-2255-2-git-send-email-glommer@parallels.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | [S390] cputime: add sparse checking and cleanupMartin Schwidefsky2011-12-153-157/+122
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make cputime_t and cputime64_t nocast to enable sparse checking to detect incorrect use of cputime. Drop the cputime macros for simple scalar operations. The conversion macros are still needed. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* | | | Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2012-01-0621-493/+1406
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (106 commits) perf kvm: Fix copy & paste error in description perf script: Kill script_spec__delete perf top: Fix a memory leak perf stat: Introduce get_ratio_color() helper perf session: Remove impossible condition check perf tools: Fix feature-bits rework fallout, remove unused variable perf script: Add generic perl handler to process events perf tools: Use for_each_set_bit() to iterate over feature flags perf tools: Unify handling of features when writing feature section perf report: Accept fifos as input file perf tools: Moving code in some files perf tools: Fix out-of-bound access to struct perf_session perf tools: Continue processing header on unknown features perf tools: Improve macros for struct feature_ops perf: builtin-record: Document and check that mmap_pages must be a power of two. perf: builtin-record: Provide advice if mmap'ing fails with EPERM. perf tools: Fix truncated annotation perf script: look up thread using tid instead of pid perf tools: Look up thread names for system wide profiling perf tools: Fix comm for processes with named threads ...
| * | | | perf events: Add Intel x86 mapping for PERF_COUNT_HW_REF_CPU_CYCLESStephane Eranian2011-12-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add event maps for Intel x86 processors (with architected PMU v2 or later). On AMD, there is frequency scaling but no Turbo. There is no core cycle event not subject to frequency scaling, therefore we do not provide a mapping. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1323559734-3488-4-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | perf events: Enable raw event support for Intel unhalted_reference_cycles eventStephane Eranian2011-12-213-18/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the encoding and definitions necessary for the unhalted_reference_cycles event avaialble since Intel Core 2 processors. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1323559734-3488-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | Merge commit 'v3.2-rc6' into perf/coreIngo Molnar2011-12-20339-1664/+2425
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: Update with the latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * \ \ \ \ Merge branch 'for-tip' of ↵Ingo Molnar2011-12-203-31/+372
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/core
| | * | | | | oprofile, s390: Add event interface to the System z hardware sampling moduleAndreas Krebbel2011-12-073-30/+372
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this patch the OProfile Basic Mode Sampling support for System z is enhanced with a counter file system. That way hardware sampling can be configured using the user space tools with only little modifications. With the patch by default new cpu_types (s390/z10, s390/z196) are returned in order to indicate that we are running a CPU which provides the hardware sampling facility. Existing user space tools will complain about an unknown cpu type. In order to be compatible with existing user space tools the `cpu_type' module parameter has been added. Setting the parameter to `timer' will force the module to return `timer' as cpu_type. The module will still try to use hardware sampling if available and the hwsampling virtual filesystem will be also be available for configuration. So this has a different effect than using the generic oprofile module parameter `timer=1'. If the basic mode sampling is enabled on the machine and the cpu_type=timer parameter is not used the kernel module will provide the following virtual filesystem: /dev/oprofile/0/enabled /dev/oprofile/0/event /dev/oprofile/0/count /dev/oprofile/0/unit_mask /dev/oprofile/0/kernel /dev/oprofile/0/user In the counter file system only the values of 'enabled', 'count', 'kernel', and 'user' are evaluated by the kernel module. Everything else must contain fixed values. The 'event' value only supports a single event - HWSAMPLING with value 0. The 'count' value specifies the hardware sampling rate as it is passed to the CPU measurement facility. The 'kernel' and 'user' flags can now be used to filter for samples when using hardware sampling. Additionally also the following file will be created: /dev/oprofile/timer/enabled This will always be the inverted value of /dev/oprofile/0/enabled. 0 is not accepted without hardware sampling. Signed-off-by: Andreas Krebbel <krebbel@linux.vnet.ibm.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
| | * | | | | oprofile: Fix oprofile_timer_exit() breakageRobert Richter2011-12-071-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removing remainings of oprofile_timer_exit() completly. Signed-off-by: Robert Richter <robert.richter@amd.com>
| * | | | | | perf, x86: Expose perf capability to other modulesGleb Natapov2011-12-062-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM needs to know perf capability to decide which PMU it can expose to a guest. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1320929850-10480-8-git-send-email-gleb@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | perf, x86: Implement arch event mask as quirkPeter Zijlstra2011-12-063-33/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement the disabling of arch events as a quirk so that we can print a message along with it. This creates some visibility into the problem space and could allow us to work on adding more work-around like the AAJ80 one. Requested-by: Ingo Molnar <mingo@elte.hu> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-wcja2z48wklzu1b0nkz0a5y7@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | x86, perf: Disable non available architectural eventsGleb Natapov2011-12-063-5/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel CPUs report non-available architectural events in cpuid leaf 0AH.EBX. Use it to disable events that are not available according to CPU. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1320929850-10480-7-git-send-email-gleb@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | jump_label, x86: Fix section mismatchPeter Zijlstra2011-12-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | WARNING: arch/x86/kernel/built-in.o(.text+0x4c71): Section mismatch in reference from the function arch_jump_label_transform_static() to the function .init.text:text_poke_early() The function arch_jump_label_transform_static() references the function __init text_poke_early(). This is often because arch_jump_label_transform_static lacks a __init annotation or the annotation of text_poke_early is wrong. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jason Baron <jbaron@redhat.com> Link: http://lkml.kernel.org/n/tip-9lefe89mrvurrwpqw5h8xm8z@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | perf, x86: Prefer fixed-purpose counters when schedulingPeter Zijlstra2011-12-061-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This avoids a scheduling failure for cases like: cycles, cycles, instructions, instructions (on Core2) Which would end up being programmed like: PMC0, PMC1, FP-instructions, fail Because all events will have the same weight. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-8tnwb92asqj7xajqqoty4gel@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | perf, x86: Fix event scheduler for constraints with overlapping countersRobert Richter2011-12-063-5/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current x86 event scheduler fails to resolve scheduling problems of certain combinations of events and constraints. This happens if the counter mask of such an event is not a subset of any other counter mask of a constraint with an equal or higher weight, e.g. constraints of the AMD family 15h pmu: counter mask weight amd_f15_PMC30 0x09 2 <--- overlapping counters amd_f15_PMC20 0x07 3 amd_f15_PMC53 0x38 3 The scheduler does not find then an existing solution. Here is an example: event code counter failure possible solution 0x02E PMC[3,0] 0 3 0x043 PMC[2:0] 1 0 0x045 PMC[2:0] 2 1 0x046 PMC[2:0] FAIL 2 The event scheduler may not select the correct counter in the first cycle because it needs to know which subsequent events will be scheduled. It may fail to schedule the events then. To solve this, we now save the scheduler state of events with overlapping counter counstraints. If we fail to schedule the events we rollback to those states and try to use another free counter. Constraints with overlapping counters are marked with a new introduced overlap flag. We set the overlap flag for such constraints to give the scheduler a hint which events to select for counter rescheduling. The EVENT_CONSTRAINT_OVERLAP() macro can be used for this. Care must be taken as the rescheduling algorithm is O(n!) which will increase scheduling cycles for an over-commited system dramatically. The number of such EVENT_CONSTRAINT_OVERLAP() macros and its counter masks must be kept at a minimum. Thus, the current stack is limited to 2 states to limit the number of loops the algorithm takes in the worst case. On systems with no overlapping-counter constraints, this implementation does not increase the loop count compared to the previous algorithm. V2: * Renamed redo -> overlap. * Reimplementation using perf scheduling helper functions. V3: * Added WARN_ON_ONCE() if out of save states. * Changed function interface of perf_sched_restore_state() to use bool as return value. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1321616122-1533-3-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | perf, x86: Implement event scheduler helper functionsRobert Richter2011-12-061-53/+132
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces x86 perf scheduler code helper functions. We need this to later add more complex functionality to support overlapping counter constraints (next patch). The algorithm is modified so that the range of weight values is now generated from the constraints. There shouldn't be other functional changes. With the helper functions the scheduler is controlled. There are functions to initialize, traverse the event list, find unused counters etc. The scheduler keeps its own state. V3: * Added macro for_each_set_bit_cont(). * Changed functions interfaces of perf_sched_find_counter() and perf_sched_next_event() to use bool as return value. * Added some comments to make code better understandable. V4: * Fix broken event assignment if weight of the first event is not wmin (perf_sched_init()). Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1321616122-1533-2-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | Merge branch 'perf/urgent' into perf/coreIngo Molnar2011-12-06112-1709/+1242
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: Add these cherry-picked commits so that future changes on perf/core don't conflict. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | | x86/tools: Add decoded instruction dump modeMasami Hiramatsu2011-12-051-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add instruction dump mode to insn_sanity tool for checking decoder really decoded instructions. This mode is enabled when passing double -v (-vv) to insn_sanity. It is useful for who wants to check whether the decoder can decode some instructions correctly. e.g. $ echo 0f 73 10 11 | ./insn_sanity -y -vv -i - Instruction = { .prefixes = { .value = 0, bytes[] = {0, 0, 0, 0}, .got = 1, .nbytes = 0}, .rex_prefix = { .value = 0, bytes[] = {0, 0, 0, 0}, .got = 1, .nbytes = 0}, .vex_prefix = { .value = 0, bytes[] = {0, 0, 0, 0}, .got = 1, .nbytes = 0}, .opcode = { .value = 29455, bytes[] = {f, 73, 0, 0}, .got = 1, .nbytes = 2}, .modrm = { .value = 16, bytes[] = {10, 0, 0, 0}, .got = 1, .nbytes = 1}, .sib = { .value = 0, bytes[] = {0, 0, 0, 0}, .got = 1, .nbytes = 0}, .displacement = { .value = 0, bytes[] = {0, 0, 0, 0}, .got = 1, .nbytes = 0}, .immediate1 = { .value = 17, bytes[] = {11, 0, 0, 0}, .got = 1, .nbytes = 1}, .immediate2 = { .value = 0, bytes[] = {0, 0, 0, 0}, .got = 0, .nbytes = 0}, .attr = 44800, .opnd_bytes = 4, .addr_bytes = 8, .length = 4, .x86_64 = 1, .kaddr = 0x7fff0f7d9430} Success: decoded and checked 1 given instructions with 0 errors (seed:0x0) Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/20111205120603.15475.91192.stgit@cloud Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | | x86: Update instruction decoder to support new AVX formatsMasami Hiramatsu2011-12-052-282/+345
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since new Intel software developers manual introduces new format for AVX instruction set (including AVX2), it is important to update x86-opcode-map.txt to fit those changes. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/20111205120557.15475.13236.stgit@cloud Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | | x86/tools: Fix insn_sanity message outputsMasami Hiramatsu2011-12-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix x86 instruction decoder test to dump all error messages to stderr and others to stdout. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/20111205120550.15475.70149.stgit@cloud Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | | x86/tools: Fix instruction decoder message outputMasami Hiramatsu2011-12-051-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix instruction decoder test (insn_sanity), so that it doesn't show both info and error messages twice on same instruction. (In that case, show only error message) Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/20111205120545.15475.7928.stgit@cloud Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | | x86: Fix instruction decoder to handle grouped AVX instructionsMasami Hiramatsu2011-12-052-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For reducing memory usage of attribute table, x86 instruction decoder puts "Group" attribute only on "no-last-prefix" attribute table (same as vex_p == 0 case). Thus, the decoder should look no-last-prefix table first, and then only if it is not a group, move on to "with-last-prefix" table (vex_p != 0). However, current implementation, inat_get_avx_attribute() looks with-last-prefix directly. So, when decoding a grouped AVX instruction, the decoder fails to find correct group because there is no "Group" attribute on the table. This ends up with the mis-decoding of instructions, as Ingo reported in http://thread.gmane.org/gmane.linux.kernel/1214103 This patch fixes it to check no-last-prefix table first even if that is an AVX instruction, and get an attribute from "with last-prefix" table only if that is not a group. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/20111205120539.15475.91428.stgit@cloud Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | | x86/tools: Fix Makefile to build all test toolsMasami Hiramatsu2011-12-051-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix arch/x86/tools/Makefile to compile both test tools correctly. This bug leads build error. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/20111205120533.15475.62047.stgit@cloud Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | | Merge branch 'core' of git://amd64.org/linux/rric into perf/coreIngo Molnar2011-11-155-76/+33
| |\ \ \ \ \ \ \ | | | |/ / / / / | | |/| | | | |
| | * | | | | | Merge branch 'perf/core' into oprofile/masterRobert Richter2011-11-0850-443/+789
| | |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: Resolve conflicts with Don's NMI rework: commit 9c48f1c629ecfa114850c03f875c6691003214de Author: Don Zickus <dzickus@redhat.com> Date: Fri Sep 30 15:06:21 2011 -0400 x86, nmi: Wire up NMI handlers to new routines Conflicts: arch/x86/oprofile/nmi_timer_int.c Signed-off-by: Robert Richter <robert.richter@amd.com>
| | * | | | | | | oprofile, x86: Reimplement nmi timer mode using perf eventRobert Richter2011-11-044-90/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The legacy x86 nmi watchdog code was removed with the implementation of the perf based nmi watchdog. This broke Oprofile's nmi timer mode. To run nmi timer mode we relied on a continuous ticking nmi source which the nmi watchdog provided. The nmi tick was no longer available and current watchdog can not be used anymore since it runs with very long periods in the range of seconds. This patch reimplements the nmi timer mode using a perf counter nmi source. V2: * removing pr_info() * fix undefined reference to `__udivdi3' for 32 bit build * fix section mismatch of .cpuinit.data:nmi_timer_cpu_nb * removed nmi timer setup in arch/x86 * implemented function stubs for op_nmi_init/exit() * made code more readable in oprofile_init() V3: * fix architectural initialization in oprofile_init() * fix CONFIG_OPROFILE_NMI_TIMER dependencies Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Robert Richter <robert.richter@amd.com>
| | * | | | | | | oprofile, x86: Add kernel parameter oprofile.cpu_type=timerRobert Richter2011-11-041-6/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need this to better test x86 NMI timer mode. Otherwise it is very hard to setup systems with NMI timer enabled, but we have this e.g. in virtual machine environments. Signed-off-by: Robert Richter <robert.richter@amd.com>
| * | | | | | | | x86, perf: Add a build-time sanity test to the x86 decoderMasami Hiramatsu2011-11-103-1/+291
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a sanity test of x86 insn decoder against a stream of randomly generated input, at build time. This test is also able to reproduce any bug that might trigger by allowing the passing of random-seed and iteration-number to the test, or by passing input which has invalid byte code. Changes in V2: - Code cleanup. - Show how to reproduce the error by insn_sanity test. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: acme@redhat.com Cc: ming.m.lin@intel.com Cc: robert.richter@amd.com Cc: ravitillo@lbl.gov Cc: yrl.pp-manager.tt@hitachi.com Cc: Andi Kleen <andi@firstfloor.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20111020140109.20938.92572.stgit@localhost.localdomain Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | | | | | | Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds2012-01-0624-46/+97
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) cpu: Export cpu_up() rcu: Apply ACCESS_ONCE() to rcu_boost() return value Revert "rcu: Permit rt_mutex_unlock() with irqs disabled" docs: Additional LWN links to RCU API rcu: Augment rcu_batch_end tracing for idle and callback state rcu: Add rcutorture tests for srcu_read_lock_raw() rcu: Make rcutorture test for hotpluggability before offlining CPUs driver-core/cpu: Expose hotpluggability to the rest of the kernel rcu: Remove redundant rcu_cpu_stall_suppress declaration rcu: Adaptive dyntick-idle preparation rcu: Keep invoking callbacks if CPU otherwise idle rcu: Irq nesting is always 0 on rcu_enter_idle_common rcu: Don't check irq nesting from rcu idle entry/exit rcu: Permit dyntick-idle with callbacks pending rcu: Document same-context read-side constraints rcu: Identify dyntick-idle CPUs on first force_quiescent_state() pass rcu: Remove dynticks false positives and RCU failures rcu: Reduce latency of rcu_prepare_for_idle() rcu: Eliminate RCU_FAST_NO_HZ grace-period hang rcu: Avoid needlessly IPIing CPUs at GP end ...
| * \ \ \ \ \ \ \ \ Merge branch 'rcu/next' of ↵Ingo Molnar2011-12-1424-46/+97
| |\ \ \ \ \ \ \ \ \ | | |_|_|_|_|_|/ / / | |/| | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
| | * | | | | | | | nohz: Remove tick_nohz_idle_enter_norcu() / tick_nohz_idle_exit_norcu()Frederic Weisbecker2011-12-1115-38/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Those two APIs were provided to optimize the calls of tick_nohz_idle_enter() and rcu_idle_enter() into a single irq disabled section. This way no interrupt happening in-between would needlessly process any RCU job. Now we are talking about an optimization for which benefits have yet to be measured. Let's start simple and completely decouple idle rcu and dyntick idle logics to simplify. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| | * | | | | | | | tile: Make tile use the new is_idle_task() APIPaul E. McKenney2011-12-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change from direct comparison of ->pid with zero to is_idle_task(). Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Chris Metcalf <cmetcalf@tilera.com>
| | * | | | | | | | sparc: Make SPARC use the new is_idle_task() APIPaul E. McKenney2011-12-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change from direct comparison of ->pid with zero to is_idle_task(). Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
| | * | | | | | | | powerpc: Tell RCU about idle after hcall tracingPaul E. McKenney2011-12-112-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PowerPC pSeries platform (CONFIG_PPC_PSERIES=y) enables hypervisor-call tracing for CONFIG_TRACEPOINTS=y kernels. One of the hypervisor calls that is traced is the H_CEDE call in the idle loop that tells the hypervisor that this OS instance no longer needs the current CPU. However, tracing uses RCU, so this combination of kernel configuration variables needs to avoid telling RCU about the current CPU's idleness until after the H_CEDE-entry tracing completes on the one hand, and must tell RCU that the the current CPU is no longer idle before the H_CEDE-exit tracing starts. In all other cases, it suffices to inform RCU of CPU idleness upon idle-loop entry and exit. This commit makes the required adjustments. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
| | * | | | | | | | x86: Call idle notifier after irq_enter()Frederic Weisbecker2011-12-115-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Interrupts notify the idle exit state before calling irq_enter(). But the notifier code calls rcu_read_lock() and this is not allowed while rcu is in an extended quiescent state. We need to wait for irq_enter() -> rcu_idle_exit() to be called before doing so otherwise this results in a grumpy RCU: [ 0.099991] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110() [ 0.099991] Hardware name: AMD690VM-FMH [ 0.099991] Modules linked in: [ 0.099991] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #255 [ 0.099991] Call Trace: [ 0.099991] <IRQ> [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0 [ 0.099991] [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20 [ 0.099991] [<ffffffff817d6fa2>] __atomic_notifier_call_chain+0xd2/0x110 [ 0.099991] [<ffffffff817d6ff1>] atomic_notifier_call_chain+0x11/0x20 [ 0.099991] [<ffffffff81001873>] exit_idle+0x43/0x50 [ 0.099991] [<ffffffff81020439>] smp_apic_timer_interrupt+0x39/0xa0 [ 0.099991] [<ffffffff817da253>] apic_timer_interrupt+0x13/0x20 [ 0.099991] <EOI> [<ffffffff8100ae67>] ? default_idle+0xa7/0x350 [ 0.099991] [<ffffffff8100ae65>] ? default_idle+0xa5/0x350 [ 0.099991] [<ffffffff8100b19b>] amd_e400_idle+0x8b/0x110 [ 0.099991] [<ffffffff810cb01f>] ? rcu_enter_nohz+0x8f/0x160 [ 0.099991] [<ffffffff810019a0>] cpu_idle+0xb0/0x110 [ 0.099991] [<ffffffff817a7505>] rest_init+0xe5/0x140 [ 0.099991] [<ffffffff817a7468>] ? rest_init+0x48/0x140 [ 0.099991] [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc [ 0.099991] [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135 [ 0.099991] [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Henroid <andrew.d.henroid@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
| | * | | | | | | | x86: Enter rcu extended qs after idle notifier callFrederic Weisbecker2011-12-111-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The idle notifier, called by enter_idle(), enters into rcu read side critical section but at that time we already switched into the RCU-idle window (rcu_idle_enter() has been called). And it's illegal to use rcu_read_lock() in that state. This results in rcu reporting its bad mood: [ 1.275635] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110() [ 1.275635] Hardware name: AMD690VM-FMH [ 1.275635] Modules linked in: [ 1.275635] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #252 [ 1.275635] Call Trace: [ 1.275635] [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0 [ 1.275635] [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20 [ 1.275635] [<ffffffff817d6f22>] __atomic_notifier_call_chain+0xd2/0x110 [ 1.275635] [<ffffffff817d6f71>] atomic_notifier_call_chain+0x11/0x20 [ 1.275635] [<ffffffff810018a0>] enter_idle+0x20/0x30 [ 1.275635] [<ffffffff81001995>] cpu_idle+0xa5/0x110 [ 1.275635] [<ffffffff817a7465>] rest_init+0xe5/0x140 [ 1.275635] [<ffffffff817a73c8>] ? rest_init+0x48/0x140 [ 1.275635] [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc [ 1.275635] [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135 [ 1.275635] [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4 [ 1.275635] ---[ end trace a22d306b065d4a66 ]--- Fix this by entering rcu extended quiescent state later, just before the CPU goes to sleep. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
| | * | | | | | | | nohz: Allow rcu extended quiescent state handling seperately from tick stopFrederic Weisbecker2011-12-1116-34/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is assumed that rcu won't be used once we switch to tickless mode and until we restart the tick. However this is not always true, as in x86-64 where we dereference the idle notifiers after the tick is stopped. To prepare for fixing this, add two new APIs: tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu(). If no use of RCU is made in the idle loop between tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch must instead call the new *_norcu() version such that the arch doesn't need to call rcu_idle_enter() and rcu_idle_exit(). Otherwise the arch must call tick_nohz_enter_idle() and tick_nohz_exit_idle() and also call explicitly: - rcu_idle_enter() after its last use of RCU before the CPU is put to sleep. - rcu_idle_exit() before the first use of RCU after the CPU is woken up. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: David Miller <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| | * | | | | | | | nohz: Separate out irq exit and idle loop dyntick logicFrederic Weisbecker2011-12-1116-34/+34
| | | |_|_|_|_|_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tick_nohz_stop_sched_tick() function, which tries to delay the next timer tick as long as possible, can be called from two places: - From the idle loop to start the dytick idle mode - From interrupt exit if we have interrupted the dyntick idle mode, so that we reprogram the next tick event in case the irq changed some internal state that requires this action. There are only few minor differences between both that are handled by that function, driven by the ts->inidle cpu variable and the inidle parameter. The whole guarantees that we only update the dyntick mode on irq exit if we actually interrupted the dyntick idle mode, and that we enter in RCU extended quiescent state from idle loop entry only. Split this function into: - tick_nohz_idle_enter(), which sets ts->inidle to 1, enters dynticks idle mode unconditionally if it can, and enters into RCU extended quiescent state. - tick_nohz_irq_exit() which only updates the dynticks idle mode when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called). To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed into tick_nohz_idle_exit(). This simplifies the code and micro-optimize the irq exit path (no need for local_irq_save there). This also prepares for the split between dynticks and rcu extended quiescent state logics. We'll need this split to further fix illegal uses of RCU in extended quiescent states in the idle loop. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: David Miller <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* | | | | | | | | Merge branch 'core-memblock-for-linus' of ↵Linus Torvalds2012-01-0664-772/+240
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) memblock: Reimplement memblock allocation using reverse free area iterator memblock: Kill early_node_map[] score: Use HAVE_MEMBLOCK_NODE_MAP s390: Use HAVE_MEMBLOCK_NODE_MAP mips: Use HAVE_MEMBLOCK_NODE_MAP ia64: Use HAVE_MEMBLOCK_NODE_MAP SuperH: Use HAVE_MEMBLOCK_NODE_MAP sparc: Use HAVE_MEMBLOCK_NODE_MAP powerpc: Use HAVE_MEMBLOCK_NODE_MAP memblock: Implement memblock_add_node() memblock: s/memblock_analyze()/memblock_allow_resize()/ and update users memblock: Track total size of regions automatically powerpc: Cleanup memblock usage memblock: Reimplement memblock_enforce_memory_limit() using __memblock_remove() memblock: Make memblock functions handle overflowing range @size memblock: Reimplement __memblock_remove() using memblock_isolate_range() memblock: Separate out memblock_isolate_range() from memblock_set_node() memblock: Kill memblock_init() memblock: Kill sentinel entries at the end of static region arrays memblock: Add __memblock_dump_all() ...
| * \ \ \ \ \ \ \ \ Merge branch 'memblock-kill-early_node_map' of ↵Ingo Molnar2011-12-2064-772/+240
| |\ \ \ \ \ \ \ \ \ | | |_|_|_|_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/memblock
| | * | | | | | | | memblock: Kill early_node_map[]Tejun Heo2011-12-088-24/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now all ARCH_POPULATES_NODE_MAP archs select HAVE_MEBLOCK_NODE_MAP - there's no user of early_node_map[] left. Kill early_node_map[] and replace ARCH_POPULATES_NODE_MAP with HAVE_MEMBLOCK_NODE_MAP. Also, relocate for_each_mem_pfn_range() and helper from mm.h to memblock.h as page_alloc.c would no longer host an alternative implementation. This change is ultimately one to one mapping and shouldn't cause any observable difference; however, after the recent changes, there are some functions which now would fit memblock.c better than page_alloc.c and dependency on HAVE_MEMBLOCK_NODE_MAP instead of HAVE_MEMBLOCK doesn't make much sense on some of them. Further cleanups for functions inside HAVE_MEMBLOCK_NODE_MAP in mm.h would be nice. -v2: Fix compile bug introduced by mis-spelling CONFIG_HAVE_MEMBLOCK_NODE_MAP to CONFIG_MEMBLOCK_HAVE_NODE_MAP in mmzone.h. Reported by Stephen Rothwell. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com>
| | * | | | | | | | score: Use HAVE_MEMBLOCK_NODE_MAPTejun Heo2011-12-082-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | score used early_node_map[] just to prime free_area_init_nodes(). Now memblock can be used for the same purpose and early_node_map[] is scheduled to be dropped. Use memblock instead. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Lennox Wu <lennox.wu@gmail.com>
| | * | | | | | | | s390: Use HAVE_MEMBLOCK_NODE_MAPTejun Heo2011-12-082-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | s390 used early_node_map[] just to prime free_area_init_nodes(). Now memblock can be used for the same purpose and early_node_map[] is scheduled to be dropped. Use memblock instead. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux-s390@vger.kernel.org
| | * | | | | | | | mips: Use HAVE_MEMBLOCK_NODE_MAPTejun Heo2011-12-083-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mips used early_node_map[] just to prime free_area_init_nodes(). Now memblock can be used for the same purpose and early_node_map[] is scheduled to be dropped. Use memblock instead. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: linux-mips@linux-mips.org
| | * | | | | | | | ia64: Use HAVE_MEMBLOCK_NODE_MAPTejun Heo2011-12-083-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ia64 used early_node_map[] just to prime free_area_init_nodes(). Now memblock can be used for the same purpose and early_node_map[] is scheduled to be dropped. Use memblock instead. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: linux-ia64@vger.kernel.org
| | * | | | | | | | SuperH: Use HAVE_MEMBLOCK_NODE_MAPTejun Heo2011-12-082-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sh doesn't access early_node_map[] directly and enabling HAVE_MEMBLOCK_NODE_MAP is trivial - replacing add_active_range() calls with memblock_set_node() and selecting HAVE_MEMBLOCK_NODE_MAP is enough. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: linux-sh@vger.kernel.org
| | * | | | | | | | sparc: Use HAVE_MEMBLOCK_NODE_MAPTejun Heo2011-12-082-20/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sparc doesn't access early_node_map[] directly and enabling HAVE_MEMBLOCK_NODE_MAP is trivial - replacing add_active_range() calls with memblock_set_node() and selecting HAVE_MEMBLOCK_NODE_MAP is enough. -v2: Use select in Kconfig instead as suggested by Sam Ravnborg. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: "David S. Miller" <davem@davemloft.net> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: sparclinux@vger.kernel.org