summaryrefslogtreecommitdiffstats
path: root/arch/x86/lib
Commit message (Collapse)AuthorAgeFilesLines
* x86, mem: Optimize memmove for small size and unaligned casesMa Ling2010-09-242-76/+362
| | | | | | | | | | | | | | | movs instruction will combine data to accelerate moving data, however we need to concern two cases about it. 1. movs instruction need long lantency to startup, so here we use general mov instruction to copy data. 2. movs instruction is not good for unaligned case, even if src offset is 0x10, dest offset is 0x0, we avoid and handle the case by general mov instruction. Signed-off-by: Ma Ling <ling.ma@intel.com> LKML-Reference: <1284664360-6138-1-git-send-email-ling.ma@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* x86, mem: Optimize memcpy by avoiding memory false dependeceMa Ling2010-08-232-59/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All read operations after allocation stage can run speculatively, all write operation will run in program order, and if addresses are different read may run before older write operation, otherwise wait until write commit. However CPU don't check each address bit, so read could fail to recognize different address even they are in different page.For example if rsi is 0xf004, rdi is 0xe008, in following operation there will generate big performance latency. 1. movq (%rsi), %rax 2. movq %rax, (%rdi) 3. movq 8(%rsi), %rax 4. movq %rax, 8(%rdi) If %rsi and rdi were in really the same meory page, there are TRUE read-after-write dependence because instruction 2 write 0x008 and instruction 3 read 0x00c, the two address are overlap partially. Actually there are in different page and no any issues, but without checking each address bit CPU could think they are in the same page, and instruction 3 have to wait for instruction 2 to write data into cache from write buffer, then load data from cache, the cost time read spent is equal to mfence instruction. We may avoid it by tuning operation sequence as follow. 1. movq 8(%rsi), %rax 2. movq %rax, 8(%rdi) 3. movq (%rsi), %rax 4. movq %rax, (%rdi) Instruction 3 read 0x004, instruction 2 write address 0x010, no any dependence. At last on Core2 we gain 1.83x speedup compared with original instruction sequence. In this patch we first handle small size(less 20bytes), then jump to different copy mode. Based on our micro-benchmark small bytes from 1 to 127 bytes, we got up to 2X improvement, and up to 1.5X improvement for 1024 bytes on Corei7. (We use our micro-benchmark, and will do further test according to your requirment) Signed-off-by: Ma Ling <ling.ma@intel.com> LKML-Reference: <1277753065-18610-1-git-send-email-ling.ma@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* x86, mem: Don't implement forward memmove() as memcpy()Ma, Ling2010-08-232-16/+68
| | | | | | | | | | | memmove() allow source and destination address to be overlap, but there is no such limitation for memcpy(). Therefore, explicitly implement memmove() in both the forwards and backward directions, to give us the ability to optimize memcpy(). Signed-off-by: Ma Ling <ling.ma@intel.com> LKML-Reference: <C10D3FB0CD45994C8A51FEC1227CE22F0E483AD86A@shsmsx502.ccr.corp.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* Merge branch 'x86/urgent' of ↵Linus Torvalds2010-08-131-108/+130
|\ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, asm: Use a lower case name for the end macro in atomic64_386_32.S x86, asm: Refactor atomic64_386_32.S to support old binutils and be cleaner x86: Document __phys_reloc_hide() usage in __pa_symbol() x86, apic: Map the local apic when parsing the MP table.
| * x86, asm: Use a lower case name for the end macro in atomic64_386_32.SLuca Barbieri2010-08-121-18/+20
| | | | | | | | | | | | | | | | | | Use a lowercase name for the end macro, which somehow fixes a binutils 2.16 problem. Signed-off-by: Luca Barbieri <luca@luca-barbieri.com> LKML-Reference: <tip-30246557a06bb20618bed906a06d1e1e0faa8bb4@git.kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86, asm: Refactor atomic64_386_32.S to support old binutils and be cleanerLuca Barbieri2010-08-111-108/+128
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old code didn't work on binutils 2.12 because setting a symbol to a register apparently requires a fairly recent version. This commit refactors the code to use the C preprocessor instead, and in the process makes the whole code a bit easier to understand. The object code produced is unchanged as expected. This fixes kernel bugzilla 16506. Reported-by: Dieter Stussy <kd6lvw+software@kd6lvw.ampr.org> Signed-off-by: Luca Barbieri <luca@luca-barbieri.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@kernel.org> 2.6.35 LKML-Reference: <tip-*@git.kernel.org>
* | Merge branch 'x86-alternatives-for-linus' of ↵Linus Torvalds2010-08-065-5/+5
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, alternatives: BUG on encountering an invalid CPU feature number x86, alternatives: Fix one more open-coded 8-bit alternative number x86, alternatives: Use 16-bit numbers for cpufeature index
| * | x86, alternatives: Fix one more open-coded 8-bit alternative numberH. Peter Anvin2010-07-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a missing case of an 8-bit alternative number, buried inside an assembly macro. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Reported-by: Yinghai Lu <yinhai@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <4C3BDDA3.2060900@kernel.org>
| * | x86, alternatives: Use 16-bit numbers for cpufeature indexH. Peter Anvin2010-07-074-4/+4
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | We already have cpufeature indicies above 255, so use a 16-bit number for the alternatives index. This consumes a padding field and so doesn't add any size, but it means that abusing the padding field to create assembly errors on overflow no longer works. We can retain the test simply by redirecting it to the .discard section, however. [ v3: updated to include open-coded locations ] Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <tip-f88731e3068f9d1392ba71cc9f50f035d26a0d4f@git.kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | x86, asm: Merge cmpxchg_486_u64() and cmpxchg8b_emu()H. Peter Anvin2010-07-281-18/+0
| | | | | | | | | | | | | | | | | | | | | | We have two functions for doing exactly the same thing -- emulating cmpxchg8b on 486 and older hardware -- with different calling conventions, and yet doing the same thing. Drop the C version and use the assembly version, via alternatives, for both the local and non-local versions of cmpxchg8b. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <AANLkTikAmaDPji-TVDarmG1yD=fwbffcsmEU=YEuP+8r@mail.gmail.com>
* | x86, asm: Move cmpxchg emulation code to arch/x86/libH. Peter Anvin2010-07-282-0/+73
|/ | | | | | | | | Move cmpxchg emulation code from arch/x86/kernel/cpu (which is otherwise CPU identification) to arch/x86/lib, where other emulation code lives already. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <AANLkTikAmaDPji-TVDarmG1yD=fwbffcsmEU=YEuP+8r@mail.gmail.com>
* Merge branch 'x86-atomic-for-linus' of ↵Linus Torvalds2010-05-184-223/+451
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-atomic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix LOCK_PREFIX_HERE for uniprocessor build x86, atomic64: In selftest, distinguish x86-64 from 586+ x86-32: Fix atomic64_inc_not_zero return value convention lib: Fix atomic64_inc_not_zero test lib: Fix atomic64_add_unless return value convention x86-32: Fix atomic64_add_unless return value convention lib: Fix atomic64_add_unless test x86: Implement atomic[64]_dec_if_positive() lib: Only test atomic64_dec_if_positive on archs having it x86-32: Rewrite 32-bit atomic64 functions in assembly lib: Add self-test for atomic64_t x86-32: Allow UP/SMP lock replacement in cmpxchg64 x86: Add support for lock prefix in alternatives
| * Merge branch 'x86/asm' into x86/atomicH. Peter Anvin2010-04-294-27/+103
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: Conflict between LOCK_PREFIX_HERE and relative alternatives pointers Resolved Conflicts: arch/x86/include/asm/alternative.h arch/x86/kernel/alternative.c Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86-32: Fix atomic64_inc_not_zero return value conventionLuca Barbieri2010-03-012-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | atomic64_inc_not_zero must return 1 if it perfomed the add and 0 otherwise. It was doing the opposite thing. Signed-off-by: Luca Barbieri <luca@luca-barbieri.com> LKML-Reference: <1267469749-11878-6-git-send-email-luca@luca-barbieri.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86-32: Fix atomic64_add_unless return value conventionLuca Barbieri2010-03-012-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | atomic64_add_unless must return 1 if it perfomed the add and 0 otherwise. The implementation did the opposite thing. Reported-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Luca Barbieri <luca@luca-barbieri.com> LKML-Reference: <1267469749-11878-3-git-send-email-luca@luca-barbieri.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86-32: Rewrite 32-bit atomic64 functions in assemblyLuca Barbieri2010-02-254-223/+453
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch replaces atomic64_32.c with two assembly implementations, one for 386/486 machines using pushf/cli/popf and one for 586+ machines using cmpxchg8b. The cmpxchg8b implementation provides the following advantages over the current one: 1. Implements atomic64_add_unless, atomic64_dec_if_positive and atomic64_inc_not_zero 2. Uses the ZF flag changed by cmpxchg8b instead of doing a comparison 3. Uses custom register calling conventions that reduce or eliminate register moves to suit cmpxchg8b 4. Reads the initial value instead of using cmpxchg8b to do that. Currently we use lock xaddl and movl, which seems the fastest. 5. Does not use the lock prefix for atomic64_set 64-bit writes are already atomic, so we don't need that. We still need it for atomic64_read to avoid restoring a value changed in the meantime. 6. Allocates registers as well or better than gcc The 386 implementation provides support for 386 and 486 machines. 386/486 SMP is not supported (we dropped it), but such support can be added easily if desired. A pure assembly implementation is required due to the custom calling conventions, and desire to use %ebp in atomic64_add_return (we need 7 registers...), as well as the ability to use pushf/popf in the 386 code without an intermediate pop/push. The parameter names are changed to match the convention in atomic_64.h Changes in v3 (due to rebasing to tip/x86/asm): - Patches atomic64_32.h instead of atomic_32.h - Uses the CALL alternative mechanism from commit 1b1d9258181bae199dc940f4bd0298126b9a73d9 Changes in v2: - Merged 386 and cx8 support in the same patch - 386 support now done in assembly, C code no longer used at all - cmpxchg64 is used for atomic64_cmpxchg - stop using macros, use one-line inline functions instead - miscellanous changes and improvements Signed-off-by: Luca Barbieri <luca@luca-barbieri.com> LKML-Reference: <1267005265-27958-5-git-send-email-luca@luca-barbieri.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | Merge branch 'perf/urgent' into perf/coreIngo Molnar2010-05-071-1/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | Merge reason: Resolve patch dependency Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | Fix the x86_64 implementation of call_rwsem_wait()David Howells2010-05-041-1/+1
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | The x86_64 call_rwsem_wait() treats the active state counter part of the R/W semaphore state as being 16-bit when it's actually 32-bit (it's half of the 64-bit state). It should do "decl %edx" not "decw %dx". Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* / | perf, x86: Add INSTRUCTION_DECODER config flagIngo Molnar2010-03-101-1/+1
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PEBS+LBR decoding magic needs the insn_get_length() infrastructure to be able to decode x86 instruction length. So split it out of KPROBES dependency and make it enabled when either KPROBES or PERF_EVENTS is enabled. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | Merge branch 'x86-rwsem-for-linus' of ↵Linus Torvalds2010-02-282-0/+82
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-rwsem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-64, rwsem: Avoid store forwarding hazard in __downgrade_write x86-64, rwsem: 64-bit xadd rwsem implementation x86: Fix breakage of UML from the changes in the rwsem system x86-64: support native xadd rwsem implementation x86: clean up rwsem type system
| * | x86-64: support native xadd rwsem implementationLinus Torvalds2010-01-132-0/+82
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This one is much faster than the spinlock based fallback rwsem code, with certain artifical benchmarks having shown 300%+ improvement on threaded page faults etc. Again, note the 32767-thread limit here. So this really does need that whole "make rwsem_count_t be 64-bit and fix the BIAS values to match" extension on top of it, but that is conceptually a totally independent issue. NOT TESTED! The original patch that this all was based on were tested by KAMEZAWA Hiroyuki, but maybe I screwed up something when I created the cleaned-up series, so caveat emptor.. Also note that it _may_ be a good idea to mark some more registers clobbered on x86-64 in the inline asms instead of saving/restoring them. They are inline functions, but they are only used in places where there are not a lot of live registers _anyway_, so doing for example the clobbers of %r8-%r11 in the asm wouldn't make the fast-path code any worse, and would make the slow-path code smaller. (Not that the slow-path really matters to that degree. Saving a few unnecessary registers is the _least_ of our problems when we hit the slow path. The instruction/cycle counting really only matters in the fast path). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <alpine.LFD.2.00.1001121810410.17145@localhost.localdomain> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | Merge branch 'x86-io-for-linus' of ↵Linus Torvalds2010-02-282-26/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-io-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Merge io.h x86: Simplify flush_write_buffers() x86: Clean up mem*io functions. x86-64: Use BUILDIO in io_64.h x86-64: Reorganize io_64.h x86-32: Remove _local variants of in/out from io_32.h x86-32: Move XQUAD definitions to numaq.h
| * | x86: Clean up mem*io functions.Brian Gerst2010-02-052-26/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Iomem has no special significance on x86. Use the standard mem* functions instead of trying to call other versions. Some fixups are needed to match the function prototypes. Signed-off-by: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1265380629-3212-6-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds2010-02-282-1/+20
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, cacheinfo: Enable L3 CID only on AMD x86, cacheinfo: Remove NUMA dependency, fix for AMD Fam10h rev D1 x86, cpu: Print AMD virtualization features in /proc/cpuinfo x86, cacheinfo: Calculate L3 indices x86, cacheinfo: Add cache index disable sysfs attrs only to L3 caches x86, cacheinfo: Fix disabling of L3 cache indices intel-agp: Switch to wbinvd_on_all_cpus x86, lib: Add wbinvd smp helpers
| * | x86, lib: Add wbinvd smp helpersBorislav Petkov2010-01-222-1/+20
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | Add wbinvd_on_cpu and wbinvd_on_all_cpus stubs for executing wbinvd on a particular CPU. [ hpa: renamed lib/smp.c to lib/cache-smp.c ] [ hpa: wbinvd_on_all_cpus() returns int, but wbinvd() returns void. Thus, the former cannot be a macro for the latter, replace with an inline function. ] Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1264172467-25155-2-git-send-email-bp@amd64.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | x86-64: Modify memcpy()/memset() alternatives mechanismJan Beulich2009-12-302-27/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to avoid unnecessary chains of branches, rather than implementing memcpy()/memset()'s access to their alternative implementations via a jump, patch the (larger) original function directly. The memcpy() part of this is slightly subtle: while alternative instruction patching does itself use memcpy(), with the replacement block being less than 64-bytes in size the main loop of the original function doesn't get used for copying memcpy_c() over memcpy(), and hence we can safely write over its beginning. Also note that the CFI annotations are fine for both variants of each of the functions. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4B2BB8D30200007800026AF2@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | x86-64: Modify copy_user_generic() alternatives mechanismJan Beulich2009-12-301-6/+0
|/ | | | | | | | | | | | | | | | | | In order to avoid unnecessary chains of branches, rather than implementing copy_user_generic() as a function consisting of just a single (possibly patched) branch, instead properly deal with patching call instructions in the alternative instructions framework, and move the patching into the callers. As a follow-on, one could also introduce something like __EXPORT_SYMBOL_ALT() to avoid patching call sites in modules. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4B2BB8180200007800026AE7@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds2009-12-193-215/+206
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, irq: Allow 0xff for /proc/irq/[n]/smp_affinity on an 8-cpu system Makefile: Unexport LC_ALL instead of clearing it x86: Fix objdump version check in arch/x86/tools/chkobjdump.awk x86: Reenable TSC sync check at boot, even with NONSTOP_TSC x86: Don't use POSIX character classes in gen-insn-attr-x86.awk Makefile: set LC_CTYPE, LC_COLLATE, LC_NUMERIC to C x86: Increase MAX_EARLY_RES; insufficient on 32-bit NUMA x86: Fix checking of SRAT when node 0 ram is not from 0 x86, cpuid: Add "volatile" to asm in native_cpuid() x86, msr: msrs_alloc/free for CONFIG_SMP=n x86, amd: Get multi-node CPU info from NodeId MSR instead of PCI config space x86: Add IA32_TSC_AUX MSR and use it x86, msr/cpuid: Register enough minors for the MSR and CPUID drivers initramfs: add missing decompressor error check bzip2: Add missing checks for malloc returning NULL bzip2/lzma/gzip: pre-boot malloc doesn't return NULL on failure
| * x86, msr: msrs_alloc/free for CONFIG_SMP=nBorislav Petkov2009-12-163-215/+206
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Randy Dunlap reported the following build error: "When CONFIG_SMP=n, CONFIG_X86_MSR=m: ERROR: "msrs_free" [drivers/edac/amd64_edac_mod.ko] undefined! ERROR: "msrs_alloc" [drivers/edac/amd64_edac_mod.ko] undefined!" This is due to the fact that <arch/x86/lib/msr.c> is conditioned on CONFIG_SMP and in the UP case we have only the stubs in the header. Fork off SMP functionality into a new file (msr-smp.c) and build msrs_{alloc,free} unconditionally. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <petkovbb@gmail.com> LKML-Reference: <20091216231625.GD27228@liondog.tnic> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds2009-12-141-4/+22
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, mce: Clean up thermal init by introducing intel_thermal_supported() x86, mce: Thermal monitoring depends on APIC being enabled x86: Gart: fix breakage due to IOMMU initialization cleanup x86: Move swiotlb initialization before dma32_free_bootmem x86: Fix build warning in arch/x86/mm/mmio-mod.c x86: Remove usedac in feature-removal-schedule.txt x86: Fix duplicated UV BAU interrupt vector nvram: Fix write beyond end condition; prove to gcc copy is safe mm: Adjust do_pages_stat() so gcc can see copy_from_user() is safe x86: Limit the number of processor bootup messages x86: Remove enabling x2apic message for every CPU doc: Add documentation for bootloader_{type,version} x86, msr: Add support for non-contiguous cpumasks x86: Use find_e820() instead of hard coded trampoline address x86, AMD: Fix stale cpuid4_info shared_map data in shared_cpu_map cpumasks Trivial percpu-naming-introduced conflicts in arch/x86/kernel/cpu/intel_cacheinfo.c
| * x86, msr: Add support for non-contiguous cpumasksBorislav Petkov2009-12-111-4/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current rd/wrmsr_on_cpus helpers assume that the supplied cpumasks are contiguous. However, there are machines out there like some K8 multinode Opterons which have a non-contiguous core enumeration on each node (e.g. cores 0,2 on node 0 instead of 0,1), see http://www.gossamer-threads.com/lists/linux/kernel/1160268. This patch fixes out-of-bounds writes (see URL above) by adding per-CPU msr structs which are used on the respective cores. Additionally, two helpers, msrs_{alloc,free}, are provided for use by the callers of the MSR accessors. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20091211171440.GD31998@aftab> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds2009-12-111-2/+2
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (57 commits) x86, perf events: Check if we have APIC enabled perf_event: Fix variable initialization in other codepaths perf kmem: Fix unused argument build warning perf symbols: perf_header__read_build_ids() offset'n'size should be u64 perf symbols: dsos__read_build_ids() should read both user and kernel buildids perf tools: Align long options which have no short forms perf kmem: Show usage if no option is specified sched: Mark sched_clock() as notrace perf sched: Add max delay time snapshot perf tools: Correct size given to memset perf_event: Fix perf_swevent_hrtimer() variable initialization perf sched: Fix for getting task's execution time tracing/kprobes: Fix field creation's bad error handling perf_event: Cleanup for cpu_clock_perf_event_update() perf_event: Allocate children's perf_event_ctxp at the right time perf_event: Clean up __perf_event_init_context() hw-breakpoints: Modify breakpoints without unregistering them perf probe: Update perf-probe document perf probe: Support --del option trace-kprobe: Support delete probe syntax ...
| * x86 insn: Delete empty or incomplete inat-tables.cMasami Hiramatsu2009-12-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Delete empty or incomplete inat-tables.c if gen-insn-attr-x86.awk failed, because it causes a build error if user tries to build kernel next time. Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20091207170033.19230.37688.stgit@dhcp-100-2-132.bos.redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * x86: Compile insn.c and inat.c only for KPROBESOGAWA Hirofumi2009-12-071-1/+1
| | | | | | | | | | | | | | | | | | | | At least, insn.c and inat.c is needed for kprobe for now. So, this compile those only if KPROBES is enabled. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Masami Hiramatsu <mhiramat@redhat.com> LKML-Reference: <878wdg8icq.fsf@devron.myhome.or.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds2009-12-051-27/+19
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, msr, cpumask: Use struct cpumask rather than the deprecated cpumask_t x86, cpuid: Simplify the code in cpuid_open x86, cpuid: Remove the bkl from cpuid_open() x86, msr: Remove the bkl from msr_open() x86: AMD Geode LX optimizations x86, msr: Unify rdmsr_on_cpus/wrmsr_on_cpus
| * | x86, msr: Unify rdmsr_on_cpus/wrmsr_on_cpusBorislav Petkov2009-09-151-27/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since rdmsr_on_cpus and wrmsr_on_cpus are almost identical, unify them into a common __rwmsr_on_cpus helper thus avoiding code duplication. While at it, convert cpumask_t's to const struct cpumask *. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds2009-12-052-12/+12
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: include/linux/compiler-gcc4.h: Fix build bug - gcc-4.0.2 doesn't understand __builtin_object_size x86/alternatives: No need for alternatives-asm.h to re-invent stuff already in asm.h x86/alternatives: Check replacementlen <= instrlen at build time x86, 64-bit: Set data segments to null after switching to 64-bit mode x86: Clean up the loadsegment() macro x86: Optimize loadsegment() x86: Add missing might_fault() checks to copy_{to,from}_user() x86-64: __copy_from_user_inatomic() adjustments x86: Remove unused thread_return label from switch_to() x86, 64-bit: Fix bstep_iret jump x86: Don't use the strict copy checks when branch profiling is in use x86, 64-bit: Move K8 B step iret fixup to fault entry asm x86: Generate cmpxchg build failures x86: Add a Kconfig option to turn the copy_from_user warnings into errors x86: Turn the copy_from_user check into an (optional) compile time warning x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy x86: Use __builtin_object_size() to validate the buffer size for copy_from_user()
| * | x86: Add missing might_fault() checks to copy_{to,from}_user()Frederic Weisbecker2009-11-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On x86-64, copy_[to|from]_user() rely on assembly routines that never call might_fault(), making us missing various lockdep checks. This doesn't apply to __copy_from,to_user() that explicitly handle these calls, neither is it a problem in x86-32 where copy_to,from_user() rely on the "__" prefixed versions that also call might_fault(). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1258382538-30979-1-git-send-email-fweisbec@gmail.com> [ v2: fix module export ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | x86-64: __copy_from_user_inatomic() adjustmentsJan Beulich2009-11-151-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This v2.6.26 commit: ad2fc2c: x86: fix copy_user on x86 rendered __copy_from_user_inatomic() identical to copy_user_generic(), yet didn't make the former just call the latter from an inline function. Furthermore, this v2.6.19 commit: b885808: [PATCH] Add proper sparse __user casts to __copy_to_user_inatomic converted the return type of __copy_to_user_inatomic() from unsigned long to int, but didn't do the same to __copy_from_user_inatomic(). Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: <v.mayatskih@gmail.com> LKML-Reference: <4AFD5778020000780001F8F4@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | x86: Turn the copy_from_user check into an (optional) compile time warningArjan van de Ven2009-10-011-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A previous patch added the buffer size check to copy_from_user(). One of the things learned from analyzing the result of the previous patch is that in general, gcc is really good at proving that the code contains sufficient security checks to not need to do a runtime check. But that for those cases where gcc could not prove this, there was a relatively high percentage of real security issues. This patch turns the case of "gcc cannot prove" into a compile time warning, as long as a sufficiently new gcc is in use that supports this. The objective is that these warnings will trigger developers checking new cases out before a security hole enters a linux kernel release. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: James Morris <jmorris@namei.org> Cc: Jan Beulich <jbeulich@novell.com> LKML-Reference: <20090930130523.348ae6c4@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | x86: Use __builtin_object_size() to validate the buffer size for ↵Arjan van de Ven2009-09-262-4/+4
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | copy_from_user() gcc (4.x) supports the __builtin_object_size() builtin, which reports the size of an object that a pointer point to, when known at compile time. If the buffer size is not known at compile time, a constant -1 is returned. This patch uses this feature to add a sanity check to copy_from_user(); if the target buffer is known to be smaller than the copy size, the copy is aborted and a WARNing is emitted in memory debug mode. These extra checks compile away when the object size is not known, or if both the buffer size and the copy length are constants. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> LKML-Reference: <20090926143301.2c396b94@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | x86: Gitignore: arch/x86/lib/inat-tables.cHiroshi Shimamoto2009-11-041-0/+1
| | | | | | | | | | | | | | | | | | Ignore generated file arch/x86/lib/inat-tables.c. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> LKML-Reference: <4AF0FBD7.7000501@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | x86: Add Intel FMA instructions to x86 opcode mapMasami Hiramatsu2009-10-291-1/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add Intel FMA(FUSED-MULTIPLY-ADD) instructions to x86 opcode map for x86 instruction decoder. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: K.Prasad <prasad@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> LKML-Reference: <20091027204235.30545.33997.stgit@harusame> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | x86: AVX instruction set decoder supportMasami Hiramatsu2009-10-293-206/+289
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add Intel AVX(Advanced Vector Extensions) instruction set support to x86 instruction decoder. This adds insn.vex_prefix field for storing VEX prefixes, and introduces some original tags for expressing opcodes attributes. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: K.Prasad <prasad@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> LKML-Reference: <20091027204226.30545.23451.stgit@harusame> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | x86: Add pclmulq to x86 opcode mapMasami Hiramatsu2009-10-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add pclmulq opcode to x86 opcode map. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: K.Prasad <prasad@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> LKML-Reference: <20091027204219.30545.82039.stgit@harusame> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | x86: Merge INAT_REXPFX into INAT_PFX_*Masami Hiramatsu2009-10-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge INAT_REXPFX into INAT_PFX_* macro and rename it to INAT_PFX_REX. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: K.Prasad <prasad@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> LKML-Reference: <20091027204211.30545.58090.stgit@harusame> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | x86: Fix SSE opcode map bugMasami Hiramatsu2009-10-291-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix superscripts position because some superscripts of SSE opcode are not put in correct position. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: K.Prasad <prasad@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> LKML-Reference: <20091027204204.30545.97296.stgit@harusame> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | x86: Add AES opcodes to opcode mapMasami Hiramatsu2009-10-211-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | Add Intel AES opcodes to x86 opcode map. These opcodes are used in arch/x86/crypt/aesni-intel_asm.S. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: systemtap<systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20091020165531.4145.21872.stgit@dhcp-100-2-132.bos.redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | x86: Fix group attribute decoding bugMasami Hiramatsu2009-10-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Fix a typo in inat_get_group_attribute() which should refer inat_group_tables, not inat_escape_tables. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: systemtap<systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20091020165524.4145.97333.stgit@dhcp-100-2-132.bos.redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | Merge commit 'v2.6.32-rc5' into perf/probesIngo Molnar2009-10-172-1/+60
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: kernel/trace/trace_event_profile.c Merge reason: update to -rc5 and resolve conflict. Signed-off-by: Ingo Molnar <mingo@elte.hu>